1
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
2
|
Fry BR, Roberts D, Thakkar KN, Johnson AW. Variables influencing conditioning-evoked hallucinations: overview and future applications. Psychol Med 2022; 52:2937-2949. [PMID: 36138518 PMCID: PMC9693682 DOI: 10.1017/s0033291722002100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 06/13/2022] [Accepted: 06/20/2022] [Indexed: 01/05/2023]
Abstract
Hallucinations occur in the absence of sensory stimulation and result in vivid perceptual experiences of nonexistent events that manifest across a range of sensory modalities. Approaches from the field of experimental and cognitive psychology have leveraged the idea that associative learning experiences can evoke conditioning-induced hallucinations in both animals and humans. In this review, we describe classical and contemporary findings and highlight the variables eliciting these experiences. We also provide an overview of the neurobiological mechanisms, along with the associative and computational factors that may explain hallucinations that are generated by representation-mediated conditioning phenomena. Through the integration of animal and human research, significant advances into the psychobiology of hallucinations are possible, which may ultimately translate to more effective clinical applications.
Collapse
Affiliation(s)
- Benjamin R. Fry
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| | - Dominic Roberts
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| | - Katharine N. Thakkar
- Department of Psychology, Michigan State University, East Lansing, MI, USA
- Neuroscience Program, Michigan State University, East Lansing, MI, USA
| | - Alexander W. Johnson
- Department of Psychology, Michigan State University, East Lansing, MI, USA
- Neuroscience Program, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
3
|
Abstract
BACKGROUND Studies that examine course and outcome in psychosis have reported considerable heterogeneity in terms of recovery, remission, employment, symptom presentation, social outcomes, and antipsychotic medication effects. Even with demonstrated heterogeneity in course and outcome, prophylactic antipsychotic maintenance therapy remains the prominent practice, particularly in participants with schizophrenia. Lack of efficacy in maintenance antipsychotic treatment and concerns over health detriments gives cause to re-examine guidelines. METHODS This study was conducted as part of the Chicago follow-up study designed as a naturalistic prospective longitudinal research study to investigate the course, outcome, symptomatology, and effects of antipsychotic medication on recovery and rehospitalization in participants with serious mental illness disorders. A total of 139 participants with 734 observations were included in the analysis. GEE logistic models were applied to adjust for confounding factors measured at index hospitalization and follow-ups. RESULTS Our data show that the majority of participants with schizophrenia or affective psychosis experience future episodes of psychosis at some point during the 20-year follow-up. There was a significant diagnostic difference between groups showing an increase in the number of future episodes of psychosis in participants with schizophrenia. Participants with schizophrenia not on antipsychotics after the first 2 years have better outcomes than participants prescribed antipsychotics. The adjusted odds ratio of not on antipsychotic medication was 5.989 (95% CI 3.588-9.993) for recovery and 0.134 (95% CI 0.070-0.259) for rehospitalization. That is, regardless of diagnosis, after the second year, the absence of antipsychotics predicted a higher probability of recovery and lower probability of rehospitalization at subsequent follow-ups after adjusting for confounders. CONCLUSION This study reports multiple findings that bring into question the use of continuous antipsychotic medications, regardless of diagnosis. Even when the confound by indication for prescribing antipsychotic medication is controlled for, participants with schizophrenia and affective psychosis do better than their medicated cohorts, strongly confirming the importance of exposing the role of aiDSP and antipsychotic drug resistance.
Collapse
Affiliation(s)
- Martin Harrow
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
| | - Thomas H Jobe
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
| | - Liping Tong
- Advocoate Aurora Health, Downers Grove, IL, USA
| |
Collapse
|
4
|
Model-based learning retrospectively updates model-free values. Sci Rep 2022; 12:2358. [PMID: 35149713 PMCID: PMC8837618 DOI: 10.1038/s41598-022-05567-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 12/16/2021] [Indexed: 12/02/2022] Open
Abstract
Reinforcement learning (RL) is widely regarded as divisible into two distinct computational strategies. Model-free learning is a simple RL process in which a value is associated with actions, whereas model-based learning relies on the formation of internal models of the environment to maximise reward. Recently, theoretical and animal work has suggested that such models might be used to train model-free behaviour, reducing the burden of costly forward planning. Here we devised a way to probe this possibility in human behaviour. We adapted a two-stage decision task and found evidence that model-based processes at the time of learning can alter model-free valuation in healthy individuals. We asked people to rate subjective value of an irrelevant feature that was seen at the time a model-based decision would have been made. These irrelevant feature value ratings were updated by rewards, but in a way that accounted for whether the selected action retrospectively ought to have been taken. This model-based influence on model-free value ratings was best accounted for by a reward prediction error that was calculated relative to the decision path that would most likely have led to the reward. This effect occurred independently of attention and was not present when participants were not explicitly told about the structure of the environment. These findings suggest that current conceptions of model-based and model-free learning require updating in favour of a more integrated approach. Our task provides an empirical handle for further study of the dialogue between these two learning systems in the future.
Collapse
|
5
|
Langdon A, Botvinick M, Nakahara H, Tanaka K, Matsumoto M, Kanai R. Meta-learning, social cognition and consciousness in brains and machines. Neural Netw 2021; 145:80-89. [PMID: 34735893 DOI: 10.1016/j.neunet.2021.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 09/20/2021] [Accepted: 10/01/2021] [Indexed: 12/11/2022]
Abstract
The intersection between neuroscience and artificial intelligence (AI) research has created synergistic effects in both fields. While neuroscientific discoveries have inspired the development of AI architectures, new ideas and algorithms from AI research have produced new ways to study brain mechanisms. A well-known example is the case of reinforcement learning (RL), which has stimulated neuroscience research on how animals learn to adjust their behavior to maximize reward. In this review article, we cover recent collaborative work between the two fields in the context of meta-learning and its extension to social cognition and consciousness. Meta-learning refers to the ability to learn how to learn, such as learning to adjust hyperparameters of existing learning algorithms and how to use existing models and knowledge to efficiently solve new tasks. This meta-learning capability is important for making existing AI systems more adaptive and flexible to efficiently solve new tasks. Since this is one of the areas where there is a gap between human performance and current AI systems, successful collaboration should produce new ideas and progress. Starting from the role of RL algorithms in driving neuroscience, we discuss recent developments in deep RL applied to modeling prefrontal cortex functions. Even from a broader perspective, we discuss the similarities and differences between social cognition and meta-learning, and finally conclude with speculations on the potential links between intelligence as endowed by model-based RL and consciousness. For future work we highlight data efficiency, autonomy and intrinsic motivation as key research areas for advancing both fields.
Collapse
Affiliation(s)
- Angela Langdon
- Princeton Neuroscience Institute, Princeton University, USA
| | - Matthew Botvinick
- DeepMind, London, UK; Gatsby Computational Neuroscience Unit, University College London, London, UK
| | | | - Keiji Tanaka
- RIKEN Center for Brain Science, Wako, Saitama, Japan
| | - Masayuki Matsumoto
- Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan; Graduate School of Comprehensive Human Sciences, University of Tsukuba, Ibaraki, Japan; Transborder Medical Research Center, University of Tsukuba, Ibaraki, Japan
| | | |
Collapse
|
6
|
Iglesias S, Kasper L, Harrison SJ, Manka R, Mathys C, Stephan KE. Cholinergic and dopaminergic effects on prediction error and uncertainty responses during sensory associative learning. Neuroimage 2020; 226:117590. [PMID: 33285332 DOI: 10.1016/j.neuroimage.2020.117590] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 10/20/2020] [Accepted: 11/19/2020] [Indexed: 01/11/2023] Open
Abstract
Navigating the physical world requires learning probabilistic associations between sensory events and their change in time (volatility). Bayesian accounts of this learning process rest on hierarchical prediction errors (PEs) that are weighted by estimates of uncertainty (or its inverse, precision). In a previous fMRI study we found that low-level precision-weighted PEs about visual outcomes (that update beliefs about associations) activated the putative dopaminergic midbrain; by contrast, precision-weighted PEs about cue-outcome associations (that update beliefs about volatility) activated the cholinergic basal forebrain. These findings suggested selective dopaminergic and cholinergic influences on precision-weighted PEs at different hierarchical levels. Here, we tested this hypothesis, repeating our fMRI study under pharmacological manipulations in healthy participants. Specifically, we performed two pharmacological fMRI studies with a between-subject double-blind placebo-controlled design: study 1 used antagonists of dopaminergic (amisulpride) and muscarinic (biperiden) receptors, study 2 used enhancing drugs of dopaminergic (levodopa) and cholinergic (galantamine) modulation. Pooled across all pharmacological conditions of study 1 and study 2, respectively, we found that low-level precision-weighted PEs activated the midbrain and high-level precision-weighted PEs the basal forebrain as in our previous study. However, we found pharmacological effects on brain activity associated with these computational quantities only when splitting the precision-weighted PEs into their PE and precision components: in a brainstem region putatively containing cholinergic (pedunculopontine and laterodorsal tegmental) nuclei, biperiden (compared to placebo) enhanced low-level PE responses and attenuated high-level PE activity, while amisulpride reduced high-level PE responses. Additionally, in the putative dopaminergic midbrain, galantamine compared to placebo enhanced low-level PE responses (in a body-weight dependent manner) and amisulpride enhanced high-level precision activity. Task behaviour was not affected by any of the drugs. These results do not support our hypothesis of a clear-cut dichotomy between different hierarchical inference levels and neurotransmitter systems, but suggest a more complex interaction between these neuromodulatory systems and hierarchical Bayesian quantities. However, our present results may have been affected by confounds inherent to pharmacological fMRI. We discuss these confounds and outline improved experimental tests for the future.
Collapse
Affiliation(s)
- Sandra Iglesias
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Wilfriedstr. 6, 8032 Zurich, Switzerland.
| | - Lars Kasper
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Wilfriedstr. 6, 8032 Zurich, Switzerland; Institute for Biomedical Engineering, ETH Zurich and University of Zurich, Switzerland
| | - Samuel J Harrison
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Wilfriedstr. 6, 8032 Zurich, Switzerland
| | - Robert Manka
- Department of Cardiology, University Hospital Zurich, Switzerland
| | - Christoph Mathys
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Wilfriedstr. 6, 8032 Zurich, Switzerland; Interacting Minds Centre, Aarhus University, Aarhus, Denmark
| | - Klaas E Stephan
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Wilfriedstr. 6, 8032 Zurich, Switzerland; Max Planck Institute for Metabolism Research, Cologne, Germany
| |
Collapse
|
7
|
Context-Dependent Multiplexing by Individual VTA Dopamine Neurons. J Neurosci 2020; 40:7489-7509. [PMID: 32859713 DOI: 10.1523/jneurosci.0502-20.2020] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 07/25/2020] [Accepted: 08/03/2020] [Indexed: 01/13/2023] Open
Abstract
Dopamine (DA) neurons of the VTA track cues and rewards to generate a reward prediction error signal during Pavlovian conditioning. Here we explored how these neurons respond to a self-paced, operant task in freely moving mice. The animal could trigger a reward-predicting cue by remaining in a specific location of an operant box for a brief time before moving to a spout for reward collection. VTA DA neurons were identified using DAT-Cre male mice that carried an optrode with minimal impact on the behavioral task. In vivo single-unit recordings revealed transient fast spiking responses to the cue and reward in correct trials, while for incorrect ones the activity paused, reflecting positive and negative error signals of a reward prediction. In parallel, a majority of VTA DA neurons simultaneously encoded multiple actions (e.g., movement velocity, acceleration, distance to goal, and licking) in sustained slow firing modulation. Applying a GLM, we show that such multiplexed encoding of rewarding and motor variables by individual DA neurons was only apparent while the mouse was engaged in the task. Downstream targets may exploit such goal-directed multiplexing of VTA DA neurons to adjust actions to optimize the task's outcome.SIGNIFICANCE STATEMENT VTA DA neurons code for multiple functions, including the reward prediction error but also motivation and locomotion. Here we show that about half of the recorded VTA DA neurons perform multiplexing: they exploit the phasic and tonic activity modes to encode, respectively, the cue/reward responses and motor parameters, most prominently when the mouse engages in a self-paced operand task. VTA non-DA neurons, by contrast, encode motor parameters regardless of task engagement.
Collapse
|
8
|
Watabe-Uchida M, Uchida N. Multiple Dopamine Systems: Weal and Woe of Dopamine. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2019; 83:83-95. [PMID: 30787046 DOI: 10.1101/sqb.2018.83.037648] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The ability to predict future outcomes increases the fitness of the animal. Decades of research have shown that dopamine neurons broadcast reward prediction error (RPE) signals-the discrepancy between actual and predicted reward-to drive learning to predict future outcomes. Recent studies have begun to show, however, that dopamine neurons are more diverse than previously thought. In this review, we will summarize a series of our studies that have shown unique properties of dopamine neurons projecting to the posterior "tail" of the striatum (TS) in terms of anatomy, activity, and function. Specifically, TS-projecting dopamine neurons are activated by a subset of negative events including threats from a novel object, send prediction errors for external threats, and reinforce avoidance behaviors. These results indicate that there are at least two axes of dopamine-mediated reinforcement learning in the brain-one learning from canonical RPEs and another learning from threat prediction errors. We argue that the existence of multiple learning systems is an adaptive strategy that makes possible each system optimized for its own needs. The compartmental organization in the mammalian striatum resembles that of a dopamine-recipient area in insects (mushroom body), pointing to a principle of dopamine function conserved across phyla.
Collapse
Affiliation(s)
- Mitsuko Watabe-Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
9
|
Sharpe MJ, Schoenbaum G. Evaluation of the hypothesis that phasic dopamine constitutes a cached-value signal. Neurobiol Learn Mem 2018; 153:131-136. [PMID: 29269085 PMCID: PMC6136434 DOI: 10.1016/j.nlm.2017.12.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 12/06/2017] [Accepted: 12/17/2017] [Indexed: 02/06/2023]
Abstract
The phasic dopamine error signal is currently argued to be synonymous with the prediction error in Sutton and Barto (1987, 1998) model-free reinforcement learning algorithm (Schultz et al., 1997). This theory argues that phasic dopamine reflects a cached-value signal that endows reward-predictive cues with the scalar value inherent in reward. Such an interpretation does not envision a role for dopamine in more complex cognitive representations between events which underlie many forms of associative learning, restricting the role dopamine can play in learning. The cached-value hypothesis of dopamine makes three concrete predictions about when a phasic dopamine response should be seen and what types of learning this signal should be able to promote. We discuss these predictions in light of recent evidence which we believe provide particularly strong tests of their validity. In doing so, we find that while the phasic dopamine signal conforms to a cached-value account in some circumstances, other evidence demonstrate that this signal is not restricted to a model-free cached-value reinforcement learning signal. In light of this evidence, we argue that the phasic dopamine signal functions more generally to signal violations of expectancies to drive real-world associations between events.
Collapse
Affiliation(s)
- Melissa J Sharpe
- National Institute on Drug Abuse, Baltimore, MD, USA; Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA; School of Psychology, UNSW Australia.
| | - Geoffrey Schoenbaum
- National Institute on Drug Abuse, Baltimore, MD, USA; Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Solomon H. Snyder Department of Neuroscience, The John Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
10
|
Langdon AJ, Sharpe MJ, Schoenbaum G, Niv Y. Model-based predictions for dopamine. Curr Opin Neurobiol 2018; 49:1-7. [PMID: 29096115 PMCID: PMC6034703 DOI: 10.1016/j.conb.2017.10.006] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 10/07/2017] [Accepted: 10/09/2017] [Indexed: 01/16/2023]
Abstract
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.
Collapse
Affiliation(s)
- Angela J Langdon
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ 08540, United States.
| | - Melissa J Sharpe
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ 08540, United States; National Institute on Drug Abuse, Baltimore, MD 21224, United States; School of Psychology, University of New South Wales, Australia
| | | | - Yael Niv
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ 08540, United States
| |
Collapse
|
11
|
The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia. Curr Opin Neurobiol 2017; 46:241-247. [PMID: 28985550 DOI: 10.1016/j.conb.2017.08.015] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 08/22/2017] [Indexed: 11/20/2022]
Abstract
Computational models of reinforcement learning (RL) strive to produce behavior that maximises reward, and thus allow software or robots to behave adaptively [1]. At the core of RL models is a learned mapping between 'states'-situations or contexts that an agent might encounter in the world-and actions. A wealth of physiological and anatomical data suggests that the basal ganglia (BG) is important for learning these mappings [2,3]. However, the computations performed by specific circuits are unclear. In this brief review, we highlight recent work concerning the anatomy and physiology of BG circuits that suggest refinements in our understanding of computations performed by the basal ganglia. We focus on one important component of basal ganglia circuitry, midbrain dopamine neurons, drawing attention to data that has been cast as supporting or departing from the RL framework that has inspired experiments in basal ganglia research over the past two decades. We suggest that the parallel circuit architecture of the BG might be expected to produce variability in the response properties of different dopamine neurons, and that variability in response profile may not reflect variable functions, but rather different arguments that serve as inputs to a common function: the computation of prediction error.
Collapse
|
12
|
Russek EM, Momennejad I, Botvinick MM, Gershman SJ, Daw ND. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput Biol 2017; 13:e1005768. [PMID: 28945743 PMCID: PMC5628940 DOI: 10.1371/journal.pcbi.1005768] [Citation(s) in RCA: 122] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 10/05/2017] [Accepted: 09/04/2017] [Indexed: 11/19/2022] Open
Abstract
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
Collapse
Affiliation(s)
- Evan M. Russek
- Center for Neural Science, New York University, New York, NY, United States of America
| | - Ida Momennejad
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, United States of America
| | - Matthew M. Botvinick
- DeepMind, London, United Kingdom and Gatsby Computational Neuroscience Unit, University College London, United Kingdom
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, United States of America
| | - Nathaniel D. Daw
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, United States of America
| |
Collapse
|
13
|
Sharpe MJ, Chang CY, Liu MA, Batchelor HM, Mueller LE, Jones JL, Niv Y, Schoenbaum G. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat Neurosci 2017; 20:735-742. [PMID: 28368385 PMCID: PMC5413864 DOI: 10.1038/nn.4538] [Citation(s) in RCA: 159] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 02/28/2017] [Indexed: 12/12/2022]
Abstract
Associative learning is driven by prediction errors. Dopamine transients correlate with these errors, which current interpretations limit to endowing cues with a scalar quantity reflecting the value of future rewards. We tested whether dopamine might act more broadly to support learning of an associative model of the environment. Using sensory preconditioning, we show that prediction errors underlying stimulus-stimulus learning can be blocked behaviorally and reinstated by optogenetically activating dopamine neurons. We further show that suppressing the firing of these neurons across the transition prevents normal stimulus-stimulus learning. These results establish that the acquisition of model-based information about transitions between nonrewarding events is also driven by prediction errors and that, contrary to existing canon, dopamine transients are both sufficient and necessary to support this type of learning. Our findings open new possibilities for how these biological signals might support associative learning in the mammalian brain in these and other contexts.
Collapse
Affiliation(s)
- Melissa J Sharpe
- NIDA Intramural Research Program, Baltimore, Maryland, USA
- Department of Psychology and Neuroscience Institute, Princeton University, Princeton, New Jersey, USA
| | - Chun Yun Chang
- NIDA Intramural Research Program, Baltimore, Maryland, USA
| | - Melissa A Liu
- NIDA Intramural Research Program, Baltimore, Maryland, USA
| | | | | | - Joshua L Jones
- NIDA Intramural Research Program, Baltimore, Maryland, USA
| | - Yael Niv
- Department of Psychology and Neuroscience Institute, Princeton University, Princeton, New Jersey, USA
| | - Geoffrey Schoenbaum
- NIDA Intramural Research Program, Baltimore, Maryland, USA
- Departments of Anatomy and of Neurobiology and Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, USA
- Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
14
|
Nasser HM, Calu DJ, Schoenbaum G, Sharpe MJ. The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning. Front Psychol 2017; 8:244. [PMID: 28275359 PMCID: PMC5319959 DOI: 10.3389/fpsyg.2017.00244] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Accepted: 02/07/2017] [Indexed: 12/31/2022] Open
Abstract
Phasic activity of midbrain dopamine neurons is currently thought to encapsulate the prediction-error signal described in Sutton and Barto’s (1981) model-free reinforcement learning algorithm. This phasic signal is thought to contain information about the quantitative value of reward, which transfers to the reward-predictive cue after learning. This is argued to endow the reward-predictive cue with the value inherent in the reward, motivating behavior toward cues signaling the presence of reward. Yet theoretical and empirical research has implicated prediction-error signaling in learning that extends far beyond a transfer of quantitative value to a reward-predictive cue. Here, we review the research which demonstrates the complexity of how dopaminergic prediction errors facilitate learning. After briefly discussing the literature demonstrating that phasic dopaminergic signals can act in the manner described by Sutton and Barto (1981), we consider how these signals may also influence attentional processing across multiple attentional systems in distinct brain circuits. Then, we discuss how prediction errors encode and promote the development of context-specific associations between cues and rewards. Finally, we consider recent evidence that shows dopaminergic activity contains information about causal relationships between cues and rewards that reflect information garnered from rich associative models of the world that can be adapted in the absence of direct experience. In discussing this research we hope to support the expansion of how dopaminergic prediction errors are thought to contribute to the learning process beyond the traditional concept of transferring quantitative value.
Collapse
Affiliation(s)
- Helen M Nasser
- Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore MD, USA
| | - Donna J Calu
- Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore MD, USA
| | - Geoffrey Schoenbaum
- Department of Anatomy and Neurobiology, University of Maryland School of Medicine, BaltimoreMD, USA; Cellular Neurobiology Research Branch, National Institute on Drug Abuse Intramural Research Program, BaltimoreMD, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, BaltimoreMD, USA
| | - Melissa J Sharpe
- Cellular Neurobiology Research Branch, National Institute on Drug Abuse Intramural Research Program, BaltimoreMD, USA; Princeton Neuroscience Institute, Princeton University, PrincetonNJ, USA
| |
Collapse
|
15
|
Iglesias S, Tomiello S, Schneebeli M, Stephan KE. Models of neuromodulation for computational psychiatry. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2016; 8. [PMID: 27653804 DOI: 10.1002/wcs.1420] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Revised: 07/22/2016] [Accepted: 08/09/2016] [Indexed: 12/28/2022]
Abstract
Psychiatry faces fundamental challenges: based on a syndrome-based nosology, it presently lacks clinical tests to infer on disease processes that cause symptoms of individual patients and must resort to trial-and-error treatment strategies. These challenges have fueled the recent emergence of a novel field-computational psychiatry-that strives for mathematical models of disease processes at physiological and computational (information processing) levels. This review is motivated by one particular goal of computational psychiatry: the development of 'computational assays' that can be applied to behavioral or neuroimaging data from individual patients and support differential diagnosis and guiding patient-specific treatment. Because the majority of available pharmacotherapeutic approaches in psychiatry target neuromodulatory transmitters, models that infer (patho)physiological and (patho)computational actions of different neuromodulatory transmitters are of central interest for computational psychiatry. This article reviews the (many) outstanding questions on the computational roles of neuromodulators (dopamine, acetylcholine, serotonin, and noradrenaline), outlines available evidence, and discusses promises and pitfalls in translating these findings to clinical applications. WIREs Cogn Sci 2017, 8:e1420. doi: 10.1002/wcs.1420 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Sandra Iglesias
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Zurich, Switzerland
| | - Sara Tomiello
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Zurich, Switzerland
| | - Maya Schneebeli
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Zurich, Switzerland
| | - Klaas E Stephan
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich & Swiss Federal Institute of Technology (ETH Zurich), Zurich, Switzerland.,Wellcome Trust Centre for Neuroimaging, University College London, London, UK.,Max Planck Institute for Metabolism Research, Cologne, Germany
| |
Collapse
|
16
|
Abstract
Cognitive control is subjectively costly, suggesting that engagement is modulated in relationship to incentive state. Dopamine appears to play key roles. In particular, dopamine may mediate cognitive effort by two broad classes of functions: (1) modulating the functional parameters of working memory circuits subserving effortful cognition, and (2) mediating value-learning and decision-making about effortful cognitive action. Here, we tie together these two lines of research, proposing how dopamine serves "double duty", translating incentive information into cognitive motivation.
Collapse
Affiliation(s)
- Andrew Westbrook
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO 63130, USA.
| | - Todd S Braver
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
17
|
Lam J, Globas C, Hosp J, Karnath HO, Wächter T, Luft A. Impaired implicit learning and feedback processing after stroke. Neuroscience 2016; 314:116-24. [DOI: 10.1016/j.neuroscience.2015.11.051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 11/20/2015] [Accepted: 11/22/2015] [Indexed: 11/15/2022]
|
18
|
Stephan K, Iglesias S, Heinzle J, Diaconescu A. Translational Perspectives for Computational Neuroimaging. Neuron 2015; 87:716-32. [DOI: 10.1016/j.neuron.2015.07.008] [Citation(s) in RCA: 132] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
19
|
Chen C, Takahashi T, Nakagawa S, Inoue T, Kusumi I. Reinforcement learning in depression: A review of computational research. Neurosci Biobehav Rev 2015; 55:247-67. [PMID: 25979140 DOI: 10.1016/j.neubiorev.2015.05.005] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 04/20/2015] [Accepted: 05/04/2015] [Indexed: 01/05/2023]
Abstract
Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease.
Collapse
Affiliation(s)
- Chong Chen
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan.
| | - Taiki Takahashi
- Department of Behavioral Science/Center for Experimental Research in Social Sciences, Hokkaido University, Sapporo 060-0810, Japan
| | - Shin Nakagawa
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan
| | - Takeshi Inoue
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan
| | - Ichiro Kusumi
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan
| |
Collapse
|
20
|
Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc Natl Acad Sci U S A 2015; 112:1595-600. [PMID: 25605941 DOI: 10.1073/pnas.1417219112] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Dual system theories suggest that behavioral control is parsed between a deliberative "model-based" and a more reflexive "model-free" system. A balance of control exerted by these systems is thought to be related to dopamine neurotransmission. However, in the absence of direct measures of human dopamine, it remains unknown whether this reflects a quantitative relation with dopamine either in the striatum or other brain areas. Using a sequential decision task performed during functional magnetic resonance imaging, combined with striatal measures of dopamine using [(18)F]DOPA positron emission tomography, we show that higher presynaptic ventral striatal dopamine levels were associated with a behavioral bias toward more model-based control. Higher presynaptic dopamine in ventral striatum was associated with greater coding of model-based signatures in lateral prefrontal cortex and diminished coding of model-free prediction errors in ventral striatum. Thus, interindividual variability in ventral striatal presynaptic dopamine reflects a balance in the behavioral expression and the neural signatures of model-free and model-based control. Our data provide a novel perspective on how alterations in presynaptic dopamine levels might be accompanied by a disruption of behavioral control as observed in aging or neuropsychiatric diseases such as schizophrenia and addiction.
Collapse
|
21
|
Kaveri S, Nakahara H. Dual reward prediction components yield Pavlovian sign- and goal-tracking. PLoS One 2014; 9:e108142. [PMID: 25310184 PMCID: PMC4195585 DOI: 10.1371/journal.pone.0108142] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 08/26/2014] [Indexed: 11/18/2022] Open
Abstract
Reinforcement learning (RL) has become a dominant paradigm for understanding animal behaviors and neural correlates of decision-making, in part because of its ability to explain Pavlovian conditioned behaviors and the role of midbrain dopamine activity as reward prediction error (RPE). However, recent experimental findings indicate that dopamine activity, contrary to the RL hypothesis, may not signal RPE and differs based on the type of Pavlovian response (e.g. sign- and goal-tracking responses). In this study, we address this discrepancy by introducing a new neural correlate for learning reward predictions; the correlate is called "cue-evoked reward". It refers to a recall of reward evoked by the cue that is learned through simple cue-reward associations. We introduce a temporal difference learning model, in which neural correlates of the cue itself and cue-evoked reward underlie learning of reward predictions. The animal's reward prediction supported by these two correlates is divided into sign and goal components respectively. We relate the sign and goal components to approach responses towards the cue (i.e. sign-tracking) and the food-tray (i.e. goal-tracking) respectively. We found a number of correspondences between simulated models and the experimental findings (i.e. behavior and neural responses). First, the development of modeled responses is consistent with those observed in the experimental task. Second, the model's RPEs were similar to dopamine activity in respective response groups. Finally, goal-tracking, but not sign-tracking, responses rapidly emerged when RPE was restored in the simulated models, similar to experiments with recovery from dopamine-antagonist. These results suggest two complementary neural correlates, corresponding to the cue and its evoked reward, form the basis for learning reward predictions in the sign- and goal-tracking rats.
Collapse
Affiliation(s)
- Sivaramakrishnan Kaveri
- Lab for Integrated Theoretical Neuroscience, RIKEN BSI, Wako, Japan
- Dept. of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan
- * E-mail:
| | | |
Collapse
|
22
|
Abstract
Advances in experimental techniques, including behavioral paradigms using rich stimuli under closed loop conditions and the interfacing of neural systems with external inputs and outputs, reveal complex dynamics in the neural code and require a revisiting of standard concepts of representation. High-throughput recording and imaging methods along with the ability to observe and control neuronal subpopulations allow increasingly detailed access to the neural circuitry that subserves neural representations and the computations they support. How do we harness theory to build biologically grounded models of complex neural function?
Collapse
Affiliation(s)
- Adrienne Fairhall
- Department of Physiology and Biophysics, University of Washington, 1705 NE Pacific St., HSB G424, Box 357290, Seattle, WA 98195-7290, USA.
| |
Collapse
|
23
|
|
24
|
Abstract
Computational neuroscience has focused largely on the dynamics and function of local circuits of neuronal populations dedicated to a common task, such as processing a common sensory input, storing its features in working memory, choosing between a set of options dictated by controlled experimental settings or generating the appropriate actions. Most of current circuit models suggest mechanisms for computations that can be captured by networks of simplified neurons connected via simple synaptic weights. In this article I review the progress of this approach and its limitations. It is argued that new experimental techniques will yield data that might challenge the present paradigms in that they will (1) demonstrate the computational importance of microscopic structural and physiological complexity and specificity; (2) highlight the importance of models of large brain structures engaged in a variety of tasks; and (3) reveal the necessity of coupling the neuronal networks to chemical and environmental variables.
Collapse
Affiliation(s)
- Haim Sompolinsky
- Edmond and Lily Safra Center for Brain Sciences, The Hebrew University, Jerusalem 91904, Israel; Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|