1
|
Mah A, Golden C, Constantinople C. Mesolimbic dopamine encodes reward prediction errors independent of learning rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590090. [PMID: 38659861 PMCID: PMC11042285 DOI: 10.1101/2024.04.18.590090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented in synaptic weights in the striatum, and updated by dopamine-dependent plasticity, suggesting that dopamine release might reflect the product of the learning rate and RPE. Here, we leveraged the fact that animals learn faster in volatile environments to characterize dopamine encoding of learning rates. We trained rats on a task with semi-observable states offering different rewards, and rats adjusted how quickly they initiated trials across states using RPEs. Computational modeling and behavioral analyses showed that learning rates were higher following state transitions, and scaled with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the nucleus accumbens encoded RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.
Collapse
Affiliation(s)
- Andrew Mah
- Center for Neural Science, New York University
| | | | | |
Collapse
|
2
|
Barry MLLR, Gerstner W. Fast adaptation to rule switching using neuronal surprise. PLoS Comput Biol 2024; 20:e1011839. [PMID: 38377112 PMCID: PMC10906910 DOI: 10.1371/journal.pcbi.1011839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/01/2024] [Accepted: 01/18/2024] [Indexed: 02/22/2024] Open
Abstract
In humans and animals, surprise is a physiological reaction to an unexpected event, but how surprise can be linked to plausible models of neuronal activity is an open problem. We propose a self-supervised spiking neural network model where a surprise signal is extracted from an increase in neural activity after an imbalance of excitation and inhibition. The surprise signal modulates synaptic plasticity via a three-factor learning rule which increases plasticity at moments of surprise. The surprise signal remains small when transitions between sensory events follow a previously learned rule but increases immediately after rule switching. In a spiking network with several modules, previously learned rules are protected against overwriting, as long as the number of modules is larger than the total number of rules-making a step towards solving the stability-plasticity dilemma in neuroscience. Our model relates the subjective notion of surprise to specific predictions on the circuit level.
Collapse
Affiliation(s)
- Martin L. L. R. Barry
- School of Computer and Communication Sciences and School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Wulfram Gerstner
- School of Computer and Communication Sciences and School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
3
|
Modirshanechi A, Becker S, Brea J, Gerstner W. Surprise and novelty in the brain. Curr Opin Neurobiol 2023; 82:102758. [PMID: 37619425 DOI: 10.1016/j.conb.2023.102758] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/30/2023] [Accepted: 07/20/2023] [Indexed: 08/26/2023]
Abstract
Notions of surprise and novelty have been used in various experimental and theoretical studies across multiple brain areas and species. However, 'surprise' and 'novelty' refer to different quantities in different studies, which raises concerns about whether these studies indeed relate to the same functionalities and mechanisms in the brain. Here, we address these concerns through a systematic investigation of how different aspects of surprise and novelty relate to different brain functions and physiological signals. We review recent classifications of definitions proposed for surprise and novelty along with links to experimental observations. We show that computational modeling and quantifiable definitions enable novel interpretations of previous findings and form a foundation for future theoretical and experimental studies.
Collapse
Affiliation(s)
- Alireza Modirshanechi
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland.
| | - Sophia Becker
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland. https://twitter.com/sophiabecker95
| | - Johanni Brea
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
| | - Wulfram Gerstner
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland.
| |
Collapse
|
4
|
Visalli A, Capizzi M, Ambrosini E, Kopp B, Vallesi A. P3-like signatures of temporal predictions: a computational EEG study. Exp Brain Res 2023:10.1007/s00221-023-06656-z. [PMID: 37354350 DOI: 10.1007/s00221-023-06656-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 06/18/2023] [Indexed: 06/26/2023]
Abstract
Many cognitive processes, ranging from perception to action, depend on the ability to predict the timing of forthcoming events. Yet, how the brain uses predictive models in the temporal domain is still an unsolved question. In previous work, we began to explore the neural correlates of temporal predictions by using a computational approach in which an ideal Bayesian observer learned the temporal probabilities of target onsets in a simple reaction time task. Because the task was specifically designed to disambiguate updating of predictive models and surprise, changes in temporal probabilities were explicitly cued. However, in the real world, we are usually incidentally exposed to changes in the statistics of the environment. Here, we thus aimed to further investigate the electroencephalographic (EEG) correlates of Bayesian belief updating and surprise associated with incidental learning of temporal probabilities. In line with our previous EEG study, results showed distinct P3-like modulations for updating and surprise. While surprise was indexed by an early fronto-central P3-like modulation, updating was associated with a later and more posterior P3 modulation. Moreover, updating was associated with a P2-like potential at centro-parietal electrodes, likely capturing integration processes between prior beliefs and likelihood of the observed event. These findings support previous evidence of trial-by-trial variability of P3 amplitudes as an index of dissociable inferential processes. Coupled with our previous findings, the present study strongly bolsters the view of the P3 as a key brain signature of temporal Bayesian inference. Data and scripts are shared on OSF: osf.io/sdy8j/.
Collapse
Affiliation(s)
- Antonino Visalli
- Department of Neuroscience, University of Padova, 35121, Padua, Italy.
- Padova Neuroscience Center, University of Padova, Padua, Italy.
- IRCCS San Camillo Hospital, 30126, Venice, Italy.
| | - M Capizzi
- Brain and Behavior Research Center (CIMCYC), Department of Experimental Psychology, University of Granada, Granada, Spain
| | - E Ambrosini
- Department of Neuroscience, University of Padova, 35121, Padua, Italy
- Padova Neuroscience Center, University of Padova, Padua, Italy
- Department of General Psychology, University of Padova, Padua, Italy
| | - B Kopp
- Department of Neurology, Hannover Medical School, 30625, Hannover, Germany
| | - Antonino Vallesi
- Department of Neuroscience, University of Padova, 35121, Padua, Italy.
- Padova Neuroscience Center, University of Padova, Padua, Italy.
| |
Collapse
|
5
|
Soltani A, Koechlin E. Computational models of adaptive behavior and prefrontal cortex. Neuropsychopharmacology 2022; 47:58-71. [PMID: 34389808 PMCID: PMC8617006 DOI: 10.1038/s41386-021-01123-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 07/19/2021] [Accepted: 07/20/2021] [Indexed: 02/07/2023]
Abstract
The real world is uncertain, and while ever changing, it constantly presents itself in terms of new sets of behavioral options. To attain the flexibility required to tackle these challenges successfully, most mammalian brains are equipped with certain computational abilities that rely on the prefrontal cortex (PFC). By examining learning in terms of internal models associating stimuli, actions, and outcomes, we argue here that adaptive behavior relies on specific interactions between multiple systems including: (1) selective models learning stimulus-action associations through rewards; (2) predictive models learning stimulus- and/or action-outcome associations through statistical inferences anticipating behavioral outcomes; and (3) contextual models learning external cues associated with latent states of the environment. Critically, the PFC combines these internal models by forming task sets to drive behavior and, moreover, constantly evaluates the reliability of actor task sets in predicting external contingencies to switch between task sets or create new ones. We review different models of adaptive behavior to demonstrate how their components map onto this unifying framework and specific PFC regions. Finally, we discuss how our framework may help to better understand the neural computations and the cognitive architecture of PFC regions guiding adaptive behavior.
Collapse
Affiliation(s)
- Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA.
| | - Etienne Koechlin
- Institut National de la Sante et de la Recherche Medicale, Universite Pierre et Marie Curie, Ecole Normale Superieure, Paris, France.
| |
Collapse
|
6
|
Relative salience signaling within a thalamo-orbitofrontal circuit governs learning rate. Curr Biol 2021; 31:5176-5191.e5. [PMID: 34637750 DOI: 10.1016/j.cub.2021.09.037] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/19/2021] [Accepted: 09/15/2021] [Indexed: 11/20/2022]
Abstract
Learning to predict rewards is essential for the sustained fitness of animals. Contemporary views suggest that such learning is driven by a reward prediction error (RPE)-the difference between received and predicted rewards. The magnitude of learning induced by an RPE is proportional to the product of the RPE and a learning rate. Here we demonstrate using two-photon calcium imaging and optogenetics in mice that certain functionally distinct subpopulations of ventral/medial orbitofrontal cortex (vmOFC) neurons signal learning rate control. Consistent with learning rate control, trial-by-trial fluctuations in vmOFC activity positively correlate with behavioral updating when the RPE is positive, and negatively correlates with behavioral updating when the RPE is negative. Learning rate is affected by many variables including the salience of a reward. We found that the average reward response of these neurons signals the relative salience of a reward, because it decreases after reward prediction learning or the introduction of another highly salient aversive stimulus. The relative salience signaling in vmOFC is sculpted by medial thalamic inputs. These results support emerging theoretical views that prefrontal cortex encodes and controls learning parameters.
Collapse
|
7
|
Foucault C, Meyniel F. Gated recurrence enables simple and accurate sequence prediction in stochastic, changing, and structured environments. eLife 2021; 10:71801. [PMID: 34854377 PMCID: PMC8735865 DOI: 10.7554/elife.71801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 12/01/2021] [Indexed: 11/13/2022] Open
Abstract
From decision making to perception to language, predicting what is coming next is crucial. It is also challenging in stochastic, changing, and structured environments; yet the brain makes accurate predictions in many situations. What computational architecture could enable this feat? Bayesian inference makes optimal predictions but is prohibitively difficult to compute. Here, we show that a specific recurrent neural network architecture enables simple and accurate solutions in several environments. This architecture relies on three mechanisms: gating, lateral connections, and recurrent weight training. Like the optimal solution and the human brain, such networks develop internal representations of their changing environment (including estimates of the environment’s latent variables and the precision of these estimates), leverage multiple levels of latent structure, and adapt their effective learning rate to changes without changing their connection weights. Being ubiquitous in the brain, gated recurrence could therefore serve as a generic building block to predict in real-life environments.
Collapse
Affiliation(s)
- Cédric Foucault
- INSERM, CEA, Université Paris-Saclay, Gif sur Yvette, France
| | | |
Collapse
|
8
|
Abstract
We live in a world that changes on many timescales. To learn and make decisions appropriately, the human brain has evolved to integrate various types of information, such as sensory evidence and reward feedback, on multiple timescales. This is reflected in cortical hierarchies of timescales consisting of heterogeneous neuronal activities and expression of genes related to neurotransmitters critical for learning. We review the recent findings on how timescales of sensory and reward integration are affected by the temporal properties of sensory and reward signals in the environment. Despite existing evidence linking behavioral and neuronal timescales, future studies must examine how neural computations at multiple timescales are adjusted and combined to influence behavior flexibly.
Collapse
Affiliation(s)
- Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Moore Hall, 3 Maynard St, Hanover, NH 03755
| | - John D. Murray
- Department of Psychiatry, Yale School of Medicine, 300 George Street, New Haven, CT 06511
| | - Hyojung Seo
- Department of Psychiatry, Yale School of Medicine, 300 George Street, New Haven, CT 06511
| | - Daeyeol Lee
- The Zanvyl Krieger Mind/Brain Institute, Department of Neuroscience, Department of Psychological Sciences, Kavli Neuroscience Discovery Institute, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218
| |
Collapse
|
9
|
Inglis JB, Valentin VV, Ashby FG. Modulation of Dopamine for Adaptive Learning: A Neurocomputational Model. COMPUTATIONAL BRAIN & BEHAVIOR 2021; 4:34-52. [PMID: 34151186 PMCID: PMC8210637 DOI: 10.1007/s42113-020-00083-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
There have been many proposals that learning rates in the brain are adaptive, in the sense that they increase or decrease depending on environmental conditions. The majority of these models are abstract and make no attempt to describe the neural circuitry that implements the proposed computations. This article describes a biologically detailed computational model that overcomes this shortcoming. Specifically, we propose a neural circuit that implements adaptive learning rates by modulating the gain on the dopamine response to reward prediction errors, and we model activity within this circuit at the level of spiking neurons. The model generates a dopamine signal that depends on the size of the tonically active dopamine neuron population and the phasic spike rate. The model was tested successfully against results from two single-neuron recording studies and a fast-scan cyclic voltammetry study. We conclude by discussing the general applicability of the model to dopamine mediated tasks that transcend the experimental phenomena it was initially designed to address.
Collapse
Affiliation(s)
- Jeffrey B Inglis
- Interdepartmental Graduate Program in Dynamical Neuroscience, University of California, Santa Barbara
| | - Vivian V Valentin
- Department of Psychological & Brain Sciences, University of California, Santa Barbara
| | - F Gregory Ashby
- Department of Psychological & Brain Sciences, University of California, Santa Barbara
| |
Collapse
|
10
|
Olasagasti I, Giraud AL. Integrating prediction errors at two time scales permits rapid recalibration of speech sound categories. eLife 2020; 9:44516. [PMID: 32223894 PMCID: PMC7217692 DOI: 10.7554/elife.44516] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 03/17/2020] [Indexed: 01/01/2023] Open
Abstract
Speech perception presumably arises from internal models of how specific sensory features are associated with speech sounds. These features change constantly (e.g. different speakers, articulation modes etc.), and listeners need to recalibrate their internal models by appropriately weighing new versus old evidence. Models of speech recalibration classically ignore this volatility. The effect of volatility in tasks where sensory cues were associated with arbitrary experimenter-defined categories were well described by models that continuously adapt the learning rate while keeping a single representation of the category. Using neurocomputational modelling we show that recalibration of natural speech sound categories is better described by representing the latter at different time scales. We illustrate our proposal by modeling fast recalibration of speech sounds after experiencing the McGurk effect. We propose that working representations of speech categories are driven both by their current environment and their long-term memory representations. People can distinguish words or syllables even though they may sound different with every speaker. This striking ability reflects the fact that our brain is continually modifying the way we recognise and interpret the spoken word based on what we have heard before, by comparing past experience with the most recent one to update expectations. This phenomenon also occurs in the McGurk effect: an auditory illusion in which someone hears one syllable but sees a person saying another syllable and ends up perceiving a third distinct sound. Abstract models, which provide a functional rather than a mechanistic description of what the brain does, can test how humans use expectations and prior knowledge to interpret the information delivered by the senses at any given moment. Olasagasti and Giraud have now built an abstract model of how brains recalibrate perception of natural speech sounds. By fitting the model with existing experimental data using the McGurk effect, the results suggest that, rather than using a single sound representation that is adjusted with each sensory experience, the brain recalibrates sounds at two different timescales. Over and above slow “procedural” learning, the findings show that there is also rapid recalibration of how different sounds are interpreted. This working representation of speech enables adaptation to changing or noisy environments and illustrates that the process is far more dynamic and flexible than previously thought.
Collapse
Affiliation(s)
- Itsaso Olasagasti
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| |
Collapse
|
11
|
Soltani A, Izquierdo A. Adaptive learning under expected and unexpected uncertainty. Nat Rev Neurosci 2020; 20:635-644. [PMID: 31147631 DOI: 10.1038/s41583-019-0180-y] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The outcome of a decision is often uncertain, and outcomes can vary over repeated decisions. Whether decision outcomes should substantially affect behaviour and learning depends on whether they are representative of a typically experienced range of outcomes or signal a change in the reward environment. Successful learning and decision-making therefore require the ability to estimate expected uncertainty (related to the variability of outcomes) and unexpected uncertainty (related to the variability of the environment). Understanding the bases and effects of these two types of uncertainty and the interactions between them - at the computational and the neural level - is crucial for understanding adaptive learning. Here, we examine computational models and experimental findings to distil computational principles and neural mechanisms for adaptive learning under uncertainty.
Collapse
Affiliation(s)
- Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA.
| | - Alicia Izquierdo
- Department of Psychology, The Brain Research Institute, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
12
|
Decodability of Reward Learning Signals Predicts Mood Fluctuations. Curr Biol 2019; 28:1433-1439.e7. [PMID: 29706512 PMCID: PMC5954908 DOI: 10.1016/j.cub.2018.03.038] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 03/05/2018] [Accepted: 03/16/2018] [Indexed: 11/23/2022]
Abstract
Our mood often fluctuates without warning. Recent accounts propose that these fluctuations might be preceded by changes in how we process reward. According to this view, the degree to which reward improves our mood reflects not only characteristics of the reward itself (e.g., its magnitude) but also how receptive to reward we happen to be. Differences in receptivity to reward have been suggested to play an important role in the emergence of mood episodes in psychiatric disorders [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. However, despite substantial theory, the relationship between reward processing and daily fluctuations of mood has yet to be tested directly. In particular, it is unclear whether the extent to which people respond to reward changes from day to day and whether such changes are followed by corresponding shifts in mood. Here, we use a novel mobile-phone platform with dense data sampling and wearable heart-rate and electroencephalographic sensors to examine mood and reward processing over an extended period of one week. Subjects regularly performed a trial-and-error choice task in which different choices were probabilistically rewarded. Subjects’ choices revealed two complementary learning processes, one fast and one slow. Reward prediction errors [17, 18] indicative of these two processes were decodable from subjects’ physiological responses. Strikingly, more accurate decodability of prediction-error signals reflective of the fast process predicted improvement in subjects’ mood several hours later, whereas more accurate decodability of the slow process’ signals predicted better mood a whole day later. We conclude that real-life mood fluctuations follow changes in responsivity to reward at multiple timescales. Choices in a week-long reward learning task reveal slow- and fast-learning processes Both processes’ prediction errors can be decoded from EEG and heart-rate responses Greater fast-process decodability predicts positive mood change a few hours later Greater slow-process decodability predicts positive mood change one day later
Collapse
|
13
|
|
14
|
Deviation from the matching law reflects an optimal strategy involving learning over multiple timescales. Nat Commun 2019; 10:1466. [PMID: 30931937 PMCID: PMC6443814 DOI: 10.1038/s41467-019-09388-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 03/08/2019] [Indexed: 11/08/2022] Open
Abstract
Behavior deviating from our normative expectations often appears irrational. For example, even though behavior following the so-called matching law can maximize reward in a stationary foraging task, actual behavior commonly deviates from matching. Such behavioral deviations are interpreted as a failure of the subject; however, here we instead suggest that they reflect an adaptive strategy, suitable for uncertain, non-stationary environments. To prove it, we analyzed the behavior of primates that perform a dynamic foraging task. In such nonstationary environment, learning on both fast and slow timescales is beneficial: fast learning allows the animal to react to sudden changes, at the price of large fluctuations (variance) in the estimates of task relevant variables. Slow learning reduces the fluctuations but costs a bias that causes systematic behavioral deviations. Our behavioral analysis shows that the animals solved this bias-variance tradeoff by combining learning on both fast and slow timescales, suggesting that learning on multiple timescales can be a biologically plausible mechanism for optimizing decisions under uncertainty. Recent experience can only provide limited information to guide decisions in a volatile environment. Here, the authors report that the choices made by a monkey in a dynamic foraging task can be better explained by a model that combines learning on both fast and slow timescales.
Collapse
|
15
|
Heilbron M, Meyniel F. Confidence resets reveal hierarchical adaptive learning in humans. PLoS Comput Biol 2019; 15:e1006972. [PMID: 30964861 PMCID: PMC6474633 DOI: 10.1371/journal.pcbi.1006972] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 04/19/2019] [Accepted: 03/21/2019] [Indexed: 12/17/2022] Open
Abstract
Hierarchical processing is pervasive in the brain, but its computational significance for learning under uncertainty is disputed. On the one hand, hierarchical models provide an optimal framework and are becoming increasingly popular to study cognition. On the other hand, non-hierarchical (flat) models remain influential and can learn efficiently, even in uncertain and changing environments. Here, we show that previously proposed hallmarks of hierarchical learning, which relied on reports of learned quantities or choices in simple experiments, are insufficient to categorically distinguish hierarchical from flat models. Instead, we present a novel test which leverages a more complex task, whose hierarchical structure allows generalization between different statistics tracked in parallel. We use reports of confidence to quantitatively and qualitatively arbitrate between the two accounts of learning. Our results support the hierarchical learning framework, and demonstrate how confidence can be a useful metric in learning theory.
Collapse
Affiliation(s)
- Micha Heilbron
- Cognitive Neuroimaging Unit / NeuroSpin center / Institute for Life Sciences Frédéric Joliot / Fundamental Research Division / Commissariat à l'Energie Atomique et aux énergies alternatives; INSERM, Université Paris-Sud; Université Paris-Saclay; Gif-sur-Yvette, France
| | - Florent Meyniel
- Cognitive Neuroimaging Unit / NeuroSpin center / Institute for Life Sciences Frédéric Joliot / Fundamental Research Division / Commissariat à l'Energie Atomique et aux énergies alternatives; INSERM, Université Paris-Sud; Université Paris-Saclay; Gif-sur-Yvette, France
| |
Collapse
|
16
|
Muller TH, Mars RB, Behrens TE, O'Reilly JX. Control of entropy in neural models of environmental state. eLife 2019; 8:e39404. [PMID: 30816090 PMCID: PMC6395063 DOI: 10.7554/elife.39404] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 02/06/2019] [Indexed: 01/01/2023] Open
Abstract
Humans and animals construct internal models of their environment in order to select appropriate courses of action. The representation of uncertainty about the current state of the environment is a key feature of these models that controls the rate of learning as well as directly affecting choice behaviour. To maintain flexibility, given that uncertainty naturally decreases over time, most theoretical inference models include a dedicated mechanism to drive up model uncertainty. Here we probe the long-standing hypothesis that noradrenaline is involved in determining the uncertainty, or entropy, and thus flexibility, of neural models. Pupil diameter, which indexes neuromodulatory state including noradrenaline release, predicted increases (but not decreases) in entropy in a neural state model encoded in human medial orbitofrontal cortex, as measured using multivariate functional MRI. Activity in anterior cingulate cortex predicted pupil diameter. These results provide evidence for top-down, neuromodulatory control of entropy in neural state models.
Collapse
Affiliation(s)
- Timothy H Muller
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional Magnetic Resonance Imaging of the BrainUniversity of Oxford, John Radcliffe HospitalOxfordUnited Kingdom
| | - Rogier B Mars
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional Magnetic Resonance Imaging of the BrainUniversity of Oxford, John Radcliffe HospitalOxfordUnited Kingdom
- Donders Institute for Brain, Cognition and BehaviourRadboud UniversityNijmegenThe Netherlands
| | - Timothy E Behrens
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional Magnetic Resonance Imaging of the BrainUniversity of Oxford, John Radcliffe HospitalOxfordUnited Kingdom
- Wellcome Centre for Human Neuroimaging, Institute of NeurologyUniversity College LondonLondonUnited Kingdom
| | - Jill X O'Reilly
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional Magnetic Resonance Imaging of the BrainUniversity of Oxford, John Radcliffe HospitalOxfordUnited Kingdom
- Donders Institute for Brain, Cognition and BehaviourRadboud UniversityNijmegenThe Netherlands
- Department of Experimental PsychologyUniversity of OxfordOxfordUnited Kingdom
| |
Collapse
|
17
|
Iigaya K, Fonseca MS, Murakami M, Mainen ZF, Dayan P. An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nat Commun 2018; 9:2477. [PMID: 29946069 PMCID: PMC6018802 DOI: 10.1038/s41467-018-04840-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 05/22/2018] [Indexed: 12/02/2022] Open
Abstract
Serotonin has widespread, but computationally obscure, modulatory effects on learning and cognition. Here, we studied the impact of optogenetic stimulation of dorsal raphe serotonin neurons in mice performing a non-stationary, reward-driven decision-making task. Animals showed two distinct choice strategies. Choices after short inter-trial-intervals (ITIs) depended only on the last trial outcome and followed a win-stay-lose-switch pattern. In contrast, choices after long ITIs reflected outcome history over multiple trials, as described by reinforcement learning models. We found that optogenetic stimulation during a trial significantly boosted the rate of learning that occurred due to the outcome of that trial, but these effects were only exhibited on choices after long ITIs. This suggests that serotonin neurons modulate reinforcement learning rates, and that this influence is masked by alternate, unaffected, decision mechanisms. These results provide insight into the role of serotonin in treating psychiatric disorders, particularly its modulation of neural plasticity and learning. Serotonin (5-HT) plays many important roles in reward, punishment, patience and beyond, and optogenetic stimulation of 5-HT neurons has not crisply parsed them. The authors report a novel analysis of a reward-based decision-making experiment, and show that 5-HT stimulation increases the learning rate, but only on a select subset of choices.
Collapse
Affiliation(s)
- Kiyohito Iigaya
- Gatsby Computational Neuroscience Unit, University College London, 25 Howland Street, London, W1T 4JG, UK. .,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Russell Square House, 10-12 Russell Square, London, WC1B 5EH, UK. .,Division of Humanities and Social Sciences, California Institute of Technology, 1200 E California Blvd, Pasadena, CA, 91125, USA.
| | - Madalena S Fonseca
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038, Lisbon, Portugal
| | - Masayoshi Murakami
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038, Lisbon, Portugal
| | - Zachary F Mainen
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038, Lisbon, Portugal
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, University College London, 25 Howland Street, London, W1T 4JG, UK.,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Russell Square House, 10-12 Russell Square, London, WC1B 5EH, UK
| |
Collapse
|
18
|
Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci 2018; 21:860-868. [DOI: 10.1038/s41593-018-0147-8] [Citation(s) in RCA: 258] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 04/05/2018] [Indexed: 11/09/2022]
|
19
|
Faraji M, Preuschoff K, Gerstner W. Balancing New against Old Information: The Role of Puzzlement Surprise in Learning. Neural Comput 2017; 30:34-83. [PMID: 29064784 DOI: 10.1162/neco_a_01025] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Surprise describes a range of phenomena from unexpected events to behavioral responses. We propose a novel measure of surprise and use it for surprise-driven learning. Our surprise measure takes into account data likelihood as well as the degree of commitment to a belief via the entropy of the belief distribution. We find that surprise-minimizing learning dynamically adjusts the balance between new and old information without the need of knowledge about the temporal statistics of the environment. We apply our framework to a dynamic decision-making task and a maze exploration task. Our surprise-minimizing framework is suitable for learning in complex environments, even if the environment undergoes gradual or sudden changes, and it could eventually provide a framework to study the behavior of humans and animals as they encounter surprising events.
Collapse
Affiliation(s)
- Mohammadjavad Faraji
- School of Computer and Communication Sciences and School of Life Sciences, Brain Mind Institute, École Polytechnique Fédéral de Lausanne, 1015 Lausanne EPFL, Switzerland
| | - Kerstin Preuschoff
- Geneva Finance Research Institute and Center for Affective Sciences, University of Geneva, 1211 Geneva, Switzerland
| | - Wulfram Gerstner
- School of Computer and Communication Sciences and School of Life Sciences, Brain Mind Institute, École Polytechnique Fédéral de Lausanne, 1015 Lausanne EPFL, Switzerland
| |
Collapse
|
20
|
Kolling N, Akam T. (Reinforcement?) Learning to forage optimally. Curr Opin Neurobiol 2017; 46:162-169. [PMID: 28918312 DOI: 10.1016/j.conb.2017.08.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/06/2017] [Accepted: 08/17/2017] [Indexed: 11/24/2022]
Abstract
Foraging effectively is critical to the survival of all animals and this imperative is thought to have profoundly shaped brain evolution. Decisions made by foraging animals often approximate optimal strategies, but the learning and decision mechanisms generating these choices remain poorly understood. Recent work with laboratory foraging tasks in humans suggest their behaviour is poorly explained by model-free reinforcement learning, with simple heuristic strategies better describing behaviour in some tasks, and in others evidence of prospective prediction of the future state of the environment. We suggest that model-based average reward reinforcement learning may provide a common framework for understanding these apparently divergent foraging strategies.
Collapse
Affiliation(s)
- Nils Kolling
- Department of Experimental Psychology, University of Oxford, United Kingdom
| | - Thomas Akam
- Department of Experimental Psychology, University of Oxford, United Kingdom; Champalimaud Neuroscience Program, Champalimaud Center for the Unknown, Portugal.
| |
Collapse
|
21
|
Farashahi S, Donahue CH, Khorsand P, Seo H, Lee D, Soltani A. Metaplasticity as a Neural Substrate for Adaptive Learning and Choice under Uncertainty. Neuron 2017; 94:401-414.e6. [PMID: 28426971 DOI: 10.1016/j.neuron.2017.03.044] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Revised: 09/02/2016] [Accepted: 03/29/2017] [Indexed: 10/19/2022]
Abstract
Value-based decision making often involves integration of reward outcomes over time, but this becomes considerably more challenging if reward assignments on alternative options are probabilistic and non-stationary. Despite the existence of various models for optimally integrating reward under uncertainty, the underlying neural mechanisms are still unknown. Here we propose that reward-dependent metaplasticity (RDMP) can provide a plausible mechanism for both integration of reward under uncertainty and estimation of uncertainty itself. We show that a model based on RDMP can robustly perform the probabilistic reversal learning task via dynamic adjustment of learning based on reward feedback, while changes in its activity signal unexpected uncertainty. The model predicts time-dependent and choice-specific learning rates that strongly depend on reward history. Key predictions from this model were confirmed with behavioral data from non-human primates. Overall, our results suggest that metaplasticity can provide a neural substrate for adaptive learning and choice under uncertainty.
Collapse
Affiliation(s)
- Shiva Farashahi
- Department of Psychological and Brain Sciences, Dartmouth College, NH 03755, USA
| | - Christopher H Donahue
- The Gladstone Institutes, San Francisco, CA 94158, USA; Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Peyman Khorsand
- Department of Psychological and Brain Sciences, Dartmouth College, NH 03755, USA
| | - Hyojung Seo
- Department of Psychiatry, Yale School of Medicine, New Haven, CT 06511, USA
| | - Daeyeol Lee
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA; Department of Psychiatry, Yale School of Medicine, New Haven, CT 06511, USA; Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA; Department of Psychology, Yale University, New Haven, CT 06520, USA
| | - Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, NH 03755, USA.
| |
Collapse
|