1
|
Etani T, Miura A, Kawase S, Fujii S, Keller PE, Vuust P, Kudo K. A review of psychological and neuroscientific research on musical groove. Neurosci Biobehav Rev 2024; 158:105522. [PMID: 38141692 DOI: 10.1016/j.neubiorev.2023.105522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 12/18/2023] [Accepted: 12/19/2023] [Indexed: 12/25/2023]
Abstract
When listening to music, we naturally move our bodies rhythmically to the beat, which can be pleasurable and difficult to resist. This pleasurable sensation of wanting to move the body to music has been called "groove." Following pioneering humanities research, psychological and neuroscientific studies have provided insights on associated musical features, behavioral responses, phenomenological aspects, and brain structural and functional correlates of the groove experience. Groove research has advanced the field of music science and more generally informed our understanding of bidirectional links between perception and action, and the role of the motor system in prediction. Activity in motor and reward-related brain networks during music listening is associated with the groove experience, and this neural activity is linked to temporal prediction and learning. This article reviews research on groove as a psychological phenomenon with neurophysiological correlates that link musical rhythm perception, sensorimotor prediction, and reward processing. Promising future research directions range from elucidating specific neural mechanisms to exploring clinical applications and socio-cultural implications of groove.
Collapse
Affiliation(s)
- Takahide Etani
- School of Medicine, College of Medical, Pharmaceutical, and Health, Kanazawa University, Kanazawa, Japan; Graduate School of Media and Governance, Keio University, Fujisawa, Japan; Advanced Research Center for Human Sciences, Waseda University, Tokorozawa, Japan.
| | - Akito Miura
- Faculty of Human Sciences, Waseda University, Tokorozawa, Japan
| | - Satoshi Kawase
- The Faculty of Psychology, Kobe Gakuin University, Kobe, Japan
| | - Shinya Fujii
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | - Peter E Keller
- Center for Music in the Brain, Aarhus University, Aarhus, Denmark/The Royal Academy of Music Aarhus/Aalborg, Denmark; The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia
| | - Peter Vuust
- Center for Music in the Brain, Aarhus University, Aarhus, Denmark/The Royal Academy of Music Aarhus/Aalborg, Denmark
| | - Kazutoshi Kudo
- Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
2
|
Krausz TA, Comrie AE, Kahn AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. Neuron 2023; 111:3465-3478.e7. [PMID: 37611585 PMCID: PMC10841332 DOI: 10.1016/j.neuron.2023.07.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/25/2023]
Abstract
Animals frequently make decisions based on expectations of future reward ("values"). Values are updated by ongoing experience: places and choices that result in reward are assigned greater value. Yet, the specific algorithms used by the brain for such credit assignment remain unclear. We monitored accumbens dopamine as rats foraged for rewards in a complex, changing environment. We observed brief dopamine pulses both at reward receipt (scaling with prediction error) and at novel path opportunities. Dopamine also ramped up as rats ran toward reward ports, in proportion to the value at each location. By examining the evolution of these dopamine place-value signals, we found evidence for two distinct update processes: progressive propagation of value along taken paths, as in temporal difference learning, and inference of value throughout the maze, using internal models. Our results demonstrate that within rich, naturalistic environments dopamine conveys place values that are updated via multiple, complementary learning algorithms.
Collapse
Affiliation(s)
- Timothy A Krausz
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alison E Comrie
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ari E Kahn
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, Princeton, Princeton, NJ 08544, USA
| | - Loren M Frank
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA; Department of Physiology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Nathaniel D Daw
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, Princeton, Princeton, NJ 08544, USA
| | - Joshua D Berke
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA; Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Neurology and Department of Psychiatry and Behavioral Science, University of California, San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
3
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
4
|
Krausz TA, Comrie AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528738. [PMID: 36993482 PMCID: PMC10054934 DOI: 10.1101/2023.02.15.528738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Dopamine in the nucleus accumbens helps motivate behavior based on expectations of future reward ("values"). These values need to be updated by experience: after receiving reward, the choices that led to reward should be assigned greater value. There are multiple theoretical proposals for how this credit assignment could be achieved, but the specific algorithms that generate updated dopamine signals remain uncertain. We monitored accumbens dopamine as freely behaving rats foraged for rewards in a complex, changing environment. We observed brief pulses of dopamine both when rats received reward (scaling with prediction error), and when they encountered novel path opportunities. Furthermore, dopamine ramped up as rats ran towards reward ports, in proportion to the value at each location. By examining the evolution of these dopamine place-value signals, we found evidence for two distinct update processes: progressive propagation along taken paths, as in temporal-difference learning, and inference of value throughout the maze, using internal models. Our results demonstrate that within rich, naturalistic environments dopamine conveys place values that are updated via multiple, complementary learning algorithms.
Collapse
Affiliation(s)
- Timothy A Krausz
- Neuroscience Graduate Program, University of California, San Francisco
| | - Alison E Comrie
- Neuroscience Graduate Program, University of California, San Francisco
| | - Loren M Frank
- Neuroscience Graduate Program, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, UCSF
- Howard Hughes Medical Institute
- Department of Physiology, UCSF
| | - Nathaniel D Daw
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, NJ
| | - Joshua D Berke
- Neuroscience Graduate Program, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, UCSF
- Department of Neurology, and Department of Psychiatry and Behavioral Science, UCSF
| |
Collapse
|
5
|
Alhassen W, Alhassen S, Chen J, Monfared RV, Alachkar A. Cilia in the Striatum Mediate Timing-Dependent Functions. Mol Neurobiol 2023; 60:545-565. [PMID: 36322337 PMCID: PMC9849326 DOI: 10.1007/s12035-022-03095-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 10/16/2022] [Indexed: 11/07/2022]
Abstract
Almost all brain cells contain cilia, antennae-like microtubule-based organelles. Yet, the significance of cilia, once considered vestigial organelles, in the higher-order brain functions is unknown. Cilia act as a hub that senses and transduces environmental sensory stimuli to generate an appropriate cellular response. Similarly, the striatum, a brain structure enriched in cilia, functions as a hub that receives and integrates various types of environmental information to drive appropriate motor response. To understand cilia's role in the striatum functions, we used loxP/Cre technology to ablate cilia from the dorsal striatum of male mice and monitored the behavioral consequences. Our results revealed an essential role for striatal cilia in the acquisition and brief storage of information, including learning new motor skills, but not in long-term consolidation of information or maintaining habitual/learned motor skills. A fundamental aspect of all disrupted functions was the "time perception/judgment deficit." Furthermore, the observed behavioral deficits form a cluster pertaining to clinical manifestations overlapping across psychiatric disorders that involve the striatum functions and are known to exhibit timing deficits. Thus, striatal cilia may act as a calibrator of the timing functions of the basal ganglia-cortical circuit by maintaining proper timing perception. Our findings suggest that dysfunctional cilia may contribute to the pathophysiology of neuro-psychiatric disorders, as related to deficits in timing perception.
Collapse
Affiliation(s)
- Wedad Alhassen
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University of California-Irvine, 356A Med Surge II, Irvine, CA 92697-4625 USA
| | - Sammy Alhassen
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University of California-Irvine, 356A Med Surge II, Irvine, CA 92697-4625 USA
| | - Jiaqi Chen
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University of California-Irvine, 356A Med Surge II, Irvine, CA 92697-4625 USA
| | - Roudabeh Vakil Monfared
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University of California-Irvine, 356A Med Surge II, Irvine, CA 92697-4625 USA
| | - Amal Alachkar
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University of California-Irvine, 356A Med Surge II, Irvine, CA 92697-4625 USA ,UC Irvine Center for the Neurobiology of Learning and Memory, University of California-Irvine, Irvine, CA 92697 USA ,Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California-Irvine, Irvine, CA 92697 USA
| |
Collapse
|
6
|
Gallistel CR, Johansson F, Jirenhed DA, Rasmussen A, Ricci M, Hesslow G. Quantitative properties of the creation and activation of a cell-intrinsic duration-encoding engram. Front Comput Neurosci 2022; 16:1019812. [PMID: 36405788 PMCID: PMC9669310 DOI: 10.3389/fncom.2022.1019812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/21/2022] [Indexed: 11/06/2022] Open
Abstract
The engram encoding the interval between the conditional stimulus (CS) and the unconditional stimulus (US) in eyeblink conditioning resides within a small population of cerebellar Purkinje cells. CSs activate this engram to produce a pause in the spontaneous firing rate of the cell, which times the CS-conditional blink. We developed a Bayesian algorithm that finds pause onsets and offsets in the records from individual CS-alone trials. We find that the pause consists of a single unusually long interspike interval. Its onset and offset latencies and their trial-to-trial variability are proportional to the CS-US interval. The coefficient of variation (CoV = σ/μ) are comparable to the CoVs for the conditional eye blink. The average trial-to-trial correlation between the onset latencies and the offset latencies is close to 0, implying that the onsets and offsets are mediated by two stochastically independent readings of the engram. The onset of the pause is step-like; there is no decline in firing rate between the onset of the CS and the onset of the pause. A single presynaptic spike volley suffices to trigger the reading of the engram; and the pause parameters are unaffected by subsequent volleys. The Fano factors for trial-to-trial variations in the distribution of interspike intervals within the intertrial intervals indicate pronounced non-stationarity in the endogenous spontaneous spiking rate, on which the CS-triggered firing pause supervenes. These properties of the spontaneous firing and of the engram read out may prove useful in finding the cell-intrinsic, molecular-level structure that encodes the CS-US interval.
Collapse
Affiliation(s)
| | - Fredrik Johansson
- Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden
| | - Dan-Anders Jirenhed
- Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden
| | - Anders Rasmussen
- Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden
| | - Matthew Ricci
- Carney Institute for Brain Sciences, Brown University, Providence, RI, United States
| | - Germund Hesslow
- Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden
- *Correspondence: Germund Hesslow,
| |
Collapse
|
7
|
Jakob AMV, Mikhael JG, Hamilos AE, Assad JA, Gershman SJ. Dopamine mediates the bidirectional update of interval timing. Behav Neurosci 2022; 136:445-452. [PMID: 36222637 PMCID: PMC9725808 DOI: 10.1037/bne0000529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The role of dopamine (DA) as a reward prediction error (RPE) signal in reinforcement learning (RL) tasks has been well-established over the past decades. Recent work has shown that the RPE interpretation can also account for the effects of DA on interval timing by controlling the speed of subjective time. According to this theory, the timing of the dopamine signal relative to reward delivery dictates whether subjective time speeds up or slows down: Early DA signals speed up subjective time and late signals slow it down. To test this bidirectional prediction, we reanalyzed measurements of dopaminergic neurons in the substantia nigra pars compacta of mice performing a self-timed movement task. Using the slope of ramping dopamine activity as a readout of subjective time speed, we found that trial-by-trial changes in the slope could be predicted from the timing of dopamine activity on the previous trial. This result provides a key piece of evidence supporting a unified computational theory of RL and interval timing. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Anthony M V Jakob
- Section of Life Sciences Engineering, École Polytechnique Fédérale de Lausanne
| | | | | | - John A Assad
- Department of Neurobiology, Harvard Medical School
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University
| |
Collapse
|
8
|
Namboodiri VMK. How do real animals account for the passage of time during associative learning? Behav Neurosci 2022; 136:383-391. [PMID: 35482634 PMCID: PMC9561011 DOI: 10.1037/bne0000516] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Animals routinely learn to associate environmental stimuli and self-generated actions with their outcomes such as rewards. One of the most popular theoretical models of such learning is the reinforcement learning (RL) framework. The simplest form of RL, model-free RL, is widely applied to explain animal behavior in numerous neuroscientific studies. More complex RL versions assume that animals build and store an explicit model of the world in memory. To apply these approaches to explain animal behavior, typical neuroscientific RL models make implicit assumptions about how real animals represent the passage of time. In this perspective, I explicitly list these assumptions and show that they have several problematic implications. I hope that the explicit discussion of these problems encourages the field to seriously examine the assumptions underlying timing and reinforcement learning. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
9
|
Tsao A, Yousefzadeh SA, Meck WH, Moser MB, Moser EI. The neural bases for timing of durations. Nat Rev Neurosci 2022; 23:646-665. [PMID: 36097049 DOI: 10.1038/s41583-022-00623-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2022] [Indexed: 11/10/2022]
Abstract
Durations are defined by a beginning and an end, and a major distinction is drawn between durations that start in the present and end in the future ('prospective timing') and durations that start in the past and end either in the past or the present ('retrospective timing'). Different psychological processes are thought to be engaged in each of these cases. The former is thought to engage a clock-like mechanism that accurately tracks the continuing passage of time, whereas the latter is thought to engage a reconstructive process that utilizes both temporal and non-temporal information from the memory of past events. We propose that, from a biological perspective, these two forms of duration 'estimation' are supported by computational processes that are both reliant on population state dynamics but are nevertheless distinct. Prospective timing is effectively carried out in a single step where the ongoing dynamics of population activity directly serve as the computation of duration, whereas retrospective timing is carried out in two steps: the initial generation of population state dynamics through the process of event segmentation and the subsequent computation of duration utilizing the memory of those dynamics.
Collapse
Affiliation(s)
- Albert Tsao
- Department of Biology, Stanford University, Stanford, CA, USA.
| | | | - Warren H Meck
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - May-Britt Moser
- Centre for Neural Computation, Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology, Trondheim, Norway
| | - Edvard I Moser
- Centre for Neural Computation, Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
10
|
Lourenco I, Mattila R, Ventura R, Wahlberg B. A Biologically Inspired Computational Model of Time Perception. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3120301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Ines Lourenco
- Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Robert Mattila
- Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Rodrigo Ventura
- Institute for Systems and Robotics, Instituto Superior Técnico, Lisbon, Portugal
| | - Bo Wahlberg
- Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
11
|
Parker NF, Baidya A, Cox J, Haetzel LM, Zhukovskaya A, Murugan M, Engelhard B, Goldman MS, Witten IB. Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning. Cell Rep 2022; 39:110756. [PMID: 35584665 PMCID: PMC9218875 DOI: 10.1016/j.celrep.2022.110756] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 02/18/2022] [Accepted: 04/07/2022] [Indexed: 11/25/2022] Open
Abstract
How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens, which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex and midline regions of the thalamus. However, little is known about whether and how representations differ across these input pathways. By comparing these inputs during a reinforcement learning task in mice, we discovered that prelimbic cortical inputs preferentially represent actions and choices, whereas midline thalamic inputs preferentially represent cues. Choice-selective activity in the prelimbic cortical inputs is organized in sequences that persist beyond the outcome. Through computational modeling, we demonstrate that these sequences can support the neural implementation of reinforcement-learning algorithms, in both a circuit model based on synaptic plasticity and one based on neural dynamics. Finally, we test and confirm a prediction of our circuit models by direct manipulation of nucleus accumbens input neurons.
Collapse
Affiliation(s)
- Nathan F Parker
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Avinash Baidya
- Center for Neuroscience, University of California, Davis, Davis, CA 95616, USA; Department of Physics and Astronomy, University of California, Davis, Davis, CA 95616, USA
| | - Julia Cox
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA; Department of Neuroscience, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Laura M Haetzel
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Anna Zhukovskaya
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Malavika Murugan
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Ben Engelhard
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Mark S Goldman
- Center for Neuroscience, University of California, Davis, Davis, CA 95616, USA; Department of Neurobiology, Physiology and Behavior, University of California, Davis, Davis, CA 95616, USA; Department of Ophthalmology and Vision Science, University of California, Davis, Davis, CA 95616, USA.
| | - Ilana B Witten
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
12
|
Calderon CB, Verguts T, Frank MJ. Thunderstruck: The ACDC model of flexible sequences and rhythms in recurrent neural circuits. PLoS Comput Biol 2022; 18:e1009854. [PMID: 35108283 PMCID: PMC8843237 DOI: 10.1371/journal.pcbi.1009854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 02/14/2022] [Accepted: 01/21/2022] [Indexed: 11/18/2022] Open
Abstract
Adaptive sequential behavior is a hallmark of human cognition. In particular, humans can learn to produce precise spatiotemporal sequences given a certain context. For instance, musicians can not only reproduce learned action sequences in a context-dependent manner, they can also quickly and flexibly reapply them in any desired tempo or rhythm without overwriting previous learning. Existing neural network models fail to account for these properties. We argue that this limitation emerges from the fact that sequence information (i.e., the position of the action) and timing (i.e., the moment of response execution) are typically stored in the same neural network weights. Here, we augment a biologically plausible recurrent neural network of cortical dynamics to include a basal ganglia-thalamic module which uses reinforcement learning to dynamically modulate action. This “associative cluster-dependent chain” (ACDC) model modularly stores sequence and timing information in distinct loci of the network. This feature increases computational power and allows ACDC to display a wide range of temporal properties (e.g., multiple sequences, temporal shifting, rescaling, and compositionality), while still accounting for several behavioral and neurophysiological empirical observations. Finally, we apply this ACDC network to show how it can learn the famous “Thunderstruck” song intro and then flexibly play it in a “bossa nova” rhythm without further training. How do humans flexibly adapt action sequences? For instance, musicians can learn a song and quickly speed up or slow down the tempo, or even play the song following a completely different rhythm (e.g., a rock song using a bossa nova rhythm). In this work, we build a biologically plausible network of cortico-basal ganglia interactions that explains how this temporal flexibility may emerge in the brain. Crucially, our model factorizes sequence order and action timing, respectively represented in cortical and basal ganglia dynamics. This factorization allows full temporal flexibility, i.e. the timing of a learned action sequence can be recomposed without interfering with the order of the sequence. As such, our model is capable of learning asynchronous action sequences, and flexibly shift, rescale, and recompose them, while accounting for biological data.
Collapse
Affiliation(s)
- Cristian Buc Calderon
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island, United States of America
- * E-mail:
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Michael J. Frank
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
13
|
Abstract
Psychological and neural distinctions between the technical concepts of "liking" and "wanting" pose important problems for motivated choice for goods. Why could we "want" something that we do not "like," or "like" something but be unwilling to exert effort to acquire it? Here, we suggest a framework for answering these questions through the medium of reinforcement learning. We consider "liking" to provide immediate, but preliminary and ultimately cancellable, information about the true, long-run worth of a good. Such initial estimates, viewed through the lens of what is known as potential-based shaping, help solve the temporally complex learning problems faced by animals.
Collapse
Affiliation(s)
- Peter Dayan
- MPI for Biological Cybernetics, Tübingen, Germany
- University of Tübingen, Tübingen, Germany
| |
Collapse
|
14
|
Polti I, Nau M, Kaplan R, van Wassenhove V, Doeller CF. Rapid encoding of task regularities in the human hippocampus guides sensorimotor timing. eLife 2022; 11:79027. [PMID: 36317500 PMCID: PMC9625083 DOI: 10.7554/elife.79027] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 10/02/2022] [Indexed: 11/17/2022] Open
Abstract
The brain encodes the statistical regularities of the environment in a task-specific yet flexible and generalizable format. Here, we seek to understand this process by bridging two parallel lines of research, one centered on sensorimotor timing, and the other on cognitive mapping in the hippocampal system. By combining functional magnetic resonance imaging (fMRI) with a fast-paced time-to-contact (TTC) estimation task, we found that the hippocampus signaled behavioral feedback received in each trial as well as performance improvements across trials along with reward-processing regions. Critically, it signaled performance improvements independent from the tested intervals, and its activity accounted for the trial-wise regression-to-the-mean biases in TTC estimation. This is in line with the idea that the hippocampus supports the rapid encoding of temporal context even on short time scales in a behavior-dependent manner. Our results emphasize the central role of the hippocampus in statistical learning and position it at the core of a brain-wide network updating sensorimotor representations in real time for flexible behavior.
Collapse
Affiliation(s)
- Ignacio Polti
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer’s Disease, Norwegian University of Science and TechnologyTrondheimNorway,Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Matthias Nau
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer’s Disease, Norwegian University of Science and TechnologyTrondheimNorway,Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Raphael Kaplan
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer’s Disease, Norwegian University of Science and TechnologyTrondheimNorway,Department of Basic Psychology, Clinical Psychology, and Psychobiology, Universitat Jaume ICastellón de la PlanaSpain
| | - Virginie van Wassenhove
- CEA DRF/Joliot, NeuroSpin; INSERM, Cognitive Neuroimaging Unit; CNRS, Université Paris-SaclayGif-Sur-YvetteFrance
| | - Christian F Doeller
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer’s Disease, Norwegian University of Science and TechnologyTrondheimNorway,Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany,Wilhelm Wundt Institute of Psychology, Leipzig UniversityLeipzigGermany
| |
Collapse
|
15
|
Mikhael JG, Lai L, Gershman SJ. Rational inattention and tonic dopamine. PLoS Comput Biol 2021; 17:e1008659. [PMID: 33760806 PMCID: PMC7990190 DOI: 10.1371/journal.pcbi.1008659] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/28/2020] [Indexed: 11/27/2022] Open
Abstract
Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.
Collapse
Affiliation(s)
- John G. Mikhael
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
- MD-PhD Program, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lucy Lai
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
16
|
Fung BJ, Sutlief E, Hussain Shuler MG. Dopamine and the interdependency of time perception and reward. Neurosci Biobehav Rev 2021; 125:380-391. [PMID: 33652021 DOI: 10.1016/j.neubiorev.2021.02.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 02/16/2021] [Accepted: 02/19/2021] [Indexed: 01/14/2023]
Abstract
Time is a fundamental dimension of our perception of the world and is therefore of critical importance to the organization of human behavior. A corpus of work - including recent optogenetic evidence - implicates striatal dopamine as a crucial factor influencing the perception of time. Another stream of literature implicates dopamine in reward and motivation processes. However, these two domains of research have remained largely separated, despite neurobiological overlap and the apothegmatic notion that "time flies when you're having fun". This article constitutes a review of the literature linking time perception and reward, including neurobiological and behavioral studies. Together, these provide compelling support for the idea that time perception and reward processing interact via a common dopaminergic mechanism.
Collapse
Affiliation(s)
- Bowen J Fung
- The Behavioural Insights Team, Suite 3, Level 13/9 Hunter St, Sydney NSW 2000, Australia.
| | - Elissa Sutlief
- The Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University School of Medicine, Woods Basic Science Building Rm914, 725 N. Wolfe Street, Baltimore, MD 21205, USA
| | - Marshall G Hussain Shuler
- The Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University School of Medicine, Woods Basic Science Building Rm914, 725 N. Wolfe Street, Baltimore, MD 21205, USA; Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, 725 N Wolfe Street, Baltimore, MD 21205, USA.
| |
Collapse
|
17
|
Jabłońska J, Szumiec Ł, Zieliński P, Rodriguez Parkitna J. Time elapsed between choices in a probabilistic task correlates with repeating the same decision. Eur J Neurosci 2021; 53:2639-2654. [PMID: 33559232 PMCID: PMC8248175 DOI: 10.1111/ejn.15144] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/01/2021] [Accepted: 02/02/2021] [Indexed: 12/30/2022]
Abstract
Reinforcement learning causes an action that yields a positive outcome more likely to be taken in the future. Here, we investigate how the time elapsed from an action affects subsequent decisions. Groups of C57BL6/J mice were housed in IntelliCages with access to water and chow ad libitum; they also had access to bottles with a reward: saccharin solution, alcohol, or a mixture of the two. The probability of receiving a reward in two of the cage corners changed between 0.9 and 0.3 every 48 hr over a period of ~33 days. As expected, in most animals, the odds of repeating a corner choice were increased if that choice was previously rewarded. Interestingly, the time elapsed from the previous choice also influenced the probability of repeating the choice, and this effect was independent of previous outcome. Behavioral data were fitted to a series of reinforcement learning models. Best fits were achieved when the reward prediction update was coupled with separate learning rates from positive and negative outcomes and additionally a “fictitious” update of the expected value of the nonselected choice. Additional inclusion of a time‐dependent decay of the expected values improved the fit marginally in some cases.
Collapse
Affiliation(s)
- Judyta Jabłońska
- Department of Molecular Neuropharmacology, Maj Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - Łukasz Szumiec
- Department of Molecular Neuropharmacology, Maj Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - Piotr Zieliński
- Department of Structure Research of Condensed Matter, The Henryk Niewodniczański Institute of Nuclear Physics, Polish Academy of Sciences, Krakow, Poland
| | - Jan Rodriguez Parkitna
- Department of Molecular Neuropharmacology, Maj Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| |
Collapse
|
18
|
Gallistel CR, Papachristos EB. Number and time in acquisition, extinction and recovery. J Exp Anal Behav 2019; 113:15-36. [PMID: 31856323 DOI: 10.1002/jeab.571] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 01/13/2023]
Abstract
We measured rate of acquisition, trials to extinction, cumulative responses in extinction, and the spontaneous recovery of anticipatory hopper poking in a Pavlovian protocol with mouse subjects. We varied by factors of 4 number of sessions, trials per session, intersession interval, and span of training (number of days over which training extended). We find that different variables affect each measure: Rate of acquisition [1/(trials to acquisition)] is faster when there are fewer trials per session. Terminal rate of responding is faster when there are more total training trials. Trials to extinction and amount of responding during extinction are unaffected by these variables. The number of training trials has no effect on recovery in a 4-trial probe session 21 days after extinction. However, recovery is greater when the span of training is greater, regardless of how many sessions there are within that span. Our results and those of others suggest that the numbers and durations and spacings of longer-duration "episodes" in a conditioning protocol (sessions and the spans in days of training and extinction) are important variables and that different variables affect different aspects of subjects' behavior. We discuss the theoretical and clinical implications of these and related findings and conclusions-for theories of conditioning and for neuroscience.
Collapse
|
19
|
Sanabria F, Daniels CW, Gupta T, Santos C. A computational formulation of the behavior systems account of the temporal organization of motivated behavior. Behav Processes 2019; 169:103952. [PMID: 31543283 PMCID: PMC6907728 DOI: 10.1016/j.beproc.2019.103952] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 08/30/2019] [Accepted: 08/31/2019] [Indexed: 02/02/2023]
Abstract
The behavior systems framework suggests that motivated behavior-e.g., seeking food and mates, avoiding predators-consists of sequences of actions organized within nested behavioral states. This framework has bridged behavioral ecology and experimental psychology, providing key insights into critical behavioral processes. In particular, the behavior systems framework entails a particular organization of behavior over time. The present paper examines whether such organization emerges from a generic Markov process, where the current behavioral state determines the probability distribution of subsequent behavioral states. This proposition is developed as a systematic examination of increasingly complex Markov models, seeking a computational formulation that balances adherence to the behavior systems approach, parsimony, and conformity to data. As a result of this exercise, a nonstationary partially hidden Markov model is selected as a computational formulation of the predatory subsystem. It is noted that the temporal distribution of discrete responses may further unveil the structure and parameters of the model but, without proper mathematical modeling, these discrete responses may be misleading. Opportunities for further elaboration of the proposed computational formulation are identified, including developments in its architecture, extensions to defensive and reproductive subsystems, and methodological refinements.
Collapse
Affiliation(s)
| | - Carter W Daniels
- Arizona State University, United States; Columbia University, United States
| | | | | |
Collapse
|
20
|
Abstract
Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
21
|
Petter EA, Gershman SJ, Meck WH. Integrating Models of Interval Timing and Reinforcement Learning. Trends Cogn Sci 2019; 22:911-922. [PMID: 30266150 DOI: 10.1016/j.tics.2018.08.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/23/2018] [Accepted: 08/13/2018] [Indexed: 10/28/2022]
Abstract
We present an integrated view of interval timing and reinforcement learning (RL) in the brain. The computational goal of RL is to maximize future rewards, and this depends crucially on a representation of time. Different RL systems in the brain process time in distinct ways. A model-based system learns 'what happens when', employing this internal model to generate action plans, while a model-free system learns to predict reward directly from a set of temporal basis functions. We describe how these systems are subserved by a computational division of labor between several brain regions, with a focus on the basal ganglia and the hippocampus, as well as how these regions are influenced by the neuromodulator dopamine.
Collapse
Affiliation(s)
- Elijah A Petter
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Warren H Meck
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA.
| |
Collapse
|
22
|
Kalmbach A, Chun E, Taylor K, Gallistel CR, Balsam PD. Time-scale-invariant information-theoretic contingencies in discrimination learning. JOURNAL OF EXPERIMENTAL PSYCHOLOGY. ANIMAL LEARNING AND COGNITION 2019; 45:280-289. [PMID: 31021132 PMCID: PMC7771212 DOI: 10.1037/xan0000205] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Animals optimize their behavior to maximize rewards by utilizing cues from the environment. In discrimination learning, cues signal when rewards can and cannot be earned by making a particular response. In our experiment, we trained male mice to press a lever to receive a reward on a random interval schedule. We then introduced a prolonged tone (20, 40, or 80 sec), during which no rewards could be earned. We sought to test our hypothesis that the duration of the tone and frequency of reward during the inter-tone-intervals affect the informativeness of cues and led to differences in discriminative behavior. Learning was expressed as an increase in lever pressing during the intertrial interval (ITI) and, when the informativeness of the cue was high, animals also reduced their lever pressing during the tone. Additionally, we found that the depth of discriminative learning was linearly related to the informativeness of the cues. Our results show that the time-scale invariant information-theoretic definition of contingency applied to excitatory conditioning can also be applied to inhibitory conditioning. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
23
|
Mikhael JG, Gershman SJ. Adapting the flow of time with dopamine. J Neurophysiol 2019; 121:1748-1760. [PMID: 30864882 PMCID: PMC6589719 DOI: 10.1152/jn.00817.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 02/04/2019] [Accepted: 02/20/2019] [Indexed: 01/25/2023] Open
Abstract
The modulation of interval timing by dopamine (DA) has been well established over decades of research. The nature of this modulation, however, has remained controversial: Although the pharmacological evidence has largely suggested that time intervals are overestimated with higher DA levels, more recent optogenetic work has shown the opposite effect. In addition, a large body of work has asserted DA's role as a "reward prediction error" (RPE), or a teaching signal that allows the basal ganglia to learn to predict future rewards in reinforcement learning tasks. Whether these two seemingly disparate accounts of DA may be related has remained an open question. By taking a reinforcement learning-based approach to interval timing, we show here that the RPE interpretation of DA naturally extends to its role as a modulator of timekeeping and furthermore that this view reconciles the seemingly conflicting observations. We derive a biologically plausible, DA-dependent plasticity rule that can modulate the rate of timekeeping in either direction and whose effect depends on the timing of the DA signal itself. This bidirectional update rule can account for the results from pharmacology and optogenetics as well as the behavioral effects of reward rate on interval timing and the temporal selectivity of striatal neurons. Hence, by adopting a single RPE interpretation of DA, our results take a step toward unifying computational theories of reinforcement learning and interval timing. NEW & NOTEWORTHY How does dopamine (DA) influence interval timing? A large body of pharmacological evidence has suggested that DA accelerates timekeeping mechanisms. However, recent optogenetic work has shown exactly the opposite effect. In this article, we relate DA's role in timekeeping to its most established role, as a critical component of reinforcement learning. This allows us to derive a neurobiologically plausible framework that reconciles a large body of DA's temporal effects, including pharmacological, behavioral, electrophysiological, and optogenetic.
Collapse
Affiliation(s)
- John G Mikhael
- Program in Neuroscience and MD-PhD Program, Harvard Medical School , Boston, Massachusetts
| | - Samuel J Gershman
- Center for Brain Science and Department of Psychology, Harvard University , Cambridge, Massachusetts
| |
Collapse
|
24
|
Austen JM, Sanderson DJ. Delay of reinforcement versus rate of reinforcement in Pavlovian conditioning. JOURNAL OF EXPERIMENTAL PSYCHOLOGY. ANIMAL LEARNING AND COGNITION 2019; 45:203-221. [PMID: 30843717 PMCID: PMC6448483 DOI: 10.1037/xan0000199] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 12/11/2018] [Accepted: 12/26/2018] [Indexed: 11/08/2022]
Abstract
Conditioned stimulus (CS) duration is a determinant of conditioned responding, with increases in duration leading to reductions in response rates. The CS duration effect has been proposed to reflect sensitivity to the reinforcement rate across cumulative exposure to the CS, suggesting that the delay of reinforcement from the onset of the cue is not crucial. Here, we compared the effects of delay and rate of reinforcement on Pavlovian appetitive conditioning in mice. In Experiment 1, the influence of reinforcement delay on the timing of responding was removed by making the duration of cues variable across trials. Mice trained with variable duration cues were sensitive to differences in the rate of reinforcement to a similar extent as mice trained with fixed duration cues. Experiments 2 and 3 tested the independent effects of delay and reinforcement rate. In Experiment 2, food was presented at either the termination of the CS or during the CS. In Experiment 3, food occurred during the CS for all cues. The latter experiment demonstrated an effect of delay, but not reinforcement rate. Experiment 4 ruled out the possibility that the lack of effect of reinforcement rate in Experiment 3 was due to mice failing to learn about the nonreinforced CS exposure after the presentation of food within a trial. These results demonstrate that although the CS duration effect is not simply a consequence of timing of conditioned responses, it is dependent on the delay of reinforcement. The results provide a challenge to current associative and nonassociative, time-accumulation models of learning. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
25
|
Temporal updating, behavioral learning, and the phenomenology of time-consciousness. Behav Brain Sci 2019; 42:e254. [DOI: 10.1017/s0140525x19000517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Abstract
Hoerl & McCormack claim that the temporal updating system only represents the world as present. This generates puzzles regarding the phenomenology of temporal experience. We argue that recent models of reinforcement learning suggest that temporal updating must have a minimal temporal structure; and we suggest that this helps to clarify what it means to experience the world as temporally structured.
Collapse
|
26
|
Rajendran VG, Teki S, Schnupp JWH. Temporal Processing in Audition: Insights from Music. Neuroscience 2018; 389:4-18. [PMID: 29108832 PMCID: PMC6371985 DOI: 10.1016/j.neuroscience.2017.10.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 10/24/2017] [Accepted: 10/27/2017] [Indexed: 11/28/2022]
Abstract
Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these disparate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together findings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions of their respective fields and consider the cross-disciplinary implications of studying auditory temporal sequence processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore the perceptual effects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception. Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception of rhythmic patterns. We conclude with a brief summary and outlook for future research.
Collapse
Affiliation(s)
- Vani G Rajendran
- Auditory Neuroscience Group, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, UK
| | - Sundeep Teki
- Auditory Neuroscience Group, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, UK
| | - Jan W H Schnupp
- City University of Hong Kong, Department of Biomedical Sciences, 31 To Yuen Street, Kowloon Tong, Hong Kong.
| |
Collapse
|
27
|
Langdon AJ, Sharpe MJ, Schoenbaum G, Niv Y. Model-based predictions for dopamine. Curr Opin Neurobiol 2018; 49:1-7. [PMID: 29096115 PMCID: PMC6034703 DOI: 10.1016/j.conb.2017.10.006] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 10/07/2017] [Accepted: 10/09/2017] [Indexed: 01/16/2023]
Abstract
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.
Collapse
Affiliation(s)
- Angela J Langdon
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ 08540, United States.
| | - Melissa J Sharpe
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ 08540, United States; National Institute on Drug Abuse, Baltimore, MD 21224, United States; School of Psychology, University of New South Wales, Australia
| | | | - Yael Niv
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ 08540, United States
| |
Collapse
|
28
|
A cerebellar mechanism for learning prior distributions of time intervals. Nat Commun 2018; 9:469. [PMID: 29391392 PMCID: PMC5794805 DOI: 10.1038/s41467-017-02516-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 12/05/2017] [Indexed: 01/14/2023] Open
Abstract
Knowledge about the statistical regularities of the world is essential for cognitive and sensorimotor function. In the domain of timing, prior statistics are crucial for optimal prediction, adaptation and planning. Where and how the nervous system encodes temporal statistics is, however, not known. Based on physiological and anatomical evidence for cerebellar learning, we develop a computational model that demonstrates how the cerebellum could learn prior distributions of time intervals and support Bayesian temporal estimation. The model shows that salient features observed in human Bayesian time interval estimates can be readily captured by learning in the cerebellar cortex and circuit level computations in the cerebellar deep nuclei. We test human behavior in two cerebellar timing tasks and find prior-dependent biases in timing that are consistent with the predictions of the cerebellar model. Human timing behavior is biased towards previously encountered intervals and is predicted by Bayesian models. Here, the authors develop a computational model based in properties of the cerebellum to show how we might encode time estimates based on prior experience.
Collapse
|
29
|
Abstract
The hypothesis that the phasic dopamine response reports a reward prediction error has become deeply entrenched. However, dopamine neurons exhibit several notable deviations from this hypothesis. A coherent explanation for these deviations can be obtained by analyzing the dopamine response in terms of Bayesian reinforcement learning. The key idea is that prediction errors are modulated by probabilistic beliefs about the relationship between cues and outcomes, updated through Bayesian inference. This account can explain dopamine responses to inferred value in sensory preconditioning, the effects of cue preexposure (latent inhibition), and adaptive coding of prediction errors when rewards vary across orders of magnitude. We further postulate that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.
Collapse
Affiliation(s)
- Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, U.S.A
| |
Collapse
|
30
|
Linderman SW, Gershman SJ. Using computational theory to constrain statistical models of neural data. Curr Opin Neurobiol 2017; 46:14-24. [PMID: 28732273 PMCID: PMC5660645 DOI: 10.1016/j.conb.2017.06.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 06/07/2017] [Accepted: 06/25/2017] [Indexed: 11/27/2022]
Abstract
Computational neuroscience is, to first order, dominated by two approaches: the 'bottom-up' approach, which searches for statistical patterns in large-scale neural recordings, and the 'top-down' approach, which begins with a theory of computation and considers plausible neural implementations. While this division is not clear-cut, we argue that these approaches should be much more intimately linked. From a Bayesian perspective, computational theories provide constrained prior distributions on neural data-albeit highly sophisticated ones. By connecting theory to observation via a probabilistic model, we provide the link necessary to test, evaluate, and revise our theories in a data-driven and statistically rigorous fashion. This review highlights examples of this theory-driven pipeline for neural data analysis in recent literature and illustrates it with a worked example based on the temporal difference learning model of dopamine.
Collapse
Affiliation(s)
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, United States.
| |
Collapse
|
31
|
Dopamine reward prediction errors reflect hidden-state inference across time. Nat Neurosci 2017; 20:581-589. [PMID: 28263301 PMCID: PMC5374025 DOI: 10.1038/nn.4520] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 01/25/2017] [Indexed: 12/14/2022]
Abstract
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference.
Collapse
|
32
|
Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016; 12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open
Abstract
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.
Collapse
Affiliation(s)
- Ayaka Kato
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
33
|
Abstract
To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Collapse
Affiliation(s)
- Yael Niv
- Psychology Department & Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, 08540
| | - Angela Langdon
- Psychology Department & Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, 08540
| |
Collapse
|
34
|
Marblestone AH, Wayne G, Kording KP. Toward an Integration of Deep Learning and Neuroscience. Front Comput Neurosci 2016; 10:94. [PMID: 27683554 PMCID: PMC5021692 DOI: 10.3389/fncom.2016.00094] [Citation(s) in RCA: 234] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 08/24/2016] [Indexed: 01/22/2023] Open
Abstract
Neuroscience has focused on the detailed implementation of computation, studying neural codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a cost function, often using simple and relatively uniform initial architectures. Two recent developments have emerged within machine learning that create an opportunity to connect these seemingly divergent perspectives. First, structured architectures are used, including dedicated systems for attention, recursion and various forms of short- and long-term memory storage. Second, cost functions and training procedures have become more complex and are varied across layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that (1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior. In support of these hypotheses, we argue that a range of implementations of credit assignment through multiple layers of neurons are compatible with our current knowledge of neural circuitry, and that the brain's specialized systems can be interpreted as enabling efficient optimization for specific problem classes. Such a heterogeneously optimized system, enabled by a series of interacting cost functions, serves to make learning data-efficient and precisely targeted to the needs of the organism. We suggest directions by which neuroscience could seek to refine and test these hypotheses.
Collapse
Affiliation(s)
- Adam H. Marblestone
- Synthetic Neurobiology Group, Massachusetts Institute of Technology, Media LabCambridge, MA, USA
| | | | - Konrad P. Kording
- Rehabilitation Institute of Chicago, Northwestern UniversityChicago, IL, USA
| |
Collapse
|
35
|
Berthet P, Lindahl M, Tully PJ, Hellgren-Kotaleski J, Lansner A. Functional Relevance of Different Basal Ganglia Pathways Investigated in a Spiking Model with Reward Dependent Plasticity. Front Neural Circuits 2016; 10:53. [PMID: 27493625 PMCID: PMC4954853 DOI: 10.3389/fncir.2016.00053] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 07/06/2016] [Indexed: 11/13/2022] Open
Abstract
The brain enables animals to behaviorally adapt in order to survive in a complex and dynamic environment, but how reward-oriented behaviors are achieved and computed by its underlying neural circuitry is an open question. To address this concern, we have developed a spiking model of the basal ganglia (BG) that learns to dis-inhibit the action leading to a reward despite ongoing changes in the reward schedule. The architecture of the network features the two pathways commonly described in BG, the direct (denoted D1) and the indirect (denoted D2) pathway, as well as a loop involving striatum and the dopaminergic system. The activity of these dopaminergic neurons conveys the reward prediction error (RPE), which determines the magnitude of synaptic plasticity within the different pathways. All plastic connections implement a versatile four-factor learning rule derived from Bayesian inference that depends upon pre- and post-synaptic activity, receptor type, and dopamine level. Synaptic weight updates occur in the D1 or D2 pathways depending on the sign of the RPE, and an efference copy informs upstream nuclei about the action selected. We demonstrate successful performance of the system in a multiple-choice learning task with a transiently changing reward schedule. We simulate lesioning of the various pathways and show that a condition without the D2 pathway fares worse than one without D1. Additionally, we simulate the degeneration observed in Parkinson's disease (PD) by decreasing the number of dopaminergic neurons during learning. The results suggest that the D1 pathway impairment in PD might have been overlooked. Furthermore, an analysis of the alterations in the synaptic weights shows that using the absolute reward value instead of the RPE leads to a larger change in D1.
Collapse
Affiliation(s)
- Pierre Berthet
- Numerical Analysis and Computer Science, Stockholm UniversityStockholm, Sweden
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
| | - Mikael Lindahl
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
| | - Philip J. Tully
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
- Institute for Adaptive and Neural Computation, School of Informatics, University of EdinburghEdinburgh, UK
| | - Jeanette Hellgren-Kotaleski
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
- Department of Neuroscience, Karolinska InstituteStockholm, Sweden
| | - Anders Lansner
- Numerical Analysis and Computer Science, Stockholm UniversityStockholm, Sweden
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
| |
Collapse
|
36
|
Fontes R, Ribeiro J, Gupta DS, Machado D, Lopes-Júnior F, Magalhães F, Bastos VH, Rocha K, Marinho V, Lima G, Velasques B, Ribeiro P, Orsini M, Pessoa B, Leite MAA, Teixeira S. Time Perception Mechanisms at Central Nervous System. Neurol Int 2016; 8:5939. [PMID: 27127597 PMCID: PMC4830363 DOI: 10.4081/ni.2016.5939] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2015] [Revised: 11/24/2015] [Accepted: 11/30/2015] [Indexed: 12/20/2022] Open
Abstract
The five senses have specific ways to receive environmental information and lead to central nervous system. The perception of time is the sum of stimuli associated with cognitive processes and environmental changes. Thus, the perception of time requires a complex neural mechanism and may be changed by emotional state, level of attention, memory and diseases. Despite this knowledge, the neural mechanisms of time perception are not yet fully understood. The objective is to relate the mechanisms involved the neurofunctional aspects, theories, executive functions and pathologies that contribute the understanding of temporal perception. Articles form 1980 to 2015 were searched by using the key themes: neuroanatomy, neurophysiology, theories, time cells, memory, schizophrenia, depression, attention-deficit hyperactivity disorder and Parkinson’s disease combined with the term perception of time. We evaluated 158 articles within the inclusion criteria for the purpose of the study. We conclude that research about the holdings of the frontal cortex, parietal, basal ganglia, cerebellum and hippocampus have provided advances in the understanding of the regions related to the perception of time. In neurological and psychiatric disorders, the understanding of time depends on the severity of the diseases and the type of tasks.
Collapse
Affiliation(s)
- Rhailana Fontes
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| | - Jéssica Ribeiro
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| | - Daya S Gupta
- Department of Biology, Camden County College , Blackwood, NJ, USA
| | - Dionis Machado
- Laboratory of Brain Mapping and Functionality, Federal University of Piauí , Parnaíba
| | - Fernando Lopes-Júnior
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| | - Francisco Magalhães
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| | - Victor Hugo Bastos
- Laboratory of Brain Mapping and Functionality, Federal University of Piauí , Parnaíba
| | - Kaline Rocha
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| | - Victor Marinho
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| | - Gildário Lima
- Neurophisic Applied Laboratory, Federal University of Piauí , Parnaíba
| | - Bruna Velasques
- Brain Mapping and and Sensory-Motor Integration Laboratory, Psychiatry Institute of Federal University of Rio de Janeiro , Rio de Janeiro
| | - Pedro Ribeiro
- Brain Mapping and and Sensory-Motor Integration Laboratory, Psychiatry Institute of Federal University of Rio de Janeiro , Rio de Janeiro
| | | | - Bruno Pessoa
- Neurology Department, Federal Fluminense University , Niterói, Brazil
| | | | - Silmar Teixeira
- Brain Mapping and Plasticity Laboratory, Federal University of Piauí , Parnaíba, Brazil
| |
Collapse
|
37
|
A Simple Network Architecture Accounts for Diverse Reward Time Responses in Primary Visual Cortex. J Neurosci 2016; 35:12659-72. [PMID: 26377457 DOI: 10.1523/jneurosci.0871-15.2015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
UNLABELLED Many actions performed by animals and humans depend on an ability to learn, estimate, and produce temporal intervals of behavioral relevance. Exemplifying such learning of cued expectancies is the observation of reward-timing activity in the primary visual cortex (V1) of rodents, wherein neural responses to visual cues come to predict the time of future reward as behaviorally experienced in the past. These reward-timing responses exhibit significant heterogeneity in at least three qualitatively distinct classes: sustained increase or sustained decrease in firing rate until the time of expected reward, and a class of cells that reach a peak in firing at the expected delay. We elaborate upon our existing model by including inhibitory and excitatory units while imposing simple connectivity rules to demonstrate what role these inhibitory elements and the simple architectures play in sculpting the response dynamics of the network. We find that simply adding inhibition is not sufficient for obtaining the different distinct response classes, and that a broad distribution of inhibitory projections is necessary for obtaining peak-type responses. Furthermore, although changes in connection strength that modulate the effects of inhibition onto excitatory units have a strong impact on the firing rate profile of these peaked responses, the network exhibits robustness in its overall ability to predict the expected time of reward. Finally, we demonstrate how the magnitude of expected reward can be encoded at the expected delay in the network and how peaked responses express this reward expectancy. SIGNIFICANCE STATEMENT Heterogeneity in single-neuron responses is a common feature of neuronal systems, although sometimes, in theoretical approaches, it is treated as a nuisance and seldom considered as conveying a different aspect of a signal. In this study, we focus on the heterogeneous responses in the primary visual cortex of rodents trained with a predictable delayed reward time. We describe under what conditions this heterogeneity can arise by self-organization, and what information it can convey. This study, while focusing on a specific system, provides insight onto how heterogeneity can arise in general while also shedding light onto mechanisms of reinforcement learning using realistic biological assumptions.
Collapse
|
38
|
Lloyd K, Dayan P. Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens. PLoS Comput Biol 2015; 11:e1004622. [PMID: 26699940 PMCID: PMC4689534 DOI: 10.1371/journal.pcbi.1004622] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 10/25/2015] [Indexed: 11/26/2022] Open
Abstract
Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning’s temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persistently predictable by preceding stimuli, and so cannot arise as this sort of prediction error. Here, we explore three possible accounts of such ramping signals: (a) the resolution of uncertainty about the timing of action; (b) the direct influence of dopamine over mechanisms associated with making choices; and (c) a new model of discounted vigour. Collectively, these suggest that dopamine ramps may be explained, with only minor disturbance, by standard theoretical ideas, though urgent questions remain regarding their proximal cause. We suggest experimental approaches to disentangling which of the proposed mechanisms are responsible for dopamine ramps. Dopamine has long been implicated in reward-motivated behaviour. Theory and experiments suggest that activity of dopamine-containing neurons resembles a temporally-sophisticated prediction error used to learn expectations of future reward. This account would appear to be inconsistent with recent observations of ‘ramps’, i.e., gradual increases in extracellular dopamine concentration prior to the execution of actions or the acquisition of rewards. We explore three different possible explanations of such ramping signals as arising: (a) when subjects experience uncertainty about when actions will be executed; (b) when dopamine itself influences the timecourse of choice; and (c) under a new model in which ‘quasi-tonic’ dopamine signals arise through a form of temporal discounting. We thereby show that dopamine ramps can be integrated with current theories, and also suggest experiments to clarify which mechanisms are involved.
Collapse
Affiliation(s)
- Kevin Lloyd
- Gatsby Computational Neuroscience Unit, London, United Kingdom
- * E-mail:
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, London, United Kingdom
| |
Collapse
|
39
|
Gouvêa TS, Monteiro T, Motiwala A, Soares S, Machens C, Paton JJ. Striatal dynamics explain duration judgments. eLife 2015; 4. [PMID: 26641377 PMCID: PMC4721960 DOI: 10.7554/elife.11386] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 12/07/2015] [Indexed: 11/25/2022] Open
Abstract
The striatum is an input structure of the basal ganglia implicated in several time-dependent functions including reinforcement learning, decision making, and interval timing. To determine whether striatal ensembles drive subjects' judgments of duration, we manipulated and recorded from striatal neurons in rats performing a duration categorization psychophysical task. We found that the dynamics of striatal neurons predicted duration judgments, and that simultaneously recorded ensembles could judge duration as well as the animal. Furthermore, striatal neurons were necessary for duration judgments, as muscimol infusions produced a specific impairment in animals' duration sensitivity. Lastly, we show that time as encoded by striatal populations ran faster or slower when rats judged a duration as longer or shorter, respectively. These results demonstrate that the speed with which striatal population state changes supports the fundamental ability of animals to judge the passage of time. DOI:http://dx.doi.org/10.7554/eLife.11386.001 You know someone is a good cook from their rice - grains must be well cooked, but not to the point of being mushy. Despite consistently using the same pot and stove, we, however, will sometimes overcook it. It is as if our inner sense of time itself is variable. What is it about the brain that explains this variability in time estimation and indeed our ability to estimate time in the first place? One issue the brain must confront in order to estimate time is that individual brain cells typically fire in bursts that last for tens of milliseconds. So how does the brain use this short-lived activity to track minutes and hours? One possibility is that individual neurons in a given brain region are programmed to fire at different points in time. The overall firing pattern of a group of neurons will therefore change in a predictable way as time passes. Gouvêa, Monteiro et al. found such predictably changing patterns of activity in the striatum of rats trained to estimate and categorize the duration of time intervals as longer or shorter than 1.5 seconds. Interestingly, when rats mistakenly categorized a short interval as a long one, population activity had travelled farther down its path than it would normally (and vice-versa for long intervals incorrectly categorized as short), suggesting that variability in subjective estimates of the passage of time might arise from variability in the speed of a changing pattern of activity across groups of neurons. As further evidence for the involvement of the striatum, inactivating the structure impaired the rats’ ability to correctly classify even the longest and shortest interval durations. The next challenge is to determine exactly how the striatum generates these time-keeping signals, at which stage variability originates, and how the brain regions that the striatum signals to use them to control an animal’s behavior. DOI:http://dx.doi.org/10.7554/eLife.11386.002
Collapse
Affiliation(s)
- Thiago S Gouvêa
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Tiago Monteiro
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Asma Motiwala
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Sofia Soares
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Christian Machens
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Joseph J Paton
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| |
Collapse
|
40
|
Gershman SJ. A Unifying Probabilistic View of Associative Learning. PLoS Comput Biol 2015; 11:e1004567. [PMID: 26535896 PMCID: PMC4633133 DOI: 10.1371/journal.pcbi.1004567] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 09/22/2015] [Indexed: 11/19/2022] Open
Abstract
Two important ideas about associative learning have emerged in recent decades: (1) Animals are Bayesian learners, tracking their uncertainty about associations; and (2) animals acquire long-term reward predictions through reinforcement learning. Both of these ideas are normative, in the sense that they are derived from rational design principles. They are also descriptive, capturing a wide range of empirical phenomena that troubled earlier theories. This article describes a unifying framework encompassing Bayesian and reinforcement learning theories of associative learning. Each perspective captures a different aspect of associative learning, and their synthesis offers insight into phenomena that neither perspective can explain on its own.
Collapse
Affiliation(s)
- Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
41
|
Morita K, Kawaguchi Y. Computing reward-prediction error: an integrated account of cortical timing and basal-ganglia pathways for appetitive and aversive learning. Eur J Neurosci 2015; 42:2003-21. [PMID: 26095906 PMCID: PMC5034842 DOI: 10.1111/ejn.12994] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 06/11/2015] [Accepted: 06/17/2015] [Indexed: 12/12/2022]
Abstract
There are two prevailing notions regarding the involvement of the corticobasal ganglia system in value‐based learning: (i) the direct and indirect pathways of the basal ganglia are crucial for appetitive and aversive learning, respectively, and (ii) the activity of midbrain dopamine neurons represents reward‐prediction error. Although (ii) constitutes a critical assumption of (i), it remains elusive how (ii) holds given (i), with the basal‐ganglia influence on the dopamine neurons. Here we present a computational neural‐circuit model that potentially resolves this issue. Based on the latest analyses of the heterogeneous corticostriatal neurons and connections, our model posits that the direct and indirect pathways, respectively, represent the values of upcoming and previous actions, and up‐regulate and down‐regulate the dopamine neurons via the basal‐ganglia output nuclei. This explains how the difference between the upcoming and previous values, which constitutes the core of reward‐prediction error, is calculated. Simultaneously, it predicts that blockade of the direct/indirect pathway causes a negative/positive shift of reward‐prediction error and thereby impairs learning from positive/negative error, i.e. appetitive/aversive learning. Through simulation of reward‐reversal learning and punishment‐avoidance learning, we show that our model could indeed account for the experimentally observed features that are suggested to support notion (i) and could also provide predictions on neural activity. We also present a behavioral prediction of our model, through simulation of inter‐temporal choice, on how the balance between the two pathways relates to the subject's time preference. These results indicate that our model, incorporating the heterogeneity of the cortical influence on the basal ganglia, is expected to provide a closed‐circuit mechanistic understanding of appetitive/aversive learning.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yasuo Kawaguchi
- Division of Cerebral Circuitry, National Institute for Physiological Sciences, Okazaki, Japan.,Department of Physiological Sciences, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Japan.,Japan Science and Technology Agency, Core Research for Evolutional Science and Technology, Tokyo, Japan
| |
Collapse
|
42
|
|
43
|
Moustafa AA. On the relationship among different motor processes: a computational modeling approach. Front Comput Neurosci 2015; 9:34. [PMID: 25852532 PMCID: PMC4364174 DOI: 10.3389/fncom.2015.00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 03/03/2015] [Indexed: 11/13/2022] Open
|
44
|
Moustafa AA, Bar-Gad I, Korngreen A, Bergman H. Basal ganglia: physiological, behavioral, and computational studies. Front Syst Neurosci 2014; 8:150. [PMID: 25191233 PMCID: PMC4139593 DOI: 10.3389/fnsys.2014.00150] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 08/04/2014] [Indexed: 12/19/2022] Open
Affiliation(s)
- Ahmed A Moustafa
- Department of Veterans Affairs, New Jersey Health Care System, School of Social Sciences and Psychology, Marcs Institute for Brain and Behaviour, University of Western Sydney Sydney, NSW, Australia
| | - Izhar Bar-Gad
- Gonda Brain Research Center, Bar-Ilan University Ramat Gan, Israel
| | - Alon Korngreen
- Gonda Brain Research Center, Bar-Ilan University Ramat Gan, Israel ; Everard Goodman Faculty of life sciences, Bar-Ilan University Ramat Gan, Israel
| | - Hagai Bergman
- Department of Neurobiology (Physiology), Faculty of Medicine, Edemond and Lily Safra Center for Brain Research, Institue of Medical Research Israel-Canada, The Hebrew University of Jerusalem Jerusalem, Israel
| |
Collapse
|
45
|
Abstract
The dopamine clock hypothesis suggests that the dopamine level determines the speed of the hypothetical internal clock. However, dopaminergic function has also been implicated for motivation and thus the effect of dopaminergic manipulations on timing behavior might also be independently mediated by altered motivational state. Studies that investigated the effect of motivational manipulations on peak responding are reviewed in this paper. The majority of these studies show that a higher reward magnitude leads to a leftward shift, whereas reward devaluation leads to a rightward shift in the initiation of timed anticipatory behavior, typically in the absence of an effect on the timing of response termination. Similar behavioral effects are also present in a number of studies that investigated the effect of dopamine agonists and dopamine-related genetic factors on peak responding. These results can be readily accounted for by independent modulation of decision-thresholds for the initiation and termination of timed responding.
Collapse
Affiliation(s)
- Fuat Balcı
- Department of Psychology, Koç University, Rumelifeneri yolu, Sarıyer, Istanbul, Turkey
| |
Collapse
|