Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gershman SJ, Moustafa AA, Ludvig EA. Time representation in reinforcement learning models of the basal ganglia. Front Comput Neurosci 2014;7:194. [PMID: 24409138 PMCID: PMC3885823 DOI: 10.3389/fncom.2013.00194] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 12/23/2013] [Indexed: 11/26/2022] Open

For:	Gershman SJ, Moustafa AA, Ludvig EA. Time representation in reinforcement learning models of the basal ganglia. Front Comput Neurosci 2014;7:194. [PMID: 24409138 PMCID: PMC3885823 DOI: 10.3389/fncom.2013.00194] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 12/23/2013] [Indexed: 11/26/2022] Open

Number

Cited by Other Article(s)

Etani T, Miura A, Kawase S, Fujii S, Keller PE, Vuust P, Kudo K. A review of psychological and neuroscientific research on musical groove. Neurosci Biobehav Rev 2024;158:105522. [PMID: 38141692 DOI: 10.1016/j.neubiorev.2023.105522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 12/18/2023] [Accepted: 12/19/2023] [Indexed: 12/25/2023]

Krausz TA, Comrie AE, Kahn AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. Neuron 2023;111:3465-3478.e7. [PMID: 37611585 PMCID: PMC10841332 DOI: 10.1016/j.neuron.2023.07.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/25/2023]

Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023;19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open

Krausz TA, Comrie AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528738. [PMID: 36993482 PMCID: PMC10054934 DOI: 10.1101/2023.02.15.528738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Alhassen W, Alhassen S, Chen J, Monfared RV, Alachkar A. Cilia in the Striatum Mediate Timing-Dependent Functions. Mol Neurobiol 2023;60:545-565. [PMID: 36322337 PMCID: PMC9849326 DOI: 10.1007/s12035-022-03095-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 10/16/2022] [Indexed: 11/07/2022]

Gallistel CR, Johansson F, Jirenhed DA, Rasmussen A, Ricci M, Hesslow G. Quantitative properties of the creation and activation of a cell-intrinsic duration-encoding engram. Front Comput Neurosci 2022;16:1019812. [PMID: 36405788 PMCID: PMC9669310 DOI: 10.3389/fncom.2022.1019812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/21/2022] [Indexed: 11/06/2022] Open

Jakob AMV, Mikhael JG, Hamilos AE, Assad JA, Gershman SJ. Dopamine mediates the bidirectional update of interval timing. Behav Neurosci 2022;136:445-452. [PMID: 36222637 PMCID: PMC9725808 DOI: 10.1037/bne0000529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]

Namboodiri VMK. How do real animals account for the passage of time during associative learning? Behav Neurosci 2022;136:383-391. [PMID: 35482634 PMCID: PMC9561011 DOI: 10.1037/bne0000516] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Tsao A, Yousefzadeh SA, Meck WH, Moser MB, Moser EI. The neural bases for timing of durations. Nat Rev Neurosci 2022;23:646-665. [PMID: 36097049 DOI: 10.1038/s41583-022-00623-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2022] [Indexed: 11/10/2022]

Lourenco I, Mattila R, Ventura R, Wahlberg B. A Biologically Inspired Computational Model of Time Perception. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3120301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Parker NF, Baidya A, Cox J, Haetzel LM, Zhukovskaya A, Murugan M, Engelhard B, Goldman MS, Witten IB. Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning. Cell Rep 2022;39:110756. [PMID: 35584665 PMCID: PMC9218875 DOI: 10.1016/j.celrep.2022.110756] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 02/18/2022] [Accepted: 04/07/2022] [Indexed: 11/25/2022] Open

Calderon CB, Verguts T, Frank MJ. Thunderstruck: The ACDC model of flexible sequences and rhythms in recurrent neural circuits. PLoS Comput Biol 2022;18:e1009854. [PMID: 35108283 PMCID: PMC8843237 DOI: 10.1371/journal.pcbi.1009854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 02/14/2022] [Accepted: 01/21/2022] [Indexed: 11/18/2022] Open

Abstract

Adaptive sequential behavior is a hallmark of human cognition. In particular, humans can learn to produce precise spatiotemporal sequences given a certain context. For instance, musicians can not only reproduce learned action sequences in a context-dependent manner, they can also quickly and flexibly reapply them in any desired tempo or rhythm without overwriting previous learning. Existing neural network models fail to account for these properties. We argue that this limitation emerges from the fact that sequence information (i.e., the position of the action) and timing (i.e., the moment of response execution) are typically stored in the same neural network weights. Here, we augment a biologically plausible recurrent neural network of cortical dynamics to include a basal ganglia-thalamic module which uses reinforcement learning to dynamically modulate action. This “associative cluster-dependent chain” (ACDC) model modularly stores sequence and timing information in distinct loci of the network. This feature increases computational power and allows ACDC to display a wide range of temporal properties (e.g., multiple sequences, temporal shifting, rescaling, and compositionality), while still accounting for several behavioral and neurophysiological empirical observations. Finally, we apply this ACDC network to show how it can learn the famous “Thunderstruck” song intro and then flexibly play it in a “bossa nova” rhythm without further training.

How do humans flexibly adapt action sequences? For instance, musicians can learn a song and quickly speed up or slow down the tempo, or even play the song following a completely different rhythm (e.g., a rock song using a bossa nova rhythm). In this work, we build a biologically plausible network of cortico-basal ganglia interactions that explains how this temporal flexibility may emerge in the brain. Crucially, our model factorizes sequence order and action timing, respectively represented in cortical and basal ganglia dynamics. This factorization allows full temporal flexibility, i.e. the timing of a learned action sequence can be recomposed without interfering with the order of the sequence. As such, our model is capable of learning asynchronous action sequences, and flexibly shift, rescale, and recompose them, while accounting for biological data.

Collapse

Dayan P. "Liking" as an early and editable draft of long-run affective value. PLoS Biol 2022;20:e3001476. [PMID: 34986138 PMCID: PMC8730425 DOI: 10.1371/journal.pbio.3001476] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Polti I, Nau M, Kaplan R, van Wassenhove V, Doeller CF. Rapid encoding of task regularities in the human hippocampus guides sensorimotor timing. eLife 2022;11:79027. [PMID: 36317500 PMCID: PMC9625083 DOI: 10.7554/elife.79027] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 10/02/2022] [Indexed: 11/17/2022] Open

Mikhael JG, Lai L, Gershman SJ. Rational inattention and tonic dopamine. PLoS Comput Biol 2021;17:e1008659. [PMID: 33760806 PMCID: PMC7990190 DOI: 10.1371/journal.pcbi.1008659] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/28/2020] [Indexed: 11/27/2022] Open

Abstract

Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.

Collapse

Fung BJ, Sutlief E, Hussain Shuler MG. Dopamine and the interdependency of time perception and reward. Neurosci Biobehav Rev 2021;125:380-391. [PMID: 33652021 DOI: 10.1016/j.neubiorev.2021.02.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 02/16/2021] [Accepted: 02/19/2021] [Indexed: 01/14/2023]

Jabłońska J, Szumiec Ł, Zieliński P, Rodriguez Parkitna J. Time elapsed between choices in a probabilistic task correlates with repeating the same decision. Eur J Neurosci 2021;53:2639-2654. [PMID: 33559232 PMCID: PMC8248175 DOI: 10.1111/ejn.15144] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/01/2021] [Accepted: 02/02/2021] [Indexed: 12/30/2022]

Gallistel CR, Papachristos EB. Number and time in acquisition, extinction and recovery. J Exp Anal Behav 2019;113:15-36. [PMID: 31856323 DOI: 10.1002/jeab.571] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 01/13/2023]

Sanabria F, Daniels CW, Gupta T, Santos C. A computational formulation of the behavior systems account of the temporal organization of motivated behavior. Behav Processes 2019;169:103952. [PMID: 31543283 PMCID: PMC6907728 DOI: 10.1016/j.beproc.2019.103952] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 08/30/2019] [Accepted: 08/31/2019] [Indexed: 02/02/2023]

Gershman SJ, Uchida N. Believing in dopamine. Nat Rev Neurosci 2019;20:703-714. [PMID: 31570826 PMCID: PMC7472313 DOI: 10.1038/s41583-019-0220-7] [Citation(s) in RCA: 115] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2019] [Indexed: 11/08/2022]

Petter EA, Gershman SJ, Meck WH. Integrating Models of Interval Timing and Reinforcement Learning. Trends Cogn Sci 2019;22:911-922. [PMID: 30266150 DOI: 10.1016/j.tics.2018.08.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/23/2018] [Accepted: 08/13/2018] [Indexed: 10/28/2022]

Kalmbach A, Chun E, Taylor K, Gallistel CR, Balsam PD. Time-scale-invariant information-theoretic contingencies in discrimination learning. JOURNAL OF EXPERIMENTAL PSYCHOLOGY. ANIMAL LEARNING AND COGNITION 2019;45:280-289. [PMID: 31021132 PMCID: PMC7771212 DOI: 10.1037/xan0000205] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Mikhael JG, Gershman SJ. Adapting the flow of time with dopamine. J Neurophysiol 2019;121:1748-1760. [PMID: 30864882 PMCID: PMC6589719 DOI: 10.1152/jn.00817.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 02/04/2019] [Accepted: 02/20/2019] [Indexed: 01/25/2023] Open

Abstract

The modulation of interval timing by dopamine (DA) has been well established over decades of research. The nature of this modulation, however, has remained controversial: Although the pharmacological evidence has largely suggested that time intervals are overestimated with higher DA levels, more recent optogenetic work has shown the opposite effect. In addition, a large body of work has asserted DA's role as a "reward prediction error" (RPE), or a teaching signal that allows the basal ganglia to learn to predict future rewards in reinforcement learning tasks. Whether these two seemingly disparate accounts of DA may be related has remained an open question. By taking a reinforcement learning-based approach to interval timing, we show here that the RPE interpretation of DA naturally extends to its role as a modulator of timekeeping and furthermore that this view reconciles the seemingly conflicting observations. We derive a biologically plausible, DA-dependent plasticity rule that can modulate the rate of timekeeping in either direction and whose effect depends on the timing of the DA signal itself. This bidirectional update rule can account for the results from pharmacology and optogenetics as well as the behavioral effects of reward rate on interval timing and the temporal selectivity of striatal neurons. Hence, by adopting a single RPE interpretation of DA, our results take a step toward unifying computational theories of reinforcement learning and interval timing. NEW & NOTEWORTHY How does dopamine (DA) influence interval timing? A large body of pharmacological evidence has suggested that DA accelerates timekeeping mechanisms. However, recent optogenetic work has shown exactly the opposite effect. In this article, we relate DA's role in timekeeping to its most established role, as a critical component of reinforcement learning. This allows us to derive a neurobiologically plausible framework that reconciles a large body of DA's temporal effects, including pharmacological, behavioral, electrophysiological, and optogenetic.

Collapse

Austen JM, Sanderson DJ. Delay of reinforcement versus rate of reinforcement in Pavlovian conditioning. JOURNAL OF EXPERIMENTAL PSYCHOLOGY. ANIMAL LEARNING AND COGNITION 2019;45:203-221. [PMID: 30843717 PMCID: PMC6448483 DOI: 10.1037/xan0000199] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 12/11/2018] [Accepted: 12/26/2018] [Indexed: 11/08/2022]

Temporal updating, behavioral learning, and the phenomenology of time-consciousness. Behav Brain Sci 2019;42:e254. [DOI: 10.1017/s0140525x19000517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Rajendran VG, Teki S, Schnupp JWH. Temporal Processing in Audition: Insights from Music. Neuroscience 2018;389:4-18. [PMID: 29108832 PMCID: PMC6371985 DOI: 10.1016/j.neuroscience.2017.10.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 10/24/2017] [Accepted: 10/27/2017] [Indexed: 11/28/2022]

Langdon AJ, Sharpe MJ, Schoenbaum G, Niv Y. Model-based predictions for dopamine. Curr Opin Neurobiol 2018;49:1-7. [PMID: 29096115 PMCID: PMC6034703 DOI: 10.1016/j.conb.2017.10.006] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 10/07/2017] [Accepted: 10/09/2017] [Indexed: 01/16/2023]

A cerebellar mechanism for learning prior distributions of time intervals. Nat Commun 2018;9:469. [PMID: 29391392 PMCID: PMC5794805 DOI: 10.1038/s41467-017-02516-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 12/05/2017] [Indexed: 01/14/2023] Open

Gershman SJ. Dopamine, Inference, and Uncertainty. Neural Comput 2017;29:3311-3326. [DOI: 10.1162/neco_a_01023] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Linderman SW, Gershman SJ. Using computational theory to constrain statistical models of neural data. Curr Opin Neurobiol 2017;46:14-24. [PMID: 28732273 PMCID: PMC5660645 DOI: 10.1016/j.conb.2017.06.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 06/07/2017] [Accepted: 06/25/2017] [Indexed: 11/27/2022]

Dopamine reward prediction errors reflect hidden-state inference across time. Nat Neurosci 2017;20:581-589. [PMID: 28263301 PMCID: PMC5374025 DOI: 10.1038/nn.4520] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 01/25/2017] [Indexed: 12/14/2022]

Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016;12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open

Abstract

It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.

Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.

Collapse

Niv Y, Langdon A. Reinforcement learning with Marr. Curr Opin Behav Sci 2016;11:67-73. [PMID: 27408906 PMCID: PMC4939081 DOI: 10.1016/j.cobeha.2016.04.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Marblestone AH, Wayne G, Kording KP. Toward an Integration of Deep Learning and Neuroscience. Front Comput Neurosci 2016;10:94. [PMID: 27683554 PMCID: PMC5021692 DOI: 10.3389/fncom.2016.00094] [Citation(s) in RCA: 234] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 08/24/2016] [Indexed: 01/22/2023] Open

Berthet P, Lindahl M, Tully PJ, Hellgren-Kotaleski J, Lansner A. Functional Relevance of Different Basal Ganglia Pathways Investigated in a Spiking Model with Reward Dependent Plasticity. Front Neural Circuits 2016;10:53. [PMID: 27493625 PMCID: PMC4954853 DOI: 10.3389/fncir.2016.00053] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 07/06/2016] [Indexed: 11/13/2022] Open

Fontes R, Ribeiro J, Gupta DS, Machado D, Lopes-Júnior F, Magalhães F, Bastos VH, Rocha K, Marinho V, Lima G, Velasques B, Ribeiro P, Orsini M, Pessoa B, Leite MAA, Teixeira S. Time Perception Mechanisms at Central Nervous System. Neurol Int 2016;8:5939. [PMID: 27127597 PMCID: PMC4830363 DOI: 10.4081/ni.2016.5939] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2015] [Revised: 11/24/2015] [Accepted: 11/30/2015] [Indexed: 12/20/2022] Open

A Simple Network Architecture Accounts for Diverse Reward Time Responses in Primary Visual Cortex. J Neurosci 2016;35:12659-72. [PMID: 26377457 DOI: 10.1523/jneurosci.0871-15.2015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Abstract

UNLABELLED

Many actions performed by animals and humans depend on an ability to learn, estimate, and produce temporal intervals of behavioral relevance. Exemplifying such learning of cued expectancies is the observation of reward-timing activity in the primary visual cortex (V1) of rodents, wherein neural responses to visual cues come to predict the time of future reward as behaviorally experienced in the past. These reward-timing responses exhibit significant heterogeneity in at least three qualitatively distinct classes: sustained increase or sustained decrease in firing rate until the time of expected reward, and a class of cells that reach a peak in firing at the expected delay. We elaborate upon our existing model by including inhibitory and excitatory units while imposing simple connectivity rules to demonstrate what role these inhibitory elements and the simple architectures play in sculpting the response dynamics of the network. We find that simply adding inhibition is not sufficient for obtaining the different distinct response classes, and that a broad distribution of inhibitory projections is necessary for obtaining peak-type responses. Furthermore, although changes in connection strength that modulate the effects of inhibition onto excitatory units have a strong impact on the firing rate profile of these peaked responses, the network exhibits robustness in its overall ability to predict the expected time of reward. Finally, we demonstrate how the magnitude of expected reward can be encoded at the expected delay in the network and how peaked responses express this reward expectancy.

SIGNIFICANCE STATEMENT

Heterogeneity in single-neuron responses is a common feature of neuronal systems, although sometimes, in theoretical approaches, it is treated as a nuisance and seldom considered as conveying a different aspect of a signal. In this study, we focus on the heterogeneous responses in the primary visual cortex of rodents trained with a predictable delayed reward time. We describe under what conditions this heterogeneity can arise by self-organization, and what information it can convey. This study, while focusing on a specific system, provides insight onto how heterogeneity can arise in general while also shedding light onto mechanisms of reinforcement learning using realistic biological assumptions.

Collapse

Lloyd K, Dayan P. Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens. PLoS Comput Biol 2015;11:e1004622. [PMID: 26699940 PMCID: PMC4689534 DOI: 10.1371/journal.pcbi.1004622] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 10/25/2015] [Indexed: 11/26/2022] Open

Abstract

Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning’s temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persistently predictable by preceding stimuli, and so cannot arise as this sort of prediction error. Here, we explore three possible accounts of such ramping signals: (a) the resolution of uncertainty about the timing of action; (b) the direct influence of dopamine over mechanisms associated with making choices; and (c) a new model of discounted vigour. Collectively, these suggest that dopamine ramps may be explained, with only minor disturbance, by standard theoretical ideas, though urgent questions remain regarding their proximal cause. We suggest experimental approaches to disentangling which of the proposed mechanisms are responsible for dopamine ramps.

Dopamine has long been implicated in reward-motivated behaviour. Theory and experiments suggest that activity of dopamine-containing neurons resembles a temporally-sophisticated prediction error used to learn expectations of future reward. This account would appear to be inconsistent with recent observations of ‘ramps’, i.e., gradual increases in extracellular dopamine concentration prior to the execution of actions or the acquisition of rewards. We explore three different possible explanations of such ramping signals as arising: (a) when subjects experience uncertainty about when actions will be executed; (b) when dopamine itself influences the timecourse of choice; and (c) under a new model in which ‘quasi-tonic’ dopamine signals arise through a form of temporal discounting. We thereby show that dopamine ramps can be integrated with current theories, and also suggest experiments to clarify which mechanisms are involved.

Collapse

Gouvêa TS, Monteiro T, Motiwala A, Soares S, Machens C, Paton JJ. Striatal dynamics explain duration judgments. eLife 2015;4. [PMID: 26641377 PMCID: PMC4721960 DOI: 10.7554/elife.11386] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 12/07/2015] [Indexed: 11/25/2022] Open

Abstract

The striatum is an input structure of the basal ganglia implicated in several time-dependent functions including reinforcement learning, decision making, and interval timing. To determine whether striatal ensembles drive subjects' judgments of duration, we manipulated and recorded from striatal neurons in rats performing a duration categorization psychophysical task. We found that the dynamics of striatal neurons predicted duration judgments, and that simultaneously recorded ensembles could judge duration as well as the animal. Furthermore, striatal neurons were necessary for duration judgments, as muscimol infusions produced a specific impairment in animals' duration sensitivity. Lastly, we show that time as encoded by striatal populations ran faster or slower when rats judged a duration as longer or shorter, respectively. These results demonstrate that the speed with which striatal population state changes supports the fundamental ability of animals to judge the passage of time.

DOI:http://dx.doi.org/10.7554/eLife.11386.001

You know someone is a good cook from their rice - grains must be well cooked, but not to the point of being mushy. Despite consistently using the same pot and stove, we, however, will sometimes overcook it. It is as if our inner sense of time itself is variable. What is it about the brain that explains this variability in time estimation and indeed our ability to estimate time in the first place?

One issue the brain must confront in order to estimate time is that individual brain cells typically fire in bursts that last for tens of milliseconds. So how does the brain use this short-lived activity to track minutes and hours? One possibility is that individual neurons in a given brain region are programmed to fire at different points in time. The overall firing pattern of a group of neurons will therefore change in a predictable way as time passes.

Gouvêa, Monteiro et al. found such predictably changing patterns of activity in the striatum of rats trained to estimate and categorize the duration of time intervals as longer or shorter than 1.5 seconds. Interestingly, when rats mistakenly categorized a short interval as a long one, population activity had travelled farther down its path than it would normally (and vice-versa for long intervals incorrectly categorized as short), suggesting that variability in subjective estimates of the passage of time might arise from variability in the speed of a changing pattern of activity across groups of neurons.

As further evidence for the involvement of the striatum, inactivating the structure impaired the rats’ ability to correctly classify even the longest and shortest interval durations.

The next challenge is to determine exactly how the striatum generates these time-keeping signals, at which stage variability originates, and how the brain regions that the striatum signals to use them to control an animal’s behavior.

DOI:http://dx.doi.org/10.7554/eLife.11386.002

Collapse

Gershman SJ. A Unifying Probabilistic View of Associative Learning. PLoS Comput Biol 2015;11:e1004567. [PMID: 26535896 PMCID: PMC4633133 DOI: 10.1371/journal.pcbi.1004567] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 09/22/2015] [Indexed: 11/19/2022] Open

Morita K, Kawaguchi Y. Computing reward-prediction error: an integrated account of cortical timing and basal-ganglia pathways for appetitive and aversive learning. Eur J Neurosci 2015;42:2003-21. [PMID: 26095906 PMCID: PMC5034842 DOI: 10.1111/ejn.12994] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 06/11/2015] [Accepted: 06/17/2015] [Indexed: 12/12/2022]

Abstract

There are two prevailing notions regarding the involvement of the corticobasal ganglia system in value‐based learning: (i) the direct and indirect pathways of the basal ganglia are crucial for appetitive and aversive learning, respectively, and (ii) the activity of midbrain dopamine neurons represents reward‐prediction error. Although (ii) constitutes a critical assumption of (i), it remains elusive how (ii) holds given (i), with the basal‐ganglia influence on the dopamine neurons. Here we present a computational neural‐circuit model that potentially resolves this issue. Based on the latest analyses of the heterogeneous corticostriatal neurons and connections, our model posits that the direct and indirect pathways, respectively, represent the values of upcoming and previous actions, and up‐regulate and down‐regulate the dopamine neurons via the basal‐ganglia output nuclei. This explains how the difference between the upcoming and previous values, which constitutes the core of reward‐prediction error, is calculated. Simultaneously, it predicts that blockade of the direct/indirect pathway causes a negative/positive shift of reward‐prediction error and thereby impairs learning from positive/negative error, i.e. appetitive/aversive learning. Through simulation of reward‐reversal learning and punishment‐avoidance learning, we show that our model could indeed account for the experimentally observed features that are suggested to support notion (i) and could also provide predictions on neural activity. We also present a behavioral prediction of our model, through simulation of inter‐temporal choice, on how the balance between the two pathways relates to the subject's time preference. These results indicate that our model, incorporating the heterogeneity of the cortical influence on the basal ganglia, is expected to provide a closed‐circuit mechanistic understanding of appetitive/aversive learning.

Collapse

Tread softly and carry a clock's tick. Nat Neurosci 2015;18:329-30. [PMID: 25710833 DOI: 10.1038/nn.3959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Moustafa AA. On the relationship among different motor processes: a computational modeling approach. Front Comput Neurosci 2015;9:34. [PMID: 25852532 PMCID: PMC4364174 DOI: 10.3389/fncom.2015.00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 03/03/2015] [Indexed: 11/13/2022] Open

Moustafa AA, Bar-Gad I, Korngreen A, Bergman H. Basal ganglia: physiological, behavioral, and computational studies. Front Syst Neurosci 2014;8:150. [PMID: 25191233 PMCID: PMC4139593 DOI: 10.3389/fnsys.2014.00150] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 08/04/2014] [Indexed: 12/19/2022] Open

Balcı F. Interval Timing, Dopamine, and Motivation. TIMING & TIME PERCEPTION 2014. [DOI: 10.1163/22134468-00002035] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]