Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016;12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open

For:	Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016;12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open

Number

Cited by Other Article(s)

Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024;20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open

Abstract

Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.

Collapse

Sato R, Shimomura K, Morita K. Opponent learning with different representations in the cortico-basal ganglia pathways can develop obsession-compulsion cycle. PLoS Comput Biol 2023;19:e1011206. [PMID: 37319256 PMCID: PMC10306209 DOI: 10.1371/journal.pcbi.1011206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 05/23/2023] [Indexed: 06/17/2023] Open

Abstract

Obsessive-compulsive disorder (OCD) has been suggested to be associated with impairment of model-based behavioral control. Meanwhile, recent work suggested shorter memory trace for negative than positive prediction errors (PEs) in OCD. We explored relations between these two suggestions through computational modeling. Based on the properties of cortico-basal ganglia pathways, we modeled human as an agent having a combination of successor representation (SR)-based system that enables model-based-like control and individual representation (IR)-based system that only hosts model-free control, with the two systems potentially learning from positive and negative PEs in different rates. We simulated the agent's behavior in the environmental model used in the recent work that describes potential development of obsession-compulsion cycle. We found that the dual-system agent could develop enhanced obsession-compulsion cycle, similarly to the agent having memory trace imbalance in the recent work, if the SR- and IR-based systems learned mainly from positive and negative PEs, respectively. We then simulated the behavior of such an opponent SR+IR agent in the two-stage decision task, in comparison with the agent having only SR-based control. Fitting of the agents' behavior by the model weighing model-based and model-free control developed in the original two-stage task study resulted in smaller weights of model-based control for the opponent SR+IR agent than for the SR-only agent. These results reconcile the previous suggestions about OCD, i.e., impaired model-based control and memory trace imbalance, raising a novel possibility that opponent learning in model(SR)-based and model-free controllers underlies obsession-compulsion. Our model cannot explain the behavior of OCD patients in punishment, rather than reward, contexts, but it could be resolved if opponent SR+IR learning operates also in the recently revealed non-canonical cortico-basal ganglia-dopamine circuit for threat/aversiveness, rather than reward, reinforcement learning, and the aversive SR + appetitive IR agent could actually develop obsession-compulsion if the environment is modeled differently.

Collapse

Morita K, Shimomura K, Kawaguchi Y. Opponent Learning with Different Representations in the Cortico-Basal Ganglia Circuits. eNeuro 2023;10:ENEURO.0422-22.2023. [PMID: 36653187 PMCID: PMC9884109 DOI: 10.1523/eneuro.0422-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/06/2022] [Accepted: 01/03/2023] [Indexed: 01/20/2023] Open

Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022;43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Jaron T. Colas Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
Neil M. Dundon Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
Raphael T. Gerraty Department of PsychologyColumbia UniversityNew YorkNew YorkUSA Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
Natalie M. Saragosa‐Harris Department of PsychologyNew York UniversityNew YorkNew YorkUSA Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
Karol P. Szymula Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Koranis Tanwisuth Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
J. Michael Tyszka Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
Camilla van Geen Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Harang Ju Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Arthur W. Toga Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
Joshua I. Gold Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Dani S. Bassett Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Santa Fe InstituteSanta FeNew MexicoUSA
Catherine A. Hartley Department of PsychologyNew York UniversityNew YorkNew YorkUSA Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
Daphna Shohamy Department of PsychologyColumbia UniversityNew YorkNew YorkUSA Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
Scott T. Grafton Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
John P. O'Doherty Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA

Collapse

Barakchian Z, Vahabie AH, Nili Ahmadabadi M. Implicit Counterfactual Effect in Partial Feedback Reinforcement Learning: Behavioral and Modeling Approach. Front Neurosci 2022;16:631347. [PMID: 35620668 PMCID: PMC9127865 DOI: 10.3389/fnins.2022.631347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open

Phasic Dopamine Changes and Hebbian Mechanisms during Probabilistic Reversal Learning in Striatal Circuits: A Computational Study. Int J Mol Sci 2022;23:ijms23073452. [PMID: 35408811 PMCID: PMC8998230 DOI: 10.3390/ijms23073452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/18/2022] [Accepted: 03/19/2022] [Indexed: 11/22/2022] Open

Morita K, Kato A. Dopamine ramps for accurate value learning under uncertainty. Trends Neurosci 2022;45:254-256. [PMID: 35181147 DOI: 10.1016/j.tins.2022.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 01/31/2022] [Indexed: 10/19/2022]

Dopamine firing plays a dual role in coding reward prediction errors and signaling motivation in a working memory task. Proc Natl Acad Sci U S A 2022;119:2113311119. [PMID: 34992139 PMCID: PMC8764687 DOI: 10.1073/pnas.2113311119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2021] [Indexed: 11/21/2022] Open

Feng Z, Nagase AM, Morita K. A Reinforcement Learning Approach to Understanding Procrastination: Does Inaccurate Value Approximation Cause Irrational Postponing of a Task? Front Neurosci 2021;15:660595. [PMID: 34602962 PMCID: PMC8481628 DOI: 10.3389/fnins.2021.660595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 08/16/2021] [Indexed: 11/27/2022] Open

Abstract

Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a "student" doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the "student" had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The "student" learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the "student" made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the "student," to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.

Collapse

Suzuki S, Yamashita Y, Katahira K. Psychiatric symptoms influence reward-seeking and loss-avoidance decision-making through common and distinct computational processes. Psychiatry Clin Neurosci 2021;75:277-285. [PMID: 34151477 PMCID: PMC8457174 DOI: 10.1111/pcn.13279] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/07/2021] [Accepted: 06/14/2021] [Indexed: 11/29/2022]

Shimomura K, Kato A, Morita K. Rigid reduced successor representation as a potential mechanism for addiction. Eur J Neurosci 2021;53:3768-3790. [PMID: 33840120 PMCID: PMC8252639 DOI: 10.1111/ejn.15227] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 03/30/2021] [Accepted: 04/07/2021] [Indexed: 12/14/2022]

Revisiting the importance of model fitting for model-based fMRI: It does matter in computational psychiatry. PLoS Comput Biol 2021;17:e1008738. [PMID: 33561125 PMCID: PMC7899379 DOI: 10.1371/journal.pcbi.1008738] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 02/22/2021] [Accepted: 01/25/2021] [Indexed: 11/19/2022] Open

Abstract

Computational modeling has been applied for data analysis in psychology, neuroscience, and psychiatry. One of its important uses is to infer the latent variables underlying behavior by which researchers can evaluate corresponding neural, physiological, or behavioral measures. This feature is especially crucial for computational psychiatry, in which altered computational processes underlying mental disorders are of interest. For instance, several studies employing model-based fMRI-a method for identifying brain regions correlated with latent variables-have shown that patients with mental disorders (e.g., depression) exhibit diminished neural responses to reward prediction errors (RPEs), which are the differences between experienced and predicted rewards. Such model-based analysis has the drawback that the parameter estimates and inference of latent variables are not necessarily correct-rather, they usually contain some errors. A previous study theoretically and empirically showed that the error in model-fitting does not necessarily cause a serious error in model-based fMRI. However, the study did not deal with certain situations relevant to psychiatry, such as group comparisons between patients and healthy controls. We developed a theoretical framework to explore such situations. We demonstrate that the parameter-misspecification can critically affect the results of group comparison. We demonstrate that even if the RPE response in patients is completely intact, a spurious difference to healthy controls is observable. Such a situation occurs when the ground-truth learning rate differs between groups but a common learning rate is used, as per previous studies. Furthermore, even if the parameters are appropriately fitted to individual participants, spurious group differences in RPE responses are observable when the model lacks a component that differs between groups. These results highlight the importance of appropriate model-fitting and the need for caution when interpreting the results of model-based fMRI.

Collapse

Wiencke K, Horstmann A, Mathar D, Villringer A, Neumann J. Dopamine release, diffusion and uptake: A computational model for synaptic and volume transmission. PLoS Comput Biol 2020;16:e1008410. [PMID: 33253315 PMCID: PMC7728201 DOI: 10.1371/journal.pcbi.1008410] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 12/10/2020] [Accepted: 09/30/2020] [Indexed: 11/19/2022] Open

Abstract

Computational modeling of dopamine transmission is challenged by complex underlying mechanisms. Here we present a new computational model that (I) simultaneously regards release, diffusion and uptake of dopamine, (II) considers multiple terminal release events and (III) comprises both synaptic and volume transmission by incorporating the geometry of the synaptic cleft. We were able to validate our model in that it simulates concentration values comparable to physiological values observed in empirical studies. Further, although synaptic dopamine diffuses into extra-synaptic space, our model reflects a very localized signal occurring on the synaptic level, i.e. synaptic dopamine release is negligibly recognized by neighboring synapses. Moreover, increasing evidence suggests that cognitive performance can be predicted by signal variability of neuroimaging data (e.g. BOLD). Signal variability in target areas of dopaminergic neurons (striatum, cortex) may arise from dopamine concentration variability. On that account we compared spatio-temporal variability in a simulation mimicking normal dopamine transmission in striatum to scenarios of enhanced dopamine release and dopamine uptake inhibition. We found different variability characteristics between the three settings, which may in part account for differences in empirical observations. From a clinical perspective, differences in striatal dopaminergic signaling contribute to differential learning and reward processing, with relevant implications for addictive- and compulsive-like behavior. Specifically, dopaminergic tone is assumed to impact on phasic dopamine and hence on the integration of reward-related signals. However, in humans DA tone is classically assessed using PET, which is an indirect measure of endogenous DA availability and suffers from temporal and spatial resolution issues. We discuss how this can lead to discrepancies with observations from other methods such as microdialysis and show how computational modeling can help to refine our understanding of DA transmission.

Collapse

Tanimoto S, Kondo M, Morita K, Yoshida E, Matsuzaki M. Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward. Front Behav Neurosci 2020;14:141. [PMID: 33100979 PMCID: PMC7498735 DOI: 10.3389/fnbeh.2020.00141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/22/2020] [Indexed: 01/20/2023] Open

Abstract

“To do or not to do” is a fundamental decision that has to be made in daily life. Behaviors related to multiple “to do” choice tasks have long been explained by reinforcement learning, and “to do or not to do” tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are real life cases in which an action/non-action is repeated, even though there is no obvious reward or punishment, because implicitly given outcomes such as saving physical energy and regret (we refer to this as “covert reward”) can affect the decision-making. In the current task, mice chose to pull a lever or not according to two tone cues assigned with different water reward probabilities (70% and 30% in condition 1, and 30% and 10% in condition 2). As the mice learned, the probability that they would choose to pull the lever decreased (<0.25) in trials with a 30% reward probability cue (30% cue) in condition 1, and in trials with a 10% cue in condition 2, but increased (>0.8) in trials with a 70% cue in condition 1 and a 30% cue in condition 2, even though a non-pull was followed by neither an overt reward nor avoidance of overt punishment in any trial. This behavioral tendency was not well explained by a combination of commonly used Q-learning models, which take only the action choice with an overt reward outcome into account. Instead, we found that the non-action preference of the mice was best explained by Q-learning models, which regarded the non-action as the other choice, and updated non-action values with a covert reward. We propose that “doing nothing” can be actively chosen as an alternative to “doing something,” and that a covert reward could serve as a reinforcer of “doing nothing.”

Collapse

Bogacz R. Dopamine role in learning and action inference. eLife 2020;9:53262. [PMID: 32633715 PMCID: PMC7392608 DOI: 10.7554/elife.53262] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 07/06/2020] [Indexed: 01/02/2023] Open

Abstract

This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between a desired reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions.

In the brain, chemicals such as dopamine allow nerve cells to ‘talk’ to each other and to relay information from and to the environment. Dopamine, in particular, is released when pleasant surprises are experienced: this helps the organism to learn about the consequences of certain actions. If a new flavour of ice-cream tastes better than expected, for example, the release of dopamine tells the brain that this flavour is worth choosing again.

However, dopamine has an additional role in controlling movement. When the cells that produce dopamine die, for instance in Parkinson’s disease, individuals may find it difficult to initiate deliberate movements. Here, Rafal Bogacz aimed to develop a comprehensive framework that could reconcile the two seemingly unrelated roles played by dopamine.

The new theory proposes that dopamine is released when an outcome differs from expectations, which helps the organism to adjust and minimise these differences. In the ice-cream example, the difference is between how good the treat is expected to taste, and how tasty it really is. By learning to select the same flavour repeatedly, the brain aligns expectation and the result of the choice. This ability would also apply when movements are planned. In this case, the brain compares the desired reward with the predicted results of the planned actions. For example, while planning to get a spoonful of ice-cream, the brain compares the pleasure expected from the movement that is currently planned, and the pleasure of eating a full spoon of the treat. If the two differ, for example because no movement has been planned yet, the brain releases dopamine to form a better version of the action plan. The theory was then tested using a computer simulation of nerve cells that release dopamine; this showed that the behaviour of the virtual cells closely matched that of their real-life counterparts.

This work offers a comprehensive description of the fundamental role of dopamine in the brain. The model now needs to be verified through experiments on living nerve cells; ultimately, it could help doctors and researchers to develop better treatments for conditions such as Parkinson’s disease or ADHD, which are linked to a lack of dopamine.

Collapse

Song MR, Lee SW. Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity. Neural Netw 2020;126:95-107. [PMID: 32203877 DOI: 10.1016/j.neunet.2020.03.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 01/22/2020] [Accepted: 03/02/2020] [Indexed: 11/29/2022]

Adams RA, Moutoussis M, Nour MM, Dahoun T, Lewis D, Illingworth B, Veronese M, Mathys C, de Boer L, Guitart-Masip M, Friston KJ, Howes OD, Roiser JP. Variability in Action Selection Relates to Striatal Dopamine 2/3 Receptor Availability in Humans: A PET Neuroimaging Study Using Reinforcement Learning and Active Inference Models. Cereb Cortex 2020;30:3573-3589. [PMID: 32083297 PMCID: PMC7233027 DOI: 10.1093/cercor/bhz327] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 11/18/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022] Open

Affiliation(s)

Rick A Adams Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK.,Division of Psychiatry, University College London, London W1T 7NF, UK.,Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK
Michael Moutoussis Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK.,Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK
Matthew M Nour Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK.,Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London SE5 8AF, UK
Tarik Dahoun Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK.,Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford OX3 7JX, UK
Declan Lewis Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
Benjamin Illingworth Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
Mattia Veronese Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London SE5 8AF, UK
Christoph Mathys Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK.,Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy.,Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich and ETH Zurich, 8032 Zurich, Switzerland
Lieke de Boer Aging Research Center, Karolinska Institute, 171 65 Stockholm, Sweden
Marc Guitart-Masip Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK.,Aging Research Center, Karolinska Institute, 171 65 Stockholm, Sweden
Karl J Friston Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK
Oliver D Howes Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK.,Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London SE5 8AF, UK
Jonathan P Roiser Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK

Collapse

Joshi VV, Patel ND, Rehan MA, Kuppa A. Mysterious Mechanisms of Memory Formation: Are the Answers Hidden in Synapses? Cureus 2019;11:e5795. [PMID: 31728242 PMCID: PMC6827877 DOI: 10.7759/cureus.5795] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 09/28/2019] [Indexed: 12/18/2022] Open

Jordan J, Weidel P, Morrison A. A Closed-Loop Toolchain for Neural Network Simulations of Learning Autonomous Agents. Front Comput Neurosci 2019;13:46. [PMID: 31427939 PMCID: PMC6687756 DOI: 10.3389/fncom.2019.00046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 06/25/2019] [Indexed: 11/17/2022] Open

Moens V, Zénon A. Learning and forgetting using reinforced Bayesian change detection. PLoS Comput Biol 2019;15:e1006713. [PMID: 30995214 PMCID: PMC6488101 DOI: 10.1371/journal.pcbi.1006713] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 04/29/2019] [Accepted: 12/09/2018] [Indexed: 12/17/2022] Open

Möller M, Bogacz R. Learning the payoffs and costs of actions. PLoS Comput Biol 2019;15:e1006285. [PMID: 30818357 PMCID: PMC6413954 DOI: 10.1371/journal.pcbi.1006285] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 03/12/2019] [Accepted: 01/15/2019] [Indexed: 11/19/2022] Open

Abstract

A set of sub-cortical nuclei called basal ganglia is critical for learning the values of actions. The basal ganglia include two pathways, which have been associated with approach and avoid behavior respectively and are differentially modulated by dopamine projections from the midbrain. Inspired by the influential opponent actor learning model, we demonstrate that, under certain circumstances, these pathways may represent learned estimates of the positive and negative consequences (payoffs and costs) of individual actions. In the model, the level of dopamine activity encodes the motivational state and controls to what extent payoffs and costs enter the overall evaluation of actions. We show that a set of previously proposed plasticity rules is suitable to extract payoffs and costs from a prediction error signal if they occur at different moments in time. For those plasticity rules, successful learning requires differential effects of positive and negative outcome prediction errors on the two pathways and a weak decay of synaptic weights over trials. We also confirm through simulations that the model reproduces drug-induced changes of willingness to work, as observed in classical experiments with the D2-antagonist haloperidol.

The basal ganglia are structures underneath the surface of the vertebrate brain, associated with error-driven learning. Much is known about the anatomical and biological features of the basal ganglia; scientists now try to understand the algorithms implemented by these structures. Numerous models aspire to capture the learning functionality, but many of them only cover some specific aspect of the algorithm. Instead of further adding to that pool of partial models, we unify two existing ones—one which captures what the basal ganglia learn, and one that describes the learning mechanism itself. The first model suggests that the basal ganglia weigh positive against negative consequences of actions according to the motivational state. It hints how payoff and cost might be represented, but does not explain how those representations arise. The other model consists of biologically plausible plasticity rules, which describe how learning takes place, but not how the brain makes use of what is learned. We show that the two theories are compatible. Together, they form a model of learning and decision making that integrates the motivational state as well as the learned payoffs and costs of opportunities.

Collapse

Morita K, Kawaguchi Y. A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine. Front Neural Circuits 2019;12:111. [PMID: 30687019 PMCID: PMC6338031 DOI: 10.3389/fncir.2018.00111] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 11/29/2018] [Indexed: 01/07/2023] Open

Abstract

The hypothesis that the basal-ganglia direct and indirect pathways represent goodness (or benefit) and badness (or cost) of options, respectively, explains a wide range of phenomena. However, this hypothesis, named the Opponent Actor Learning (OpAL), still has limitations. Structurally, the OpAL model does not incorporate differentiation of the two types of cortical inputs to the basal-ganglia pathways received from intratelencephalic (IT) and pyramidal-tract (PT) neurons. Functionally, the OpAL model does not describe the temporal-difference (TD)-type reward-prediction-error (RPE), nor explains how RPE is calculated in the circuitry connecting to the DA neurons. In fact, there is a different hypothesis on the basal-ganglia pathways and DA, named the Cortico-Striatal-Temporal-Difference (CS-TD) model. The CS-TD model differentiates the IT and PT inputs, describes the TD-type RPE, and explains how TD-RPE is calculated. However, a critical difficulty in this model lies in its assumption that DA induces the same direction of plasticity in both direct and indirect pathways, which apparently contradicts the experimentally observed opposite effects of DA on these pathways. Here, we propose a new hypothesis that integrates the OpAL and CS-TD models. Specifically, we propose that the IT-basal-ganglia pathways represent goodness/badness of current options while the PT-indirect pathway represents the overall value of the previously chosen option, and both of these have influence on the DA neurons, through the basal-ganglia output, so that a variant of TD-RPE is calculated. A key assumption is that opposite directions of plasticity are induced upon phasic activation of DA neurons in the IT-indirect pathway and PT-indirect pathway because of different profiles of IT and PT inputs. Specifically, at PT→indirect-pathway-medium-spiny-neuron (iMSN) synapses, sustained glutamatergic inputs generate rich adenosine, which allosterically prevents DA-D2 receptor signaling and instead favors adenosine-A2A receptor signaling. Then, phasic DA-induced phasic adenosine, which reflects TD-RPE, causes long-term synaptic potentiation. In contrast, at IT→iMSN synapses where adenosine is scarce, phasic DA causes long-term synaptic depression via D2 receptor signaling. This new Opponency and Temporal-Difference (OTD) model provides unique predictions, part of which is potentially in line with recently reported activity patterns of neurons in the globus pallidus externus on the indirect pathway.

Collapse

Hallquist MN, Dombrovski AY. Selective maintenance of value information helps resolve the exploration/exploitation dilemma. Cognition 2018;183:226-243. [PMID: 30502584 DOI: 10.1016/j.cognition.2018.11.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 11/06/2018] [Accepted: 11/08/2018] [Indexed: 10/27/2022]

A Neural Circuit Mechanism for the Involvements of Dopamine in Effort-Related Choices: Decay of Learned Values, Secondary Effects of Depletion, and Calculation of Temporal Difference Error. eNeuro 2018;5:eN-NWR-0021-18. [PMID: 29468191 PMCID: PMC5820541 DOI: 10.1523/eneuro.0021-18.2018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/11/2018] [Indexed: 12/17/2022] Open

Colas JT, Pauli WM, Larsen T, Tyszka JM, O’Doherty JP. Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI. PLoS Comput Biol 2017;13:e1005810. [PMID: 29049406 PMCID: PMC5673235 DOI: 10.1371/journal.pcbi.1005810] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 11/06/2017] [Accepted: 10/09/2017] [Indexed: 11/19/2022] Open