1
|
Hitchcock PF, Frank MJ. The challenge of learning adaptive mental behavior. JOURNAL OF PSYCHOPATHOLOGY AND CLINICAL SCIENCE 2024; 133:413-426. [PMID: 38815082 PMCID: PMC11229419 DOI: 10.1037/abn0000924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Many psychotherapies aim to help people replace maladaptive mental behaviors (such as those leading to unproductive worry) with more adaptive ones (such as those leading to active problem solving). Yet, little is known empirically about how challenging it is to learn adaptive mental behaviors. Mental behaviors entail taking mental operations and thus may be more challenging to perform than motor actions; this challenge may enhance or impair learning. In particular, challenge when learning is often desirable because it improves retention. Yet, it is also plausible that the necessity of carrying out mental operations interferes with learning the expected values of mental actions by impeding credit assignment: the process of updating an action's value after reinforcement. Then, it may be more challenging not only to perform-but also to learn the consequences of-mental (vs. motor) behaviors. We designed a task to assess learning to take adaptive mental versus motor actions via matched probabilistic feedback. In two experiments (N = 300), most participants found it more difficult to learn to select optimal mental (vs. motor) actions, as evident in worse accuracy not only in a learning but also test (retention) phase. Computational modeling traced this impairment to an indicator of worse credit assignment (impaired construction and maintenance of expected values) when learning mental actions, accounting for worse accuracy in the learning and retention phases. The results suggest that people have particular difficulty learning adaptive mental behavior and pave the way for novel interventions to scaffold credit assignment and promote adaptive thinking. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Collapse
Affiliation(s)
- Peter F. Hitchcock
- Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI
- Department of Psychology, Emory University, Atlanta GA
| | - Michael J. Frank
- Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI
- Carney Institute for Brain Science, Brown University, Providence, RI
| |
Collapse
|
2
|
Wurm F, Ernst B, Steinhauser M. Surprise-minimization as a solution to the structural credit assignment problem. PLoS Comput Biol 2024; 20:e1012175. [PMID: 38805546 PMCID: PMC11175464 DOI: 10.1371/journal.pcbi.1012175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 06/13/2024] [Accepted: 05/18/2024] [Indexed: 05/30/2024] Open
Abstract
The structural credit assignment problem arises when the causal structure between actions and subsequent outcomes is hidden from direct observation. To solve this problem and enable goal-directed behavior, an agent has to infer structure and form a representation thereof. In the scope of this study, we investigate a possible solution in the human brain. We recorded behavioral and electrophysiological data from human participants in a novel variant of the bandit task, where multiple actions lead to multiple outcomes. Crucially, the mapping between actions and outcomes was hidden and not instructed to the participants. Human choice behavior revealed clear hallmarks of credit assignment and learning. Moreover, a computational model which formalizes action selection as the competition between multiple representations of the hidden structure was fit to account for participants data. Starting in a state of uncertainty about the correct representation, the central mechanism of this model is the arbitration of action control towards the representation which minimizes surprise about outcomes. Crucially, single-trial latent-variable analysis reveals that the neural patterns clearly support central quantitative predictions of this surprise minimization model. The results suggest that neural activity is not only related to reinforcement learning under correct as well as incorrect task representations but also reflects central mechanisms of credit assignment and behavioral arbitration.
Collapse
Affiliation(s)
- Franz Wurm
- Catholic University of Eichstätt-Ingolstadt, Eichstätt, Germany
- Leiden University, Leiden, the Netherlands
- Leiden Institute for Brain and Cognition, Leiden University, Leiden, the Netherlands
| | - Benjamin Ernst
- Catholic University of Eichstätt-Ingolstadt, Eichstätt, Germany
| | | |
Collapse
|
3
|
Zimmerman CA, Pan-Vazquez A, Wu B, Keppler EF, Guthman EM, Fetcho RN, Bolkan SS, McMannon B, Lee J, Hoag AT, Lynch LA, Janarthanan SR, López Luna JF, Bondy AG, Falkner AL, Wang SSH, Witten IB. A neural mechanism for learning from delayed postingestive feedback. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.06.561214. [PMID: 37873112 PMCID: PMC10592633 DOI: 10.1101/2023.10.06.561214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Animals learn the value of foods based on their postingestive effects and thereby develop aversions to foods that are toxic1-6 and preferences to those that are nutritious7-14. However, it remains unclear how the brain is able to assign credit to flavors experienced during a meal with postingestive feedback signals that can arise after a substantial delay. Here, we reveal an unexpected role for postingestive reactivation of neural flavor representations in this temporal credit assignment process. To begin, we leverage the fact that mice learn to associate novel15-18, but not familiar, flavors with delayed gastric malaise signals to investigate how the brain represents flavors that support aversive postingestive learning. Surveying cellular resolution brainwide activation patterns reveals that a network of amygdala regions is unique in being preferentially activated by novel flavors across every stage of the learning process: the initial meal, delayed malaise, and memory retrieval. By combining high-density recordings in the amygdala with optogenetic stimulation of genetically defined hindbrain malaise cells, we find that postingestive malaise signals potently and specifically reactivate amygdalar novel flavor representations from a recent meal. The degree of malaise-driven reactivation of individual neurons predicts strengthening of flavor responses upon memory retrieval, leading to stabilization of the population-level representation of the recently consumed flavor. In contrast, meals without postingestive consequences degrade neural flavor representations as flavors become familiar and safe. Thus, our findings demonstrate that interoceptive reactivation of amygdalar flavor representations provides a neural mechanism to resolve the temporal credit assignment problem inherent to postingestive learning.
Collapse
Affiliation(s)
| | | | - Bichan Wu
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Emma F Keppler
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Eartha Mae Guthman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Robert N Fetcho
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Scott S Bolkan
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Brenna McMannon
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Junuk Lee
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Austin T Hoag
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Laura A Lynch
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Juan F López Luna
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Adrian G Bondy
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Annegret L Falkner
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Samuel S-H Wang
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Ilana B Witten
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| |
Collapse
|
4
|
Lamba A, Nassar MR, FeldmanHall O. Prefrontal cortex state representations shape human credit assignment. eLife 2023; 12:e84888. [PMID: 37399050 PMCID: PMC10351919 DOI: 10.7554/elife.84888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 06/16/2023] [Indexed: 07/04/2023] Open
Abstract
People learn adaptively from feedback, but the rate of such learning differs drastically across individuals and contexts. Here, we examine whether this variability reflects differences in what is learned. Leveraging a neurocomputational approach that merges fMRI and an iterative reward learning task, we link the specificity of credit assignment-how well people are able to appropriately attribute outcomes to their causes-to the precision of neural codes in the prefrontal cortex (PFC). Participants credit task-relevant cues more precisely in social compared vto nonsocial contexts, a process that is mediated by high-fidelity (i.e., distinct and consistent) state representations in the PFC. Specifically, the medial PFC and orbitofrontal cortex work in concert to match the neural codes from feedback to those at choice, and the strength of these common neural codes predicts credit assignment precision. Together this work provides a window into how neural representations drive adaptive learning.
Collapse
Affiliation(s)
- Amrita Lamba
- Department of Cognitive Linguistic & Psychological Sciences, Brown UniversityProvidenceUnited States
| | - Matthew R Nassar
- Department of Neuroscience, Brown UniversityProvidenceUnited States
- Carney Institute of Brain Sciences, Brown UniversityProvidenceUnited States
| | - Oriel FeldmanHall
- Department of Cognitive Linguistic & Psychological Sciences, Brown UniversityProvidenceUnited States
- Carney Institute of Brain Sciences, Brown UniversityProvidenceUnited States
| |
Collapse
|
5
|
Csorba BA, Krause MR, Zanos TP, Pack CC. Long-range cortical synchronization supports abrupt visual learning. Curr Biol 2022; 32:2467-2479.e4. [PMID: 35523181 DOI: 10.1016/j.cub.2022.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/08/2022] [Accepted: 04/12/2022] [Indexed: 11/29/2022]
Abstract
Visual plasticity declines sharply after the critical period, yet we easily learn to recognize new faces and places, even as adults. Such learning is often characterized by a "moment of insight," an abrupt and dramatic improvement in recognition. The mechanisms that support abrupt learning are unknown, but one hypothesis is that they involve changes in synchronization between brain regions. To test this hypothesis, we used a behavioral task in which non-human primates rapidly learned to recognize novel images and to associate them with specific responses. Simultaneous recordings from inferotemporal and prefrontal cortices revealed a transient synchronization of neural activity between these areas that peaked around the moment of insight. Synchronization was strongest between inferotemporal sites that encoded images and reward-sensitive prefrontal sites. Moreover, its magnitude intensified gradually over image exposures, suggesting that abrupt learning is the culmination of a search for informative signals within a circuit linking sensory information to task demands.
Collapse
Affiliation(s)
- Bennett A Csorba
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada.
| | - Matthew R Krause
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | | | - Christopher C Pack
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| |
Collapse
|
6
|
Parker NF, Baidya A, Cox J, Haetzel LM, Zhukovskaya A, Murugan M, Engelhard B, Goldman MS, Witten IB. Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning. Cell Rep 2022; 39:110756. [PMID: 35584665 PMCID: PMC9218875 DOI: 10.1016/j.celrep.2022.110756] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 02/18/2022] [Accepted: 04/07/2022] [Indexed: 11/25/2022] Open
Abstract
How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens, which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex and midline regions of the thalamus. However, little is known about whether and how representations differ across these input pathways. By comparing these inputs during a reinforcement learning task in mice, we discovered that prelimbic cortical inputs preferentially represent actions and choices, whereas midline thalamic inputs preferentially represent cues. Choice-selective activity in the prelimbic cortical inputs is organized in sequences that persist beyond the outcome. Through computational modeling, we demonstrate that these sequences can support the neural implementation of reinforcement-learning algorithms, in both a circuit model based on synaptic plasticity and one based on neural dynamics. Finally, we test and confirm a prediction of our circuit models by direct manipulation of nucleus accumbens input neurons.
Collapse
Affiliation(s)
- Nathan F Parker
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Avinash Baidya
- Center for Neuroscience, University of California, Davis, Davis, CA 95616, USA; Department of Physics and Astronomy, University of California, Davis, Davis, CA 95616, USA
| | - Julia Cox
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA; Department of Neuroscience, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Laura M Haetzel
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Anna Zhukovskaya
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Malavika Murugan
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Ben Engelhard
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Mark S Goldman
- Center for Neuroscience, University of California, Davis, Davis, CA 95616, USA; Department of Neurobiology, Physiology and Behavior, University of California, Davis, Davis, CA 95616, USA; Department of Ophthalmology and Vision Science, University of California, Davis, Davis, CA 95616, USA.
| | - Ilana B Witten
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
7
|
Costa RM, Baxter DA, Byrne JH. Neuronal population activity dynamics reveal a low-dimensional signature of operant learning in Aplysia. Commun Biol 2022; 5:90. [PMID: 35075264 PMCID: PMC8786933 DOI: 10.1038/s42003-022-03044-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 01/07/2022] [Indexed: 11/24/2022] Open
Abstract
Learning engages a high-dimensional neuronal population space spanning multiple brain regions. However, it remains unknown whether it is possible to identify a low-dimensional signature associated with operant conditioning, a ubiquitous form of learning in which animals learn from the consequences of behavior. Using single-neuron resolution voltage imaging, here we identify two low-dimensional motor modules in the neuronal population underlying Aplysia feeding. Our findings point to a temporal shift in module recruitment as the primary signature of operant learning. Our findings can help guide characterization of learning signatures in systems in which only a smaller fraction of the relevant neuronal population can be monitored. Costa et al. use single-neuron resolution voltage imaging to identify two low-dimensional motor modules in the neuronal population underlying Aplysia feeding. Their findings point to a temporal shift in module recruitment as the primary signature of operant learning.
Collapse
|
8
|
Murray EA, Fellows LK. Prefrontal cortex interactions with the amygdala in primates. Neuropsychopharmacology 2022; 47:163-179. [PMID: 34446829 PMCID: PMC8616954 DOI: 10.1038/s41386-021-01128-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
This review addresses functional interactions between the primate prefrontal cortex (PFC) and the amygdala, with emphasis on their contributions to behavior and cognition. The interplay between these two telencephalic structures contributes to adaptive behavior and to the evolutionary success of all primate species. In our species, dysfunction in this circuitry creates vulnerabilities to psychopathologies. Here, we describe amygdala-PFC contributions to behaviors that have direct relevance to Darwinian fitness: learned approach and avoidance, foraging, predator defense, and social signaling, which have in common the need for flexibility and sensitivity to specific and rapidly changing contexts. Examples include the prediction of positive outcomes, such as food availability, food desirability, and various social rewards, or of negative outcomes, such as threats of harm from predators or conspecifics. To promote fitness optimally, these stimulus-outcome associations need to be rapidly updated when an associative contingency changes or when the value of a predicted outcome changes. We review evidence from nonhuman primates implicating the PFC, the amygdala, and their functional interactions in these processes, with links to experimental work and clinical findings in humans where possible.
Collapse
Affiliation(s)
| | - Lesley K Fellows
- Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| |
Collapse
|
9
|
Martinez MC, Zold CL, Coletti MA, Murer MG, Belluscio MA. Dorsal striatum coding for the timely execution of action sequences. eLife 2022; 11:74929. [PMID: 36426715 PMCID: PMC9699698 DOI: 10.7554/elife.74929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 10/27/2022] [Indexed: 11/27/2022] Open
Abstract
The automatic initiation of actions can be highly functional. But occasionally these actions cannot be withheld and are released at inappropriate times, impulsively. Striatal activity has been shown to participate in the timing of action sequence initiation and it has been linked to impulsivity. Using a self-initiated task, we trained adult male rats to withhold a rewarded action sequence until a waiting time interval has elapsed. By analyzing neuronal activity we show that the striatal response preceding the initiation of the learned sequence is strongly modulated by the time subjects wait before eliciting the sequence. Interestingly, the modulation is steeper in adolescent rats, which show a strong prevalence of impulsive responses compared to adults. We hypothesize this anticipatory striatal activity reflects the animals’ subjective reward expectation, based on the elapsed waiting time, while the steeper waiting modulation in adolescence reflects age-related differences in temporal discounting, internal urgency states, or explore–exploit balance.
Collapse
Affiliation(s)
- Maria Cecilia Martinez
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Fisiología, Biología Molecular y Celular “Dr. Héctor Maldonado”Buenos AiresArgentina,Universidad de Buenos Aires - CONICET, Instituto de Fisiología y Biofísica “Dr. Bernardo Houssay” (IFIBIO-Houssay), Grupo de Neurociencia de SistemasBuenos AiresArgentina
| | - Camila Lidia Zold
- Universidad de Buenos Aires - CONICET, Instituto de Fisiología y Biofísica “Dr. Bernardo Houssay” (IFIBIO-Houssay), Grupo de Neurociencia de SistemasBuenos AiresArgentina,Universidad de Buenos Aires, Facultad de Ciencias Médicas, Departamento de FisiologíaBuenos AiresArgentina
| | - Marcos Antonio Coletti
- Universidad de Buenos Aires - CONICET, Instituto de Fisiología y Biofísica “Dr. Bernardo Houssay” (IFIBIO-Houssay), Grupo de Neurociencia de SistemasBuenos AiresArgentina,Universidad de Buenos Aires, Facultad de Ciencias Médicas, Departamento de FisiologíaBuenos AiresArgentina
| | - Mario Gustavo Murer
- Universidad de Buenos Aires - CONICET, Instituto de Fisiología y Biofísica “Dr. Bernardo Houssay” (IFIBIO-Houssay), Grupo de Neurociencia de SistemasBuenos AiresArgentina,Universidad de Buenos Aires, Facultad de Ciencias Médicas, Departamento de FisiologíaBuenos AiresArgentina
| | - Mariano Andrés Belluscio
- Universidad de Buenos Aires - CONICET, Instituto de Fisiología y Biofísica “Dr. Bernardo Houssay” (IFIBIO-Houssay), Grupo de Neurociencia de SistemasBuenos AiresArgentina,Universidad de Buenos Aires, Facultad de Ciencias Médicas, Departamento de FisiologíaBuenos AiresArgentina
| |
Collapse
|
10
|
Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans. Nat Commun 2021; 12:3344. [PMID: 34099678 PMCID: PMC8184756 DOI: 10.1038/s41467-021-23704-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 05/06/2021] [Indexed: 11/17/2022] Open
Abstract
Whether maximizing rewards and minimizing punishments rely on distinct brain systems remains debated, given inconsistent results coming from human neuroimaging and animal electrophysiology studies. Bridging the gap across techniques, we recorded intracerebral activity from twenty participants while they performed an instrumental learning task. We found that both reward and punishment prediction errors (PE), estimated from computational modeling of choice behavior, correlate positively with broadband gamma activity (BGA) in several brain regions. In all cases, BGA scaled positively with the outcome (reward or punishment versus nothing) and negatively with the expectation (predictability of reward or punishment). However, reward PE were better signaled in some regions (such as the ventromedial prefrontal and lateral orbitofrontal cortex), and punishment PE in other regions (such as the anterior insula and dorsolateral prefrontal cortex). These regions might therefore belong to brain systems that differentially contribute to the repetition of rewarded choices and the avoidance of punished choices. Whether maximizing rewards and minimizing punishments rely on distinct brain learning systems remains debated. Here, using intracerebral recordings in humans, the authors provide evidence for brain regions differentially engaged in signaling reward and punishment prediction errors that prescribe repetition versus avoidance of past choices.
Collapse
|
11
|
Active maintenance of eligibility trace in rodent prefrontal cortex. Sci Rep 2020; 10:18860. [PMID: 33139778 PMCID: PMC7608665 DOI: 10.1038/s41598-020-75820-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/29/2020] [Indexed: 12/05/2022] Open
Abstract
Even though persistent neural activity has been proposed as a mechanism for maintaining eligibility trace, direct empirical evidence for active maintenance of eligibility trace has been lacking. We recorded neuronal activity in the medial prefrontal cortex (mPFC) in rats performing a dynamic foraging task in which a choice must be remembered until its outcome on the timescale of seconds for correct credit assignment. We found that mPFC neurons maintain significant choice signals during the time period between action selection and choice outcome. We also found that neural signals for choice, outcome, and action value converge in the mPFC when choice outcome was revealed. Our results indicate that the mPFC maintains choice signals necessary for temporal credit assignment in the form of persistent neural activity in our task. They also suggest that the mPFC might update action value by combining actively maintained eligibility trace with action value and outcome signals.
Collapse
|
12
|
Shen X, Zhang X, Huang Y, Chen S, Wang Y. Reinforcement Learning based Decoding Using Internal Reward for Time Delayed Task in Brain Machine Interfaces. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:3351-3354. [PMID: 33018722 DOI: 10.1109/embc44109.2020.9175964] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Reinforcement learning (RL) algorithm interprets neural signals into movement intentions with the guidance of the reward in Brain-machine interfaces (BMIs). Current RL algorithms generally work for the tasks with immediate rewards delivery, and lack of efficiency in delayed reward task. Prefrontal cortex, including medial prefrontal cortex(mPFC), has been demonstrated to assign credit to intermediate steps, which reinforces preceding action more efficiently. In this paper, we propose to simulate the functionality of mPFC activities as intermediate rewards to train a RL based decoder in a two-step movement task. A support vector machine (SVM) is adopted to verify if the subject expects a reward due to the accomplishment of a subtask from mPFC activity. Then this discrimination result will be utilized to guide the training of the RL decoder for each step respectively. Here, we apply the Sarsa-style attention-gated reinforcement learning (SAGREL) as the decoder to interpret motor cortex(M1) activity to action states. We test on in vivo primary motor cortex (M1) and mPFC data collected from rats, where the rats need to first trigger the start and then press lever for rewards using M1 signals. SAGREL using intermediate rewards from mPFC activities achieves a prediction accuracy of 66.8% ± 2.0.% (mean ± std) %, which is significantly better than the one using the reward by the end of trial (45.9.% ± 1.2%). This reveals the potentials of modelling mPFC activities as intermediate rewards for the delayed reward tasks.
Collapse
|
13
|
Phase of firing coding of learning variables across the fronto-striatal network during feature-based learning. Nat Commun 2020; 11:4669. [PMID: 32938940 PMCID: PMC7495418 DOI: 10.1038/s41467-020-18435-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 08/24/2020] [Indexed: 11/26/2022] Open
Abstract
The prefrontal cortex and striatum form a recurrent network whose spiking activity encodes multiple types of learning-relevant information. This spike-encoded information is evident in average firing rates, but finer temporal coding might allow multiplexing and enhanced readout across the connected network. We tested this hypothesis in the fronto-striatal network of nonhuman primates during reversal learning of feature values. We found that populations of neurons encoding choice outcomes, outcome prediction errors, and outcome history in their firing rates also carry significant information in their phase-of-firing at a 10–25 Hz band-limited beta frequency at which they synchronize across lateral prefrontal cortex, anterior cingulate cortex and anterior striatum when outcomes were processed. The phase-of-firing code exceeds information that can be obtained from firing rates alone and is evident for inter-areal connections between anterior cingulate cortex, lateral prefrontal cortex and anterior striatum. For the majority of connections, the phase-of-firing information gain is maximal at phases of the beta cycle that were offset from the preferred spiking phase of neurons. Taken together, these findings document enhanced information of three important learning variables at specific phases of firing in the beta cycle at an inter-areally shared beta oscillation frequency during goal-directed behavior. The average spiking frequency in the fronto-striatal network encodes multiple types of learning-relevant information. Here, the authors show that populations of neurons in non-human primates also carry significant information in their phase-of-firing when learning-relevant outcomes are processed.
Collapse
|
14
|
Specializations for reward-guided decision-making in the primate ventral prefrontal cortex. Nat Rev Neurosci 2019; 19:404-417. [PMID: 29795133 DOI: 10.1038/s41583-018-0013-4] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The estimated values of choices, and therefore decision-making based on those values, are influenced by both the chance that the chosen items or goods can be obtained (availability) and their current worth (desirability) as well as by the ability to link the estimated values to choices (a process sometimes called credit assignment). In primates, the prefrontal cortex (PFC) has been thought to contribute to each of these processes; however, causal relationships between particular subdivisions of the PFC and specific functions have been difficult to establish. Recent lesion-based research studies have defined the roles of two different parts of the primate PFC - the orbitofrontal cortex (OFC) and the ventral lateral frontal cortex (VLFC) - and their subdivisions in evaluating each of these factors and in mediating credit assignment during reward-based decision-making.
Collapse
|
15
|
Oemisch M, Westendorff S, Azimi M, Hassani SA, Ardid S, Tiesinga P, Womelsdorf T. Feature-specific prediction errors and surprise across macaque fronto-striatal circuits. Nat Commun 2019; 10:176. [PMID: 30635579 PMCID: PMC6329800 DOI: 10.1038/s41467-018-08184-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 12/20/2018] [Indexed: 01/23/2023] Open
Abstract
To adjust expectations efficiently, prediction errors need to be associated with the precise features that gave rise to the unexpected outcome, but this credit assignment may be problematic if stimuli differ on multiple dimensions and it is ambiguous which feature dimension caused the outcome. Here, we report a potential solution: neurons in four recorded areas of the anterior fronto-striatal networks encode prediction errors that are specific to feature values of different dimensions of attended multidimensional stimuli. The most ubiquitous prediction error occurred for the reward-relevant dimension. Feature-specific prediction error signals a) emerge on average shortly after non-specific prediction error signals, b) arise earliest in the anterior cingulate cortex and later in dorsolateral prefrontal cortex, caudate and ventral striatum, and c) contribute to feature-based stimulus selection after learning. Thus, a widely-distributed feature-specific eligibility trace may be used to update synaptic weights for improved feature-based attention. In order to adjust expectations efficiently, prediction errors need to be associated with the features that gave rise to the unexpected outcome. Here, the authors show that neurons in anterior fronto-striatal networks encode prediction errors that are specific to feature values of different stimulus dimensions.
Collapse
Affiliation(s)
- Mariann Oemisch
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada. .,Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06510, USA.
| | - Stephanie Westendorff
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada.,Institute of Neurobiology, University of Tübingen, Tübingen, 72076, Germany
| | - Marzyeh Azimi
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada
| | - Seyed Alireza Hassani
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada.,Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
| | - Salva Ardid
- Department of Mathematics and Statistics, Boston University, Boston, MA, 02215, USA
| | - Paul Tiesinga
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, 6525 EN, Netherlands
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada. .,Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA.
| |
Collapse
|
16
|
Gmaz JM, Carmichael JE, van der Meer MA. Persistent coding of outcome-predictive cue features in the rat nucleus accumbens. eLife 2018; 7:37275. [PMID: 30234485 PMCID: PMC6195350 DOI: 10.7554/elife.37275] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 09/15/2018] [Indexed: 01/09/2023] Open
Abstract
The nucleus accumbens (NAc) is important for learning from feedback, and for biasing and invigorating behaviour in response to cues that predict motivationally relevant outcomes. NAc encodes outcome-related cue features such as the magnitude and identity of reward. However, little is known about how features of cues themselves are encoded. We designed a decision making task where rats learned multiple sets of outcome-predictive cues, and recorded single-unit activity in the NAc during performance. We found that coding of cue identity and location occurred alongside coding of expected outcome. Furthermore, this coding persisted both during a delay period, after the rat made a decision and was waiting for an outcome, and after the outcome was revealed. Encoding of cue features in the NAc may enable contextual modulation of on-going behaviour, and provide an eligibility trace of outcome-predictive stimuli for updating stimulus-outcome associations to inform future behaviour.
Collapse
Affiliation(s)
- Jimmie M Gmaz
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States
| | - James E Carmichael
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States
| | | |
Collapse
|
17
|
Monitoring and Updating of Action Selection for Goal-Directed Behavior through the Striatal Direct and Indirect Pathways. Neuron 2018; 99:1302-1314.e5. [PMID: 30146299 DOI: 10.1016/j.neuron.2018.08.002] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 06/29/2018] [Accepted: 08/01/2018] [Indexed: 12/28/2022]
Abstract
The basal ganglia play key roles in adaptive behaviors guided by reward and punishment. However, despite accumulating knowledge, few studies have tested how heterogeneous signals in the basal ganglia are organized and coordinated for goal-directed behavior. In this study, we investigated neuronal signals of the direct and indirect pathways of the basal ganglia as rats performed a lever push/pull task for a probabilistic reward. In the dorsomedial striatum, we found that optogenetically and electrophysiologically identified direct pathway neurons encoded reward outcomes, whereas indirect pathway neurons encoded no-reward outcome and next-action selection. Outcome coding occurred in association with the chosen action. In support of pathway-specific neuronal coding, light activation induced a bias on repeat selection of the same action in the direct pathway, but on switch selection in the indirect pathway. Our data reveal the mechanisms underlying monitoring and updating of action selection for goal-directed behavior through basal ganglia circuits.
Collapse
|
18
|
Massi B, Donahue CH, Lee D. Volatility Facilitates Value Updating in the Prefrontal Cortex. Neuron 2018; 99:598-608.e4. [PMID: 30033151 DOI: 10.1016/j.neuron.2018.06.033] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 05/13/2018] [Accepted: 06/21/2018] [Indexed: 01/24/2023]
Abstract
Adaptation of learning and decision-making might depend on the regulation of activity in the prefrontal cortex. Here we examined how volatility of reward probabilities influences learning and neural activity in the primate prefrontal cortex. We found that animals selected recently rewarded targets more often when reward probabilities of different options fluctuated across trials than when they were fixed. Additionally, neurons in the orbitofrontal cortex displayed more sustained activity related to the outcomes of their previous choices when reward probabilities changed over time. Such volatility also enhanced signals in the dorsolateral prefrontal cortex related to the current but not the previous location of the previously rewarded target. These results suggest that prefrontal activity related to choice and reward is dynamically regulated by the volatility of the environment and underscore the role of the prefrontal cortex in identifying aspects of the environment that are responsible for previous outcomes and should be learned.
Collapse
Affiliation(s)
- Bart Massi
- Interdeparmental Neuroscience Program, Yale School of Medicine, New Haven, CT 06510, USA
| | | | - Daeyeol Lee
- Interdeparmental Neuroscience Program, Yale School of Medicine, New Haven, CT 06510, USA; Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA; Department of Psychiatry, Yale School of Medicine, New Haven, CT 06510, USA; Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA; Department of Psychology, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
19
|
Timme NM, Lapish C. A Tutorial for Information Theory in Neuroscience. eNeuro 2018; 5:ENEURO.0052-18.2018. [PMID: 30211307 PMCID: PMC6131830 DOI: 10.1523/eneuro.0052-18.2018] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 04/10/2018] [Accepted: 05/30/2018] [Indexed: 11/21/2022] Open
Abstract
Understanding how neural systems integrate, encode, and compute information is central to understanding brain function. Frequently, data from neuroscience experiments are multivariate, the interactions between the variables are nonlinear, and the landscape of hypothesized or possible interactions between variables is extremely broad. Information theory is well suited to address these types of data, as it possesses multivariate analysis tools, it can be applied to many different types of data, it can capture nonlinear interactions, and it does not require assumptions about the structure of the underlying data (i.e., it is model independent). In this article, we walk through the mathematics of information theory along with common logistical problems associated with data type, data binning, data quantity requirements, bias, and significance testing. Next, we analyze models inspired by canonical neuroscience experiments to improve understanding and demonstrate the strengths of information theory analyses. To facilitate the use of information theory analyses, and an understanding of how these analyses are implemented, we also provide a free MATLAB software package that can be applied to a wide range of data from neuroscience experiments, as well as from other fields of study.
Collapse
Affiliation(s)
- Nicholas M Timme
- Department of Psychology, Indiana University - Purdue University Indianapolis, 402 N. Blackford St, Indianapolis, IN 46202
| | - Christopher Lapish
- Department of Psychology, Indiana University - Purdue University Indianapolis, 402 N. Blackford St, Indianapolis, IN 46202
| |
Collapse
|
20
|
Stolyarova A. Solving the Credit Assignment Problem With the Prefrontal Cortex. Front Neurosci 2018; 12:182. [PMID: 29636659 PMCID: PMC5881225 DOI: 10.3389/fnins.2018.00182] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 03/06/2018] [Indexed: 12/13/2022] Open
Abstract
In naturalistic multi-cue and multi-step learning tasks, where outcomes of behavior are delayed in time, discovering which choices are responsible for rewards can present a challenge, known as the credit assignment problem. In this review, I summarize recent work that highlighted a critical role for the prefrontal cortex (PFC) in assigning credit where it is due in tasks where only a few of the multitude of cues or choices are relevant to the final outcome of behavior. Collectively, these investigations have provided compelling support for specialized roles of the orbitofrontal (OFC), anterior cingulate (ACC), and dorsolateral prefrontal (dlPFC) cortices in contingent learning. However, recent work has similarly revealed shared contributions and emphasized rich and heterogeneous response properties of neurons in these brain regions. Such functional overlap is not surprising given the complexity of reciprocal projections spanning the PFC. In the concluding section, I overview the evidence suggesting that the OFC, ACC and dlPFC communicate extensively, sharing the information about presented options, executed decisions and received rewards, which enables them to assign credit for outcomes to choices on which they are contingent. This account suggests that lesion or inactivation/inhibition experiments targeting a localized PFC subregion will be insufficient to gain a fine-grained understanding of credit assignment during learning and instead poses refined questions for future research, shifting the focus from focal manipulations to experimental techniques targeting cortico-cortical projections.
Collapse
Affiliation(s)
- Alexandra Stolyarova
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
21
|
Grossberg S. Desirability, availability, credit assignment, category learning, and attention: Cognitive-emotional and working memory dynamics of orbitofrontal, ventrolateral, and dorsolateral prefrontal cortices. Brain Neurosci Adv 2018; 2:2398212818772179. [PMID: 32166139 PMCID: PMC7058233 DOI: 10.1177/2398212818772179] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Accepted: 03/16/2018] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The prefrontal cortices play an essential role in cognitive-emotional and working memory processes through interactions with multiple brain regions. METHODS This article further develops a unified neural architecture that explains many recent and classical data about prefrontal function and makes testable predictions. RESULTS Prefrontal properties of desirability, availability, credit assignment, category learning, and feature-based attention are explained. These properties arise through interactions of orbitofrontal, ventrolateral prefrontal, and dorsolateral prefrontal cortices with the inferotemporal cortex, perirhinal cortex, parahippocampal cortices; ventral bank of the principal sulcus, ventral prearcuate gyrus, frontal eye fields, hippocampus, amygdala, basal ganglia, hypothalamus, and visual cortical areas V1, V2, V3A, V4, middle temporal cortex, medial superior temporal area, lateral intraparietal cortex, and posterior parietal cortex. Model explanations also include how the value of visual objects and events is computed, which objects and events cause desired consequences and which may be ignored as predictively irrelevant, and how to plan and act to realise these consequences, including how to selectively filter expected versus unexpected events, leading to movements towards, and conscious perception of, expected events. Modelled processes include reinforcement learning and incentive motivational learning; object and spatial working memory dynamics; and category learning, including the learning of object categories, value categories, object-value categories, and sequence categories, or list chunks. CONCLUSION This article hereby proposes a unified neural theory of prefrontal cortex and its functions.
Collapse
Affiliation(s)
- Stephen Grossberg
- Center for Adaptive Systems, Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, Biomedical Engineering, Boston University, Boston, MA, USA
| |
Collapse
|