1
|
Dubinsky JM, Hamid AA. The neuroscience of active learning and direct instruction. Neurosci Biobehav Rev 2024; 163:105737. [PMID: 38796122 DOI: 10.1016/j.neubiorev.2024.105737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/13/2024] [Accepted: 05/20/2024] [Indexed: 05/28/2024]
Abstract
Throughout the educational system, students experiencing active learning pedagogy perform better and fail less than those taught through direct instruction. Can this be ascribed to differences in learning from a neuroscientific perspective? This review examines mechanistic, neuroscientific evidence that might explain differences in cognitive engagement contributing to learning outcomes between these instructional approaches. In classrooms, direct instruction comprehensively describes academic content, while active learning provides structured opportunities for learners to explore, apply, and manipulate content. Synaptic plasticity and its modulation by arousal or novelty are central to all learning and both approaches. As a form of social learning, direct instruction relies upon working memory. The reinforcement learning circuit, associated agency, curiosity, and peer-to-peer social interactions combine to enhance motivation, improve retention, and build higher-order-thinking skills in active learning environments. When working memory becomes overwhelmed, additionally engaging the reinforcement learning circuit improves retention, providing an explanation for the benefits of active learning. This analysis provides a mechanistic examination of how emerging neuroscience principles might inform pedagogical choices at all educational levels.
Collapse
Affiliation(s)
- Janet M Dubinsky
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA.
| | - Arif A Hamid
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
2
|
Fascianelli V, Battista A, Stefanini F, Tsujimoto S, Genovesio A, Fusi S. Neural representational geometries reflect behavioral differences in monkeys and recurrent neural networks. Nat Commun 2024; 15:6479. [PMID: 39090091 PMCID: PMC11294567 DOI: 10.1038/s41467-024-50503-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
Animals likely use a variety of strategies to solve laboratory tasks. Traditionally, combined analysis of behavioral and neural recording data across subjects employing different strategies may obscure important signals and give confusing results. Hence, it is essential to develop techniques that can infer strategy at the single-subject level. We analyzed an experiment in which two male monkeys performed a visually cued rule-based task. The analysis of their performance shows no indication that they used a different strategy. However, when we examined the geometry of stimulus representations in the state space of the neural activities recorded in dorsolateral prefrontal cortex, we found striking differences between the two monkeys. Our purely neural results induced us to reanalyze the behavior. The new analysis showed that the differences in representational geometry are associated with differences in the reaction times, revealing behavioral differences we were unaware of. All these analyses suggest that the monkeys are using different strategies. Finally, using recurrent neural network models trained to perform the same task, we show that these strategies correlate with the amount of training, suggesting a possible explanation for the observed neural and behavioral differences.
Collapse
Affiliation(s)
- Valeria Fascianelli
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
| | - Aldo Battista
- Center for Neural Science, New York University, New York, NY, USA
| | - Fabio Stefanini
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | | | - Aldo Genovesio
- Department of Physiology and Pharmacology, Sapienza University of Rome, Rome, Italy.
| | - Stefano Fusi
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Department of Neuroscience, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA.
- Kavli Institute for Brain Science, Columbia University, New York, NY, USA.
| |
Collapse
|
3
|
Scott DN, Mukherjee A, Nassar MR, Halassa MM. Thalamocortical architectures for flexible cognition and efficient learning. Trends Cogn Sci 2024; 28:739-756. [PMID: 38886139 PMCID: PMC11305962 DOI: 10.1016/j.tics.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 05/12/2024] [Accepted: 05/13/2024] [Indexed: 06/20/2024]
Abstract
The brain exhibits a remarkable ability to learn and execute context-appropriate behaviors. How it achieves such flexibility, without sacrificing learning efficiency, is an important open question. Neuroscience, psychology, and engineering suggest that reusing and repurposing computations are part of the answer. Here, we review evidence that thalamocortical architectures may have evolved to facilitate these objectives of flexibility and efficiency by coordinating distributed computations. Recent work suggests that distributed prefrontal cortical networks compute with flexible codes, and that the mediodorsal thalamus provides regularization to promote efficient reuse. Thalamocortical interactions resemble hierarchical Bayesian computations, and their network implementation can be related to existing gating, synchronization, and hub theories of thalamic function. By reviewing recent findings and providing a novel synthesis, we highlight key research horizons integrating computation, cognition, and systems neuroscience.
Collapse
Affiliation(s)
- Daniel N Scott
- Department of Neuroscience, Brown University, Providence, RI, USA; Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| | - Arghya Mukherjee
- Department of Neuroscience, Tufts University School of Medicine, Boston, MA, USA
| | - Matthew R Nassar
- Department of Neuroscience, Brown University, Providence, RI, USA; Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, USA
| | - Michael M Halassa
- Department of Neuroscience, Tufts University School of Medicine, Boston, MA, USA; Department of Psychiatry, Tufts University School of Medicine, Boston, MA, USA.
| |
Collapse
|
4
|
Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta Learning With Graph Attention Networks for Low-Data Drug Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11218-11230. [PMID: 37028032 DOI: 10.1109/tnnls.2023.3250324] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Finding candidate molecules with favorable pharmacological activity, low toxicity, and proper pharmacokinetic properties is an important task in drug discovery. Deep neural networks have made impressive progress in accelerating and improving drug discovery. However, these techniques rely on a large amount of label data to form accurate predictions of molecular properties. At each stage of the drug discovery pipeline, usually, only a few biological data of candidate molecules and derivatives are available, indicating that the application of deep neural networks for low-data drug discovery is still a formidable challenge. Here, we propose a meta learning architecture with graph attention network, Meta-GAT, to predict molecular properties in low-data drug discovery. The GAT captures the local effects of atomic groups at the atom level through the triple attentional mechanism and implicitly captures the interactions between different atomic groups at the molecular level. GAT is used to perceive molecular chemical environment and connectivity, thereby effectively reducing sample complexity. Meta-GAT further develops a meta learning strategy based on bilevel optimization, which transfers meta knowledge from other attribute prediction tasks to low-data target tasks. In summary, our work demonstrates how meta learning can reduce the amount of data required to make meaningful predictions of molecules in low-data scenarios. Meta learning is likely to become the new learning paradigm in low-data drug discovery. The source code is publicly available at: https://github.com/lol88/Meta-GAT.
Collapse
|
5
|
Zhang R, Pitkow X, Angelaki DE. Inductive biases of neural network modularity in spatial navigation. SCIENCE ADVANCES 2024; 10:eadk1256. [PMID: 39028809 PMCID: PMC11259174 DOI: 10.1126/sciadv.adk1256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 06/14/2024] [Indexed: 07/21/2024]
Abstract
The brain may have evolved a modular architecture for daily tasks, with circuits featuring functionally specialized modules that match the task structure. We hypothesize that this architecture enables better learning and generalization than architectures with less specialized modules. To test this, we trained reinforcement learning agents with various neural architectures on a naturalistic navigation task. We found that the modular agent, with an architecture that segregates computations of state representation, value, and action into specialized modules, achieved better learning and generalization. Its learned state representation combines prediction and observation, weighted by their relative uncertainty, akin to recursive Bayesian estimation. This agent's behavior also resembles macaques' behavior more closely. Our results shed light on the possible rationale for the brain's modularity and suggest that artificial systems can use this insight from neuroscience to improve learning and generalization in natural tasks.
Collapse
Affiliation(s)
- Ruiyi Zhang
- Tandon School of Engineering, New York University, New York, NY, USA
| | - Xaq Pitkow
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Dora E. Angelaki
- Tandon School of Engineering, New York University, New York, NY, USA
- Center for Neural Science, New York University, New York, NY, USA
| |
Collapse
|
6
|
Cone I, Clopath C, Shouval HZ. Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time. Nat Commun 2024; 15:5856. [PMID: 38997276 PMCID: PMC11245539 DOI: 10.1038/s41467-024-50205-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 07/02/2024] [Indexed: 07/14/2024] Open
Abstract
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference learning (TD) learning, whereby certain units signal reward prediction errors (RPE). The TD algorithm has been traditionally mapped onto the dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, certain predictions of TD learning are inconsistent with experimental results, and previous implementations of the algorithm have made unscalable assumptions regarding stimulus-specific fixed temporal bases. We propose an alternate framework to describe dopamine signaling in the brain, FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, dopamine release is similar, but not identical to RPE, leading to predictions that contrast to those of TD. While FLEX itself is a general theoretical framework, we describe a specific, biophysically plausible implementation, the results of which are consistent with a preponderance of both existing and reanalyzed experimental data.
Collapse
Affiliation(s)
- Ian Cone
- Department of Bioengineering, Imperial College London, London, UK
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA
- Applied Physics Program, Rice University, Houston, TX, USA
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, London, UK
| | - Harel Z Shouval
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA.
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.
| |
Collapse
|
7
|
Lippl S, Kay K, Jensen G, Ferrera VP, Abbott LF. A mathematical theory of relational generalization in transitive inference. Proc Natl Acad Sci U S A 2024; 121:e2314511121. [PMID: 38968113 PMCID: PMC11252811 DOI: 10.1073/pnas.2314511121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 05/30/2024] [Indexed: 07/07/2024] Open
Abstract
Humans and animals routinely infer relations between different items or events and generalize these relations to novel combinations of items. This allows them to respond appropriately to radically novel circumstances and is fundamental to advanced cognition. However, how learning systems (including the brain) can implement the necessary inductive biases has been unclear. We investigated transitive inference (TI), a classic relational task paradigm in which subjects must learn a relation ([Formula: see text] and [Formula: see text]) and generalize it to new combinations of items ([Formula: see text]). Through mathematical analysis, we found that a broad range of biologically relevant learning models (e.g. gradient flow or ridge regression) perform TI successfully and recapitulate signature behavioral patterns long observed in living subjects. First, we found that models with item-wise additive representations automatically encode transitive relations. Second, for more general representations, a single scalar "conjunctivity factor" determines model behavior on TI and, further, the principle of norm minimization (a standard statistical inductive bias) enables models with fixed, partly conjunctive representations to generalize transitively. Finally, neural networks in the "rich regime," which enables representation learning and improves generalization on many tasks, unexpectedly show poor generalization and anomalous behavior on TI. We find that such networks implement a form of norm minimization (over hidden weights) that yields a local encoding mechanism lacking transitivity. Our findings show how minimal statistical learning principles give rise to a classical relational inductive bias (transitivity), explain empirically observed behaviors, and establish a formal approach to understanding the neural basis of relational abstraction.
Collapse
Affiliation(s)
- Samuel Lippl
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
| | - Kenneth Kay
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University, New York, NY10027
- Grossman Center for the Statistics of Mind, Columbia University, New York, NY10027
| | - Greg Jensen
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
- Department of Psychology, Reed College, Portland, OR97202
| | - Vincent P. Ferrera
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
- Department of Psychiatry, Columbia University Medical Center, New York, NY10032
| | - L. F. Abbott
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
| |
Collapse
|
8
|
Jensen KT, Hennequin G, Mattar MG. A recurrent network model of planning explains hippocampal replay and human behavior. Nat Neurosci 2024; 27:1340-1348. [PMID: 38849521 PMCID: PMC11239510 DOI: 10.1038/s41593-024-01675-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 05/07/2024] [Indexed: 06/09/2024]
Abstract
When faced with a novel situation, people often spend substantial periods of time contemplating possible futures. For such planning to be rational, the benefits to behavior must compensate for the time spent thinking. Here, we capture these features of behavior by developing a neural network model where planning itself is controlled by the prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences from its own policy, which we call 'rollouts'. In a spatial navigation task, the agent learns to plan when it is beneficial, which provides a normative explanation for empirical variability in human thinking times. Additionally, the patterns of policy rollouts used by the artificial agent closely resemble patterns of rodent hippocampal replays. Our work provides a theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by-and adaptively affect-prefrontal dynamics.
Collapse
Affiliation(s)
- Kristopher T Jensen
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK.
- Sainsbury Wellcome Centre, University College London, London, UK.
| | - Guillaume Hennequin
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK
| | - Marcelo G Mattar
- Department of Cognitive Science, University of California, San Diego, CA, USA
- Department of Psychology, New York University, New York, NY, USA
| |
Collapse
|
9
|
Hosoda K, Nishida K, Seno S, Mashita T, Kashioka H, Ohzawa I. A single fast Hebbian-like process enabling one-shot class addition in deep neural networks without backbone modification. Front Neurosci 2024; 18:1344114. [PMID: 38933813 PMCID: PMC11202076 DOI: 10.3389/fnins.2024.1344114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 05/16/2024] [Indexed: 06/28/2024] Open
Abstract
One-shot learning, the ability to learn a new concept from a single instance, is a distinctive brain function that has garnered substantial interest in machine learning. While modeling physiological mechanisms poses challenges, advancements in artificial neural networks have led to performances in specific tasks that rival human capabilities. Proposing one-shot learning methods with these advancements, especially those involving simple mechanisms, not only enhance technological development but also contribute to neuroscience by proposing functionally valid hypotheses. Among the simplest methods for one-shot class addition with deep learning image classifiers is "weight imprinting," which uses neural activity from a new class image data as the corresponding new synaptic weights. Despite its simplicity, its relevance to neuroscience is ambiguous, and it often interferes with original image classification, which is a significant drawback in practical applications. This study introduces a novel interpretation where a part of the weight imprinting process aligns with the Hebbian rule. We show that a single Hebbian-like process enables pre-trained deep learning image classifiers to perform one-shot class addition without any modification to the original classifier's backbone. Using non-parametric normalization to mimic brain's fast Hebbian plasticity significantly reduces the interference observed in previous methods. Our method is one of the simplest and most practical for one-shot class addition tasks, and its reliance on a single fast Hebbian-like process contributes valuable insights to neuroscience hypotheses.
Collapse
Affiliation(s)
- Kazufumi Hosoda
- Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan
- Life and Medical Sciences Area, Health Sciences Discipline, Kobe University, Kobe, Japan
| | - Keigo Nishida
- Laboratory for Computational Molecular Design, RIKEN Center for Biosystems Dynamics Research, Suita, Japan
| | - Shigeto Seno
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Suita, Japan
| | | | - Hideki Kashioka
- Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan
| | - Izumi Ohzawa
- Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan
| |
Collapse
|
10
|
Gong L, Pasqualetti F, Papouin T, Ching S. Astrocytes as a mechanism for contextually-guided network dynamics and function. PLoS Comput Biol 2024; 20:e1012186. [PMID: 38820533 PMCID: PMC11168681 DOI: 10.1371/journal.pcbi.1012186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 06/12/2024] [Accepted: 05/21/2024] [Indexed: 06/02/2024] Open
Abstract
Astrocytes are a ubiquitous and enigmatic type of non-neuronal cell and are found in the brain of all vertebrates. While traditionally viewed as being supportive of neurons, it is increasingly recognized that astrocytes play a more direct and active role in brain function and neural computation. On account of their sensitivity to a host of physiological covariates and ability to modulate neuronal activity and connectivity on slower time scales, astrocytes may be particularly well poised to modulate the dynamics of neural circuits in functionally salient ways. In the current paper, we seek to capture these features via actionable abstractions within computational models of neuron-astrocyte interaction. Specifically, we engage how nested feedback loops of neuron-astrocyte interaction, acting over separated time-scales, may endow astrocytes with the capability to enable learning in context-dependent settings, where fluctuations in task parameters may occur much more slowly than within-task requirements. We pose a general model of neuron-synapse-astrocyte interaction and use formal analysis to characterize how astrocytic modulation may constitute a form of meta-plasticity, altering the ways in which synapses and neurons adapt as a function of time. We then embed this model in a bandit-based reinforcement learning task environment, and show how the presence of time-scale separated astrocytic modulation enables learning over multiple fluctuating contexts. Indeed, these networks learn far more reliably compared to dynamically homogeneous networks and conventional non-network-based bandit algorithms. Our results fuel the notion that neuron-astrocyte interactions in the brain benefit learning over different time-scales and the conveyance of task-relevant contextual information onto circuit dynamics.
Collapse
Affiliation(s)
- Lulu Gong
- Department of Electrical and Systems Engineering, Washington University, St. Louis, Missouri, United States of America
| | - Fabio Pasqualetti
- Department of Mechanical Engineering, University of California, Riverside, California, United States of America
| | - Thomas Papouin
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - ShiNung Ching
- Department of Electrical and Systems Engineering, Washington University, St. Louis, Missouri, United States of America
| |
Collapse
|
11
|
Lakshminarasimhan KJ, Xie M, Cohen JD, Sauerbrei BA, Hantman AW, Litwin-Kumar A, Escola S. Specific connectivity optimizes learning in thalamocortical loops. Cell Rep 2024; 43:114059. [PMID: 38602873 PMCID: PMC11104520 DOI: 10.1016/j.celrep.2024.114059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 01/04/2024] [Accepted: 03/20/2024] [Indexed: 04/13/2024] Open
Abstract
Thalamocortical loops have a central role in cognition and motor control, but precisely how they contribute to these processes is unclear. Recent studies showing evidence of plasticity in thalamocortical synapses indicate a role for the thalamus in shaping cortical dynamics through learning. Since signals undergo a compression from the cortex to the thalamus, we hypothesized that the computational role of the thalamus depends critically on the structure of corticothalamic connectivity. To test this, we identified the optimal corticothalamic structure that promotes biologically plausible learning in thalamocortical synapses. We found that corticothalamic projections specialized to communicate an efference copy of the cortical output benefit motor control, while communicating the modes of highest variance is optimal for working memory tasks. We analyzed neural recordings from mice performing grasping and delayed discrimination tasks and found corticothalamic communication consistent with these predictions. These results suggest that the thalamus orchestrates cortical dynamics in a functionally precise manner through structured connectivity.
Collapse
Affiliation(s)
| | - Marjorie Xie
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Jeremy D Cohen
- Neuroscience Center, University of North Carolina, Chapel Hill, NC 27559, USA
| | - Britton A Sauerbrei
- Department of Neurosciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Adam W Hantman
- Neuroscience Center, University of North Carolina, Chapel Hill, NC 27559, USA
| | - Ashok Litwin-Kumar
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA.
| | - Sean Escola
- Department of Psychiatry, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
12
|
Menéndez JA, Hennig JA, Golub MD, Oby ER, Sadtler PT, Batista AP, Chase SM, Yu BM, Latham PE. A theory of brain-computer interface learning via low-dimensional control. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.589952. [PMID: 38712193 PMCID: PMC11071278 DOI: 10.1101/2024.04.18.589952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
A remarkable demonstration of the flexibility of mammalian motor systems is primates' ability to learn to control brain-computer interfaces (BCIs). This constitutes a completely novel motor behavior, yet primates are capable of learning to control BCIs under a wide range of conditions. BCIs with carefully calibrated decoders, for example, can be learned with only minutes to hours of practice. With a few weeks of practice, even BCIs with randomly constructed decoders can be learned. What are the biological substrates of this learning process? Here, we develop a theory based on a re-aiming strategy, whereby learning operates within a low-dimensional subspace of task-relevant inputs driving the local population of recorded neurons. Through comprehensive numerical and formal analysis, we demonstrate that this theory can provide a unifying explanation for disparate phenomena previously reported in three different BCI learning tasks, and we derive a novel experimental prediction that we verify with previously published data. By explicitly modeling the underlying neural circuitry, the theory reveals an interpretation of these phenomena in terms of biological constraints on neural activity.
Collapse
|
13
|
Subramoney A, Bellec G, Scherr F, Legenstein R, Maass W. Fast learning without synaptic plasticity in spiking neural networks. Sci Rep 2024; 14:8557. [PMID: 38609429 PMCID: PMC11015027 DOI: 10.1038/s41598-024-55769-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 02/27/2024] [Indexed: 04/14/2024] Open
Abstract
Spiking neural networks are of high current interest, both from the perspective of modelling neural networks of the brain and for porting their fast learning capability and energy efficiency into neuromorphic hardware. But so far we have not been able to reproduce fast learning capabilities of the brain in spiking neural networks. Biological data suggest that a synergy of synaptic plasticity on a slow time scale with network dynamics on a faster time scale is responsible for fast learning capabilities of the brain. We show here that a suitable orchestration of this synergy between synaptic plasticity and network dynamics does in fact reproduce fast learning capabilities of generic recurrent networks of spiking neurons. This points to the important role of recurrent connections in spiking networks, since these are necessary for enabling salient network dynamics. We show more specifically that the proposed synergy enables synaptic weights to encode more general information such as priors and task structures, since moment-to-moment processing of new information can be delegated to the network dynamics.
Collapse
Affiliation(s)
- Anand Subramoney
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
- Department of Computer Science, Royal Holloway University of London, Egham, UK
| | - Guillaume Bellec
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
- Laboratory of Computational Neuroscience, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Franz Scherr
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Robert Legenstein
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Wolfgang Maass
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria.
| |
Collapse
|
14
|
Pereira-Obilinovic U, Hou H, Svoboda K, Wang XJ. Brain mechanism of foraging: Reward-dependent synaptic plasticity versus neural integration of values. Proc Natl Acad Sci U S A 2024; 121:e2318521121. [PMID: 38551832 PMCID: PMC10998608 DOI: 10.1073/pnas.2318521121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/02/2024] Open
Abstract
During foraging behavior, action values are persistently encoded in neural activity and updated depending on the history of choice outcomes. What is the neural mechanism for action value maintenance and updating? Here, we explore two contrasting network models: synaptic learning of action value versus neural integration. We show that both models can reproduce extant experimental data, but they yield distinct predictions about the underlying biological neural circuits. In particular, the neural integrator model but not the synaptic model requires that reward signals are mediated by neural pools selective for action alternatives and their projections are aligned with linear attractor axes in the valuation system. We demonstrate experimentally observable neural dynamical signatures and feasible perturbations to differentiate the two contrasting scenarios, suggesting that the synaptic model is a more robust candidate mechanism. Overall, this work provides a modeling framework to guide future experimental research on probabilistic foraging.
Collapse
Affiliation(s)
- Ulises Pereira-Obilinovic
- Center for Neural Science, New York University, New York, NY10003
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Han Hou
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Karel Svoboda
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY10003
| |
Collapse
|
15
|
Mohebi A, Wei W, Pelattini L, Kim K, Berke JD. Dopamine transients follow a striatal gradient of reward time horizons. Nat Neurosci 2024; 27:737-746. [PMID: 38321294 PMCID: PMC11001583 DOI: 10.1038/s41593-023-01566-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 12/21/2023] [Indexed: 02/08/2024]
Abstract
Animals make predictions to guide their behavior and update those predictions through experience. Transient increases in dopamine (DA) are thought to be critical signals for updating predictions. However, it is unclear how this mechanism handles a wide range of behavioral timescales-from seconds or less (for example, if singing a song) to potentially hours or more (for example, if hunting for food). Here we report that DA transients in distinct rat striatal subregions convey prediction errors based on distinct time horizons. DA dynamics systematically accelerated from ventral to dorsomedial to dorsolateral striatum, in the tempo of spontaneous fluctuations, the temporal integration of prior rewards and the discounting of future rewards. This spectrum of timescales for evaluative computations can help achieve efficient learning and adaptive motivation for a broad range of behaviors.
Collapse
Affiliation(s)
- Ali Mohebi
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Wei Wei
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Lilian Pelattini
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Kyoungjun Kim
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Joshua D Berke
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California San Francisco, San Francisco, CA, USA.
- Neuroscience Graduate Program, University of California San Francisco, San Francisco, CA, USA.
- Kavli Institute for Fundamental Neuroscience, University of California San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
16
|
Wang X, Wang S, Liang X, Zhao D, Huang J, Xu X, Dai B, Miao Q. Deep Reinforcement Learning: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5064-5078. [PMID: 36170386 DOI: 10.1109/tnnls.2022.3207346] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Deep reinforcement learning (DRL) integrates the feature representation ability of deep learning with the decision-making ability of reinforcement learning so that it can achieve powerful end-to-end learning control capabilities. In the past decade, DRL has made substantial advances in many tasks that require perceiving high-dimensional input and making optimal or near-optimal decisions. However, there are still many challenging problems in the theory and applications of DRL, especially in learning control tasks with limited samples, sparse rewards, and multiple agents. Researchers have proposed various solutions and new theories to solve these problems and promote the development of DRL. In addition, deep learning has stimulated the further development of many subfields of reinforcement learning, such as hierarchical reinforcement learning (HRL), multiagent reinforcement learning, and imitation learning. This article gives a comprehensive overview of the fundamental theories, key algorithms, and primary research domains of DRL. In addition to value-based and policy-based DRL algorithms, the advances in maximum entropy-based DRL are summarized. The future research topics of DRL are also analyzed and discussed.
Collapse
|
17
|
Kay K, Biderman N, Khajeh R, Beiran M, Cueva CJ, Shohamy D, Jensen G, Wei XX, Ferrera VP, Abbott LF. Emergent neural dynamics and geometry for generalization in a transitive inference task. PLoS Comput Biol 2024; 20:e1011954. [PMID: 38662797 PMCID: PMC11125559 DOI: 10.1371/journal.pcbi.1011954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 05/24/2024] [Accepted: 02/28/2024] [Indexed: 05/25/2024] Open
Abstract
Relational cognition-the ability to infer relationships that generalize to novel combinations of objects-is fundamental to human and animal intelligence. Despite this importance, it remains unclear how relational cognition is implemented in the brain due in part to a lack of hypotheses and predictions at the levels of collective neural activity and behavior. Here we discovered, analyzed, and experimentally tested neural networks (NNs) that perform transitive inference (TI), a classic relational task (if A > B and B > C, then A > C). We found NNs that (i) generalized perfectly, despite lacking overt transitive structure prior to training, (ii) generalized when the task required working memory (WM), a capacity thought to be essential to inference in the brain, (iii) emergently expressed behaviors long observed in living subjects, in addition to a novel order-dependent behavior, and (iv) expressed different task solutions yielding alternative behavioral and neural predictions. Further, in a large-scale experiment, we found that human subjects performing WM-based TI showed behavior inconsistent with a class of NNs that characteristically expressed an intuitive task solution. These findings provide neural insights into a classical relational ability, with wider implications for how the brain realizes relational cognition.
Collapse
Affiliation(s)
- Kenneth Kay
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
- Grossman Center for the Statistics of Mind, Columbia University, New York, New York, United States of America
| | - Natalie Biderman
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Department of Psychology, Columbia University, New York, New York, United States of America
| | - Ramin Khajeh
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
| | - Manuel Beiran
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
| | - Christopher J. Cueva
- Department of Brain and Cognitive Sciences, MIT, Cambridge, Massachusetts, United States of America
| | - Daphna Shohamy
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Department of Psychology, Columbia University, New York, New York, United States of America
- The Kavli Institute for Brain Science, Columbia University, New York, New York, United States of America
| | - Greg Jensen
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Department of Neuroscience, Columbia University Medical Center, New York, New York, United States of America
- Department of Psychology at Reed College, Portland, Oregon, United States of America
| | - Xue-Xin Wei
- Departments of Neuroscience and Psychology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Vincent P. Ferrera
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Department of Neuroscience, Columbia University Medical Center, New York, New York, United States of America
- Department of Psychiatry, Columbia University Medical Center, New York, New York, United States of America
| | - LF Abbott
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
- The Kavli Institute for Brain Science, Columbia University, New York, New York, United States of America
- Department of Neuroscience, Columbia University Medical Center, New York, New York, United States of America
| |
Collapse
|
18
|
McNamee DC. The generative neural microdynamics of cognitive processing. Curr Opin Neurobiol 2024; 85:102855. [PMID: 38428170 DOI: 10.1016/j.conb.2024.102855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 02/06/2024] [Accepted: 02/07/2024] [Indexed: 03/03/2024]
Abstract
The entorhinal cortex and hippocampus form a recurrent network that informs many cognitive processes, including memory, planning, navigation, and imagination. Neural recordings from these regions reveal spatially organized population codes corresponding to external environments and abstract spaces. Aligning the former cognitive functionalities with the latter neural phenomena is a central challenge in understanding the entorhinal-hippocampal circuit (EHC). Disparate experiments demonstrate a surprising level of complexity and apparent disorder in the intricate spatiotemporal dynamics of sequential non-local hippocampal reactivations, which occur particularly, though not exclusively, during immobile pauses and rest. We review these phenomena with a particular focus on their apparent lack of physical simulative realism. These observations are then integrated within a theoretical framework and proposed neural circuit mechanisms that normatively characterize this neural complexity by conceiving different regimes of hippocampal microdynamics as neuromarkers of diverse cognitive computations.
Collapse
|
19
|
Kuroki S, Mizuseki K. CA3 Circuit Model Compressing Sequential Information in Theta Oscillation and Replay. Neural Comput 2024; 36:501-548. [PMID: 38457750 DOI: 10.1162/neco_a_01641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 11/20/2023] [Indexed: 03/10/2024]
Abstract
The hippocampus plays a critical role in the compression and retrieval of sequential information. During wakefulness, it achieves this through theta phase precession and theta sequences. Subsequently, during periods of sleep or rest, the compressed information reactivates through sharp-wave ripple events, manifesting as memory replay. However, how these sequential neuronal activities are generated and how they store information about the external environment remain unknown. We developed a hippocampal cornu ammonis 3 (CA3) computational model based on anatomical and electrophysiological evidence from the biological CA3 circuit to address these questions. The model comprises theta rhythm inhibition, place input, and CA3-CA3 plastic recurrent connection. The model can compress the sequence of the external inputs, reproduce theta phase precession and replay, learn additional sequences, and reorganize previously learned sequences. A gradual increase in synaptic inputs, controlled by interactions between theta-paced inhibition and place inputs, explained the mechanism of sequence acquisition. This model highlights the crucial role of plasticity in the CA3 recurrent connection and theta oscillational dynamics and hypothesizes how the CA3 circuit acquires, compresses, and replays sequential information.
Collapse
Affiliation(s)
- Satoshi Kuroki
- Department of Physiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, 545-8585, Japan
| | - Kenji Mizuseki
- Department of Physiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, 545-8585, Japan
| |
Collapse
|
20
|
Wolff M, Halassa MM. The mediodorsal thalamus in executive control. Neuron 2024; 112:893-908. [PMID: 38295791 DOI: 10.1016/j.neuron.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 11/15/2023] [Accepted: 01/03/2024] [Indexed: 03/23/2024]
Abstract
Executive control, the ability to organize thoughts and action plans in real time, is a defining feature of higher cognition. Classical theories have emphasized cortical contributions to this process, but recent studies have reinvigorated interest in the role of the thalamus. Although it is well established that local thalamic damage diminishes cognitive capacity, such observations have been difficult to inform functional models. Recent progress in experimental techniques is beginning to enrich our understanding of the anatomical, physiological, and computational substrates underlying thalamic engagement in executive control. In this review, we discuss this progress and particularly focus on the mediodorsal thalamus, which regulates the activity within and across frontal cortical areas. We end with a synthesis that highlights frontal thalamocortical interactions in cognitive computations and discusses its functional implications in normal and pathological conditions.
Collapse
Affiliation(s)
- Mathieu Wolff
- University of Bordeaux, CNRS, INCIA, UMR 5287, 33000 Bordeaux, France.
| | - Michael M Halassa
- Department of Neuroscience, Tufts University School of Medicine, Boston, MA, USA; Department of Psychiatry, Tufts University School of Medicine, Boston, MA, USA.
| |
Collapse
|
21
|
Jahn CI, Markov NT, Morea B, Daw ND, Ebitz RB, Buschman TJ. Learning attentional templates for value-based decision-making. Cell 2024; 187:1476-1489.e21. [PMID: 38401541 DOI: 10.1016/j.cell.2024.01.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/18/2023] [Accepted: 01/25/2024] [Indexed: 02/26/2024]
Abstract
Attention filters sensory inputs to enhance task-relevant information. It is guided by an "attentional template" that represents the stimulus features that are currently relevant. To understand how the brain learns and uses templates, we trained monkeys to perform a visual search task that required them to repeatedly learn new attentional templates. Neural recordings found that templates were represented across the prefrontal and parietal cortex in a structured manner, such that perceptually neighboring templates had similar neural representations. When the task changed, a new attentional template was learned by incrementally shifting the template toward rewarded features. Finally, we found that attentional templates transformed stimulus features into a common value representation that allowed the same decision-making mechanisms to deploy attention, regardless of the identity of the template. Altogether, our results provide insight into the neural mechanisms by which the brain learns to control attention and how attention can be flexibly deployed across tasks.
Collapse
Affiliation(s)
- Caroline I Jahn
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA.
| | - Nikola T Markov
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA
| | - Britney Morea
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA
| | - Nathaniel D Daw
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| | - R Becket Ebitz
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Neurosciences, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Timothy J Buschman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Psychology, Princeton University, Princeton, NJ 08540, USA.
| |
Collapse
|
22
|
Carter F, Cossette MP, Trujillo-Pisanty I, Pallikaras V, Breton YA, Conover K, Caplan J, Solis P, Voisard J, Yaksich A, Shizgal P. Does phasic dopamine release cause policy updates? Eur J Neurosci 2024; 59:1260-1277. [PMID: 38039083 DOI: 10.1111/ejn.16199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 10/21/2023] [Accepted: 11/01/2023] [Indexed: 12/03/2023]
Abstract
Phasic dopamine activity is believed to both encode reward-prediction errors (RPEs) and to cause the adaptations that these errors engender. If so, a rat working for optogenetic stimulation of dopamine neurons will repeatedly update its policy and/or action values, thus iteratively increasing its work rate. Here, we challenge this view by demonstrating stable, non-maximal work rates in the face of repeated optogenetic stimulation of midbrain dopamine neurons. Furthermore, we show that rats learn to discriminate between world states distinguished only by their history of dopamine activation. Comparison of these results to reinforcement learning simulations suggests that the induced dopamine transients acted more as rewards than RPEs. However, pursuit of dopaminergic stimulation drifted upwards over a time scale of days and weeks, despite its stability within trials. To reconcile the results with prior findings, we consider multiple roles for dopamine signalling.
Collapse
Affiliation(s)
- Francis Carter
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
- Montreal Institute for Learning Algorithms, Université de Montréal, Montreal, Quebec, Canada
| | | | - Ivan Trujillo-Pisanty
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
- Department of Psychology, Langara College, Vancouver, British Columbia, Canada
| | | | | | - Kent Conover
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
| | - Jill Caplan
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
| | - Pavel Solis
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
| | - Jacques Voisard
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
| | - Alexandra Yaksich
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
| | - Peter Shizgal
- Department of Psychology, Concordia University, Montreal, Quebec, Canada
| |
Collapse
|
23
|
Muller TH, Butler JL, Veselic S, Miranda B, Wallis JD, Dayan P, Behrens TEJ, Kurth-Nelson Z, Kennerley SW. Distributional reinforcement learning in prefrontal cortex. Nat Neurosci 2024; 27:403-408. [PMID: 38200183 PMCID: PMC10917656 DOI: 10.1038/s41593-023-01535-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/29/2023] [Indexed: 01/12/2024]
Abstract
The prefrontal cortex is crucial for learning and decision-making. Classic reinforcement learning (RL) theories center on learning the expectation of potential rewarding outcomes and explain a wealth of neural data in the prefrontal cortex. Distributional RL, on the other hand, learns the full distribution of rewarding outcomes and better explains dopamine responses. In the present study, we show that distributional RL also better explains macaque anterior cingulate cortex neuronal responses, suggesting that it is a common mechanism for reward-guided learning.
Collapse
Affiliation(s)
- Timothy H Muller
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
- Department of Clinical and Movement Neurosciences, University College London, London, UK.
| | - James L Butler
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Department of Clinical and Movement Neurosciences, University College London, London, UK
| | - Sebastijan Veselic
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Department of Clinical and Movement Neurosciences, University College London, London, UK
- Wellcome Trust Centre for Human Neuroimaging, University College London, London, UK
| | - Bruno Miranda
- Department of Clinical and Movement Neurosciences, University College London, London, UK
- Institute of Physiology and Institute of Molecular Medicine, Lisbon School of Medicine, University of Lisbon, Lisbon, Portugal
| | - Joni D Wallis
- Department of Psychology and Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- University of Tübingen, Tübingen, Germany
| | - Timothy E J Behrens
- Wellcome Trust Centre for Human Neuroimaging, University College London, London, UK
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford, UK
- Sainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London, UK
| | - Zeb Kurth-Nelson
- Google DeepMind, London, UK.
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
| | - Steven W Kennerley
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
- Department of Clinical and Movement Neurosciences, University College London, London, UK.
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford, UK.
| |
Collapse
|
24
|
Simoens J, Verguts T, Braem S. Learning environment-specific learning rates. PLoS Comput Biol 2024; 20:e1011978. [PMID: 38517916 PMCID: PMC10990245 DOI: 10.1371/journal.pcbi.1011978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 04/03/2024] [Accepted: 03/09/2024] [Indexed: 03/24/2024] Open
Abstract
People often have to switch back and forth between different environments that come with different problems and volatilities. While volatile environments require fast learning (i.e., high learning rates), stable environments call for lower learning rates. Previous studies have shown that people adapt their learning rates, but it remains unclear whether they can also learn about environment-specific learning rates, and instantaneously retrieve them when revisiting environments. Here, using optimality simulations and hierarchical Bayesian analyses across three experiments, we show that people can learn to use different learning rates when switching back and forth between two different environments. We even observe a signature of these environment-specific learning rates when the volatility of both environments is suddenly the same. We conclude that humans can flexibly adapt and learn to associate different learning rates to different environments, offering important insights for developing theories of meta-learning and context-specific control.
Collapse
Affiliation(s)
- Jonas Simoens
- Department of Experimental Psychology, Ghent University, Belgium
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Belgium
| | - Senne Braem
- Department of Experimental Psychology, Ghent University, Belgium
| |
Collapse
|
25
|
Hocker D, Constantinople CM, Savin C. Curriculum learning inspired by behavioral shaping trains neural networks to adopt animal-like decision making strategies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575461. [PMID: 38318205 PMCID: PMC10843159 DOI: 10.1101/2024.01.12.575461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Recurrent neural networks (RNN) are ubiquitously used in neuroscience to capture both neural dynamics and behaviors of living systems. However, when it comes to complex cognitive tasks, traditional methods for training RNNs can fall short in capturing crucial aspects of animal behavior. To address this challenge, we take inspiration from a commonly used (though rarely appreciated) approach from the experimental neuroscientist's toolkit: behavioral shaping. Our solution leverages task compositionality and models the animal's relevant learning experiences prior to the task. Taking as target a temporal wagering task previously studied in rats, we designed a pretraining curriculum of simpler cognitive tasks that are prerequisites for performing it well. These pretraining tasks are not just simplified versions of the temporal wagering task, but reflect relevant sub-computations. We show that this approach is required for RNNs to adopt similar strategies as rats, including long-timescale inference of latent states, which conventional pretraining approaches fail to capture. Mechanistically, our pretraining supports the development of key dynamical systems features needed for implementing both inference and value-based decision making. Overall, our approach addresses a gap in neural network model training by incorporating inductive biases of animals, which is important when modeling complex behaviors that rely on computational abilities acquired from past experiences.
Collapse
|
26
|
Wientjes S, Holroyd CB. The successor representation subserves hierarchical abstraction for goal-directed behavior. PLoS Comput Biol 2024; 20:e1011312. [PMID: 38377074 PMCID: PMC10906840 DOI: 10.1371/journal.pcbi.1011312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/01/2024] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
Humans have the ability to craft abstract, temporally extended and hierarchically organized plans. For instance, when considering how to make spaghetti for dinner, we typically concern ourselves with useful "subgoals" in the task, such as cutting onions, boiling pasta, and cooking a sauce, rather than particulars such as how many cuts to make to the onion, or exactly which muscles to contract. A core question is how such decomposition of a more abstract task into logical subtasks happens in the first place. Previous research has shown that humans are sensitive to a form of higher-order statistical learning named "community structure". Community structure is a common feature of abstract tasks characterized by a logical ordering of subtasks. This structure can be captured by a model where humans learn predictions of upcoming events multiple steps into the future, discounting predictions of events further away in time. One such model is the "successor representation", which has been argued to be useful for hierarchical abstraction. As of yet, no study has convincingly shown that this hierarchical abstraction can be put to use for goal-directed behavior. Here, we investigate whether participants utilize learned community structure to craft hierarchically informed action plans for goal-directed behavior. Participants were asked to search for paintings in a virtual museum, where the paintings were grouped together in "wings" representing community structure in the museum. We find that participants' choices accord with the hierarchical structure of the museum and that their response times are best predicted by a successor representation. The degree to which the response times reflect the community structure of the museum correlates with several measures of performance, including the ability to craft temporally abstract action plans. These results suggest that successor representation learning subserves hierarchical abstractions relevant for goal-directed behavior.
Collapse
Affiliation(s)
- Sven Wientjes
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Clay B. Holroyd
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
27
|
Wise T, Emery K, Radulescu A. Naturalistic reinforcement learning. Trends Cogn Sci 2024; 28:144-158. [PMID: 37777463 PMCID: PMC10878983 DOI: 10.1016/j.tics.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 10/02/2023]
Abstract
Humans possess a remarkable ability to make decisions within real-world environments that are expansive, complex, and multidimensional. Human cognitive computational neuroscience has sought to exploit reinforcement learning (RL) as a framework within which to explain human decision-making, often focusing on constrained, artificial experimental tasks. In this article, we review recent efforts that use naturalistic approaches to determine how humans make decisions in complex environments that better approximate the real world, providing a clearer picture of how humans navigate the challenges posed by real-world decisions. These studies purposely embed elements of naturalistic complexity within experimental paradigms, rather than focusing on simplification, generating insights into the processes that likely underpin humans' ability to navigate complex, multidimensional real-world environments so successfully.
Collapse
Affiliation(s)
- Toby Wise
- Department of Neuroimaging, King's College London, London, UK.
| | - Kara Emery
- Center for Data Science, New York University, New York, NY, USA
| | - Angela Radulescu
- Center for Computational Psychiatry, Icahn School of Medicine at Mt. Sinai, New York, NY, USA
| |
Collapse
|
28
|
Blanco-Pozo M, Akam T, Walton ME. Dopamine-independent effect of rewards on choices through hidden-state inference. Nat Neurosci 2024; 27:286-297. [PMID: 38216649 PMCID: PMC10849965 DOI: 10.1038/s41593-023-01542-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 12/01/2023] [Indexed: 01/14/2024]
Abstract
Dopamine is implicated in adaptive behavior through reward prediction error (RPE) signals that update value estimates. There is also accumulating evidence that animals in structured environments can use inference processes to facilitate behavioral flexibility. However, it is unclear how these two accounts of reward-guided decision-making should be integrated. Using a two-step task for mice, we show that dopamine reports RPEs using value information inferred from task structure knowledge, alongside information about reward rate and movement. Nonetheless, although rewards strongly influenced choices and dopamine activity, neither activating nor inhibiting dopamine neurons at trial outcome affected future choice. These data were recapitulated by a neural network model where cortex learned to track hidden task states by predicting observations, while basal ganglia learned values and actions via RPEs. This shows that the influence of rewards on choices can stem from dopamine-independent information they convey about the world's state, not the dopaminergic RPEs they produce.
Collapse
Affiliation(s)
- Marta Blanco-Pozo
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| | - Thomas Akam
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| | - Mark E Walton
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| |
Collapse
|
29
|
Valentin S, Kleinegesse S, Bramley NR, Seriès P, Gutmann MU, Lucas CG. Designing optimal behavioral experiments using machine learning. eLife 2024; 13:e86224. [PMID: 38261382 PMCID: PMC10805374 DOI: 10.7554/elife.86224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 11/19/2023] [Indexed: 01/24/2024] Open
Abstract
Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code to replicate all analyses as well as tutorial notebooks and pointers to adapt the methodology to different experimental settings.
Collapse
Affiliation(s)
- Simon Valentin
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | | | - Neil R Bramley
- Department of Psychology, University of EdinburghEdinburghUnited Kingdom
| | - Peggy Seriès
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | - Michael U Gutmann
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | | |
Collapse
|
30
|
Algermissen J, Swart JC, Scheeringa R, Cools R, den Ouden HEM. Prefrontal signals precede striatal signals for biased credit assignment in motivational learning biases. Nat Commun 2024; 15:19. [PMID: 38168089 PMCID: PMC10762147 DOI: 10.1038/s41467-023-44632-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 12/22/2023] [Indexed: 01/05/2024] Open
Abstract
Actions are biased by the outcomes they can produce: Humans are more likely to show action under reward prospect, but hold back under punishment prospect. Such motivational biases derive not only from biased response selection, but also from biased learning: humans tend to attribute rewards to their own actions, but are reluctant to attribute punishments to having held back. The neural origin of these biases is unclear. Specifically, it remains open whether motivational biases arise primarily from the architecture of subcortical regions or also reflect cortical influences, the latter being typically associated with increased behavioral flexibility and control beyond stereotyped behaviors. Simultaneous EEG-fMRI allowed us to track which regions encoded biased prediction errors in which order. Biased prediction errors occurred in cortical regions (dorsal anterior and posterior cingulate cortices) before subcortical regions (striatum). These results highlight that biased learning is not a mere feature of the basal ganglia, but arises through prefrontal cortical contributions, revealing motivational biases to be a potentially flexible, sophisticated mechanism.
Collapse
Affiliation(s)
- Johannes Algermissen
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Jennifer C Swart
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
| | - René Scheeringa
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- Erwin L. Hahn Institute for Magnetic Resonance Imaging, University of Duisburg-Essen, Essen, Germany
| | - Roshan Cools
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- Department of Psychiatry, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Hanneke E M den Ouden
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| |
Collapse
|
31
|
Seifert G, Sealander A, Marzen S, Levin M. From reinforcement learning to agency: Frameworks for understanding basal cognition. Biosystems 2024; 235:105107. [PMID: 38128873 DOI: 10.1016/j.biosystems.2023.105107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/17/2023] [Accepted: 12/17/2023] [Indexed: 12/23/2023]
Abstract
Organisms play, explore, and mimic those around them. Is there a purpose to this behavior? Are organisms just behaving, or are they trying to achieve goals? We believe this is a false dichotomy. To that end, to understand organisms, we attempt to unify two approaches for understanding complex agents, whether evolved or engineered. We argue that formalisms describing multiscale competencies and goal-directedness in biology (e.g., TAME), and reinforcement learning (RL), can be combined in a symbiotic framework. While RL has been largely focused on higher-level organisms and robots of high complexity, TAME is naturally capable of describing lower-level organisms and minimal agents as well. We propose several novel questions that come from using RL/TAME to understand biology as well as ones that come from using biology to formulate new theory in AI. We hope that the research programs proposed in this piece shape future efforts to understand biological organisms and also future efforts to build artificial agents.
Collapse
Affiliation(s)
- Gabriella Seifert
- Department of Physics, University of Colorado, Boulder, CO 80309, USA; W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna College, Claremont, CA 91711, USA
| | - Ava Sealander
- Department of Electrical Engineering, School of Engineering and Applied Sciences, Columbia University, New York, NY 10027, USA; W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna College, Claremont, CA 91711, USA
| | - Sarah Marzen
- W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna College, Claremont, CA 91711, USA.
| | - Michael Levin
- Department of Biology, Tufts University, Medford, MA 02155, USA; Allen Discovery Center at Tufts University, Medford, MA 02155, USA
| |
Collapse
|
32
|
Leimar O, Quiñones AE, Bshary R. Flexible learning in complex worlds. Behav Ecol 2024; 35:arad109. [PMID: 38162692 PMCID: PMC10756056 DOI: 10.1093/beheco/arad109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/23/2023] [Accepted: 12/03/2023] [Indexed: 01/03/2024] Open
Abstract
Cognitive flexibility can enhance the ability to adjust to changing environments. Here, we use learning simulations to investigate the possible advantages of flexible learning in volatile (changing) environments. We compare two established learning mechanisms, one with constant learning rates and one with rates that adjust to volatility. We study an ecologically relevant case of volatility, based on observations of developing cleaner fish Labroides dimidiatus that experience a transition from a simpler to a more complex foraging environment. There are other similar transitions in nature, such as migrating to a new and different habitat. We also examine two traditional approaches to volatile environments in experimental psychology and behavioral ecology: reversal learning, and learning set formation (consisting of a sequence of different discrimination tasks). These provide experimental measures of cognitive flexibility. Concerning transitions to a complex world, we show that both constant and flexible learning rates perform well, losing only a small proportion of available rewards in the period after a transition, but flexible rates perform better than constant rates. For reversal learning, flexible rates improve the performance with each successive reversal because of increasing learning rates, but this does not happen for constant rates. For learning set formation, we find no improvement in performance with successive shifts to new stimuli to discriminate for either flexible or constant learning rates. Flexible learning rates might thus explain increasing performance in reversal learning but not in learning set formation, and this can shed light on the nature of cognitive flexibility in a given system.
Collapse
Affiliation(s)
- Olof Leimar
- Department of Zoology, Stockholm University, 106 91 Stockholm, Sweden and
| | - Andrés E Quiñones
- Institute of Biology, University of Neuchâtel, Emile-Argand 11, 2000 Neuchâtel, Switzerland
| | - Redouan Bshary
- Institute of Biology, University of Neuchâtel, Emile-Argand 11, 2000 Neuchâtel, Switzerland
| |
Collapse
|
33
|
Quispe Escudero D. It's all about making new contacts: How being metabotropic and phasicity help D1-like receptors promote LTP in the PFC. Prog Neuropsychopharmacol Biol Psychiatry 2023; 127:110784. [PMID: 37169273 DOI: 10.1016/j.pnpbp.2023.110784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 04/23/2023] [Accepted: 05/04/2023] [Indexed: 05/13/2023]
Abstract
D1-like receptors have two important qualities, they are all metabotropic and they activate with phasic dopamine. After analyzing the molecular implications of each of these qualities separately and then combining them for the specific case of the prefrontal cortex, we propose a model that explains why long term potentiation in this cortical area depends on the amount of contact between D1-like receptors and dopamine. This simple model also explains why in order to promote long term potentiation, dopamine transporters should be scarce in the prefrontal cortex. Additionally, it explains why stimulants like methamphetamine could have such detrimental cognitive effects on regular substance consumers.
Collapse
Affiliation(s)
- David Quispe Escudero
- Departamento de Psicobiología, Facultad de Psicología, Universidad Complutense de Madrid, Madrid E-28040, Spain.
| |
Collapse
|
34
|
Bernotat J, Landolfi L, Pasquali D, Nardelli A, Rea F. Remember me - user-centered implementation of working memory architectures on an industrial robot. Front Robot AI 2023; 10:1257690. [PMID: 38116169 PMCID: PMC10728719 DOI: 10.3389/frobt.2023.1257690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/16/2023] [Indexed: 12/21/2023] Open
Abstract
The present research is innovative as we followed a user-centered approach to implement and train two working memory architectures on an industrial RB-KAIROS + robot: GRU, a state-of-the-art architecture, and WorkMATe, a biologically-inspired alternative. Although user-centered approaches are essential to create a comfortable and safe HRI, they are still rare in industrial settings. Closing this research gap, we conducted two online user studies with large heterogeneous samples. The major aim of these studies was to evaluate the RB-KAIROS + robot's appearance, movements, and perceived memory functions before (User Study 1) and after the implementation and training of robot working memory (User Study 2). In User Study 1, we furthermore explored participants' ideas about robot memory and what aspects of the robot's movements participants found positive and what aspects they would change. The effects of participants' demographic background and attitudes were controlled for. In User Study 1, participants' overall evaluations of the robot were moderate. Participant age and negative attitudes toward robots led to more negative robot evaluations. According to exploratory analyses, these effects were driven by perceived low experience with robots. Participants expressed clear ideas of robot memory and precise suggestions for a safe, efficient, and comfortable robot navigation which are valuable for further research and development. In User Study 2, the implementation of WorkMATe and GRU led to more positive evaluations of perceived robot memory, but not of robot appearance and movements. Participants' robot evaluations were driven by their positive views of robots. Our results demonstrate that considering potential users' views can greatly contribute to an efficient and positively perceived robot navigation, while users' experience with robots is crucial for a positive HRI.
Collapse
Affiliation(s)
- Jasmin Bernotat
- COgNiTive Architecture for Collaborative Technologies (CONTACT) Unit, Italian Institute of Technology (IIT), Genoa, Italy
| | - Lorenzo Landolfi
- COgNiTive Architecture for Collaborative Technologies (CONTACT) Unit, Italian Institute of Technology (IIT), Genoa, Italy
| | - Dario Pasquali
- COgNiTive Architecture for Collaborative Technologies (CONTACT) Unit, Italian Institute of Technology (IIT), Genoa, Italy
| | - Alice Nardelli
- COgNiTive Architecture for Collaborative Technologies (CONTACT) Unit, Italian Institute of Technology (IIT), Genoa, Italy
- Department of Informatics, Bioengineering, Robotics and Systems Engineering (DIBRIS), University of Genoa, Genoa, Italy
| | - Francesco Rea
- COgNiTive Architecture for Collaborative Technologies (CONTACT) Unit, Italian Institute of Technology (IIT), Genoa, Italy
| |
Collapse
|
35
|
Hattori R, Hedrick NG, Jain A, Chen S, You H, Hattori M, Choi JH, Lim BK, Yasuda R, Komiyama T. Meta-reinforcement learning via orbitofrontal cortex. Nat Neurosci 2023; 26:2182-2191. [PMID: 37957318 PMCID: PMC10689244 DOI: 10.1038/s41593-023-01485-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 10/06/2023] [Indexed: 11/15/2023]
Abstract
The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here we show that the orbitofrontal cortex (OFC) mediates meta-RL. We trained mice and deep RL models on a probabilistic reversal learning task across sessions during which they improved their trial-by-trial RL policy through meta-learning. Ca2+/calmodulin-dependent protein kinase II-dependent synaptic plasticity in OFC was necessary for this meta-learning but not for the within-session trial-by-trial RL in experts. After meta-learning, OFC activity robustly encoded value signals, and OFC inactivation impaired the RL behaviors. Longitudinal tracking of OFC activity revealed that meta-learning gradually shapes population value coding to guide the ongoing behavioral policy. Our results indicate that two distinct RL algorithms with distinct neural mechanisms and timescales coexist in OFC to support adaptive decision-making.
Collapse
Affiliation(s)
- Ryoma Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA.
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA.
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA.
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
- Department of Neuroscience, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, University of Florida, Jupiter, FL, USA.
| | - Nathan G Hedrick
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Anant Jain
- Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA
| | - Shuqi Chen
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Hanjia You
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Mariko Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Jun-Hyeok Choi
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
| | - Byung Kook Lim
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
| | - Ryohei Yasuda
- Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA
| | - Takaki Komiyama
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA.
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA.
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA.
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
36
|
Jiménez GA, de la Escalera Hueso A, Gómez-Silva MJ. Reinforcement Learning Algorithms for Autonomous Mission Accomplishment by Unmanned Aerial Vehicles: A Comparative View with DQN, SARSA and A2C. SENSORS (BASEL, SWITZERLAND) 2023; 23:9013. [PMID: 37960711 PMCID: PMC10649256 DOI: 10.3390/s23219013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/24/2023] [Accepted: 11/02/2023] [Indexed: 11/15/2023]
Abstract
Unmanned aerial vehicles (UAV) can be controlled in diverse ways. One of the most common is through artificial intelligence (AI), which comprises different methods, such as reinforcement learning (RL). The article aims to provide a comparison of three RL algorithms-DQN as the benchmark, SARSA as a same-family algorithm, and A2C as a different-structure one-to address the problem of a UAV navigating from departure point A to endpoint B while avoiding obstacles and, simultaneously, using the least possible time and flying the shortest distance. Under fixed premises, this investigation provides the results of the performances obtained for this activity. A neighborhood environment was selected because it is likely one of the most common areas of use for commercial drones. Taking DQN as the benchmark and not having previous knowledge of the behavior of SARSA or A2C in the employed environment, the comparison outcomes showed that DQN was the only one achieving the target. At the same time, SARSA and A2C did not. However, a deeper analysis of the results led to the conclusion that a fine-tuning of A2C could overcome the performance of DQN under certain conditions, demonstrating a greater speed at maximum finding with a more straightforward structure.
Collapse
Affiliation(s)
- Gonzalo Aguilar Jiménez
- Dana SAC Spain, S.A., Dana Off-Highway, C/Abedul S/N, Pol. Ind. Los Huertecillos, 28350 Ciempozuelos, Madrid, Spain
| | - Arturo de la Escalera Hueso
- Intelligent Systems Lab, Universidad Carlos III de Madrid, Avda de la Universidad 30, 28911 Leganés, Madrid, Spain;
| | - Maria J. Gómez-Silva
- Department of Computer Architecture and Automation, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, Plaza Ciencias 1, 28040 Madrid, Spain;
| |
Collapse
|
37
|
Krausz TA, Comrie AE, Kahn AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. Neuron 2023; 111:3465-3478.e7. [PMID: 37611585 PMCID: PMC10841332 DOI: 10.1016/j.neuron.2023.07.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/25/2023]
Abstract
Animals frequently make decisions based on expectations of future reward ("values"). Values are updated by ongoing experience: places and choices that result in reward are assigned greater value. Yet, the specific algorithms used by the brain for such credit assignment remain unclear. We monitored accumbens dopamine as rats foraged for rewards in a complex, changing environment. We observed brief dopamine pulses both at reward receipt (scaling with prediction error) and at novel path opportunities. Dopamine also ramped up as rats ran toward reward ports, in proportion to the value at each location. By examining the evolution of these dopamine place-value signals, we found evidence for two distinct update processes: progressive propagation of value along taken paths, as in temporal difference learning, and inference of value throughout the maze, using internal models. Our results demonstrate that within rich, naturalistic environments dopamine conveys place values that are updated via multiple, complementary learning algorithms.
Collapse
Affiliation(s)
- Timothy A Krausz
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alison E Comrie
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ari E Kahn
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, Princeton, Princeton, NJ 08544, USA
| | - Loren M Frank
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA; Department of Physiology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Nathaniel D Daw
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, Princeton, Princeton, NJ 08544, USA
| | - Joshua D Berke
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA; Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Neurology and Department of Psychiatry and Behavioral Science, University of California, San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
38
|
Tsuda B, Richmond BJ, Sejnowski TJ. Exploring strategy differences between humans and monkeys with recurrent neural networks. PLoS Comput Biol 2023; 19:e1011618. [PMID: 37983250 PMCID: PMC10695363 DOI: 10.1371/journal.pcbi.1011618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 12/04/2023] [Accepted: 10/19/2023] [Indexed: 11/22/2023] Open
Abstract
Animal models are used to understand principles of human biology. Within cognitive neuroscience, non-human primates are considered the premier model for studying decision-making behaviors in which direct manipulation experiments are still possible. Some prominent studies have brought to light major discrepancies between monkey and human cognition, highlighting problems with unverified extrapolation from monkey to human. Here, we use a parallel model system-artificial neural networks (ANNs)-to investigate a well-established discrepancy identified between monkeys and humans with a working memory task, in which monkeys appear to use a recency-based strategy while humans use a target-selective strategy. We find that ANNs trained on the same task exhibit a progression of behavior from random behavior (untrained) to recency-like behavior (partially trained) and finally to selective behavior (further trained), suggesting monkeys and humans may occupy different points in the same overall learning progression. Surprisingly, what appears to be recency-like behavior in the ANN, is in fact an emergent non-recency-based property of the organization of the neural network's state space during its development through training. We find that explicit encouragement of recency behavior during training has a dual effect, not only causing an accentuated recency-like behavior, but also speeding up the learning process altogether, resulting in an efficient shaping mechanism to achieve the optimal strategy. Our results suggest a new explanation for the discrepency observed between monkeys and humans and reveal that what can appear to be a recency-based strategy in some cases may not be recency at all.
Collapse
Affiliation(s)
- Ben Tsuda
- Computational Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, United States of America
- Neurosciences Graduate Program, University of California San Diego, La Jolla, California, United States of America
- Medical Scientist Training Program, University of California San Diego, La Jolla, California, United States of America
| | - Barry J. Richmond
- Section on Neural Coding and Computation, National Institute of Mental Health, Bethesda, Maryland, United States of America
| | - Terrence J. Sejnowski
- Computational Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, United States of America
- Institute for Neural Computation, University of California San Diego, La Jolla, California, United States of America
- Division of Biological Sciences, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
39
|
Soo WWM, Goudar V, Wang XJ. Training biologically plausible recurrent neural networks on cognitive tasks with long-term dependencies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.10.561588. [PMID: 37873445 PMCID: PMC10592728 DOI: 10.1101/2023.10.10.561588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Training recurrent neural networks (RNNs) has become a go-to approach for generating and evaluating mechanistic neural hypotheses for cognition. The ease and efficiency of training RNNs with backpropagation through time and the availability of robustly supported deep learning libraries has made RNN modeling more approachable and accessible to neuroscience. Yet, a major technical hindrance remains. Cognitive processes such as working memory and decision making involve neural population dynamics over a long period of time within a behavioral trial and across trials. It is difficult to train RNNs to accomplish tasks where neural representations and dynamics have long temporal dependencies without gating mechanisms such as LSTMs or GRUs which currently lack experimental support and prohibit direct comparison between RNNs and biological neural circuits. We tackled this problem based on the idea of specialized skip-connections through time to support the emergence of task-relevant dynamics, and subsequently reinstitute biological plausibility by reverting to the original architecture. We show that this approach enables RNNs to successfully learn cognitive tasks that prove impractical if not impossible to learn using conventional methods. Over numerous tasks considered here, we achieve less training steps and shorter wall-clock times, particularly in tasks that require learning long-term dependencies via temporal integration over long timescales or maintaining a memory of past events in hidden-states. Our methods expand the range of experimental tasks that biologically plausible RNN models can learn, thereby supporting the development of theory for the emergent neural mechanisms of computations involving long-term dependencies.
Collapse
|
40
|
Rajagopalan AE, Darshan R, Hibbard KL, Fitzgerald JE, Turner GC. Reward expectations direct learning and drive operant matching in Drosophila. Proc Natl Acad Sci U S A 2023; 120:e2221415120. [PMID: 37733736 PMCID: PMC10523640 DOI: 10.1073/pnas.2221415120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/11/2023] [Indexed: 09/23/2023] Open
Abstract
Foraging animals must use decision-making strategies that dynamically adapt to the changing availability of rewards in the environment. A wide diversity of animals do this by distributing their choices in proportion to the rewards received from each option, Herrnstein's operant matching law. Theoretical work suggests an elegant mechanistic explanation for this ubiquitous behavior, as operant matching follows automatically from simple synaptic plasticity rules acting within behaviorally relevant neural circuits. However, no past work has mapped operant matching onto plasticity mechanisms in the brain, leaving the biological relevance of the theory unclear. Here, we discovered operant matching in Drosophila and showed that it requires synaptic plasticity that acts in the mushroom body and incorporates the expectation of reward. We began by developing a dynamic foraging paradigm to measure choices from individual flies as they learn to associate odor cues with probabilistic rewards. We then built a model of the fly mushroom body to explain each fly's sequential choice behavior using a family of biologically realistic synaptic plasticity rules. As predicted by past theoretical work, we found that synaptic plasticity rules could explain fly matching behavior by incorporating stimulus expectations, reward expectations, or both. However, by optogenetically bypassing the representation of reward expectation, we abolished matching behavior and showed that the plasticity rule must specifically incorporate reward expectations. Altogether, these results reveal the first synapse-level mechanisms of operant matching and provide compelling evidence for the role of reward expectation signals in the fly brain.
Collapse
Affiliation(s)
- Adithya E. Rajagopalan
- Janelia Research Campus, HHMI, Ashburn, VA20147
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD21205
| | - Ran Darshan
- Janelia Research Campus, HHMI, Ashburn, VA20147
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Sagol School of Neuroscience, The School of Physics and Astronomy, Tel Aviv University, Tel Aviv6997801, Israel
| | | | | | | |
Collapse
|
41
|
Miconi T, Kay K. An active neural mechanism for relational learning and fast knowledge reassembly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550739. [PMID: 37546842 PMCID: PMC10402151 DOI: 10.1101/2023.07.27.550739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
How do we gain general insights from limited novel experiences? Humans and animals have a striking ability to learn relationships between experienced items, enabling efficient generalization and rapid assimilation of new information. One fundamental instance of such relational learning is transitive inference (learn A>B and B>C, infer A>C), which can be quickly and globally reorganized upon learning a new item (learn A>B>C and D>E>F, then C>D, and infer B>E). Despite considerable study, neural mechanisms of transitive inference and fast reassembly of existing knowledge remain elusive. Here we adopt a meta-learning ("learning-to-learn") approach. We train artificial neural networks, endowed with synaptic plasticity and neuromodulation, to be able to learn novel orderings of arbitrary stimuli from repeated presentation of stimulus pairs. We then obtain a complete mechanistic understanding of this discovered neural learning algorithm. Remarkably, this learning involves active cognition: items from previous trials are selectively reinstated in working memory, enabling delayed, self-generated learning and knowledge reassembly. These findings identify a new mechanism for relational learning and insight, suggest new interpretations of neural activity in cognitive tasks, and highlight a novel approach to discovering neural mechanisms capable of supporting cognitive behaviors.
Collapse
|
42
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
43
|
Kumar S, Dasgupta I, Daw ND, Cohen JD, Griffiths TL. Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning. PLoS Comput Biol 2023; 19:e1011316. [PMID: 37624841 PMCID: PMC10497163 DOI: 10.1371/journal.pcbi.1011316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 09/12/2023] [Accepted: 06/29/2023] [Indexed: 08/27/2023] Open
Abstract
The ability to acquire abstract knowledge is a hallmark of human intelligence and is believed by many to be one of the core differences between humans and neural network models. Agents can be endowed with an inductive bias towards abstraction through meta-learning, where they are trained on a distribution of tasks that share some abstract structure that can be learned and applied. However, because neural networks are hard to interpret, it can be difficult to tell whether agents have learned the underlying abstraction, or alternatively statistical patterns that are characteristic of that abstraction. In this work, we compare the performance of humans and agents in a meta-reinforcement learning paradigm in which tasks are generated from abstract rules. We define a novel methodology for building "task metamers" that closely match the statistics of the abstract tasks but use a different underlying generative process, and evaluate performance on both abstract and metamer tasks. We find that humans perform better at abstract tasks than metamer tasks whereas common neural network architectures typically perform worse on the abstract tasks than the matched metamers. This work provides a foundation for characterizing differences between humans and machine learning that can be used in future work towards developing machines with more human-like behavior.
Collapse
Affiliation(s)
- Sreejan Kumar
- Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | | | - Nathaniel D. Daw
- Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
| | - Jonathan. D. Cohen
- Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
| | - Thomas L. Griffiths
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
44
|
Astle DE, Johnson MH, Akarca D. Toward computational neuroconstructivism: a framework for developmental systems neuroscience. Trends Cogn Sci 2023; 27:726-744. [PMID: 37263856 DOI: 10.1016/j.tics.2023.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 06/03/2023]
Abstract
Brain development is underpinned by complex interactions between neural assemblies, driving structural and functional change. This neuroconstructivism (the notion that neural functions are shaped by these interactions) is core to some developmental theories. However, due to their complexity, understanding underlying developmental mechanisms is challenging. Elsewhere in neurobiology, a computational revolution has shown that mathematical models of hidden biological mechanisms can bridge observations with theory building. Can we build a similar computational framework yielding mechanistic insights for brain development? Here, we outline the conceptual and technical challenges of addressing this theory gap, and demonstrate that there is great potential in specifying brain development as mathematically defined processes operating within physical constraints. We provide examples, alongside broader ingredients needed, as the field explores computational explanations of system-wide development.
Collapse
Affiliation(s)
- Duncan E Astle
- Department of Psychiatry, University of Cambridge, Cambridge, CB2 2QQ, UK; MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK.
| | - Mark H Johnson
- Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, UK; Centre for Brain and Cognitive Development, Birkbeck, University of London, London, WC1E 7JL, UK
| | - Danyal Akarca
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| |
Collapse
|
45
|
Sugiyama T, Schweighofer N, Izawa J. Reinforcement learning establishes a minimal metacognitive process to monitor and control motor learning performance. Nat Commun 2023; 14:3988. [PMID: 37422476 PMCID: PMC10329706 DOI: 10.1038/s41467-023-39536-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 06/16/2023] [Indexed: 07/10/2023] Open
Abstract
Humans and animals develop learning-to-learn strategies throughout their lives to accelerate learning. One theory suggests that this is achieved by a metacognitive process of controlling and monitoring learning. Although such learning-to-learn is also observed in motor learning, the metacognitive aspect of learning regulation has not been considered in classical theories of motor learning. Here, we formulated a minimal mechanism of this process as reinforcement learning of motor learning properties, which regulates a policy for memory update in response to sensory prediction error while monitoring its performance. This theory was confirmed in human motor learning experiments, in which the subjective sense of learning-outcome association determined the direction of up- and down-regulation of both learning speed and memory retention. Thus, it provides a simple, unifying account for variations in learning speeds, where the reinforcement learning mechanism monitors and controls the motor learning process.
Collapse
Affiliation(s)
- Taisei Sugiyama
- Empowerment Informatics, University of Tsukuba, Tsukuba, Ibaraki, 305-8573, Japan
| | - Nicolas Schweighofer
- Biokinesiology and Physical Therapy, University of Southern California, Los Angeles, CA, 90089-9006, USA
| | - Jun Izawa
- Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, 305-8573, Japan.
| |
Collapse
|
46
|
Ambrogioni L, Ólafsdóttir HF. Rethinking the hippocampal cognitive map as a meta-learning computational module. Trends Cogn Sci 2023:S1364-6613(23)00128-6. [PMID: 37357064 DOI: 10.1016/j.tics.2023.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/26/2023] [Accepted: 05/24/2023] [Indexed: 06/27/2023]
Abstract
A hallmark of biological intelligence is the ability to adaptively draw on past experience to guide behaviour under novel situations. Yet, the neurobiological principles that underlie this form of meta-learning remain relatively unexplored. In this Opinion, we review the existing literature on hippocampal spatial representations and reinforcement learning theory and describe a novel theoretical framework that aims to account for biological meta-learning. We conjecture that so-called hippocampal cognitive maps of familiar environments are part of a larger meta-representation (meta-map) that encodes information states and sources, which support exploration and provides a foundation for learning. We also introduce concrete hypotheses on how these generic states can be encoded using a principle of superposition.
Collapse
Affiliation(s)
- Luca Ambrogioni
- Donders Institute for Brain, Cognition & Behaviour, Radboud Universiteit, Nijmegen, The Netherlands.
| | - H Freyja Ólafsdóttir
- Donders Institute for Brain, Cognition & Behaviour, Radboud Universiteit, Nijmegen, The Netherlands.
| |
Collapse
|
47
|
Poli F, Ghilardi T, Mars RB, Hinne M, Hunnius S. Eight-Month-Old Infants Meta-Learn by Downweighting Irrelevant Evidence. Open Mind (Camb) 2023; 7:141-155. [PMID: 37416070 PMCID: PMC10320826 DOI: 10.1162/opmi_a_00079] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 04/06/2023] [Indexed: 07/08/2023] Open
Abstract
Infants learn to navigate the complexity of the physical and social world at an outstanding pace, but how they accomplish this learning is still largely unknown. Recent advances in human and artificial intelligence research propose that a key feature to achieving quick and efficient learning is meta-learning, the ability to make use of prior experiences to learn how to learn better in the future. Here we show that 8-month-old infants successfully engage in meta-learning within very short timespans after being exposed to a new learning environment. We developed a Bayesian model that captures how infants attribute informativity to incoming events, and how this process is optimized by the meta-parameters of their hierarchical models over the task structure. We fitted the model with infants' gaze behavior during a learning task. Our results reveal how infants actively use past experiences to generate new inductive biases that allow future learning to proceed faster.
Collapse
Affiliation(s)
- Francesco Poli
- Donders Center for Cognition, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Tommaso Ghilardi
- Donders Center for Cognition, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Rogier B. Mars
- Donders Center for Cognition, Radboud University Nijmegen, Nijmegen, The Netherlands
- Nuffield Department of Clinical Neurosciences, Wellcome Centre for Integrative Neuroimaging, FMRIB, University of Oxford, John Radcliffe Hospital, Headington, Oxford, UK
| | - Max Hinne
- Donders Center for Cognition, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Sabine Hunnius
- Donders Center for Cognition, Radboud University Nijmegen, Nijmegen, The Netherlands
| |
Collapse
|
48
|
Brea J, Clayton NS, Gerstner W. Computational models of episodic-like memory in food-caching birds. Nat Commun 2023; 14:2979. [PMID: 37221167 DOI: 10.1038/s41467-023-38570-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 05/08/2023] [Indexed: 05/25/2023] Open
Abstract
Birds of the crow family adapt food-caching strategies to anticipated needs at the time of cache recovery and rely on memory of the what, where and when of previous caching events to recover their hidden food. It is unclear if this behavior can be explained by simple associative learning or if it relies on higher cognitive processes like mental time-travel. We present a computational model and propose a neural implementation of food-caching behavior. The model has hunger variables for motivational control, reward-modulated update of retrieval and caching policies and an associative neural network for remembering caching events with a memory consolidation mechanism for flexible decoding of the age of a memory. Our methodology of formalizing experimental protocols is transferable to other domains and facilitates model evaluation and experiment design. Here, we show that memory-augmented, associative reinforcement learning without mental time-travel is sufficient to explain the results of 28 behavioral experiments with food-caching birds.
Collapse
Affiliation(s)
- Johanni Brea
- School of Computer and Communication Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
- School of Life Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Nicola S Clayton
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Wulfram Gerstner
- School of Computer and Communication Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- School of Life Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
49
|
Coldren J. Conditions under which college students cease learning. Front Psychol 2023; 14:1116853. [PMID: 37151351 PMCID: PMC10157072 DOI: 10.3389/fpsyg.2023.1116853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/30/2023] [Indexed: 05/09/2023] Open
Abstract
Introduction Effective learning involves the acquisition of information toward a goal and cessation upon reaching that goal. Whereas the process of learning acquisition is well understood, comparatively little is known about how or when learning ceases under naturalistic, open-ended learning conditions in which the criterion for performance is not specified. Ideally, learning should cease once there is no progress toward the goal, although this has never been directly tested in human learners. The present set of experiments explored the conditions under which college students stopped attempting to learn a series of inductive perceptual discrimination problems. Methods Each problem varied by whether it was solvable and had a criterion for success. The first problem was solvable and involved a pre-determined criterion. The second problem was solvable, but with no criterion for ending the problem so that learners eventually achieved a highly accurate level of performance (overlearning). The third problem was unsolvable as the correct answer varied randomly across features. Measures included the number of trials attempted and the outcome of each problem. Results and Discussion Results revealed that college students rarely ceased learning in the overlearning or unsolvable problems even though there was no possibility for further progress. Learning cessation increased only by manipulating time demands for completion or reducing the opportunity for reinforcement. These results suggest that human learners show laudable, but inefficient and unproductive, attempts to master problems they should cease.
Collapse
Affiliation(s)
- Jeffrey Coldren
- Department of Psychological Sciences and Counseling, Youngstown State University, Youngstown, OH, United States
| |
Collapse
|
50
|
Goudar V, Peysakhovich B, Freedman DJ, Buffalo EA, Wang XJ. Schema formation in a neural population subspace underlies learning-to-learn in flexible sensorimotor problem-solving. Nat Neurosci 2023; 26:879-890. [PMID: 37024575 DOI: 10.1038/s41593-023-01293-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 02/27/2023] [Indexed: 04/08/2023]
Abstract
Learning-to-learn, a progressive speedup of learning while solving a series of similar problems, represents a core process of knowledge acquisition that draws attention in both neuroscience and artificial intelligence. To investigate its underlying brain mechanism, we trained a recurrent neural network model on arbitrary sensorimotor mappings known to depend on the prefrontal cortex. The network displayed an exponential time course of accelerated learning. The neural substrate of a schema emerges within a low-dimensional subspace of population activity; its reuse in new problems facilitates learning by limiting connection weight changes. Our work highlights the weight-driven modifications of the vector field, which determines the population trajectory of a recurrent network and behavior. Such plasticity is especially important for preserving and reusing the learned schema in spite of undesirable changes of the vector field due to the transition to learning a new problem; the accumulated changes across problems account for the learning-to-learn dynamics.
Collapse
Affiliation(s)
- Vishwa Goudar
- Center for Neural Science, New York University, New York, NY, USA
| | | | - David J Freedman
- Department of Neurobiology, University of Chicago, Chicago, IL, USA
| | - Elizabeth A Buffalo
- Department of Physiology and Biophysics, University of Washington School of Medicine, Seattle, WA, USA
- Washington National Primate Research Center, Seattle, WA, USA
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY, USA.
| |
Collapse
|