1
|
Löwe AT, Touzo L, Muhle-Karbe PS, Saxe AM, Summerfield C, Schuck NW. Abrupt and spontaneous strategy switches emerge in simple regularised neural networks. PLoS Comput Biol 2024; 20:e1012505. [PMID: 39432516 PMCID: PMC11527165 DOI: 10.1371/journal.pcbi.1012505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 10/31/2024] [Accepted: 09/23/2024] [Indexed: 10/23/2024] Open
Abstract
Humans sometimes have an insight that leads to a sudden and drastic performance improvement on the task they are working on. Sudden strategy adaptations are often linked to insights, considered to be a unique aspect of human cognition tied to complex processes such as creativity or meta-cognitive reasoning. Here, we take a learning perspective and ask whether insight-like behaviour can occur in simple artificial neural networks, even when the models only learn to form input-output associations through gradual gradient descent. We compared learning dynamics in humans and regularised neural networks in a perceptual decision task that included a hidden regularity to solve the task more efficiently. Our results show that only some humans discover this regularity, and that behaviour is marked by a sudden and abrupt strategy switch that reflects an aha-moment. Notably, we find that simple neural networks with a gradual learning rule and a constant learning rate closely mimicked behavioural characteristics of human insight-like switches, exhibiting delay of insight, suddenness and selective occurrence in only some networks. Analyses of network architectures and learning dynamics revealed that insight-like behaviour crucially depended on a regularised gating mechanism and noise added to gradient updates, which allowed the networks to accumulate "silent knowledge" that is initially suppressed by regularised gating. This suggests that insight-like behaviour can arise from gradual learning in simple neural networks, where it reflects the combined influences of noise, gating and regularisation. These results have potential implications for more complex systems, such as the brain, and guide the way for future insight research.
Collapse
Affiliation(s)
- Anika T. Löwe
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
- Institute of Psychology, Universität Hamburg, Hamburg, Germany
| | - Léo Touzo
- Laboratoire de Physique de l’Ecole Normale Supérieure, CNRS, ENS, Université PSL, Sorbonne Université, Université Paris Cité, Paris, France
| | - Paul S. Muhle-Karbe
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- School of Psychology, University of Birmingham, Birmingham, United Kingdom
- Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Andrew M. Saxe
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
- Sainsbury Wellcome Centre, University College London, London, United Kingdom
- CIFAR Azrieli Global Scholar, CIFAR, Toronto, Canada
| | | | - Nicolas W. Schuck
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
- Institute of Psychology, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
2
|
Chen X, Bialek W. Searching for long timescales without fine tuning. Phys Rev E 2024; 110:034407. [PMID: 39425360 DOI: 10.1103/physreve.110.034407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 09/03/2024] [Indexed: 10/21/2024]
Abstract
Animal behavior occurs on timescales much longer than the response times of individual neurons. In many cases, it is plausible that these long timescales emerge from the recurrent dynamics of electrical activity in networks of neurons. In linear models, timescales are set by the eigenvalues of a dynamical matrix whose elements measure the strengths of synaptic connections between neurons. It is not clear to what extent these matrix elements need to be tuned to generate long timescales; in some cases, one needs not just a single long timescale but a whole range. Starting from the simplest case of random symmetric connections, we combine maximum entropy and random matrix theory methods to construct ensembles of networks, exploring the constraints required for long timescales to become generic. We argue that a single long timescale can emerge generically from realistic constraints, but a full spectrum of slow modes requires more tuning. Langevin dynamics that generates patterns of synaptic connections drawn from these ensembles involves a combination of Hebbian learning and activity-dependent synaptic scaling.
Collapse
Affiliation(s)
- Xiaowen Chen
- Joseph Henry Laboratories of Physics, and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, PSL Université, CNRS, Sorbonne Université, Université Paris Cité, F-75005 Paris, France
| | - William Bialek
- Joseph Henry Laboratories of Physics, and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Avenue, New York, New York 10016, USA
| |
Collapse
|
3
|
Pazó D. Discontinuous transition to chaos in a canonical random neural network. Phys Rev E 2024; 110:014201. [PMID: 39161016 DOI: 10.1103/physreve.110.014201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 06/11/2024] [Indexed: 08/21/2024]
Abstract
We study a paradigmatic random recurrent neural network introduced by Sompolinsky, Crisanti, and Sommers (SCS). In the infinite size limit, this system exhibits a direct transition from a homogeneous rest state to chaotic behavior, with the Lyapunov exponent gradually increasing from zero. We generalize the SCS model considering odd saturating nonlinear transfer functions, beyond the usual choice ϕ(x)=tanhx. A discontinuous transition to chaos occurs whenever the slope of ϕ at 0 is a local minimum [i.e., for ϕ^{'''}(0)>0]. Chaos appears out of the blue, by an attractor-repeller fold. Accordingly, the Lyapunov exponent stays away from zero at the birth of chaos.
Collapse
|
4
|
Marzen SE, Riechers PM, Crutchfield JP. Complexity-calibrated benchmarks for machine learning reveal when prediction algorithms succeed and mislead. Sci Rep 2024; 14:8727. [PMID: 38622279 PMCID: PMC11018857 DOI: 10.1038/s41598-024-58814-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 04/03/2024] [Indexed: 04/17/2024] Open
Abstract
Recurrent neural networks are used to forecast time series in finance, climate, language, and from many other domains. Reservoir computers are a particularly easily trainable form of recurrent neural network. Recently, a "next-generation" reservoir computer was introduced in which the memory trace involves only a finite number of previous symbols. We explore the inherent limitations of finite-past memory traces in this intriguing proposal. A lower bound from Fano's inequality shows that, on highly non-Markovian processes generated by large probabilistic state machines, next-generation reservoir computers with reasonably long memory traces have an error probability that is at least ∼ 60 % higher than the minimal attainable error probability in predicting the next observation. More generally, it appears that popular recurrent neural networks fall far short of optimally predicting such complex processes. These results highlight the need for a new generation of optimized recurrent neural network architectures. Alongside this finding, we present concentration-of-measure results for randomly-generated but complex processes. One conclusion is that large probabilistic state machines-specifically, large ϵ -machines-are key to generating challenging and structurally-unbiased stimuli for ground-truthing recurrent neural network architectures.
Collapse
Affiliation(s)
- Sarah E Marzen
- W. M. Keck Science Department of Pitzer, Scripps, and Claremont McKenna College, Claremont, CA, 91711, USA.
| | - Paul M Riechers
- Beyond Institute for Theoretical Science, San Francisco, CA, USA
| | - James P Crutchfield
- Complexity Sciences Center and Physics Department, University of California at Davis, One Shields Avenue, Davis, CA, 95616, USA
| |
Collapse
|
5
|
Pang R, Baker C, Murthy M, Pillow J. Inferring neural dynamics of memory during naturalistic social communication. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.26.577404. [PMID: 38328156 PMCID: PMC10849655 DOI: 10.1101/2024.01.26.577404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Memory processes in complex behaviors like social communication require forming representations of the past that grow with time. The neural mechanisms that support such continually growing memory remain unknown. We address this gap in the context of fly courtship, a natural social behavior involving the production and perception of long, complex song sequences. To study female memory for male song history in unrestrained courtship, we present 'Natural Continuation' (NC)-a general, simulation-based model comparison procedure to evaluate candidate neural codes for complex stimuli using naturalistic behavioral data. Applying NC to fly courtship revealed strong evidence for an adaptive population mechanism for how female auditory neural dynamics could convert long song histories into a rich mnemonic format. Song temporal patterning is continually transformed by heterogeneous nonlinear adaptation dynamics, then integrated into persistent activity, enabling common neural mechanisms to retain continuously unfolding information over long periods and yielding state-of-the-art predictions of female courtship behavior. At a population level this coding model produces multi-dimensional advection-diffusion-like responses that separate songs over a continuum of timescales and can be linearly transformed into flexible output signals, illustrating its potential to create a generic, scalable mnemonic format for extended input signals poised to drive complex behavioral responses. This work thus shows how naturalistic behavior can directly inform neural population coding models, revealing here a novel process for memory formation.
Collapse
Affiliation(s)
- Rich Pang
- Princeton Neuroscience Institute, Princeton, NJ, USA
- Center for the Physics of Biological Function, Princeton, NJ and New York, NY, USA
| | - Christa Baker
- Princeton Neuroscience Institute, Princeton, NJ, USA
- Present address: Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Mala Murthy
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | | |
Collapse
|
6
|
Clark DG, Abbott LF, Litwin-Kumar A. Dimension of Activity in Random Neural Networks. PHYSICAL REVIEW LETTERS 2023; 131:118401. [PMID: 37774280 DOI: 10.1103/physrevlett.131.118401] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 05/25/2023] [Accepted: 08/08/2023] [Indexed: 10/01/2023]
Abstract
Neural networks are high-dimensional nonlinear dynamical systems that process information through the coordinated activity of many connected units. Understanding how biological and machine-learning networks function and learn requires knowledge of the structure of this coordinated activity, information contained, for example, in cross covariances between units. Self-consistent dynamical mean field theory (DMFT) has elucidated several features of random neural networks-in particular, that they can generate chaotic activity-however, a calculation of cross covariances using this approach has not been provided. Here, we calculate cross covariances self-consistently via a two-site cavity DMFT. We use this theory to probe spatiotemporal features of activity coordination in a classic random-network model with independent and identically distributed (i.i.d.) couplings, showing an extensive but fractionally low effective dimension of activity and a long population-level timescale. Our formulas apply to a wide range of single-unit dynamics and generalize to non-i.i.d. couplings. As an example of the latter, we analyze the case of partially symmetric couplings.
Collapse
Affiliation(s)
- David G Clark
- Zuckerman Institute, Department of Neuroscience, Columbia University, New York, New York 10027, USA
| | - L F Abbott
- Zuckerman Institute, Department of Neuroscience, Columbia University, New York, New York 10027, USA
| | - Ashok Litwin-Kumar
- Zuckerman Institute, Department of Neuroscience, Columbia University, New York, New York 10027, USA
| |
Collapse
|
7
|
Ogawa S, Fumarola F, Mazzucato L. Multitasking via baseline control in recurrent neural networks. Proc Natl Acad Sci U S A 2023; 120:e2304394120. [PMID: 37549275 PMCID: PMC10437433 DOI: 10.1073/pnas.2304394120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/31/2023] [Indexed: 08/09/2023] Open
Abstract
Changes in behavioral state, such as arousal and movements, strongly affect neural activity in sensory areas, and can be modeled as long-range projections regulating the mean and variance of baseline input currents. What are the computational benefits of these baseline modulations? We investigate this question within a brain-inspired framework for reservoir computing, where we vary the quenched baseline inputs to a recurrent neural network with random couplings. We found that baseline modulations control the dynamical phase of the reservoir network, unlocking a vast repertoire of network phases. We uncovered a number of bistable phases exhibiting the simultaneous coexistence of fixed points and chaos, of two fixed points, and of weak and strong chaos. We identified several phenomena, including noise-driven enhancement of chaos and ergodicity breaking; neural hysteresis, whereby transitions across a phase boundary retain the memory of the preceding phase. In each bistable phase, the reservoir performs a different binary decision-making task. Fast switching between different tasks can be controlled by adjusting the baseline input mean and variance. Moreover, we found that the reservoir network achieves optimal memory performance at any first-order phase boundary. In summary, baseline control enables multitasking without any optimization of the network couplings, opening directions for brain-inspired artificial intelligence and providing an interpretation for the ubiquitously observed behavioral modulations of cortical activity.
Collapse
Affiliation(s)
- Shun Ogawa
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, Wako, Saitama351-0198, Japan
| | - Francesco Fumarola
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, Wako, Saitama351-0198, Japan
| | - Luca Mazzucato
- Department of Biology, Institute of Neuroscience, University of Oregon, Eugene, OR97403
- Department of Mathematics, Institute of Neuroscience, University of Oregon, Eugene, OR97403
| |
Collapse
|
8
|
Tiberi L, Stapmanns J, Kühn T, Luu T, Dahmen D, Helias M. Gell-Mann-Low Criticality in Neural Networks. PHYSICAL REVIEW LETTERS 2022; 128:168301. [PMID: 35522522 DOI: 10.1103/physrevlett.128.168301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 02/09/2022] [Accepted: 03/04/2022] [Indexed: 06/14/2023]
Abstract
Criticality is deeply related to optimal computational capacity. The lack of a renormalized theory of critical brain dynamics, however, so far limits insights into this form of biological information processing to mean-field results. These methods neglect a key feature of critical systems: the interaction between degrees of freedom across all length scales, required for complex nonlinear computation. We present a renormalized theory of a prototypical neural field theory, the stochastic Wilson-Cowan equation. We compute the flow of couplings, which parametrize interactions on increasing length scales. Despite similarities with the Kardar-Parisi-Zhang model, the theory is of a Gell-Mann-Low type, the archetypal form of a renormalizable quantum field theory. Here, nonlinear couplings vanish, flowing towards the Gaussian fixed point, but logarithmically slowly, thus remaining effective on most scales. We show this critical structure of interactions to implement a desirable trade-off between linearity, optimal for information storage, and nonlinearity, required for computation.
Collapse
Affiliation(s)
- Lorenzo Tiberi
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA-Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52425 Jülich, Germany
- Institute for Theoretical Solid State Physics, RWTH Aachen University, 52074 Aachen, Germany
- Center for Advanced Simulation and Analytics, Forschungszentrum Jülich, 52425 Jülich, Germany
| | - Jonas Stapmanns
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA-Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52425 Jülich, Germany
- Institute for Theoretical Solid State Physics, RWTH Aachen University, 52074 Aachen, Germany
| | - Tobias Kühn
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Thomas Luu
- Center for Advanced Simulation and Analytics, Forschungszentrum Jülich, 52425 Jülich, Germany
- Institut für Kernphysik (IKP-3), Institute for Advanced Simulation (IAS-4) and Jülich Center for Hadron Physics, Jülich Research Centre, 52425 Jülich, Germany
| | - David Dahmen
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA-Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52425 Jülich, Germany
| | - Moritz Helias
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA-Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52425 Jülich, Germany
- Institute for Theoretical Solid State Physics, RWTH Aachen University, 52074 Aachen, Germany
- Center for Advanced Simulation and Analytics, Forschungszentrum Jülich, 52425 Jülich, Germany
| |
Collapse
|