1
|
Agliari E, Alemanno F, Aquaro M, Fachechi A. Regularization, early-stopping and dreaming: A Hopfield-like setup to address generalization and overfitting. Neural Netw 2024; 177:106389. [PMID: 38788291 DOI: 10.1016/j.neunet.2024.106389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 04/12/2024] [Accepted: 05/12/2024] [Indexed: 05/26/2024]
Abstract
In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the extent of such unlearning is proved to be related to the regularization hyperparameter of the loss function and to the training time. Thus, we can design strategies to avoid overfitting that are formulated in terms of regularization and early-stopping tuning. The generalization capabilities of these attractor networks are also investigated: analytical results are obtained for random synthetic datasets, next, the emerging picture is corroborated by numerical experiments that highlight the existence of several regimes (i.e., overfitting, failure and success) as the dataset parameters are varied.
Collapse
Affiliation(s)
- E Agliari
- Dipartimento di Matematica "Guido Castelnuovo", Sapienza Università di Roma, Italy; GNFM-INdAM, Gruppo Nazionale di Fisica Matematica (Istituto Nazionale di Alta Matematica), Italy.
| | - F Alemanno
- Dipartimento di Matematica, Università di Bologna, Italy; GNFM-INdAM, Gruppo Nazionale di Fisica Matematica (Istituto Nazionale di Alta Matematica), Italy
| | - M Aquaro
- Dipartimento di Matematica "Guido Castelnuovo", Sapienza Università di Roma, Italy; GNFM-INdAM, Gruppo Nazionale di Fisica Matematica (Istituto Nazionale di Alta Matematica), Italy
| | - A Fachechi
- Dipartimento di Matematica "Guido Castelnuovo", Sapienza Università di Roma, Italy; GNFM-INdAM, Gruppo Nazionale di Fisica Matematica (Istituto Nazionale di Alta Matematica), Italy
| |
Collapse
|
2
|
Mildner JN, Tamir DI. Why do we think? The dynamics of spontaneous thought reveal its functions. PNAS NEXUS 2024; 3:pgae230. [PMID: 38939015 PMCID: PMC11210302 DOI: 10.1093/pnasnexus/pgae230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 05/15/2024] [Indexed: 06/29/2024]
Abstract
Spontaneous thought-mind wandering, daydreaming, and creative ideation-makes up most of everyday cognition. Is this idle thought, or does it serve an adaptive function? We test two hypotheses about the functions of spontaneous thought: First, spontaneous thought improves memory efficiency. Under this hypothesis, spontaneous thought should prioritize detailed, vivid episodic simulations. Second, spontaneous thought helps us achieve our goals. Under this hypothesis, spontaneous thought should prioritize content relevant to ongoing goal pursuits, or current concerns. We use natural language processing and machine learning to quantify the dynamics of thought in a large sample (N = 3,359) of think aloud data. Results suggest that spontaneous thought both supports memory optimization and keeps us focused on current concerns.
Collapse
Affiliation(s)
- Judith N Mildner
- Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| | - Diana I Tamir
- Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| |
Collapse
|
3
|
Agliari E, Alemanno F, Aquaro M, Barra A, Durante F, Kanter I. Hebbian dreaming for small datasets. Neural Netw 2024; 173:106174. [PMID: 38359641 DOI: 10.1016/j.neunet.2024.106174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 01/02/2024] [Accepted: 02/09/2024] [Indexed: 02/17/2024]
Abstract
The dreaming Hopfield model constitutes a generalization of the Hebbian paradigm for neural networks, that is able to perform on-line learning when "awake" and also to account for off-line "sleeping" mechanisms. The latter have been shown to enhance storing in such a way that, in the long sleep-time limit, this model can reach the maximal storage capacity achievable by networks equipped with symmetric pairwise interactions. In this paper, we inspect the minimal amount of information that must be supplied to such a network to guarantee a successful generalization, and we test it both on random synthetic and on standard structured datasets (i.e., MNIST, Fashion-MNIST and Olivetti). By comparing these minimal thresholds of information with those required by the standard (i.e., always "awake") Hopfield model, we prove that the present network can save up to ∼90% of the dataset size, yet preserving the same performance of the standard counterpart. This suggests that sleep may play a pivotal role in explaining the gap between the large volumes of data required to train artificial neural networks and the relatively small volumes needed by their biological counterparts. Further, we prove that the model Cost function (typically used in statistical mechanics) admits a representation in terms of a standard Loss function (typically used in machine learning) and this allows us to analyze its emergent computational skills both theoretically and computationally: a quantitative picture of its capabilities as a function of its control parameters is achieved and consistency between the two approaches is highlighted. The resulting network is an associative memory for pattern recognition tasks that learns from examples on-line, generalizes correctly (in suitable regions of its control parameters) and optimizes its storage capacity by off-line sleeping: such a reduction of the training cost can be inspiring toward sustainable AI and in situations where data are relatively sparse.
Collapse
Affiliation(s)
- Elena Agliari
- Department of Mathematics of Sapienza Università di Roma, Rome, Italy.
| | - Francesco Alemanno
- Department of Mathematics and Physics of Università del Salento, Lecce, Italy
| | - Miriam Aquaro
- Department of Mathematics of Sapienza Università di Roma, Rome, Italy
| | - Adriano Barra
- Department of Mathematics and Physics of Università del Salento, Lecce, Italy.
| | - Fabrizio Durante
- Department of Economic Sciences of Università del Salento, Lecce, Italy
| | - Ido Kanter
- Department of Physics of Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
4
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
5
|
Shen X, Wu Y, Lou X, Li Z, Ma L, Bian X. Central pattern generator network model for the alternating hind limb gait of rats based on the modified Van der Pol equation. Med Biol Eng Comput 2023; 61:555-566. [PMID: 36538267 DOI: 10.1007/s11517-022-02734-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
Herein, we employed a central pattern generator (CPG), a spinal cord neural network that regulates lower-limb gait during intra-spinal micro-stimulation (ISMS). Particularly, ISMS was used to determine the spatial distribution pattern of CPG sites in the spinal cord and the signal regulation pattern that induced the CPG network to produce coordinated actions. Based on the oscillation phenomenon of the single CPG neurons of Van der Pol (VDP) oscillators, a double-cell CPG neural network model was constructed to realise double lower limbs, six-joint modelling, the simulation of 12 neural circuits, the CPG loci characterising stimuli-inducing alternating movements and changes in polarity stimulation signals in rat hindlimbs, and leg-state change movements. The feasibility and effectiveness of the CPG neural network were verified by recording the electromyographic burst-release mode of the flexor and extensor muscles of the knee joints during CPG electrical stimulation. The results revealed that the output pattern of the CPG presented stable rhythm and coordination characteristics. The 12-neuron CPG model based on the improved VDP equation realised single-point control while significantly reducing the number of stimulation electrodes in the gait training of spinal cord injury patients. We believe that this study advances motor function recovery in rehabilitation medicine.
Collapse
Affiliation(s)
- Xiaoyan Shen
- School of Information Science and Technology, Nantong University, 9 Seyuan Road, Nantong, 226019, Jiangsu Province, China. .,Nantong Research Institute for Advanced Communication Technologies, Nantong, Jiangsu, China. .,Co-Innovation Center of Neuroregeneration, Nantong University, Nantong, Jiangsu, China.
| | - Yan Wu
- School of Information Science and Technology, Nantong University, 9 Seyuan Road, Nantong, 226019, Jiangsu Province, China
| | - Xiongjie Lou
- School of Information Science and Technology, Nantong University, 9 Seyuan Road, Nantong, 226019, Jiangsu Province, China
| | - Zhiling Li
- School of Information Science and Technology, Nantong University, 9 Seyuan Road, Nantong, 226019, Jiangsu Province, China
| | - Lei Ma
- School of Information Science and Technology, Nantong University, 9 Seyuan Road, Nantong, 226019, Jiangsu Province, China
| | - Xiongheng Bian
- School of Information Science and Technology, Nantong University, 9 Seyuan Road, Nantong, 226019, Jiangsu Province, China
| |
Collapse
|
6
|
Jedlicka P, Tomko M, Robins A, Abraham WC. Contributions by metaplasticity to solving the Catastrophic Forgetting Problem. Trends Neurosci 2022; 45:656-666. [PMID: 35798611 DOI: 10.1016/j.tins.2022.06.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/06/2022] [Accepted: 06/09/2022] [Indexed: 10/17/2022]
Abstract
Catastrophic forgetting (CF) refers to the sudden and severe loss of prior information in learning systems when acquiring new information. CF has been an Achilles heel of standard artificial neural networks (ANNs) when learning multiple tasks sequentially. The brain, by contrast, has solved this problem during evolution. Modellers now use a variety of strategies to overcome CF, many of which have parallels to cellular and circuit functions in the brain. One common strategy, based on metaplasticity phenomena, controls the future rate of change at key connections to help retain previously learned information. However, the metaplasticity properties so far used are only a subset of those existing in neurobiology. We propose that as models become more sophisticated, there could be value in drawing on a richer set of metaplasticity rules, especially when promoting continual learning in agents moving about the environment.
Collapse
Affiliation(s)
- Peter Jedlicka
- ICAR3R - Interdisciplinary Centre for 3Rs in Animal Research, Faculty of Medicine, Justus Liebig University, Giessen, Germany; Institute of Clinical Neuroanatomy, Neuroscience Center, Goethe University Frankfurt, Frankfurt/Main, Germany; Frankfurt Institute for Advanced Studies, Frankfurt 60438, Germany.
| | - Matus Tomko
- ICAR3R - Interdisciplinary Centre for 3Rs in Animal Research, Faculty of Medicine, Justus Liebig University, Giessen, Germany; Institute of Molecular Physiology and Genetics, Centre of Biosciences, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Anthony Robins
- Department of Computer Science, University of Otago, Dunedin 9016, New Zealand
| | - Wickliffe C Abraham
- Department of Psychology, Brain Health Research Centre, University of Otago, Dunedin 9054, New Zealand.
| |
Collapse
|
7
|
Fachechi A, Barra A, Agliari E, Alemanno F. Outperforming RBM Feature-Extraction Capabilities by "Dreaming" Mechanism. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:1172-1181. [PMID: 35724278 DOI: 10.1109/tnnls.2022.3182882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Inspired by a formal equivalence between the Hopfield model and restricted Boltzmann machines (RBMs), we design a Boltzmann machine, referred to as the dreaming Boltzmann machine (DBM), which achieves better performances than the standard one. The novelty in our model lies in a precise prescription for intralayer connections among hidden neurons whose strengths depend on features correlations. We analyze learning and retrieving capabilities in DBMs, both theoretically and numerically, and compare them to the RBM reference. We find that, in a supervised scenario, the former significantly outperforms the latter. Furthermore, in the unsupervised case, the DBM achieves better performances both in features extraction and representation learning, especially when the network is properly pretrained. Finally, we compare both models in simple classification tasks and find that the DBM again outperforms the RBM reference.
Collapse
|
8
|
Benedetti M, Ventura E, Marinari E, Ruocco G, Zamponi F. Supervised perceptron learning versus unsupervised Hebbian unlearning: approaching optimal memory retrieval in Hopfield-like networks. J Chem Phys 2022; 156:104107. [DOI: 10.1063/5.0084219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
9
|
|
10
|
Jiang Z, Zhou J, Hou T, Wong KYM, Huang H. Associative memory model with arbitrary Hebbian length. Phys Rev E 2021; 104:064306. [PMID: 35030887 DOI: 10.1103/physreve.104.064306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Conversion of temporal to spatial correlations in the cortex is one of the most intriguing functions in the brain. The learning at synapses triggering the correlation conversion can take place in a wide integration window, whose influence on the correlation conversion remains elusive. Here we propose a generalized associative memory model of pattern sequences, in which pattern separations within an arbitrary Hebbian length are learned. The model can be analytically solved, and predicts that a small Hebbian length can already significantly enhance the correlation conversion, i.e., the stimulus-induced attractor can be highly correlated with a significant number of patterns in the stored sequence, thereby facilitating state transitions in the neural representation space. Moreover, an anti-Hebbian component is able to reshape the energy landscape of memories, akin to the memory regulation function during sleep. Our work thus establishes the fundamental connection between associative memory, Hebbian length, and correlation conversion in the brain.
Collapse
Affiliation(s)
- Zijian Jiang
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Jianwen Zhou
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China and CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Tianqi Hou
- Department of Physics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, People's Republic of China and Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd., Hong Kong Science Park, People's Republic of China
| | - K Y Michael Wong
- Department of Physics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, People's Republic of China
| | - Haiping Huang
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| |
Collapse
|
11
|
Zhou J, Jiang Z, Hou T, Chen Z, Wong KYM, Huang H. Eigenvalue spectrum of neural networks with arbitrary Hebbian length. Phys Rev E 2021; 104:064307. [PMID: 35030940 DOI: 10.1103/physreve.104.064307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Associative memory is a fundamental function in the brain. Here, we generalize the standard associative memory model to include long-range Hebbian interactions at the learning stage, corresponding to a large synaptic integration window. In our model, the Hebbian length can be arbitrarily large. The spectral density of the coupling matrix is derived using the replica method, which is also shown to be consistent with the results obtained by applying the free probability method. The maximal eigenvalue is then obtained by an iterative equation, related to the paramagnetic to spin glass transition in the model. Altogether, this work establishes the connection between the associative memory with arbitrary Hebbian length and the asymptotic eigen-spectrum of the neural-coupling matrix.
Collapse
Affiliation(s)
- Jianwen Zhou
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China and CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Zijian Jiang
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Tianqi Hou
- Department of Physics, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, People's Republic of China and Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd., Hong Kong Science Park, People's Republic of China
| | - Ziming Chen
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China and Department of Electronic Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| | - K Y Michael Wong
- Department of Physics, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, People's Republic of China
| | - Haiping Huang
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| |
Collapse
|
12
|
Golosio B, De Luca C, Capone C, Pastorelli E, Stegel G, Tiddia G, De Bonis G, Paolucci PS. Thalamo-cortical spiking model of incremental learning combining perception, context and NREM-sleep. PLoS Comput Biol 2021; 17:e1009045. [PMID: 34181642 PMCID: PMC8270441 DOI: 10.1371/journal.pcbi.1009045] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 07/09/2021] [Accepted: 05/05/2021] [Indexed: 01/19/2023] Open
Abstract
The brain exhibits capabilities of fast incremental learning from few noisy examples, as well as the ability to associate similar memories in autonomously-created categories and to combine contextual hints with sensory perceptions. Together with sleep, these mechanisms are thought to be key components of many high-level cognitive functions. Yet, little is known about the underlying processes and the specific roles of different brain states. In this work, we exploited the combination of context and perception in a thalamo-cortical model based on a soft winner-take-all circuit of excitatory and inhibitory spiking neurons. After calibrating this model to express awake and deep-sleep states with features comparable with biological measures, we demonstrate the model capability of fast incremental learning from few examples, its resilience when proposed with noisy perceptions and contextual signals, and an improvement in visual classification after sleep due to induced synaptic homeostasis and association of similar memories. We created a thalamo-cortical spiking model (ThaCo) with the purpose of demonstrating a link among two phenomena that we believe to be essential for the brain capability of efficient incremental learning from few examples in noisy environments. Grounded in two experimental observations—the first about the effects of deep-sleep on pre- and post-sleep firing rate distributions, the second about the combination of perceptual and contextual information in pyramidal neurons—our model joins these two ingredients. ThaCo alternates phases of incremental learning, classification and deep-sleep. Memories of handwritten digit examples are learned through thalamo-cortical and cortico-cortical plastic synapses. In absence of noise, the combination of contextual information with perception enables fast incremental learning. Deep-sleep becomes crucial when noisy inputs are considered. We observed in ThaCo both homeostatic and associative processes: deep-sleep fights noise in perceptual and internal knowledge and it supports the categorical association of examples belonging to the same digit class, through reinforcement of class-specific cortico-cortical synapses. The distributions of pre-sleep and post-sleep firing rates during classification change in a manner similar to those of experimental observation. These changes promote energetic efficiency during recall of memories, better representation of individual memories and categories and higher classification performances.
Collapse
Affiliation(s)
- Bruno Golosio
- Dipartimento di Fisica, Università di Cagliari, Cagliari, Italy
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Cagliari, Cagliari, Italy
| | - Chiara De Luca
- Ph.D. Program in Behavioural Neuroscience, “Sapienza” Università di Roma, Rome, Italy
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome, Italy
- * E-mail:
| | - Cristiano Capone
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome, Italy
| | - Elena Pastorelli
- Ph.D. Program in Behavioural Neuroscience, “Sapienza” Università di Roma, Rome, Italy
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome, Italy
| | - Giovanni Stegel
- Dipartimento di Chimica e Farmacia, Università di Sassari, Sassari, Italy
| | - Gianmarco Tiddia
- Dipartimento di Fisica, Università di Cagliari, Cagliari, Italy
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Cagliari, Cagliari, Italy
| | - Giulia De Bonis
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome, Italy
| | | |
Collapse
|
13
|
Leonelli FE, Agliari E, Albanese L, Barra A. On the effective initialisation for restricted Boltzmann machines via duality with Hopfield model. Neural Netw 2021; 143:314-326. [PMID: 34175807 DOI: 10.1016/j.neunet.2021.06.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 05/14/2021] [Accepted: 06/14/2021] [Indexed: 11/17/2022]
Abstract
Restricted Boltzmann machines (RBMs) with a binary visible layer of size N and a Gaussian hidden layer of size P have been proved to be equivalent to a Hopfield neural network (HNN) made of N binary neurons and storing P patterns ξ, as long as the weights w in the former are identified with the patterns. Here we aim to leverage this equivalence to find effective initialisations for weights in the RBM when what is available is a set of noisy examples of each pattern, aiming to translate statistical mechanics background available for HNN to the study of RBM's learning and retrieval abilities. In particular, given a set of definite, structureless patterns we build a sample of blurred examples and prove that the initialisation where w corresponds to the empirical average ξ¯ over the sample is a fixed point under stochastic gradient descent. Further, as a toy application of the duality between HNN and RBM, we consider the simplest random auto-encoder (a three layer network made of two RBMs coupled by their hidden layer) and evidence that, as long as the parameter setting corresponds to the retrieval region of the dual HNN, reconstruction and denoising can be accomplished trivially, while when the system is in the spin-glass phase inference algorithms are necessary. This questions the need for larger retrieval regions which we obtain by applying a Gram-Schmidt orthogonalisation to the patterns: in fact, this procedure yields to a set of patterns devoid of correlations and for which the largest retrieval region can be accomplished. Finally we consider an application of duality also in a structured case: we test this approach on the MNIST dataset, and obtain that the network performs already ∼67% of successful classifications, suggesting it can be exploited as a computationally-cheap pre-training.
Collapse
Affiliation(s)
- Francesca Elisa Leonelli
- Dipartimento di Matematica "Guido Castelnuovo", Sapienza Università di Roma, Italy; Istituto di Scienze Marine, ISMAR-CNR, Italy
| | - Elena Agliari
- Dipartimento di Matematica "Guido Castelnuovo", Sapienza Università di Roma, Italy
| | - Linda Albanese
- Dipartimento di Matematica e Fisica "Ennio De Giorgi", Università del Salento, Italy; Scuola Superiore ISUFI, Università del Salento, Italy
| | - Adriano Barra
- Dipartimento di Matematica e Fisica "Ennio De Giorgi", Università del Salento, Italy; I.N.F.N., Sezione di Lecce, Italy.
| |
Collapse
|
14
|
Marullo C, Agliari E. Boltzmann Machines as Generalized Hopfield Networks: A Review of Recent Results and Outlooks. ENTROPY 2020; 23:e23010034. [PMID: 33383716 PMCID: PMC7823871 DOI: 10.3390/e23010034] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 12/21/2020] [Accepted: 12/25/2020] [Indexed: 12/02/2022]
Abstract
The Hopfield model and the Boltzmann machine are among the most popular examples of neural networks. The latter, widely used for classification and feature detection, is able to efficiently learn a generative model from observed data and constitutes the benchmark for statistical learning. The former, designed to mimic the retrieval phase of an artificial associative memory lays in between two paradigmatic statistical mechanics models, namely the Curie-Weiss and the Sherrington-Kirkpatrick, which are recovered as the limiting cases of, respectively, one and many stored memories. Interestingly, the Boltzmann machine and the Hopfield network, if considered to be two cognitive processes (learning and information retrieval), are nothing more than two sides of the same coin. In fact, it is possible to exactly map the one into the other. We will inspect such an equivalence retracing the most representative steps of the research in this field.
Collapse
|
15
|
Wang Z, Baruni S, Parastesh F, Jafari S, Ghosh D, Perc M, Hussain I. Chimeras in an adaptive neuronal network with burst-timing-dependent plasticity. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.083] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
16
|
González OC, Sokolov Y, Krishnan GP, Delanois JE, Bazhenov M. Can sleep protect memories from catastrophic forgetting? eLife 2020; 9:e51005. [PMID: 32748786 PMCID: PMC7440920 DOI: 10.7554/elife.51005] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 07/19/2020] [Indexed: 11/13/2022] Open
Abstract
Continual learning remains an unsolved problem in artificial neural networks. The brain has evolved mechanisms to prevent catastrophic forgetting of old knowledge during new training. Building upon data suggesting the importance of sleep in learning and memory, we tested a hypothesis that sleep protects old memories from being forgotten after new learning. In the thalamocortical model, training a new memory interfered with previously learned old memories leading to degradation and forgetting of the old memory traces. Simulating sleep after new learning reversed the damage and enhanced old and new memories. We found that when a new memory competed for previously allocated neuronal/synaptic resources, sleep replay changed the synaptic footprint of the old memory to allow overlapping neuronal populations to store multiple memories. Our study predicts that memory storage is dynamic, and sleep enables continual learning by combining consolidation of new memory traces with reconsolidation of old memory traces to minimize interference.
Collapse
Affiliation(s)
- Oscar C González
- Department of Medicine, University of California, San DiegoLa JollaUnited States
| | - Yury Sokolov
- Department of Medicine, University of California, San DiegoLa JollaUnited States
| | - Giri P Krishnan
- Department of Medicine, University of California, San DiegoLa JollaUnited States
| | - Jean Erik Delanois
- Department of Medicine, University of California, San DiegoLa JollaUnited States
- Department of Computer Science and Engineering, University of California, San DiegoLa JollaUnited States
| | - Maxim Bazhenov
- Department of Medicine, University of California, San DiegoLa JollaUnited States
| |
Collapse
|
17
|
Agliari E, Alemanno F, Barra A, Fachechi A. Generalized Guerra's interpolation schemes for dense associative neural networks. Neural Netw 2020; 128:254-267. [PMID: 32454370 DOI: 10.1016/j.neunet.2020.05.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 04/16/2020] [Accepted: 05/07/2020] [Indexed: 11/29/2022]
Abstract
In this work we develop analytical techniques to investigate a broad class of associative neural networks set in the high-storage regime. These techniques translate the original statistical-mechanical problem into an analytical-mechanical one which implies solving a set of partial differential equations, rather than tackling the canonical probabilistic route. We test the method on the classical Hopfield model - where the cost function includes only two-body interactions (i.e., quadratic terms) - and on the "relativistic" Hopfield model - where the (expansion of the) cost function includes p-body (i.e., of degree p) contributions. Under the replica symmetric assumption, we paint the phase diagrams of these models by obtaining the explicit expression of their free energy as a function of the model parameters (i.e., noise level and memory storage). Further, since for non-pairwise models ergodicity breaking is non necessarily a critical phenomenon, we develop a fluctuation analysis and find that criticality is preserved in the relativistic model.
Collapse
Affiliation(s)
| | - Francesco Alemanno
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, Italy; C.N.R. Nanotec Lecce, Italy
| | - Adriano Barra
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, Italy; I.N.F.N., Sezione di Lecce, Italy.
| | - Alberto Fachechi
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, Italy; I.N.F.N., Sezione di Lecce, Italy
| |
Collapse
|
18
|
Morales A, Froese T. Unsupervised Learning Facilitates Neural Coordination Across the Functional Clusters of the C. elegans Connectome. Front Robot AI 2020; 7:40. [PMID: 33501208 PMCID: PMC7805867 DOI: 10.3389/frobt.2020.00040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 03/09/2020] [Indexed: 11/23/2022] Open
Abstract
Modeling of complex adaptive systems has revealed a still poorly understood benefit of unsupervised learning: when neural networks are enabled to form an associative memory of a large set of their own attractor configurations, they begin to reorganize their connectivity in a direction that minimizes the coordination constraints posed by the initial network architecture. This self-optimization process has been replicated in various neural network formalisms, but it is still unclear whether it can be applied to biologically more realistic network topologies and scaled up to larger networks. Here we continue our efforts to respond to these challenges by demonstrating the process on the connectome of the widely studied nematode worm C. elegans. We extend our previous work by considering the contributions made by hierarchical partitions of the connectome that form functional clusters, and we explore possible beneficial effects of inter-cluster inhibitory connections. We conclude that the self-optimization process can be applied to neural network topologies characterized by greater biological realism, and that long-range inhibitory connections can facilitate the generalization capacity of the process.
Collapse
Affiliation(s)
- Alejandro Morales
- Embodied Cognitive Science Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- Computer Science and Engineering Postgraduate Program, National Autonomous University of Mexico, Mexico City, Mexico
| | - Tom Froese
- Embodied Cognitive Science Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
19
|
Mildner JN, Tamir DI. Spontaneous Thought as an Unconstrained Memory Process. Trends Neurosci 2019; 42:763-777. [PMID: 31627848 DOI: 10.1016/j.tins.2019.09.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 08/23/2019] [Accepted: 09/05/2019] [Indexed: 01/12/2023]
Abstract
The stream of thought can flow freely, without much guidance from attention or cognitive control. What determines what we think about from one moment to the next? Spontaneous thought shares many commonalities with memory processes. We use insights from computational models of memory to explain how the stream of thought flows through the landscape of memory. In this framework of spontaneous thought, semantic memory scaffolds episodic memory to form the content of thought, and drifting context modulated by one's current state - both internal and external - constrains the area of memory to explore. This conceptualization of spontaneous thought can help to answer outstanding questions such as: what is the function of spontaneous thought, and how does the mind select what to think about?
Collapse
Affiliation(s)
- Judith N Mildner
- Department of Psychology, Princeton University, Princeton, NJ 08540, USA.
| | - Diana I Tamir
- Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| |
Collapse
|