1
|
Catania G, Decelle A, Seoane B. Copycat perceptron: Smashing barriers through collective learning. Phys Rev E 2024; 109:065313. [PMID: 39020926 DOI: 10.1103/physreve.109.065313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 06/03/2024] [Indexed: 07/20/2024]
Abstract
We characterize the equilibrium properties of a model of y coupled binary perceptrons in the teacher-student scenario, subject to a suitable cost function, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas leads to a bend of the phase diagram towards smaller values of α: This suggests that the free entropy landscape gets smoother around the solution with perfect generalization (i.e., the teacher) at a fixed fraction of examples, allowing standard thermal updating algorithms such as Simulated Annealing to easily reach the teacher solution and avoid getting trapped in metastable states as happens in the unreplicated case, even in the computationally easy regime of the inference phase diagram. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.
Collapse
|
2
|
Agliari E, Alemanno F, Aquaro M, Barra A, Durante F, Kanter I. Hebbian dreaming for small datasets. Neural Netw 2024; 173:106174. [PMID: 38359641 DOI: 10.1016/j.neunet.2024.106174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 01/02/2024] [Accepted: 02/09/2024] [Indexed: 02/17/2024]
Abstract
The dreaming Hopfield model constitutes a generalization of the Hebbian paradigm for neural networks, that is able to perform on-line learning when "awake" and also to account for off-line "sleeping" mechanisms. The latter have been shown to enhance storing in such a way that, in the long sleep-time limit, this model can reach the maximal storage capacity achievable by networks equipped with symmetric pairwise interactions. In this paper, we inspect the minimal amount of information that must be supplied to such a network to guarantee a successful generalization, and we test it both on random synthetic and on standard structured datasets (i.e., MNIST, Fashion-MNIST and Olivetti). By comparing these minimal thresholds of information with those required by the standard (i.e., always "awake") Hopfield model, we prove that the present network can save up to ∼90% of the dataset size, yet preserving the same performance of the standard counterpart. This suggests that sleep may play a pivotal role in explaining the gap between the large volumes of data required to train artificial neural networks and the relatively small volumes needed by their biological counterparts. Further, we prove that the model Cost function (typically used in statistical mechanics) admits a representation in terms of a standard Loss function (typically used in machine learning) and this allows us to analyze its emergent computational skills both theoretically and computationally: a quantitative picture of its capabilities as a function of its control parameters is achieved and consistency between the two approaches is highlighted. The resulting network is an associative memory for pattern recognition tasks that learns from examples on-line, generalizes correctly (in suitable regions of its control parameters) and optimizes its storage capacity by off-line sleeping: such a reduction of the training cost can be inspiring toward sustainable AI and in situations where data are relatively sparse.
Collapse
Affiliation(s)
- Elena Agliari
- Department of Mathematics of Sapienza Università di Roma, Rome, Italy.
| | - Francesco Alemanno
- Department of Mathematics and Physics of Università del Salento, Lecce, Italy
| | - Miriam Aquaro
- Department of Mathematics of Sapienza Università di Roma, Rome, Italy
| | - Adriano Barra
- Department of Mathematics and Physics of Università del Salento, Lecce, Italy.
| | - Fabrizio Durante
- Department of Economic Sciences of Università del Salento, Lecce, Italy
| | - Ido Kanter
- Department of Physics of Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
3
|
Nawaz S, Saleem M, Kusmartsev FV, Anjum DH. Major Role of Multiscale Entropy Evolution in Complex Systems and Data Science. ENTROPY (BASEL, SWITZERLAND) 2024; 26:330. [PMID: 38667884 PMCID: PMC11048943 DOI: 10.3390/e26040330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 04/07/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024]
Abstract
Complex systems are prevalent in various disciplines encompassing the natural and social sciences, such as physics, biology, economics, and sociology. Leveraging data science techniques, particularly those rooted in artificial intelligence and machine learning, offers a promising avenue for comprehending the intricacies of complex systems without necessitating detailed knowledge of underlying dynamics. In this paper, we demonstrate that multiscale entropy (MSE) is pivotal in describing the steady state of complex systems. Introducing the multiscale entropy dynamics (MED) methodology, we provide a framework for dissecting system dynamics and uncovering the driving forces behind their evolution. Our investigation reveals that the MED methodology facilitates the expression of complex system dynamics through a Generalized Nonlinear Schrödinger Equation (GNSE) that thus demonstrates its potential applicability across diverse complex systems. By elucidating the entropic underpinnings of complexity, our study paves the way for a deeper understanding of dynamic phenomena. It offers insights into the behavior of complex systems across various domains.
Collapse
Affiliation(s)
- Shahid Nawaz
- Department of Physics, Loughborough University, Loughborough LE11 3TU, UK
| | - Muhammad Saleem
- Department of Physics, Bellarmine University, 2001 Newburg Road, Louisville, KY 40205, USA
| | - Fedor V. Kusmartsev
- Department of Physics, Khalifa University, Abu Dhabi P.O. Box 127788, United Arab Emirates
| | - Dalaver H. Anjum
- Department of Physics, Khalifa University, Abu Dhabi P.O. Box 127788, United Arab Emirates
| |
Collapse
|
4
|
Gerace F, Krzakala F, Loureiro B, Stephan L, Zdeborová L. Gaussian universality of perceptrons with random labels. Phys Rev E 2024; 109:034305. [PMID: 38632742 DOI: 10.1103/physreve.109.034305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 12/08/2023] [Indexed: 04/19/2024]
Abstract
While classical in many theoretical settings-and in particular in statistical physics-inspired works-the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, also known as the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real data sets.
Collapse
Affiliation(s)
- Federica Gerace
- International School of Advanced Studies (SISSA), Trieste, Via Bonomea, 265, 34136 Trieste, Italy
- EPFL Statistical Physics of Computation (SPOC) Laboratory, Rte Cantonale, 1015 Lausanne, Switzerland
| | - Florent Krzakala
- EPFL, Information, Learning and Physics (IdePHICS) Laboratory, Rte Cantonale, 1015 Lausanne, Switzerland
| | - Bruno Loureiro
- EPFL, Information, Learning and Physics (IdePHICS) Laboratory, Rte Cantonale, 1015 Lausanne, Switzerland
- Département d'Informatique, École Normale Supérieure (ENS)-PSL & CNRS, F-75230 Paris Cedex 05, France
| | - Ludovic Stephan
- EPFL, Information, Learning and Physics (IdePHICS) Laboratory, Rte Cantonale, 1015 Lausanne, Switzerland
| | - Lenka Zdeborová
- EPFL Statistical Physics of Computation (SPOC) Laboratory, Rte Cantonale, 1015 Lausanne, Switzerland
| |
Collapse
|
5
|
Shi C, Pan L, Hu H, Dokmanić I. Homophily modulates double descent generalization in graph convolution networks. Proc Natl Acad Sci U S A 2024; 121:e2309504121. [PMID: 38346190 PMCID: PMC10895367 DOI: 10.1073/pnas.2309504121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 01/17/2024] [Indexed: 02/28/2024] Open
Abstract
Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental observations of "transductive" double descent in key networks and datasets, we use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. Our results illuminate the nuances of learning on homophilic versus heterophilic data and predict double descent whose existence in GNNs has been questioned by recent work. We show how risk is shaped by the interplay between the graph noise, feature noise, and the number of training labels. Our findings apply beyond stylized models, capturing qualitative trends in real-world GNNs and datasets. As a case in point, we use our analytic insights to improve performance of state-of-the-art graph convolution networks on heterophilic datasets.
Collapse
Affiliation(s)
- Cheng Shi
- Departement Mathematik und Informatik, Universität Basel, Basel4051, Switzerland
| | - Liming Pan
- School of Cyber Science and Technology, University of Science and Technology of China, Hefei230026, China
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing210023, China
| | - Hong Hu
- Wharton Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA19104-1686
| | - Ivan Dokmanić
- Departement Mathematik und Informatik, Universität Basel, Basel4051, Switzerland
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL61801
| |
Collapse
|
6
|
Ingrosso A, Panizon E. Machine learning at the mesoscale: A computation-dissipation bottleneck. Phys Rev E 2024; 109:014132. [PMID: 38366483 DOI: 10.1103/physreve.109.014132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/05/2023] [Indexed: 02/18/2024]
Abstract
The cost of information processing in physical systems calls for a trade-off between performance and energetic expenditure. Here we formulate and study a computation-dissipation bottleneck in mesoscopic systems used as input-output devices. Using both real data sets and synthetic tasks, we show how nonequilibrium leads to enhanced performance. Our framework sheds light on a crucial compromise between information compression, input-output computation and dynamic irreversibility induced by nonreciprocal interactions.
Collapse
Affiliation(s)
- Alessandro Ingrosso
- Quantitative Life Sciences, Abdus Salam International Centre for Theoretical Physics, 34151 Trieste, Italy
| | - Emanuele Panizon
- Quantitative Life Sciences, Abdus Salam International Centre for Theoretical Physics, 34151 Trieste, Italy
| |
Collapse
|
7
|
Lu CK. Bayesian inference with finitely wide neural networks. Phys Rev E 2023; 108:014311. [PMID: 37583227 DOI: 10.1103/physreve.108.014311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 07/10/2023] [Indexed: 08/17/2023]
Abstract
The analytic inference, e.g., predictive distribution being in closed form, may be an appealing benefit for machine learning practitioners when they treat wide neural networks as Gaussian process in a Bayesian setting. The realistic widths, however, are finite and cause weak deviation from the Gaussianity under which partial marginalization of random variables in a model is straightforward. On the basis of multivariate Edgeworth expansion, we propose a non-Gaussian distribution in differential form to model a finite set of outputs from a random neural network, and derive the corresponding marginal and conditional properties. Thus, we are able to derive the non-Gaussian posterior distribution in Bayesian regression task. In addition, in the bottlenecked deep neural networks, a weight space representation of a deep Gaussian process, the non-Gaussianity is investigated through the marginal kernel and the accompanying small parameters.
Collapse
Affiliation(s)
- Chi-Ken Lu
- Department of Mathematics and Computer Science, Rutgers University, Newark, New Jersey 07102, USA
| |
Collapse
|
8
|
Herrera Segura C, Montoya E, Tapias D. Subaging in underparametrized Deep Neural Networks. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac8f1b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
In this work, we consider a simple classification problem to show that the dynamics of finite--width Deep Neural Networks in the underparametrized regime gives rise to effects similar to those associated with glassy systems, namely a slow evolution of the loss function and aging. Remarkably, the aging is sublinear in the waiting time (subaging) and the power--law exponent characterizing it is robust to different architectures under the constraint of a constant total number of parameters. Our results are maintained in the more complex scenario of the MNIST database. We find that for this database there is a unique exponent ruling the subaging behavior in the whole phase.
Collapse
|
9
|
Richert F, Worschech R, Rosenow B. Soft mode in the dynamics of over-realizable online learning for soft committee machines. Phys Rev E 2022; 105:L052302. [PMID: 35706279 DOI: 10.1103/physreve.105.l052302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/04/2022] [Indexed: 06/15/2023]
Abstract
Overparametrized deep neural networks trained by stochastic gradient descent are successful in performing many tasks of practical relevance. One aspect of overparametrization is the possibility that the student network has a larger expressivity than the data generating process. In the context of a student-teacher scenario, this corresponds to the so-called over-realizable case, where the student network has a larger number of hidden units than the teacher. For online learning of a two-layer soft committee machine in the over-realizable case, we present evidence that the approach to perfect learning occurs in a power-law fashion rather than exponentially as in the realizable case. All student nodes learn and replicate one of the teacher nodes if teacher and student outputs are suitably rescaled and if the numbers of student and teacher hidden units are commensurate.
Collapse
Affiliation(s)
- Frederieke Richert
- Institut für Theoretische Physik, Universität Leipzig, Brüderstrasse 16, 04103 Leipzig, Germany
| | - Roman Worschech
- Institut für Theoretische Physik, Universität Leipzig, Brüderstrasse 16, 04103 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
| | - Bernd Rosenow
- Institut für Theoretische Physik, Universität Leipzig, Brüderstrasse 16, 04103 Leipzig, Germany
| |
Collapse
|
10
|
Agliari E, Alemanno F, Barra A, Marzo GD. The emergence of a concept in shallow neural networks. Neural Netw 2022; 148:232-253. [DOI: 10.1016/j.neunet.2022.01.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 01/11/2022] [Accepted: 01/26/2022] [Indexed: 12/12/2022]
|
11
|
Rau N, Lücke J, Hartmann AK. Phase transition for parameter learning of hidden Markov models. Phys Rev E 2021; 104:044105. [PMID: 34781434 DOI: 10.1103/physreve.104.044105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 09/15/2021] [Indexed: 11/07/2022]
Abstract
We study a phase transition in parameter learning of hidden Markov models (HMMs). We do this by generating sequences of observed symbols from given discrete HMMs with uniformly distributed transition probabilities and a noise level encoded in the output probabilities. We apply the Baum-Welch (BW) algorithm, an expectation-maximization algorithm from the field of machine learning. By using the BW algorithm we then try to estimate the parameters of each investigated realization of an HMM. We study HMMs with n=4,8, and 16 states. By changing the amount of accessible learning data and the noise level, we observe a phase-transition-like change in the performance of the learning algorithm. For bigger HMMs and more learning data, the learning behavior improves tremendously below a certain threshold in the noise strength. For a noise level above the threshold, learning is not possible. Furthermore, we use an overlap parameter applied to the results of a maximum a posteriori (Viterbi) algorithm to investigate the accuracy of the hidden state estimation around the phase transition.
Collapse
Affiliation(s)
- Nikita Rau
- Institut für Physik, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Jörg Lücke
- Department of Medical Physics and Acoustics, Universität Oldenburg, D-26111 Oldenburg, Germany
| | | |
Collapse
|
12
|
Bonnasse-Gahot L, Nadal JP. Categorical Perception: A Groundwork for Deep Learning. Neural Comput 2021; 34:437-475. [PMID: 34758487 DOI: 10.1162/neco_a_01454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 07/26/2021] [Indexed: 11/04/2022]
Abstract
Classification is one of the major tasks that deep learning is successfully tackling. Categorization is also a fundamental cognitive ability. A well-known perceptual consequence of categorization in humans and other animals, categorical per ception, is notably characterized by a within-category compression and a between-category separation: two items, close in input space, are perceived closer if they belong to the same category than if they belong to different categories. Elaborating on experimental and theoretical results in cognitive science, here we study categorical effects in artificial neural networks. We combine a theoretical analysis that makes use of mutual and Fisher information quantities and a series of numerical simulations on networks of increasing complexity. These formal and numerical analyses provide insights into the geometry of the neural representation in deep layers, with expansion of space near category boundaries and contraction far from category boundaries. We investigate categorical representation by using two complementary approaches: one mimics experiments in psychophysics and cognitive neuroscience by means of morphed continua between stimuli of different categories, while the other introduces a categoricality index that, for each layer in the network, quantifies the separability of the categories at the neural population level. We show on both shallow and deep neural networks that category learning automatically induces categorical perception. We further show that the deeper a layer, the stronger the categorical effects. As an outcome of our study, we propose a coherent view of the efficacy of different heuristic practices of the dropout regularization technique. More generally, our view, which finds echoes in the neuroscience literature, insists on the differential impact of noise in any given layer depending on the geometry of the neural representation that is being learned, that is, on how this geometry reflects the structure of the categories.
Collapse
Affiliation(s)
- Laurent Bonnasse-Gahot
- Centre d'Analyse et de Mathématique Sociales, École des Hautes Études en Sciences Sociales, 75006 Paris, France
| | - Jean-Pierre Nadal
- Centre d'Analyse et de Mathématique Sociales, École des Hautes Études en Sciences Sociales, 75006 Paris, France, and Laboratoire de Physique de l'ENS, Université de Paris, École Normale Supérieure, 75006 Paris, France
| |
Collapse
|
13
|
Raman DV, O'Leary T. Optimal plasticity for memory maintenance during ongoing synaptic change. eLife 2021; 10:62912. [PMID: 34519270 PMCID: PMC8504970 DOI: 10.7554/elife.62912] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 09/13/2021] [Indexed: 11/13/2022] Open
Abstract
Synaptic connections in many brain circuits fluctuate, exhibiting substantial turnover and remodelling over hours to days. Surprisingly, experiments show that most of this flux in connectivity persists in the absence of learning or known plasticity signals. How can neural circuits retain learned information despite a large proportion of ongoing and potentially disruptive synaptic changes? We address this question from first principles by analysing how much compensatory plasticity would be required to optimally counteract ongoing fluctuations, regardless of whether fluctuations are random or systematic. Remarkably, we find that the answer is largely independent of plasticity mechanisms and circuit architectures: compensatory plasticity should be at most equal in magnitude to fluctuations, and often less, in direct agreement with previously unexplained experimental observations. Moreover, our analysis shows that a high proportion of learning-independent synaptic change is consistent with plasticity mechanisms that accurately compute error gradients.
Collapse
Affiliation(s)
- Dhruva V Raman
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Timothy O'Leary
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
14
|
Canatar A, Bordelon B, Pehlevan C. Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nat Commun 2021; 12:2914. [PMID: 34006842 PMCID: PMC8131612 DOI: 10.1038/s41467-021-23103-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/13/2021] [Indexed: 11/25/2022] Open
Abstract
A theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.
Collapse
Affiliation(s)
- Abdulkadir Canatar
- Department of Physics, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Blake Bordelon
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Cengiz Pehlevan
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
15
|
Steinberg J, Advani M, Sompolinsky H. New role for circuit expansion for learning in neural networks. Phys Rev E 2021; 103:022404. [PMID: 33736047 DOI: 10.1103/physreve.103.022404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/16/2020] [Indexed: 11/07/2022]
Abstract
Many sensory pathways in the brain include sparsely active populations of neurons downstream from the input stimuli. The biological purpose of this expanded structure is unclear, but it may be beneficial due to the increased expressive power of the network. In this work, we show that certain ways of expanding a neural network can improve its generalization performance even when the expanded structure is pruned after the learning period. To study this setting, we use a teacher-student framework where a perceptron teacher network generates labels corrupted with small amounts of noise. We then train a student network structurally matched to the teacher. In this scenario, the student can achieve optimal accuracy if given the teacher's synaptic weights. We find that sparse expansion of the input layer of a student perceptron network both increases its capacity and improves the generalization performance of the network when learning a noisy rule from a teacher perceptron when the expansion is pruned after learning. We find similar behavior when the expanded units are stochastic and uncorrelated with the input and analyze this network in the mean-field limit. By solving the mean-field equations, we show that the generalization error of the stochastic expanded student network continues to drop as the size of the network increases. This improvement in generalization performance occurs despite the increased complexity of the student network relative to the teacher it is trying to learn. We show that this effect is closely related to the addition of slack variables in artificial neural networks and suggest possible implications for artificial and biological neural networks.
Collapse
Affiliation(s)
- Julia Steinberg
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA
- Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Madhu Advani
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Haim Sompolinsky
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
16
|
Advani MS, Saxe AM, Sompolinsky H. High-dimensional dynamics of generalization error in neural networks. Neural Netw 2020; 132:428-446. [PMID: 33022471 PMCID: PMC7685244 DOI: 10.1016/j.neunet.2020.08.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Revised: 08/18/2020] [Accepted: 08/24/2020] [Indexed: 11/19/2022]
Abstract
We perform an analysis of the average generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that standard application of theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.
Collapse
Affiliation(s)
- Madhu S Advani
- Center for Brain Science, Harvard University, Cambridge, MA 02138, United States of America
| | - Andrew M Saxe
- Center for Brain Science, Harvard University, Cambridge, MA 02138, United States of America.
| | - Haim Sompolinsky
- Center for Brain Science, Harvard University, Cambridge, MA 02138, United States of America; Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
17
|
Goldt S, Advani MS, Saxe AM, Krzakala F, Zdeborová L. Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup. JOURNAL OF STATISTICAL MECHANICS (ONLINE) 2020; 2020:124010. [PMID: 34262607 PMCID: PMC8252911 DOI: 10.1088/1742-5468/abc61e] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 10/28/2020] [Indexed: 06/13/2023]
Abstract
Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set.
Collapse
Affiliation(s)
- Sebastian Goldt
- Institut de Physique Théorique, CNRS, CEA, Université Paris-Saclay, France
| | - Madhu S Advani
- Center for Brain Science, Harvard University, Cambridge, MA 02138, United States of America
| | - Andrew M Saxe
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Florent Krzakala
- Laboratoire de Physique Statistique, Sorbonne Universités, Université Pierre et Marie Curie Paris 6, Ecole Normale Supérieure, 75005 Paris, France
| | - Lenka Zdeborová
- Institut de Physique Théorique, CNRS, CEA, Université Paris-Saclay, France
| |
Collapse
|
18
|
Ingrosso A. Optimal learning with excitatory and inhibitory synapses. PLoS Comput Biol 2020; 16:e1008536. [PMID: 33370266 PMCID: PMC7793294 DOI: 10.1371/journal.pcbi.1008536] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 01/08/2021] [Accepted: 11/13/2020] [Indexed: 11/22/2022] Open
Abstract
Characterizing the relation between weight structure and input/output statistics is fundamental for understanding the computational capabilities of neural circuits. In this work, I study the problem of storing associations between analog signals in the presence of correlations, using methods from statistical mechanics. I characterize the typical learning performance in terms of the power spectrum of random input and output processes. I show that optimal synaptic weight configurations reach a capacity of 0.5 for any fraction of excitatory to inhibitory weights and have a peculiar synaptic distribution with a finite fraction of silent synapses. I further provide a link between typical learning performance and principal components analysis in single cases. These results may shed light on the synaptic profile of brain circuits, such as cerebellar structures, that are thought to engage in processing time-dependent signals and performing on-line prediction.
Collapse
Affiliation(s)
- Alessandro Ingrosso
- Zuckerman Mind, Brain, Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
19
|
Saxe A, Nelli S, Summerfield C. If deep learning is the answer, what is the question? Nat Rev Neurosci 2020; 22:55-67. [PMID: 33199854 DOI: 10.1038/s41583-020-00395-8] [Citation(s) in RCA: 112] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/2020] [Indexed: 11/09/2022]
Abstract
Neuroscience research is undergoing a minor revolution. Recent advances in machine learning and artificial intelligence research have opened up new ways of thinking about neural computation. Many researchers are excited by the possibility that deep neural networks may offer theories of perception, cognition and action for biological brains. This approach has the potential to radically reshape our approach to understanding neural systems, because the computations performed by deep networks are learned from experience, and not endowed by the researcher. If so, how can neuroscientists use deep networks to model and understand biological brains? What is the outlook for neuroscientists who seek to characterize computations or neural codes, or who wish to understand perception, attention, memory and executive functions? In this Perspective, our goal is to offer a road map for systems neuroscience research in the age of deep learning. We discuss the conceptual and methodological challenges of comparing behaviour, learning dynamics and neural representations in artificial and biological systems, and we highlight new research questions that have emerged for neuroscience as a direct consequence of recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Saxe
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
| | - Stephanie Nelli
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
| | | |
Collapse
|
20
|
Rotondo P, Pastore M, Gherardi M. Beyond the Storage Capacity: Data-Driven Satisfiability Transition. PHYSICAL REVIEW LETTERS 2020; 125:120601. [PMID: 33016711 DOI: 10.1103/physrevlett.125.120601] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/22/2020] [Accepted: 08/13/2020] [Indexed: 06/11/2023]
Abstract
Data structure has a dramatic impact on the properties of neural networks, yet its significance in the established theoretical frameworks is poorly understood. Here we compute the Vapnik-Chervonenkis entropy of a kernel machine operating on data grouped into equally labeled subsets. At variance with the unstructured scenario, entropy is nonmonotonic in the size of the training set, and displays an additional critical point besides the storage capacity. Remarkably, the same behavior occurs in margin classifiers even with randomly labeled data, as is elucidated by identifying the synaptic volume encoding the transition. These findings reveal aspects of expressivity lying beyond the condensed description provided by the storage capacity, and they indicate the path towards more realistic bounds for the generalization error of neural networks.
Collapse
Affiliation(s)
- Pietro Rotondo
- Istituto Nazionale di Fisica Nucleare, sezione di Milano, via Celoria 16, 20133 Milano, Italy
- Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| | - Mauro Pastore
- Istituto Nazionale di Fisica Nucleare, sezione di Milano, via Celoria 16, 20133 Milano, Italy
- Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| | - Marco Gherardi
- Istituto Nazionale di Fisica Nucleare, sezione di Milano, via Celoria 16, 20133 Milano, Italy
- Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
21
|
Agliari E, Alemanno F, Barra A, Fachechi A. Generalized Guerra's interpolation schemes for dense associative neural networks. Neural Netw 2020; 128:254-267. [PMID: 32454370 DOI: 10.1016/j.neunet.2020.05.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 04/16/2020] [Accepted: 05/07/2020] [Indexed: 11/29/2022]
Abstract
In this work we develop analytical techniques to investigate a broad class of associative neural networks set in the high-storage regime. These techniques translate the original statistical-mechanical problem into an analytical-mechanical one which implies solving a set of partial differential equations, rather than tackling the canonical probabilistic route. We test the method on the classical Hopfield model - where the cost function includes only two-body interactions (i.e., quadratic terms) - and on the "relativistic" Hopfield model - where the (expansion of the) cost function includes p-body (i.e., of degree p) contributions. Under the replica symmetric assumption, we paint the phase diagrams of these models by obtaining the explicit expression of their free energy as a function of the model parameters (i.e., noise level and memory storage). Further, since for non-pairwise models ergodicity breaking is non necessarily a critical phenomenon, we develop a fluctuation analysis and find that criticality is preserved in the relativistic model.
Collapse
Affiliation(s)
| | - Francesco Alemanno
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, Italy; C.N.R. Nanotec Lecce, Italy
| | - Adriano Barra
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, Italy; I.N.F.N., Sezione di Lecce, Italy.
| | - Alberto Fachechi
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, Italy; I.N.F.N., Sezione di Lecce, Italy
| |
Collapse
|
22
|
Westerhout T, Astrakhantsev N, Tikhonov KS, Katsnelson MI, Bagrov AA. Generalization properties of neural network approximations to frustrated magnet ground states. Nat Commun 2020; 11:1593. [PMID: 32221284 PMCID: PMC7101385 DOI: 10.1038/s41467-020-15402-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 02/28/2020] [Indexed: 01/18/2023] Open
Abstract
Neural quantum states (NQS) attract a lot of attention due to their potential to serve as a very expressive variational ansatz for quantum many-body systems. Here we study the main factors governing the applicability of NQS to frustrated magnets by training neural networks to approximate ground states of several moderately-sized Hamiltonians using the corresponding wave function structure on a small subset of the Hilbert space basis as training dataset. We notice that generalization quality, i.e. the ability to learn from a limited number of samples and correctly approximate the target state on the rest of the space, drops abruptly when frustration is increased. We also show that learning the sign structure is considerably more difficult than learning amplitudes. Finally, we conclude that the main issue to be addressed at this stage, in order to use the method of NQS for simulating realistic models, is that of generalization rather than expressibility.
Collapse
Affiliation(s)
- Tom Westerhout
- Institute for Molecules and Materials, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
| | - Nikita Astrakhantsev
- Physik-Institut, Universität Zürich, Winterthurerstrasse 190, CH-8057, Zürich, Switzerland.
- Moscow Institute of Physics and Technology, Institutsky lane 9, 141700, Dolgoprudny, Russia.
- Institute for Theoretical and Experimental Physics NRC Kurchatov Institute, 117218, Moscow, Russia.
| | - Konstantin S Tikhonov
- Skolkovo Institute of Science and Technology, 143026, Skolkovo, Russia.
- Institut für Nanotechnologie, Karlsruhe Institute of Technology, 76021, Karlsruhe, Germany.
- Landau Institute for Theoretical Physics RAS, 119334, Moscow, Russia.
| | - Mikhail I Katsnelson
- Institute for Molecules and Materials, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands
- Theoretical Physics and Applied Mathematics Department, Ural Federal University, 620002, Yekaterinburg, Russia
| | - Andrey A Bagrov
- Institute for Molecules and Materials, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
- Theoretical Physics and Applied Mathematics Department, Ural Federal University, 620002, Yekaterinburg, Russia.
- Department of Physics and Astronomy, Uppsala University, Box 516, SE-75120, Uppsala, Sweden.
| |
Collapse
|
23
|
Franz S, Hwang S, Urbani P. Jamming in Multilayer Supervised Learning Models. PHYSICAL REVIEW LETTERS 2019; 123:160602. [PMID: 31702370 DOI: 10.1103/physrevlett.123.160602] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Indexed: 06/10/2023]
Abstract
Critical jamming transitions are characterized by an astonishing degree of universality. Analytic and numerical evidence points to the existence of a large universality class that encompasses finite and infinite dimensional spheres and continuous constraint satisfaction problems (CCSP) such as the nonconvex perceptron and related models. In this Letter we investigate multilayer neural networks (MLNN) learning random associations as models for CCSP that could potentially define different jamming universality classes. As opposed to simple perceptrons and infinite dimensional spheres, which are described by a single effective field in terms of which the constraints appear to be one dimensional, the description of MLNN involves multiple fields, and the constraints acquire a multidimensional character. We first study the models numerically and show that similarly to the perceptron, whenever jamming is isostatic, the sphere universality class is recovered, we then write the exact mean-field equations for the models and identify a dimensional reduction mechanism that leads to a scaling regime identical to the one of spheres.
Collapse
Affiliation(s)
- Silvio Franz
- LPTMS - Bâtiment Pascal n° 530 rue André Rivière - Université Paris Sud, 91405 Orsay Cedex, France
| | - Sungmin Hwang
- LPTMS - Bâtiment Pascal n° 530 rue André Rivière - Université Paris Sud, 91405 Orsay Cedex, France
| | - Pierfrancesco Urbani
- Institut de physique théorique, Université Paris Saclay, CNRS, CEA, F-91191 Gif-sur-Yvette, France
| |
Collapse
|
24
|
Consistent validation of gray-level thresholding image segmentation algorithms based on machine learning classifiers. Stat Pap (Berl) 2019. [DOI: 10.1007/s00362-019-01138-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
25
|
Raman DV, Rotondo AP, O'Leary T. Fundamental bounds on learning performance in neural circuits. Proc Natl Acad Sci U S A 2019; 116:10537-10546. [PMID: 31061133 PMCID: PMC6535002 DOI: 10.1073/pnas.1813416116] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
How does the size of a neural circuit influence its learning performance? Larger brains tend to be found in species with higher cognitive function and learning ability. Intuitively, we expect the learning capacity of a neural circuit to grow with the number of neurons and synapses. We show how adding apparently redundant neurons and connections to a network can make a task more learnable. Consequently, large neural circuits can either devote connectivity to generating complex behaviors or exploit this connectivity to achieve faster and more precise learning of simpler behaviors. However, we show that in a biologically relevant setting where synapses introduce an unavoidable amount of noise, there is an optimal size of network for a given task. Above the optimal network size, the addition of neurons and synaptic connections starts to impede learning performance. This suggests that the size of brain circuits may be constrained by the need to learn efficiently with unreliable synapses and provides a hypothesis for why some neurological learning deficits are associated with hyperconnectivity. Our analysis is independent of specific learning rules and uncovers fundamental relationships between learning rate, task performance, network size, and intrinsic noise in neural circuits.
Collapse
Affiliation(s)
- Dhruva Venkita Raman
- Department of Engineering, University of Cambridge, Cambridge CB21PZ, United Kingdom
| | - Adriana Perez Rotondo
- Department of Engineering, University of Cambridge, Cambridge CB21PZ, United Kingdom
| | - Timothy O'Leary
- Department of Engineering, University of Cambridge, Cambridge CB21PZ, United Kingdom
| |
Collapse
|
26
|
Barbier J, Krzakala F, Macris N, Miolane L, Zdeborová L. Optimal errors and phase transitions in high-dimensional generalized linear models. Proc Natl Acad Sci U S A 2019; 116:5451-5460. [PMID: 30824595 PMCID: PMC6431156 DOI: 10.1073/pnas.1802705116] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.
Collapse
Affiliation(s)
- Jean Barbier
- Quantitative Life Sciences, International Center for Theoretical Physics, 34151 Trieste, Italy;
- Laboratoire de Physique de l'Ecole Normale Supérieure, Université Paris-Sciences-et-Lettres, Centre National de la Recherche Scientifique, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, 75005 Paris, France
| | - Florent Krzakala
- Laboratoire de Physique de l'Ecole Normale Supérieure, Université Paris-Sciences-et-Lettres, Centre National de la Recherche Scientifique, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, 75005 Paris, France
| | - Nicolas Macris
- Communication Theory Laboratory, School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Léo Miolane
- Département d'Informatique de l'Ecole Normale Supérieure, Université Paris-Sciences-et-Lettres, Centre National de la Recherche Scientifique, Inria, 75005 Paris, France;
| | - Lenka Zdeborová
- Institut de Physique Théorique, Centre National de la Recherche Scientifique et Commissariat à l'Energie Atomique, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| |
Collapse
|
27
|
Papo D. Neurofeedback: Principles, appraisal, and outstanding issues. Eur J Neurosci 2019; 49:1454-1469. [PMID: 30570194 DOI: 10.1111/ejn.14312] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 11/21/2018] [Accepted: 11/27/2018] [Indexed: 12/16/2022]
Abstract
Neurofeedback is a form of brain training in which subjects are fed back information about some measure of their brain activity which they are instructed to modify in a way thought to be functionally advantageous. Over the last 20 years, neurofeedback has been used to treat various neurological and psychiatric conditions, and to improve cognitive function in various contexts. However, in spite of a growing popularity, neurofeedback protocols typically make (often covert) assumptions on what aspects of brain activity to target, where in the brain to act and how, which have far-reaching implications for the assessment of its potential and efficacy. Here we critically examine some conceptual and methodological issues associated with the way neurofeedback's general objectives and neural targets are defined. The neural mechanisms through which neurofeedback may act at various spatial and temporal scales, and the way its efficacy is appraised are reviewed, and the extent to which neurofeedback may be used to control functional brain activity discussed. Finally, it is proposed that gauging neurofeedback's potential, as well as assessing and improving its efficacy will require better understanding of various fundamental aspects of brain dynamics and a more precise definition of functional brain activity and brain-behaviour relationships.
Collapse
Affiliation(s)
- David Papo
- SCALab, CNRS, Université de Lille, Villeneuve d'Ascq, France
| |
Collapse
|
28
|
Straat M, Abadi F, Göpfert C, Hammer B, Biehl M. Statistical Mechanics of On-Line Learning Under Concept Drift. ENTROPY 2018; 20:e20100775. [PMID: 33265863 PMCID: PMC7512337 DOI: 10.3390/e20100775] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Revised: 10/03/2018] [Accepted: 10/08/2018] [Indexed: 12/03/2022]
Abstract
We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.
Collapse
Affiliation(s)
- Michiel Straat
- Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Nijenborgh 9, 9747 AG Groningen, The Netherlands
| | - Fthi Abadi
- Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Nijenborgh 9, 9747 AG Groningen, The Netherlands
| | - Christina Göpfert
- Center of Excellence—Cognitive Interaction Technology (CITEC), Bielefeld University, Inspiration 1, 33619 Bielefeld, Germany
| | - Barbara Hammer
- Center of Excellence—Cognitive Interaction Technology (CITEC), Bielefeld University, Inspiration 1, 33619 Bielefeld, Germany
| | - Michael Biehl
- Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Nijenborgh 9, 9747 AG Groningen, The Netherlands
- Correspondence: ; Tel.: +31-50-363-3997
| |
Collapse
|
29
|
Zhang Y, Saxe AM, Advani MS, Lee AA. Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning. Mol Phys 2018. [DOI: 10.1080/00268976.2018.1483535] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Yao Zhang
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Andrew M. Saxe
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Madhu S. Advani
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Alpha A. Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
30
|
Li B, Saad D. Exploring the Function Space of Deep-Learning Machines. PHYSICAL REVIEW LETTERS 2018; 120:248301. [PMID: 29956949 DOI: 10.1103/physrevlett.120.248301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 04/10/2018] [Indexed: 06/08/2023]
Abstract
The function space of deep-learning machines is investigated by studying growth in the entropy of functions of a given error with respect to a reference function, realized by a deep-learning machine. Using physics-inspired methods we study both sparsely and densely connected architectures to discover a layerwise convergence of candidate functions, marked by a corresponding reduction in entropy when approaching the reference function, gain insight into the importance of having a large number of layers, and observe phase transitions as the error increases.
Collapse
Affiliation(s)
- Bo Li
- Department of Physics, The Hong Kong University of Science and Technology, Hong Kong
| | - David Saad
- Non-linearity and Complexity Research Group, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
31
|
Barra A, Genovese G, Sollich P, Tantari D. Phase diagram of restricted Boltzmann machines and generalized Hopfield networks with arbitrary priors. Phys Rev E 2018; 97:022310. [PMID: 29548112 DOI: 10.1103/physreve.97.022310] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Indexed: 06/08/2023]
Abstract
Restricted Boltzmann machines are described by the Gibbs measure of a bipartite spin glass, which in turn can be seen as a generalized Hopfield network. This equivalence allows us to characterize the state of these systems in terms of their retrieval capabilities, both at low and high load, of pure states. We study the paramagnetic-spin glass and the spin glass-retrieval phase transitions, as the pattern (i.e., weight) distribution and spin (i.e., unit) priors vary smoothly from Gaussian real variables to Boolean discrete variables. Our analysis shows that the presence of a retrieval phase is robust and not peculiar to the standard Hopfield model with Boolean patterns. The retrieval region becomes larger when the pattern entries and retrieval units get more peaked and, conversely, when the hidden units acquire a broader prior and therefore have a stronger response to high fields. Moreover, at low load retrieval always exists below some critical temperature, for every pattern distribution ranging from the Boolean to the Gaussian case.
Collapse
Affiliation(s)
- Adriano Barra
- Dipartimento di Matematica e Fisica Ennio De Giorgi, Università del Salento, 73100 Lecce, Italy
| | - Giuseppe Genovese
- Institut für Mathematik, Universität Zürich, CH-8057 Zürich, Switzerland
| | - Peter Sollich
- Department of Mathematics, King's College London, WC2R 2LS London, United Kingdom
| | - Daniele Tantari
- Scuola Normale Superiore, Centro Ennio de Giorgi, Piazza dei Cavalieri 3, I-56100 Pisa, Italy
| |
Collapse
|
32
|
Wei Q, Melko RG, Chen JZY. Identifying polymer states by machine learning. Phys Rev E 2017; 95:032504. [PMID: 28415199 DOI: 10.1103/physreve.95.032504] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Indexed: 06/07/2023]
Abstract
The ability of a feed-forward neural network to learn and classify different states of polymer configurations is systematically explored. Performing numerical experiments, we find that a simple network model can, after adequate training, recognize multiple structures, including gaslike coil, liquidlike globular, and crystalline anti-Mackay and Mackay structures. The network can be trained to identify the transition points between various states, which compare well with those identified by independent specific-heat calculations. Our study demonstrates that neural networks provide an unconventional tool to study the phase transitions in polymeric systems.
Collapse
Affiliation(s)
- Qianshi Wei
- Department of Physics and Astronomy, University of Waterloo, Waterloo N2L 3G1, Canada
| | - Roger G Melko
- Department of Physics and Astronomy, University of Waterloo, Waterloo N2L 3G1, Canada
- Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada
| | - Jeff Z Y Chen
- Department of Physics and Astronomy, University of Waterloo, Waterloo N2L 3G1, Canada
| |
Collapse
|
33
|
Walschaers M, Mulet R, Wellens T, Buchleitner A. Statistical theory of designed quantum transport across disordered networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:042137. [PMID: 25974468 DOI: 10.1103/physreve.91.042137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Indexed: 06/04/2023]
Abstract
We explain how centrosymmetry, together with a dominant doublet of energy eigenstates in the local density of states, can guarantee interference-assisted, strongly enhanced, strictly coherent quantum excitation transport between two predefined sites of a random network of two-level systems. Starting from a generalization of the chaos-assisted tunnelling mechanism, we formulate a random matrix theoretical framework for the analytical prediction of the transfer time distribution, of lower bounds of the transfer efficiency, and of the scaling behavior of characteristic statistical properties with the size of the network. We show that these analytical predictions compare well to numerical simulations, using Hamiltonians sampled from the Gaussian orthogonal ensemble.
Collapse
Affiliation(s)
- Mattia Walschaers
- Physikalisches Institut, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Str. 3, D-79104 Freiburg, Germany
- Instituut voor Theoretische Fysica, University of Leuven, Celestijnenlaan 200D, B-3001 Heverlee, Belgium
| | - Roberto Mulet
- Physikalisches Institut, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Str. 3, D-79104 Freiburg, Germany
- Complex Systems Group, Department of Theoretical Physics, University of Havana, Cuba
| | - Thomas Wellens
- Physikalisches Institut, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Str. 3, D-79104 Freiburg, Germany
| | - Andreas Buchleitner
- Physikalisches Institut, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Str. 3, D-79104 Freiburg, Germany
- Freiburg Institute for Advanced Studies, Albert-Ludwigs-Universität Freiburg, Albertstr. 19, D-79104 Freiburg, Germany
| |
Collapse
|
34
|
Ganguli S, Sompolinsky H. Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annu Rev Neurosci 2012; 35:485-508. [PMID: 22483042 DOI: 10.1146/annurev-neuro-062111-150410] [Citation(s) in RCA: 123] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality poses severe challenges to both technical and conceptual progress in neuroscience. In particular, it plagues our ability to acquire, process, and model high-dimensional data sets. Moreover, neural systems must cope with the challenge of processing data in high dimensions to learn and operate successfully within a complex world. We review recent mathematical advances that provide ways to combat dimensionality in specific situations. These advances shed light on two dual questions in neuroscience. First, how can we as neuroscientists rapidly acquire high-dimensional data from the brain and subsequently extract meaningful models from limited amounts of these data? And second, how do brains themselves process information in their intrinsically high-dimensional patterns of neural activity as well as learn meaningful, generalizable models of the external world from limited experience?
Collapse
Affiliation(s)
- Surya Ganguli
- Department of Applied Physics, Stanford University, Stanford, California 94305, USA.
| | | |
Collapse
|
35
|
Ribeiro F, Opper M. Expectation propagation with factorizing distributions: a Gaussian approximation and performance results for simple models. Neural Comput 2011; 23:1047-69. [PMID: 21222527 DOI: 10.1162/neco_a_00104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We discuss the expectation propagation (EP) algorithm for approximate Bayesian inference using a factorizing posterior approximation. For neural network models, we use a central limit theorem argument to make EP tractable when the number of parameters is large. For two types of models, we show that EP can achieve optimal generalization performance when data are drawn from a simple distribution.
Collapse
Affiliation(s)
- Fabiano Ribeiro
- Instituto de Física, Universidade de São Paulo, São Paulo, 05508-090, Brazil
| | | |
Collapse
|
36
|
Bouguila N. Count data modeling and classification using finite mixtures of distributions. ACTA ACUST UNITED AC 2010; 22:186-98. [PMID: 21095862 DOI: 10.1109/tnn.2010.2091428] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper, we consider the problem of constructing accurate and flexible statistical representations for count data, which we often confront in many areas such as data mining, computer vision, and information retrieval. In particular, we analyze and compare several generative approaches widely used for count data clustering, namely multinomial, multinomial Dirichlet, and multinomial generalized Dirichlet mixture models. Moreover, we propose a clustering approach via a mixture model based on a composition of the Liouville family of distributions, from which we select the Beta-Liouville distribution, and the multinomial. The novel proposed model, which we call multinomial Beta-Liouville mixture, is optimized by deterministic annealing expectation-maximization and minimum description length, and strives to achieve a high accuracy of count data clustering and model selection. An important feature of the multinomial Beta-Liouville mixture is that it has fewer parameters than the recently proposed multinomial generalized Dirichlet mixture. The performance evaluation is conducted through a set of extensive empirical experiments, which concern text and image texture modeling and classification and shape modeling, and highlights the merits of the proposed models and approaches.
Collapse
Affiliation(s)
- Nizar Bouguila
- Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC H3G 1T7, Canada.
| |
Collapse
|
37
|
Fontanari JF. Social interaction as a heuristic for combinatorial optimization problems. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:056118. [PMID: 21230556 DOI: 10.1103/physreve.82.056118] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Indexed: 05/30/2023]
Abstract
We investigate the performance of a variant of Axelrod's model for dissemination of culture--the Adaptive Culture Heuristic (ACH)--on solving an NP-Complete optimization problem, namely, the classification of binary input patterns of size F by a Boolean Binary Perceptron. In this heuristic, N agents, characterized by binary strings of length F which represent possible solutions to the optimization problem, are fixed at the sites of a square lattice and interact with their nearest neighbors only. The interactions are such that the agents' strings (or cultures) become more similar to the low-cost strings of their neighbors resulting in the dissemination of these strings across the lattice. Eventually the dynamics freezes into a homogeneous absorbing configuration in which all agents exhibit identical solutions to the optimization problem. We find through extensive simulations that the probability of finding the optimal solution is a function of the reduced variable F/N(¼) so that the number of agents must increase with the fourth power of the problem size, N∝F⁴, to guarantee a fixed probability of success. In this case, we find that the relaxation time to reach an absorbing configuration scales with F⁶ which can be interpreted as the overall computational cost of the ACH to find an optimal set of weights for a Boolean binary perceptron, given a fixed probability of success.
Collapse
Affiliation(s)
- José F Fontanari
- Instituto de Física de São Carlos, Universidade de São Paulo, Caixa Postal 369, 13560-970 São Carlos, SP, Brazil
| |
Collapse
|
38
|
Affiliation(s)
- Wolfgang Kinzel
- a Institut für Theoretische Physik, Universität Würzburg , Am Hubland, D-97074 , Würzburg , Germany
| |
Collapse
|
39
|
Phase transitions in vector quantization and neural gas. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.10.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
40
|
Neirotti JP, Saad D. Inference by replication in densely connected systems. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 76:046121. [PMID: 17995074 DOI: 10.1103/physreve.76.046121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Indexed: 05/25/2023]
Abstract
An efficient Bayesian inference method for problems that can be mapped onto dense graphs is presented. The approach is based on message passing where messages are averaged over a large number of replicated variable systems exposed to the same evidential nodes. An assumption about the symmetry of the solutions is required for carrying out the averages; here we extend the previous derivation based on a replica-symmetric- (RS)-like structure to include a more complex one-step replica-symmetry-breaking-like (1RSB-like) ansatz. To demonstrate the potential of the approach it is employed for studying critical properties of the Ising linear perceptron and for multiuser detection in code division multiple access (CDMA) under different noise models. Results obtained under the RS assumption in the noncritical regime give rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical regime one observes a first-order transition line that ends in a continuous phase transition point. Finite size effects are also observed. While the 1RSB ansatz is not required for the original problems, it was applied to the CDMA signal detection problem with a more complex noise model that exhibits RSB behavior, resulting in an improvement in performance.
Collapse
Affiliation(s)
- Juan P Neirotti
- The Neural Computing Research Group, Aston University, Birmingham B4 7ET, United Kingdom
| | | |
Collapse
|
41
|
de Almeida RMC, Espinosa A, Idiart MAP. Concatenated retrieval of correlated stored information in neural networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:041912. [PMID: 17155101 DOI: 10.1103/physreve.74.041912] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Revised: 09/04/2006] [Indexed: 05/12/2023]
Abstract
We consider a coupled map lattice defined on a hypercube in M dimensions, taken here as the information space, to model memory retrieval and information association by a neural network. We assume that both neuronal activity and spike timing may carry information. In this model the state of the network at a given time t is completely determined by the intensity y(sigma,t) with which the information pattern represented by the integer sigma is being expressed by the network. Logistic maps, coupled in the information space, are used to describe the evolution of the intensity function y(sigma(upper arrow),t) with the intent to model memory retrieval in neural systems. We calculate the phase diagram of the system regarding the model ability to work as an associative memory. We show that this model is capable of retrieving simultaneously a correlated set of memories, after a relatively long transient that may be associated to the retrieving of concatenated memorized patterns that lead to a final attractor.
Collapse
Affiliation(s)
- R M C de Almeida
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Caixa Postal 15051, 91501-970 Porto Alegre, RS, Brazil.
| | | | | |
Collapse
|
42
|
Abstract
In many cortical and subcortical areas, neurons are known to modulate their average firing rate in response to certain external stimulus features. It is widely believed that information about the stimulus features is coded by a weighted average of the neural responses. Recent theoretical studies have shown that the information capacity of such a coding scheme is very limited in the presence of the experimentally observed pairwise correlations. However, central to the analysis of these studies was the assumption of a homogeneous population of neurons. Experimental findings show a considerable measure of heterogeneity in the response properties of different neurons. In this study, we investigate the effect of neuronal heterogeneity on the information capacity of a correlated population of neurons. We show that information capacity of a heterogeneous network is not limited by the correlated noise, but scales linearly with the number of cells in the population. This information cannot be extracted by the population vector readout, whose accuracy is greatly suppressed by the correlated noise. On the other hand, we show that an optimal linear readout that takes into account the neuronal heterogeneity can extract most of this information. We study analytically the nature of the dependence of the optimal linear readout weights on the neuronal diversity. We show that simple online learning can generate readout weights with the appropriate dependence on the neuronal diversity, thereby yielding efficient readout.
Collapse
Affiliation(s)
- Maoz Shamir
- Center for BioDynamics, Boston University, Boston, MA 02215, U.S.A.
| | | |
Collapse
|
43
|
Braunstein A, Zecchina R. Learning by message passing in networks of discrete synapses. PHYSICAL REVIEW LETTERS 2006; 96:030201. [PMID: 16486667 DOI: 10.1103/physrevlett.96.030201] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Indexed: 05/06/2023]
Abstract
We show that a message-passing process allows us to store in binary "material" synapses a number of random patterns which almost saturate the information theoretic bounds. We apply the learning algorithm to networks characterized by a wide range of different connection topologies and of size comparable with that of biological systems (e.g., [EQUATION: SEE TEXT]). The algorithm can be turned into an online-fault tolerant-learning protocol of potential interest in modeling aspects of synaptic plasticity and in building neuromorphic devices.
Collapse
|
44
|
Abstract
Sensory perception is a learned trait. The brain strategies we use to perceive the world are constantly modified by experience. With practice, we subconsciously become better at identifying familiar objects or distinguishing fine details in our environment. Current theoretical models simulate some properties of perceptual learning, but neglect the underlying cortical circuits. Future neural network models must incorporate the top-down alteration of cortical function by expectation or perceptual tasks. These newly found dynamic processes are challenging earlier views of static and feedforward processing of sensory information.
Collapse
Affiliation(s)
- Misha Tsodyks
- Department of Neurobiology, Weizmann Institute, Rehovot 76100, Israel (e-mail:
)
| | - Charles Gilbert
- The Rockefeller University, 1230 York Avenue, New York, New York 10021, USA (e-mail:
)
| |
Collapse
|
45
|
Abstract
This letter analyzes the Fisher kernel from a statistical point of view. The Fisher kernel is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data (i.e., of the underlying data density). It is important to analyze and ultimately understand the statistical properties of the Fisher kernel. To this end, we first establish sufficient conditions that the constructed posterior model is realizable (i.e., it contains the true distribution). Realizability immediately leads to consistency results. Subsequently, we focus on an asymptotic analysis of the generalization error, which elucidates the learning curves of the Fisher kernel and how unlabeled data contribute to learning. We also point out that the squared or log loss is theoretically preferable-because both yield consistent estimators-to other losses such as the exponential loss, when a linear classifier is used together with the Fisher kernel. Therefore, this letter underlines that the Fisher kernel should be viewed not as a heuristics but as a powerful statistical tool with well-controlled statistical properties.
Collapse
Affiliation(s)
- Koji Tsuda
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
| | | | | | | |
Collapse
|
46
|
Rosen-Zvi M, Engel A, Kanter I. Generalization and capacity of extensively large two-layered perceptrons. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 66:036138. [PMID: 12366215 DOI: 10.1103/physreve.66.036138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2002] [Indexed: 05/23/2023]
Abstract
The generalization ability and storage capacity of a treelike two-layered neural network with a number of hidden units scaling as the input dimension is examined. The mapping from the input to the hidden layer is via Boolean functions; the mapping from the hidden layer to the output is done by a perceptron. The analysis is within the replica framework where an order parameter characterizing the overlap between two networks in the combined space of Boolean functions and hidden-to-output couplings is introduced. The maximal capacity of such networks is found to scale linearly with the logarithm of the number of Boolean functions per hidden unit. The generalization process exhibits a first-order phase transition from poor to perfect learning for the case of discrete hidden-to-output couplings. The critical number of examples per input dimension, alpha(c), at which the transition occurs, again scales linearly with the logarithm of the number of Boolean functions. In the case of continuous hidden-to-output couplings, the generalization error decreases according to the same power law as for the perceptron, with the prefactor being different.
Collapse
Affiliation(s)
- Michal Rosen-Zvi
- Minerva Center and Department of Physics, Bar-Ilan University, Ramat-Gan, 52900 Israel
| | | | | |
Collapse
|
47
|
de Almeida RMC, Idiart MAP. Information space dynamics for neural networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 65:061908. [PMID: 12188760 DOI: 10.1103/physreve.65.061908] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2002] [Indexed: 05/23/2023]
Abstract
We propose a coupled map lattice defined on a hypercube in M dimensions, the information space, to model memory retrieval by a neural network. We consider that both neuronal activity and the spiking phase may carry information. In this model the state of the network at a given time t is completely determined by a function y(sigma-->,t) of the bit strings sigma-->=(sigma1,sigma2,...,sigmaM), where sigma(i)=+/-1 with i=1,2, ...,M, that gives the intensity with which the information sigma--> is being expressed by the network. As an example, we consider logistic maps, coupled in the information space, to describe the evolution of the intensity function y(sigma-->,t). We propose an interpretation of the maps in terms of the physiological state of the neurons and the coupling between them, obtain Hebb-like learning rules, show that the model works as an associative memory, numerically investigate the capacity of the network and the size of the basins of attraction, and estimate finite size effects. We finally show that the model, when exposed to sequences of uncorrelated stimuli, shows recency and latency effects that depend on the noise level, delay time of measurement, and stimulus intensity.
Collapse
Affiliation(s)
- R M C de Almeida
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Caixa Postal 15051, 91501-970 Porto Alegre, RS, Brazil
| | | |
Collapse
|
48
|
Samengo I. Estimating probabilities from experimental frequencies. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 65:046124. [PMID: 12005943 DOI: 10.1103/physreve.65.046124] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2001] [Indexed: 05/23/2023]
Abstract
Estimating the probability distribution q governing the behavior of a certain variable by sampling its value a finite number of times most typically involves an error. Successive measurements allow the construction of a histogram, or frequency count f, of each of the possible outcomes. In this work, the probability that the true distribution be q, given that the frequency count f was sampled, is studied. Such a probability may be written as a Gibbs distribution. A thermodynamic potential, which allows an easy evaluation of the mean Kullback-Leibler divergence between the true and measured distribution, is defined. For a large number of samples, the expectation value of any function of q is expanded in powers of the inverse number of samples. As an example, the moments, the entropy, and the mutual information are analyzed.
Collapse
Affiliation(s)
- Inés Samengo
- Centro Atómico Bariloche and Instituto Balseiro, 8400 San Carlos de Bariloche, Río Negro, Argentina.
| |
Collapse
|
49
|
Luo P, Michael Wong KY. Cavity approach to noisy learning in nonlinear perceptrons. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2001; 64:061912. [PMID: 11736215 DOI: 10.1103/physreve.64.061912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2001] [Indexed: 05/23/2023]
Abstract
We analyze the learning of noisy teacher-generated examples by nonlinear and differentiable student perceptrons using the cavity method. The generic activation of an example is a function of the cavity activation of the example, which is its activation in the perceptron that learns without the example. Mean-field equations for the macroscopic parameters and the stability condition yield results consistent with the replica method. When a single value of the cavity activation maps to multiple values of the generic activation, there is a competition in learning strategy between preferentially learning an example and sacrificing it in favor of the background adjustment. We find parameter regimes in which examples are learned preferentially or sacrificially, leading to a gap in the activation distribution. Full phase diagrams of this complex system are presented, and the theory predicts the existence of a phase transition from poor to good generalization states in the system. Simulation results confirm the theoretical predictions.
Collapse
Affiliation(s)
- P Luo
- Department of Physics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | | |
Collapse
|
50
|
Abstract
We define predictive information I(pred)(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:I(pred)(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then I(pred)(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of I(pred)(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
Collapse
Affiliation(s)
- W Bialek
- NEC Research Institute, Princeton, NJ 08540, USA
| | | | | |
Collapse
|