1
|
Abstract
Deep neural networks (DNN) are becoming fundamental learning devices for extracting information from data in a variety of real-world applications and in natural and social sciences. The learning process in DNN consists of finding a minimizer of a loss function that measures how well the data are classified. This optimization task is typically solved by tuning millions of parameters by stochastic gradient algorithms. This process can be thought of as an exploration process of a highly nonconvex landscape. Here we show that such landscapes possess very peculiar wide flat minima and that the current models have been shaped to make the loss functions and the algorithms focus on those minima. We also derive efficient algorithmic solutions. Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.
Collapse
|
2
|
Baldassi C, Gerace F, Kappen HJ, Lucibello C, Saglietti L, Tartaglione E, Zecchina R. Role of Synaptic Stochasticity in Training Low-Precision Neural Networks. PHYSICAL REVIEW LETTERS 2018; 120:268103. [PMID: 30004730 DOI: 10.1103/physrevlett.120.268103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 03/19/2018] [Indexed: 06/08/2023]
Abstract
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension that allows to train discrete deep neural networks is also investigated.
Collapse
Affiliation(s)
- Carlo Baldassi
- Bocconi Institute for Data Science and Analytics, Bocconi University, Milano 20136, Italy
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Torino, Torino 10129, Italy
| | - Federica Gerace
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Hilbert J Kappen
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour 6525 EZ Nijmegen, Netherlands
| | - Carlo Lucibello
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Luca Saglietti
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Enzo Tartaglione
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Riccardo Zecchina
- Bocconi Institute for Data Science and Analytics, Bocconi University, Milano 20136, Italy
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- International Centre for Theoretical Physics, Trieste 34151, Italy
| |
Collapse
|
3
|
Abstract
Quantum annealers aim at solving nonconvex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists of designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to identify classes of nonconvex optimization problems for which quantum annealing remains efficient while thermal annealing fails. We show that this happens for a wide class of problems which are central to machine learning. Their energy landscapes are dominated by local minima that cause exponential slowdown of classical thermal annealers while simulated quantum annealing converges efficiently to rare dense regions of optimal solutions.
Collapse
|
4
|
Baldassi C, Gerace F, Lucibello C, Saglietti L, Zecchina R. Learning may need only a few bits of synaptic precision. Phys Rev E 2016; 93:052313. [PMID: 27300916 DOI: 10.1103/physreve.93.052313] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Indexed: 11/07/2022]
Abstract
Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with multiple states and generally more plausible biological features. The results clearly indicate that the overall qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic precision (i.e., the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that, for practical applications, only few bits may be needed for near-optimal performance, consistent with recent biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient algorithmic search strategies.
Collapse
Affiliation(s)
- Carlo Baldassi
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Federica Gerace
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Carlo Lucibello
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Luca Saglietti
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Riccardo Zecchina
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy.,Collegio Carlo Alberto, Via Real Collegio 30, I-10024 Moncalieri, Italy
| |
Collapse
|
5
|
Baldassi C, Ingrosso A, Lucibello C, Saglietti L, Zecchina R. Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. PHYSICAL REVIEW LETTERS 2015; 115:128101. [PMID: 26431018 DOI: 10.1103/physrevlett.115.128101] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Indexed: 06/05/2023]
Abstract
We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. Here, we introduce a novel method that allows us to find analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure also suggests how to design novel algorithmic schemes for optimization based on local entropy maximization.
Collapse
Affiliation(s)
- Carlo Baldassi
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Alessandro Ingrosso
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Carlo Lucibello
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Luca Saglietti
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Riccardo Zecchina
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
- Collegio Carlo Alberto, Via Real Collegio 30, I-10024 Moncalieri, Italy
| |
Collapse
|
6
|
Huang H, Kabashima Y. Origin of the computational hardness for learning with binary synapses. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:052813. [PMID: 25493840 DOI: 10.1103/physreve.90.052813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Indexed: 06/04/2023]
Abstract
Through supervised learning in a binary perceptron one is able to classify an extensive number of random patterns by a proper assignment of binary synaptic weights. However, to find such assignments in practice is quite a nontrivial task. The relation between the weight space structure and the algorithmic hardness has not yet been fully understood. To this end, we analytically derive the Franz-Parisi potential for the binary perceptron problem by starting from an equilibrium solution of weights and exploring the weight space structure around it. Our result reveals the geometrical organization of the weight space; the weight space is composed of isolated solutions, rather than clusters of exponentially many close-by solutions. The pointlike clusters far apart from each other in the weight space explain the previously observed glassy behavior of stochastic local search heuristics.
Collapse
Affiliation(s)
- Haiping Huang
- Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama 226-8502, Japan
| | - Yoshiyuki Kabashima
- Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama 226-8502, Japan
| |
Collapse
|
7
|
Alamino RC, Neirotti JP, Saad D. Replication-based inference algorithms for hard computational problems. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:013313. [PMID: 23944589 DOI: 10.1103/physreve.88.013313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Indexed: 06/02/2023]
Abstract
Inference algorithms based on evolving interactions between replicated solutions are introduced and analyzed on a prototypical NP-hard problem: the capacity of the binary Ising perceptron. The efficiency of the algorithm is examined numerically against that of the parallel tempering algorithm, showing improved performance in terms of the results obtained, computing requirements and simplicity of implementation.
Collapse
Affiliation(s)
- Roberto C Alamino
- Non-linearity and Complexity Research Group, Aston University, Birmingham B4 7ET, United Kingdom
| | | | | |
Collapse
|
8
|
|
9
|
Braunstein A, Zecchina R. Learning by message passing in networks of discrete synapses. PHYSICAL REVIEW LETTERS 2006; 96:030201. [PMID: 16486667 DOI: 10.1103/physrevlett.96.030201] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Indexed: 05/06/2023]
Abstract
We show that a message-passing process allows us to store in binary "material" synapses a number of random patterns which almost saturate the information theoretic bounds. We apply the learning algorithm to networks characterized by a wide range of different connection topologies and of size comparable with that of biological systems (e.g., [EQUATION: SEE TEXT]). The algorithm can be turned into an online-fault tolerant-learning protocol of potential interest in modeling aspects of synaptic plasticity and in building neuromorphic devices.
Collapse
|
10
|
Xiong YS, Saad D. Noise, regularizers, and unrealizable scenarios in online learning from restricted training sets. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2001; 64:011919. [PMID: 11461300 DOI: 10.1103/physreve.64.011919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2000] [Revised: 02/05/2001] [Indexed: 05/23/2023]
Abstract
We study the dynamics of online learning in multilayer neural networks where training examples are sampled with repetition and where the number of examples scales with the number of network weights. The analysis is carried out using the dynamical replica method aimed at obtaining a closed set of coupled equations for a set of macroscopic variables from which both training and generalization errors can be calculated. We focus on scenarios whereby training examples are corrupted by additive Gaussian output noise and regularizers are introduced to improve the network performance. The dependence of the dynamics on the noise level, with and without regularizers, is examined, as well as that of the asymptotic values obtained for both training and generalization errors. We also demonstrate the ability of the method to approximate the learning dynamics in structurally unrealizable scenarios. The theoretical results show good agreement with those obtained from computer simulations.
Collapse
Affiliation(s)
- Y S Xiong
- The Neural Computing Research Group, Aston University, Birmingham B4 7ET, United Kingdom
| | | |
Collapse
|
11
|
Botelho E, Mattos CR, Caticha N. Variational studies and replica symmetry breaking in the generalization problem of the binary perceptron. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 62:6999-7007. [PMID: 11102056 DOI: 10.1103/physreve.62.6999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2000] [Indexed: 05/23/2023]
Abstract
We analyze the average performance of a general class of learning algorithms for the nondeterministic polynomial time complete problem of rule extraction by a binary perceptron. The examples are generated by a rule implemented by a teacher network of similar architecture. A variational approach is used in trying to identify the potential energy that leads to the largest generalization in the thermodynamic limit. We restrict our search to algorithms that always satisfy the binary constraints. A replica symmetric ansatz leads to a learning algorithm which presents a phase transition in violation of an information theoretical bound. Stability analysis shows that this is due to a failure of the replica symmetric ansatz and the first step of replica symmetry breaking (RSB) is studied. The variational method does not determine a unique potential but it allows construction of a class with a unique minimum within each first order valley. Members of this class improve on the performance of Gibbs algorithm but fail to reach the Bayesian limit in the low generalization phase. They even fail to reach the performance of the best binary, an optimal clipping of the barycenter of version space. We find a trade-off between a good low performance and early onset of perfect generalization. Although the RSB may be locally stable we discuss the possibility that it fails to be the correct saddle point globally.
Collapse
Affiliation(s)
- E Botelho
- Instituto de Fisica da Universidade de Sao Paulo, Caixa Postal 66318, Sao Paulo, SP 05315-970, Brazil
| | | | | |
Collapse
|
12
|
Coolen AC, Saad D. Dynamics of learning with restricted training sets. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 62:5444-87. [PMID: 11089107 DOI: 10.1103/physreve.62.5444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/1999] [Revised: 05/05/2000] [Indexed: 11/07/2022]
Abstract
We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian probability distributions and the learning dynamics is of a spin-glass nature, with the composition of the training set playing the role of quenched disorder. We show how dynamical replica theory can be used to predict the evolution of macroscopic observables, including the two relevant performance measures (training error and generalization error), incorporating the old formalism developed for complete training sets in the limit alpha=p/N-->infinity as a special case. For simplicity, we restrict ourselves in this paper to single-layer networks and realizable tasks. In the case of (on-line and batch) Hebbian learning, where a direct exact solution is possible, we show that our theory provides exact results at any time in many different verifiable cases. For non-Hebbian learning rules, such as PERCEPTRON and ADATRON, we find very good agreement between the predictions of our theory and numerical simulations. Finally, we derive three approximation schemes aimed at eliminating the need to solve a functional saddle-point equation at each time step, and we assess their performance. The simplest of these schemes leads to a fully explicit and relatively simple nonlinear diffusion equation for the joint field distribution, which already describes the learning dynamics surprisingly well over a wide range of parameters.
Collapse
Affiliation(s)
- AC Coolen
- Department of Mathematics, King's College London, Strand, London WC2R 2LS, United Kingdom
| | | |
Collapse
|
13
|
Wong KY, Li S, Tong YW. Many-body approach to the dynamics of batch learning. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 62:4036-4042. [PMID: 11088927 DOI: 10.1103/physreve.62.4036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2000] [Revised: 05/20/2000] [Indexed: 05/23/2023]
Abstract
Using the cavity method and diagrammatic methods, we model the dynamics of batch learning of restricted sets of examples, widely applicable to general learning cost functions, and fully taking into account the temporal correlations introduced by the recycling of the examples. The approach is illustrated using the Adaline rule learning teacher-generated or random examples.
Collapse
Affiliation(s)
- KY Wong
- Department of Physics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | | | | |
Collapse
|
14
|
Malzahn D. Learning strategies for the maximally stable diluted binary perceptron. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 61:6261-9. [PMID: 11088299 DOI: 10.1103/physreve.61.6261] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/1999] [Indexed: 11/07/2022]
Abstract
I show analytically that an optimally chosen continuous precursor J in the hypercube is highly correlated to the maximally stable diluted binary perceptron which solves the same storage problem. J allows the construction of a diluted binary perceptron D by a simple rule. Performing simulations for perceptrons of size N=100 I demonstrate that D is highly stable and can be improved in an efficient manner by partial enumeration thereby incorporating information from the precursor components. The precursor highlights the vector components on which partial enumeration improves the stability of the vector most efficiently. Moreover, it discriminates for each vector component i at least one of the three possible values D(i)=-1,0,1 as being extremely unlikely.
Collapse
Affiliation(s)
- D Malzahn
- Institut fur Theoretische Physik, Otto-von-Guericke-Universitat, Postfach 4120, D-39016 Magdeburg, Germany
| |
Collapse
|
15
|
Nieuwenhuizen TM. Thermodynamic picture of the glassy state gained from exactly solvable models. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 61:267-292. [PMID: 11046265 DOI: 10.1103/physreve.61.267] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/1998] [Indexed: 05/23/2023]
Abstract
A picture for thermodynamics of the glassy state was introduced recently by us [Phys. Rev. Lett. 79, 1317 (1997); 80, 5580 (1998)]. It starts by assuming that one extra parameter, the effective temperature, is needed to describe the glassy state. This approach connects responses of macroscopic observables to a field change with their temporal fluctuations, and with the fluctuation-dissipation relation, in a generalized, nonequilibrium way. Similar universal relations do not hold between energy fluctuations and the specific heat. In the present paper, the underlying arguments are discussed in greater length. The main part of the paper involves details of the exact dynamical solution of two simple models introduced recently: uncoupled harmonic oscillators subject to parallel Monte Carlo dynamics, and independent spherical spins in a random field with such dynamics. At low temperature, the relaxation time of both models diverges as an Arrhenius law, which causes glassy behavior in typical situations. In the glassy regime, we are able to verify the above-mentioned relations for the thermodynamics of the glassy state. In the course of the analysis, it is argued that stretched exponential behavior is not a fundamental property of the glassy state, though it may be useful for fitting in a limited parameter regime.
Collapse
Affiliation(s)
- TM Nieuwenhuizen
- Department of Physics and Astronomy, University of Amsterdam, Valckenierstraat 65, 1018 XE Amsterdam, The Netherlands
| |
Collapse
|
16
|
|
17
|
|
18
|
|
19
|
|
20
|
Cule D, Shapir Y. Broken ergodicity in the self-consistent dynamics of the two-dimensional random sine-Gordon model. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1996; 53:1553-1565. [PMID: 9964417 DOI: 10.1103/physreve.53.1553] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
|
21
|
Cannas SA, Stariolo D, Tamarit FA. Learning dynamics of simple perceptrons with non-extensive cost functions. NETWORK (BRISTOL, ENGLAND) 1996; 7:141-149. [PMID: 29480149 DOI: 10.1080/0954898x.1996.11978659] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A Tsallis-statistics-based generalization of the gradient descent dynamics (using non- extensive cost functions), recently introduced by one of us, is proposed as a learning rule in a simple perceptron. The resulting Langevin equations are solved numerically for different values of an index q (q = 1 and q ≠ 1 respectively correspond to the extensive and non-extensive cases) and for different cost functions. The results are compared with the learning curve (mean error versus time) obtained from a learning experiment carried out with human beings, showing an excellent agreement for values of q slightly above unity. This fact illustrates the possible importance of including some degree of non-locality (non-extensivity) in computational learning procedures, whenever one wants to mimic human behaviour.
Collapse
Affiliation(s)
- S A Cannas
- a Facultad de Matemática, Astronomía y Física , Universidad Nacional de Córdoba, Haya de la Torre y Medina Allende S/N, Ciudad Universitaria , 5000 Córdoba , Argentina
| | - D Stariolo
- b Centro Brasileiro de Pesquisas Físicas , Rua Xavier Sigaud 150, 22290-180, Rio de Janeiro, Brazil
| | - F A Tamarit
- a Facultad de Matemática, Astronomía y Física , Universidad Nacional de Córdoba, Haya de la Torre y Medina Allende S/N, Ciudad Universitaria , 5000 Córdoba , Argentina
- b Centro Brasileiro de Pesquisas Físicas , Rua Xavier Sigaud 150, 22290-180, Rio de Janeiro, Brazil
| |
Collapse
|
22
|
Cule D. Dynamical properties of a growing surface on a random substrate. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1995; 52:R1-R4. [PMID: 9963541 DOI: 10.1103/physreve.52.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
|
23
|
Nieuwenhuizen TM. To Maximize or Not to Maximize the Free Energy of Glassy Systems. PHYSICAL REVIEW LETTERS 1995; 74:3463-3466. [PMID: 10058207 DOI: 10.1103/physrevlett.74.3463] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
|
24
|
Cule D, Shapir Y. Nonergodic dynamics of the two-dimensional random-phase sine-Gordon model: Applications to vortex-glass arrays and disordered-substrate surfaces. PHYSICAL REVIEW. B, CONDENSED MATTER 1995; 51:3305-3308. [PMID: 9979135 DOI: 10.1103/physrevb.51.3305] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
|
25
|
Steffan H, K�hn R. Replica symmetry breaking in attractor neural network models. ACTA ACUST UNITED AC 1994. [DOI: 10.1007/bf01312198] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
26
|
|
27
|
|
28
|
|