Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Horner H. Dynamics of learning for the binary perceptron problem. ACTA ACUST UNITED AC 1992. [DOI: 10.1007/bf01313839] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Number

Cited by Other Article(s)

Shaping the learning landscape in neural networks around wide flat minima. Proc Natl Acad Sci U S A 2019;117:161-170. [PMID: 31871189 PMCID: PMC6955380 DOI: 10.1073/pnas.1908636117] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Deep neural networks (DNN) are becoming fundamental learning devices for extracting information from data in a variety of real-world applications and in natural and social sciences. The learning process in DNN consists of finding a minimizer of a loss function that measures how well the data are classified. This optimization task is typically solved by tuning millions of parameters by stochastic gradient algorithms. This process can be thought of as an exploration process of a highly nonconvex landscape. Here we show that such landscapes possess very peculiar wide flat minima and that the current models have been shaped to make the loss functions and the algorithms focus on those minima. We also derive efficient algorithmic solutions.

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

Collapse

Baldassi C, Gerace F, Kappen HJ, Lucibello C, Saglietti L, Tartaglione E, Zecchina R. Role of Synaptic Stochasticity in Training Low-Precision Neural Networks. PHYSICAL REVIEW LETTERS 2018;120:268103. [PMID: 30004730 DOI: 10.1103/physrevlett.120.268103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 03/19/2018] [Indexed: 06/08/2023]

Efficiency of quantum vs. classical annealing in nonconvex learning problems. Proc Natl Acad Sci U S A 2018;115:1457-1462. [PMID: 29382764 PMCID: PMC5816144 DOI: 10.1073/pnas.1711456115] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Baldassi C, Gerace F, Lucibello C, Saglietti L, Zecchina R. Learning may need only a few bits of synaptic precision. Phys Rev E 2016;93:052313. [PMID: 27300916 DOI: 10.1103/physreve.93.052313] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Indexed: 11/07/2022]

Baldassi C, Ingrosso A, Lucibello C, Saglietti L, Zecchina R. Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. PHYSICAL REVIEW LETTERS 2015;115:128101. [PMID: 26431018 DOI: 10.1103/physrevlett.115.128101] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Indexed: 06/05/2023]

Huang H, Kabashima Y. Origin of the computational hardness for learning with binary synapses. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014;90:052813. [PMID: 25493840 DOI: 10.1103/physreve.90.052813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Indexed: 06/04/2023]

Alamino RC, Neirotti JP, Saad D. Replication-based inference algorithms for hard computational problems. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013;88:013313. [PMID: 23944589 DOI: 10.1103/physreve.88.013313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Indexed: 06/02/2023]

Feed-back neural networks with discrete weights. Neural Comput Appl 2013. [DOI: 10.1007/s00521-012-0867-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Braunstein A, Zecchina R. Learning by message passing in networks of discrete synapses. PHYSICAL REVIEW LETTERS 2006;96:030201. [PMID: 16486667 DOI: 10.1103/physrevlett.96.030201] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Indexed: 05/06/2023]

Xiong YS, Saad D. Noise, regularizers, and unrealizable scenarios in online learning from restricted training sets. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2001;64:011919. [PMID: 11461300 DOI: 10.1103/physreve.64.011919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2000] [Revised: 02/05/2001] [Indexed: 05/23/2023]

Botelho E, Mattos CR, Caticha N. Variational studies and replica symmetry breaking in the generalization problem of the binary perceptron. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000;62:6999-7007. [PMID: 11102056 DOI: 10.1103/physreve.62.6999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2000] [Indexed: 05/23/2023]

Coolen AC, Saad D. Dynamics of learning with restricted training sets. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000;62:5444-87. [PMID: 11089107 DOI: 10.1103/physreve.62.5444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/1999] [Revised: 05/05/2000] [Indexed: 11/07/2022]

Wong KY, Li S, Tong YW. Many-body approach to the dynamics of batch learning. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000;62:4036-4042. [PMID: 11088927 DOI: 10.1103/physreve.62.4036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2000] [Revised: 05/20/2000] [Indexed: 05/23/2023]

Malzahn D. Learning strategies for the maximally stable diluted binary perceptron. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000;61:6261-9. [PMID: 11088299 DOI: 10.1103/physreve.61.6261] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/1999] [Indexed: 11/07/2022]

Nieuwenhuizen TM. Thermodynamic picture of the glassy state gained from exactly solvable models. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000;61:267-292. [PMID: 11046265 DOI: 10.1103/physreve.61.267] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/1998] [Indexed: 05/23/2023]

Penney RW, Sherrington D. The weight-space of the binary perceptron. ACTA ACUST UNITED AC 1999. [DOI: 10.1088/0305-4470/26/22/018] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Penny RW, Sherrington D. A perceptron with a skeletal weight-space. ACTA ACUST UNITED AC 1999. [DOI: 10.1088/0305-4470/27/1/003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Nokura K. Spin glass states of the anti-Hopfield model. ACTA ACUST UNITED AC 1999. [DOI: 10.1088/0305-4470/31/37/007] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Nieuwenhuizen TM. Thermodynamic description of a dynamical glassy transition. ACTA ACUST UNITED AC 1999. [DOI: 10.1088/0305-4470/31/10/004] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Cule D, Shapir Y. Broken ergodicity in the self-consistent dynamics of the two-dimensional random sine-Gordon model. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1996;53:1553-1565. [PMID: 9964417 DOI: 10.1103/physreve.53.1553] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Cannas SA, Stariolo D, Tamarit FA. Learning dynamics of simple perceptrons with non-extensive cost functions. NETWORK (BRISTOL, ENGLAND) 1996;7:141-149. [PMID: 29480149 DOI: 10.1080/0954898x.1996.11978659] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Cule D. Dynamical properties of a growing surface on a random substrate. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1995;52:R1-R4. [PMID: 9963541 DOI: 10.1103/physreve.52.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Nieuwenhuizen TM. To Maximize or Not to Maximize the Free Energy of Glassy Systems. PHYSICAL REVIEW LETTERS 1995;74:3463-3466. [PMID: 10058207 DOI: 10.1103/physrevlett.74.3463] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]

Cule D, Shapir Y. Nonergodic dynamics of the two-dimensional random-phase sine-Gordon model: Applications to vortex-glass arrays and disordered-substrate surfaces. PHYSICAL REVIEW. B, CONDENSED MATTER 1995;51:3305-3308. [PMID: 9979135 DOI: 10.1103/physrevb.51.3305] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]

Steffan H, K�hn R. Replica symmetry breaking in attractor neural network models. ACTA ACUST UNITED AC 1994. [DOI: 10.1007/bf01312198] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

The sphericalp-spin interaction spin-glass model. ACTA ACUST UNITED AC 1993. [DOI: 10.1007/bf01312184] [Citation(s) in RCA: 184] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Kohring G. On the Q-state neuron problem in attractor neural networks. Neural Netw 1993. [DOI: 10.1016/s0893-6080(05)80060-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Dynamics of learning and generalization in a binary perceptron model. ACTA ACUST UNITED AC 1992. [DOI: 10.1007/bf01309290] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]