1
|
Yue X, Nouiehed M, Al Kontar R. SALR: Sharpness-Aware Learning Rate Scheduler for Improved Generalization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12518-12527. [PMID: 37027266 DOI: 10.1109/tnnls.2023.3263393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In an effort to improve generalization in deep learning and automate the process of learning rate scheduling, we propose SALR: a sharpness-aware learning rate update technique designed to recover flat minimizers. Our method dynamically updates the learning rate of gradient-based optimizers based on the local sharpness of the loss function. This allows optimizers to automatically increase learning rates at sharp valleys to increase the chance of escaping them. We demonstrate the effectiveness of SALR when adopted by various algorithms over a broad range of networks. Our experiments indicate that SALR improves generalization, converges faster, and drives solutions to significantly flatter regions.
Collapse
|
2
|
Catania G, Decelle A, Seoane B. Copycat perceptron: Smashing barriers through collective learning. Phys Rev E 2024; 109:065313. [PMID: 39020926 DOI: 10.1103/physreve.109.065313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 06/03/2024] [Indexed: 07/20/2024]
Abstract
We characterize the equilibrium properties of a model of y coupled binary perceptrons in the teacher-student scenario, subject to a suitable cost function, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas leads to a bend of the phase diagram towards smaller values of α: This suggests that the free entropy landscape gets smoother around the solution with perfect generalization (i.e., the teacher) at a fixed fraction of examples, allowing standard thermal updating algorithms such as Simulated Annealing to easily reach the teacher solution and avoid getting trapped in metastable states as happens in the unreplicated case, even in the computationally easy regime of the inference phase diagram. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.
Collapse
|
3
|
Bi Z, Li H, Tian L. Top-down generation of low-resolution representations improves visual perception and imagination. Neural Netw 2024; 171:440-456. [PMID: 38150870 DOI: 10.1016/j.neunet.2023.12.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 11/30/2023] [Accepted: 12/18/2023] [Indexed: 12/29/2023]
Abstract
Perception or imagination requires top-down signals from high-level cortex to primary visual cortex (V1) to reconstruct or simulate the representations bottom-up stimulated by the seen images. Interestingly, top-down signals in V1 have lower spatial resolution than bottom-up representations. It is unclear why the brain uses low-resolution signals to reconstruct or simulate high-resolution representations. By modeling the top-down pathway of the visual system using the decoder of a variational auto-encoder (VAE), we reveal that low-resolution top-down signals can better reconstruct or simulate the information contained in the sparse activities of V1 simple cells, which facilitates perception and imagination. This advantage of low-resolution generation is related to facilitating high-level cortex to form geometry-respecting representations observed in experiments. Furthermore, we present two findings regarding this phenomenon in the context of AI-generated sketches, a style of drawings made of lines. First, we found that the quality of the generated sketches critically depends on the thickness of the lines in the sketches: thin-line sketches are harder to generate than thick-line sketches. Second, we propose a technique to generate high-quality thin-line sketches: instead of directly using original thin-line sketches, we use blurred sketches to train VAE or GAN (generative adversarial network), and then infer the thin-line sketches from the VAE- or GAN-generated blurred sketches. Collectively, our work suggests that low-resolution top-down generation is a strategy the brain uses to improve visual perception and imagination, which inspires new sketch-generation AI techniques.
Collapse
Affiliation(s)
- Zedong Bi
- Lingang Laboratory, Shanghai 200031, China.
| | - Haoran Li
- Department of Physics, Hong Kong Baptist University, Hong Kong, China
| | - Liang Tian
- Department of Physics, Hong Kong Baptist University, Hong Kong, China; Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Hong Kong, China; Institute of Systems Medicine and Health Sciences, Hong Kong Baptist University, Hong Kong, China; State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong, China.
| |
Collapse
|
4
|
Zambon A, Zecchina R, Tiana G. Structure of the space of folding protein sequences defined by large language models. Phys Biol 2024; 21:026002. [PMID: 38237200 DOI: 10.1088/1478-3975/ad205c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 01/18/2024] [Indexed: 02/01/2024]
Abstract
Proteins populate a manifold in the high-dimensional sequence space whose geometrical structure guides their natural evolution. Leveraging recently-developed structure prediction tools based on transformer models, we first examine the protein sequence landscape as defined by an effective energy that is a proxy of sequence foldability. This landscape shares characteristics with optimization challenges encountered in machine learning and constraint satisfaction problems. Our analysis reveals that natural proteins predominantly reside in wide, flat minima within this energy landscape. To investigate further, we employ statistical mechanics algorithms specifically designed to explore regions with high local entropy in relatively flat landscapes. Our findings indicate that these specialized algorithms can identify valleys with higher entropy compared to those found using traditional methods such as Monte Carlo Markov Chains. In a proof-of-concept case, we find that these highly entropic minima exhibit significant similarities to natural sequences, especially in critical key sites and local entropy. Additionally, evaluations through Molecular Dynamics suggests that the stability of these sequences closely resembles that of natural proteins. Our tool combines advancements in machine learning and statistical physics, providing new insights into the exploration of sequence landscapes where wide, flat minima coexist alongside a majority of narrower minima.
Collapse
Affiliation(s)
- A Zambon
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy
| | - R Zecchina
- Bocconi University, via Roentgen 1, 20136 Milano, Italy
| | - G Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy
- INFN, Sezione di Milano, Via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
5
|
Ingrosso A, Panizon E. Machine learning at the mesoscale: A computation-dissipation bottleneck. Phys Rev E 2024; 109:014132. [PMID: 38366483 DOI: 10.1103/physreve.109.014132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/05/2023] [Indexed: 02/18/2024]
Abstract
The cost of information processing in physical systems calls for a trade-off between performance and energetic expenditure. Here we formulate and study a computation-dissipation bottleneck in mesoscopic systems used as input-output devices. Using both real data sets and synthetic tasks, we show how nonequilibrium leads to enhanced performance. Our framework sheds light on a crucial compromise between information compression, input-output computation and dynamic irreversibility induced by nonreciprocal interactions.
Collapse
Affiliation(s)
- Alessandro Ingrosso
- Quantitative Life Sciences, Abdus Salam International Centre for Theoretical Physics, 34151 Trieste, Italy
| | - Emanuele Panizon
- Quantitative Life Sciences, Abdus Salam International Centre for Theoretical Physics, 34151 Trieste, Italy
| |
Collapse
|
6
|
Bi Z. Cognition of Time and Thinking Beyond. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1455:171-195. [PMID: 38918352 DOI: 10.1007/978-3-031-60183-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
A common research protocol in cognitive neuroscience is to train subjects to perform deliberately designed experiments while recording brain activity, with the aim of understanding the brain mechanisms underlying cognition. However, how the results of this protocol of research can be applied in technology is seldom discussed. Here, I review the studies on time processing of the brain as examples of this research protocol, as well as two main application areas of neuroscience (neuroengineering and brain-inspired artificial intelligence). Time processing is a fundamental dimension of cognition, and time is also an indispensable dimension of any real-world signal to be processed in technology. Therefore, one may expect that the studies of time processing in cognition profoundly influence brain-related technology. Surprisingly, I found that the results from cognitive studies on timing processing are hardly helpful in solving practical problems. This awkward situation may be due to the lack of generalizability of the results of cognitive studies, which are under well-controlled laboratory conditions, to real-life situations. This lack of generalizability may be rooted in the fundamental unknowability of the world (including cognition). Overall, this paper questions and criticizes the usefulness and prospect of the abovementioned research protocol of cognitive neuroscience. I then give three suggestions for future research. First, to improve the generalizability of research, it is better to study brain activity under real-life conditions instead of in well-controlled laboratory experiments. Second, to overcome the unknowability of the world, we can engineer an easily accessible surrogate of the object under investigation, so that we can predict the behavior of the object under investigation by experimenting on the surrogate. Third, the paper calls for technology-oriented research, with the aim of technology creation instead of knowledge discovery.
Collapse
Affiliation(s)
- Zedong Bi
- Lingang Laboratory, Shanghai, China.
- Institute for Future, Qingdao University, Qingdao, China.
- School of Automation, Shandong Key Laboratory of Industrial Control Technology, Qingdao University, Qingdao, China.
| |
Collapse
|
7
|
Annesi BL, Lauditi C, Lucibello C, Malatesta EM, Perugini G, Pittorino F, Saglietti L. Star-Shaped Space of Solutions of the Spherical Negative Perceptron. PHYSICAL REVIEW LETTERS 2023; 131:227301. [PMID: 38101365 DOI: 10.1103/physrevlett.131.227301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 09/05/2023] [Accepted: 11/08/2023] [Indexed: 12/17/2023]
Abstract
Empirical studies on the landscape of neural networks have shown that low-energy configurations are often found in complex connected structures, where zero-energy paths between pairs of distant solutions can be constructed. Here, we consider the spherical negative perceptron, a prototypical nonconvex neural network model framed as a continuous constraint satisfaction problem. We introduce a general analytical method for computing energy barriers in the simplex with vertex configurations sampled from the equilibrium. We find that in the overparametrized regime the solution manifold displays simple connectivity properties. There exists a large geodesically convex component that is attractive for a wide range of optimization dynamics. Inside this region we identify a subset of atypical high-margin solutions that are geodesically connected with most other solutions, giving rise to a star-shaped geometry. We analytically characterize the organization of the connected space of solutions and show numerical evidence of a transition, at larger constraint densities, where the aforementioned simple geodesic connectivity breaks down.
Collapse
Affiliation(s)
| | - Clarissa Lauditi
- Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy
| | - Carlo Lucibello
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
- Bocconi Institute for Data Science and Analytics, 20136 Milano, Italy
| | - Enrico M Malatesta
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
- Bocconi Institute for Data Science and Analytics, 20136 Milano, Italy
| | - Gabriele Perugini
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
| | - Fabrizio Pittorino
- Bocconi Institute for Data Science and Analytics, 20136 Milano, Italy
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, 20125 Milano, Italy
| | - Luca Saglietti
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
- Bocconi Institute for Data Science and Analytics, 20136 Milano, Italy
| |
Collapse
|
8
|
Baldassi C, Malatesta EM, Perugini G, Zecchina R. Typical and atypical solutions in nonconvex neural networks with discrete and continuous weights. Phys Rev E 2023; 108:024310. [PMID: 37723812 DOI: 10.1103/physreve.108.024310] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 08/07/2023] [Indexed: 09/20/2023]
Abstract
We study the binary and continuous negative-margin perceptrons as simple nonconvex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes nonmonotonic, indicating a breakup of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behavior is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.
Collapse
Affiliation(s)
- Carlo Baldassi
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
| | - Enrico M Malatesta
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
| | - Gabriele Perugini
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
| | - Riccardo Zecchina
- Department of Computing Sciences, Bocconi University, 20136 Milano, Italy
| |
Collapse
|
9
|
Baldassi C, Lauditi C, Malatesta EM, Pacelli R, Perugini G, Zecchina R. Learning through atypical phase transitions in overparameterized neural networks. Phys Rev E 2022; 106:014116. [PMID: 35974501 DOI: 10.1103/physreve.106.014116] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 06/09/2022] [Indexed: 06/15/2023]
Abstract
Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for nonconvex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex binary neural network models, trained on data generated from a structurally simpler but "hidden" network. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalization performance. A first transition happens at the so-called interpolation point, when solutions begin to exist (perfect fitting becomes possible). This transition reflects the properties of typical solutions, which however are in sharp minima and hard to sample. After a gap, a second transition occurs, with the discontinuous appearance of a different kind of "atypical" structures: wide regions of the weight space that are particularly solution dense and have good generalization properties. The two kinds of solutions coexist, with the typical ones being exponentially more numerous, but empirically we find that efficient algorithms sample the atypical, rare ones. This suggests that the atypical phase transition is the relevant one for learning. The results of numerical tests with realistic networks on observables suggested by the theory are consistent with this scenario.
Collapse
Affiliation(s)
- Carlo Baldassi
- Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy
| | - Clarissa Lauditi
- Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy
| | | | - Rosalba Pacelli
- Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy
| | - Gabriele Perugini
- Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy
| | - Riccardo Zecchina
- Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy
| |
Collapse
|
10
|
Lucibello C, Pittorino F, Perugini G, Zecchina R. Deep learning via message passing algorithms based on belief propagation. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac7d3b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. They yield exact marginals on tree-like graphical models and have also proven to be effective in many problems defined on loopy graphs, from inference to optimization, from signal processing to clustering. The BP-based schemes are fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement term that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with performance comparable to SGD heuristics in a diverse set of experiments on natural datasets including multi-class image classification and continual learning, while being capable of yielding improved performances on sparse networks. Furthermore, they allow to make approximate Bayesian predictions that have higher accuracy than point-wise ones.
Collapse
|
11
|
Baldassi C, Lauditi C, Malatesta EM, Perugini G, Zecchina R. Unveiling the Structure of Wide Flat Minima in Neural Networks. PHYSICAL REVIEW LETTERS 2021; 127:278301. [PMID: 35061428 DOI: 10.1103/physrevlett.127.278301] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 12/06/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here, we show that wide flat minima arise as complex extensive structures, from the coalescence of minima around "high-margin" (i.e., locally robust) configurations. Despite being exponentially rare compared to zero-margin ones, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.
Collapse
Affiliation(s)
- Carlo Baldassi
- Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy
| | - Clarissa Lauditi
- Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy
| | | | - Gabriele Perugini
- Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy
| | - Riccardo Zecchina
- Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy
| |
Collapse
|
12
|
Cui H, Saglietti L, Zdeborová L. Large deviations in the perceptron model and consequences for active learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abfbbb] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Active learning (AL) is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any AL algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing AL algorithms. We also provide a comparison with the performance of some other popular active learning strategies.
Collapse
|
13
|
Negri M, Tiana G, Zecchina R. Native state of natural proteins optimizes local entropy. Phys Rev E 2021; 104:064117. [PMID: 35030941 DOI: 10.1103/physreve.104.064117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/24/2021] [Indexed: 06/14/2023]
Abstract
The differing ability of polypeptide conformations to act as the native state of proteins has long been rationalized in terms of differing kinetic accessibility or thermodynamic stability. Building on the successful applications of physical concepts and sampling algorithms recently introduced in the study of disordered systems, in particular artificial neural networks, we quantitatively explore how well a quantity known as the local entropy describes the native state of model proteins. In lattice models and all-atom representations of proteins, we are able to efficiently sample high local entropy states and to provide a proof of concept of enhanced stability and folding rate. Our methods are based on simple and general statistical-mechanics arguments, and thus we expect that they are of very general use.
Collapse
Affiliation(s)
- M Negri
- Department Applied Science and Technology, Politecnico di Torino, CorsoDuca degli Abruzzi 24, I-10129 Turin, Italy
| | - G Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133 Milan, Italy
| | - R Zecchina
- Artificial Intelligence Lab, Bocconi University, Via Sarfatti, 25, 20136 Milan, Italy
| |
Collapse
|
14
|
Sicilia A, Zhao X, Sosnovskikh A, Hwang SJ. PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2021; 12903:560-570. [PMID: 34957473 DOI: 10.1007/978-3-030-87199-4_53] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a "thorn in the side" of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds, on network performance after training which explicitly quantify the possibility of overfitting. In this work, we explore recent advances using the PAC-Bayesian framework to provide bounds on generalization error for large (stochastic) networks. While previous efforts focus on classification in larger natural image datasets (e.g., MNIST and CIFAR-10), we apply these techniques to both classification and segmentation in a smaller medical imagining dataset: the ISIC 2018 challenge set. We observe the resultant bounds are competitive compared to a simpler baseline, while also being more explainable and alleviating the need for holdout sets.
Collapse
Affiliation(s)
- Anthony Sicilia
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, USA
| | - Xingchen Zhao
- Department of Computer Science, University of Pittsburgh, Pittsburgh, USA
| | | | - Seong Jae Hwang
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, USA.,Department of Computer Science, University of Pittsburgh, Pittsburgh, USA
| |
Collapse
|
15
|
Musso D. Partial local entropy and anisotropy in deep weight spaces. Phys Rev E 2021; 103:042303. [PMID: 34005873 DOI: 10.1103/physreve.103.042303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 03/09/2021] [Indexed: 11/07/2022]
Abstract
We refine a recently proposed class of local entropic loss functions by restricting the smoothening regularization to only a subset of weights. The new loss functions are referred to as partial local entropies. They can adapt to the weight-space anisotropy, thus outperforming their isotropic counterparts. We support the theoretical analysis with experiments on image classification tasks performed with multilayer, fully connected, and convolutional neural networks. The present study suggests how to better exploit the anisotropic nature of deep landscapes, and it provides direct probes of the shape of the minima encountered by stochastic gradient descent algorithms. As a byproduct, we observe an asymptotic dynamical regime at late training times where the temperature of all the layers obeys a common cooling behavior.
Collapse
Affiliation(s)
- Daniele Musso
- Departamento de Física de Partículas, Universidade de Santiago de Compostela (USC), Instituto Galego de Física de Altas Enerxías (IGFAE), E-15782 Santiago de Compostela, Spain; Inovalabs Digital S.L. (TECHEYE), E-36202 Vigo, Spain; and Centro de Supercomputación de Galicia (CESGA), s/n, Avenida de Vigo, 15705, Santiago de Compostela, Spain
| |
Collapse
|
16
|
Zhao H, Zhou HJ. Maximally flexible solutions of a random K-satisfiability formula. Phys Rev E 2020; 102:012301. [PMID: 32794979 DOI: 10.1103/physreve.102.012301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 06/12/2020] [Indexed: 11/07/2022]
Abstract
Random K-satisfiability (K-SAT) is a paradigmatic model system for studying phase transitions in constraint satisfaction problems and for developing empirical algorithms. The statistical properties of the random K-SAT solution space have been extensively investigated, but most earlier efforts focused on solutions that are typical. Here we consider maximally flexible solutions which satisfy all the constraints only using the minimum number of variables. Such atypical solutions have high internal entropy because they contain a maximum number of null variables which are completely free to choose their states. Each maximally flexible solution indicates a dense region of the solution space. We estimate the maximum fraction of null variables by the replica-symmetric cavity method, and implement message-passing algorithms to construct maximally flexible solutions for single K-SAT instances.
Collapse
Affiliation(s)
- Han Zhao
- CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China and School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hai-Jun Zhou
- CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China and School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
17
|
Construction of cascaded depth model based on boosting feature selection and classification. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-020-00413-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Westerhout T, Astrakhantsev N, Tikhonov KS, Katsnelson MI, Bagrov AA. Generalization properties of neural network approximations to frustrated magnet ground states. Nat Commun 2020; 11:1593. [PMID: 32221284 PMCID: PMC7101385 DOI: 10.1038/s41467-020-15402-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 02/28/2020] [Indexed: 01/18/2023] Open
Abstract
Neural quantum states (NQS) attract a lot of attention due to their potential to serve as a very expressive variational ansatz for quantum many-body systems. Here we study the main factors governing the applicability of NQS to frustrated magnets by training neural networks to approximate ground states of several moderately-sized Hamiltonians using the corresponding wave function structure on a small subset of the Hilbert space basis as training dataset. We notice that generalization quality, i.e. the ability to learn from a limited number of samples and correctly approximate the target state on the rest of the space, drops abruptly when frustration is increased. We also show that learning the sign structure is considerably more difficult than learning amplitudes. Finally, we conclude that the main issue to be addressed at this stage, in order to use the method of NQS for simulating realistic models, is that of generalization rather than expressibility.
Collapse
Affiliation(s)
- Tom Westerhout
- Institute for Molecules and Materials, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
| | - Nikita Astrakhantsev
- Physik-Institut, Universität Zürich, Winterthurerstrasse 190, CH-8057, Zürich, Switzerland.
- Moscow Institute of Physics and Technology, Institutsky lane 9, 141700, Dolgoprudny, Russia.
- Institute for Theoretical and Experimental Physics NRC Kurchatov Institute, 117218, Moscow, Russia.
| | - Konstantin S Tikhonov
- Skolkovo Institute of Science and Technology, 143026, Skolkovo, Russia.
- Institut für Nanotechnologie, Karlsruhe Institute of Technology, 76021, Karlsruhe, Germany.
- Landau Institute for Theoretical Physics RAS, 119334, Moscow, Russia.
| | - Mikhail I Katsnelson
- Institute for Molecules and Materials, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands
- Theoretical Physics and Applied Mathematics Department, Ural Federal University, 620002, Yekaterinburg, Russia
| | - Andrey A Bagrov
- Institute for Molecules and Materials, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
- Theoretical Physics and Applied Mathematics Department, Ural Federal University, 620002, Yekaterinburg, Russia.
- Department of Physics and Astronomy, Uppsala University, Box 516, SE-75120, Uppsala, Sweden.
| |
Collapse
|
19
|
Abstract
Deep neural networks (DNN) are becoming fundamental learning devices for extracting information from data in a variety of real-world applications and in natural and social sciences. The learning process in DNN consists of finding a minimizer of a loss function that measures how well the data are classified. This optimization task is typically solved by tuning millions of parameters by stochastic gradient algorithms. This process can be thought of as an exploration process of a highly nonconvex landscape. Here we show that such landscapes possess very peculiar wide flat minima and that the current models have been shaped to make the loss functions and the algorithms focus on those minima. We also derive efficient algorithmic solutions. Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.
Collapse
|
20
|
Baldassi C, Malatesta EM, Zecchina R. Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations. PHYSICAL REVIEW LETTERS 2019; 123:170602. [PMID: 31702271 DOI: 10.1103/physrevlett.123.170602] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Indexed: 06/10/2023]
Abstract
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learning systems. This choice was originally suggested as a way to compensate for the so-called vanishing gradient problem which can undercut stochastic gradient descent learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: While the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.
Collapse
Affiliation(s)
- Carlo Baldassi
- Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, Milano 20135, Italy
| | - Enrico M Malatesta
- Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, Milano 20135, Italy
| | - Riccardo Zecchina
- Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, Milano 20135, Italy
| |
Collapse
|
21
|
García Trillos N, Kaplan Z, Sanz-Alonso D. Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning. ENTROPY (BASEL, SWITZERLAND) 2019; 21:e21050511. [PMID: 33267225 PMCID: PMC7515000 DOI: 10.3390/e21050511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/06/2019] [Accepted: 05/11/2019] [Indexed: 06/12/2023]
Abstract
The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses, we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of a best Gaussian approximation in Kullback-Leibler divergence. Disregarding approximation error in these two steps, the variational characterizations allow us to show a simple monotonicity result for training error along optimization iterates. The two-step optimization schemes for local entropy and heat regularized loss differ only over which argument of the Kullback-Leibler divergence is used to find the best Gaussian approximation. Local entropy corresponds to minimizing over the second argument, and the solution is given by moment matching. This allows replacing traditional backpropagation calculation of gradients by sampling algorithms, opening an avenue for gradient-free, parallelizable training of neural networks. However, our presentation also acknowledges the potential increase in computational cost of naive optimization of regularized costs, thus giving a less optimistic view than existing works of the gains facilitated by loss regularization.
Collapse
Affiliation(s)
| | - Zachary Kaplan
- Division of Applied Mathematics, Brown University, Providence, RI 02906, USA
| | | |
Collapse
|
22
|
Optimization of neural networks via finite-value quantum fluctuations. Sci Rep 2018; 8:9950. [PMID: 29967442 PMCID: PMC6028692 DOI: 10.1038/s41598-018-28212-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 06/19/2018] [Indexed: 11/30/2022] Open
Abstract
We numerically test an optimization method for deep neural networks (DNNs) using quantum fluctuations inspired by quantum annealing. For efficient optimization, our method utilizes the quantum tunneling effect beyond the potential barriers. The path integral formulation of the DNN optimization generates an attracting force to simulate the quantum tunneling effect. In the standard quantum annealing method, the quantum fluctuations will vanish at the last stage of optimization. In this study, we propose a learning protocol that utilizes a finite value for quantum fluctuations strength to obtain higher generalization performance, which is a type of robustness. We demonstrate the performance of our method using two well-known open datasets: the MNIST dataset and the Olivetti face dataset. Although computational costs prevent us from testing our method on large datasets with high-dimensional data, results show that our method can enhance generalization performance by induction of the finite value for quantum fluctuations.
Collapse
|
23
|
Baldassi C, Gerace F, Kappen HJ, Lucibello C, Saglietti L, Tartaglione E, Zecchina R. Role of Synaptic Stochasticity in Training Low-Precision Neural Networks. PHYSICAL REVIEW LETTERS 2018; 120:268103. [PMID: 30004730 DOI: 10.1103/physrevlett.120.268103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 03/19/2018] [Indexed: 06/08/2023]
Abstract
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension that allows to train discrete deep neural networks is also investigated.
Collapse
Affiliation(s)
- Carlo Baldassi
- Bocconi Institute for Data Science and Analytics, Bocconi University, Milano 20136, Italy
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Torino, Torino 10129, Italy
| | - Federica Gerace
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Hilbert J Kappen
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour 6525 EZ Nijmegen, Netherlands
| | - Carlo Lucibello
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Luca Saglietti
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Enzo Tartaglione
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Torino 10129, Italy
| | - Riccardo Zecchina
- Bocconi Institute for Data Science and Analytics, Bocconi University, Milano 20136, Italy
- Italian Institute for Genomic Medicine, Torino 10126, Italy
- International Centre for Theoretical Physics, Trieste 34151, Italy
| |
Collapse
|
24
|
Abstract
Quantum annealers aim at solving nonconvex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists of designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to identify classes of nonconvex optimization problems for which quantum annealing remains efficient while thermal annealing fails. We show that this happens for a wide class of problems which are central to machine learning. Their energy landscapes are dominated by local minima that cause exponential slowdown of classical thermal annealers while simulated quantum annealing converges efficiently to rare dense regions of optimal solutions.
Collapse
|
25
|
Rubin SJ, Xu N, Sandvik AW. Dual time scales in simulated annealing of a two-dimensional Ising spin glass. Phys Rev E 2017; 95:052133. [PMID: 28618601 DOI: 10.1103/physreve.95.052133] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Indexed: 11/07/2022]
Abstract
We apply a generalized Kibble-Zurek out-of-equilibrium scaling ansatz to simulated annealing when approaching the spin-glass transition at temperature T=0 of the two-dimensional Ising model with random J=±1 couplings. Analyzing the spin-glass order parameter and the excess energy as functions of the system size and the annealing velocity in Monte Carlo simulations with Metropolis dynamics, we find scaling where the energy relaxes slower than the spin-glass order parameter, i.e., there are two different dynamic exponents. The values of the exponents relating the relaxation time scales to the system length, τ∼L^{z}, are z=8.28±0.03 for the relaxation of the order parameter and z=10.31±0.04 for the energy relaxation. We argue that the behavior with dual time scales arises as a consequence of the entropy-driven ordering mechanism within droplet theory. We point out that the dynamic exponents found here for T→0 simulated annealing are different from the temperature-dependent equilibrium dynamic exponent z_{eq}(T), for which previous studies have found a divergent behavior: z_{eq}(T→0)→∞. Thus, our study shows that, within Metropolis dynamics, it is easier to relax the system to one of its degenerate ground states than to migrate at low temperatures between regions of the configuration space surrounding different ground states. In a more general context of optimization, our study provides an example of robust dense-region solutions for which the excess energy (the conventional cost function) may not be the best measure of success.
Collapse
Affiliation(s)
- Shanon J Rubin
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Na Xu
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Anders W Sandvik
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
26
|
Baldassi C, Borgs C, Chayes JT, Ingrosso A, Lucibello C, Saglietti L, Zecchina R. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc Natl Acad Sci U S A 2016; 113:E7655-E7662. [PMID: 27856745 PMCID: PMC5137727 DOI: 10.1073/pnas.1608103113] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.
Collapse
Affiliation(s)
- Carlo Baldassi
- Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy;
- Human Genetics Foundation-Torino, I-10126 Torino, Italy
| | | | | | - Alessandro Ingrosso
- Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, I-10126 Torino, Italy
| | - Carlo Lucibello
- Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, I-10126 Torino, Italy
| | - Luca Saglietti
- Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, I-10126 Torino, Italy
| | - Riccardo Zecchina
- Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy
- Human Genetics Foundation-Torino, I-10126 Torino, Italy
- Collegio Carlo Alberto, I-10024 Moncalieri, Italy
| |
Collapse
|
27
|
Baldassi C, Gerace F, Lucibello C, Saglietti L, Zecchina R. Learning may need only a few bits of synaptic precision. Phys Rev E 2016; 93:052313. [PMID: 27300916 DOI: 10.1103/physreve.93.052313] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Indexed: 11/07/2022]
Abstract
Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with multiple states and generally more plausible biological features. The results clearly indicate that the overall qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic precision (i.e., the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that, for practical applications, only few bits may be needed for near-optimal performance, consistent with recent biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient algorithmic search strategies.
Collapse
Affiliation(s)
- Carlo Baldassi
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Federica Gerace
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Carlo Lucibello
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Luca Saglietti
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy
| | - Riccardo Zecchina
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.,Human Genetics Foundation-Torino, Via Nizza 52, I-10126 Torino, Italy.,Collegio Carlo Alberto, Via Real Collegio 30, I-10024 Moncalieri, Italy
| |
Collapse
|
28
|
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. COMPUTER VISION – ECCV 2016 2016. [DOI: 10.1007/978-3-319-46493-0_32] [Citation(s) in RCA: 1017] [Impact Index Per Article: 127.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|