1
|
Baglioni P, Pacelli R, Aiudi R, Di Renzo F, Vezzani A, Burioni R, Rotondo P. Predictive Power of a Bayesian Effective Action for Fully Connected One Hidden Layer Neural Networks in the Proportional Limit. PHYSICAL REVIEW LETTERS 2024; 133:027301. [PMID: 39073956 DOI: 10.1103/physrevlett.133.027301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/26/2024] [Accepted: 05/29/2024] [Indexed: 07/31/2024]
Abstract
We perform accurate numerical experiments with fully connected one hidden layer neural networks trained with a discretized Langevin dynamics on the MNIST and CIFAR10 datasets. Our goal is to empirically determine the regimes of validity of a recently derived Bayesian effective action for shallow architectures in the proportional limit. We explore the predictive power of the theory as a function of the parameters (the temperature T, the magnitude of the Gaussian priors λ_{1}, λ_{0}, the size of the hidden layer N_{1}, and the size of the training set P) by comparing the experimental and predicted generalization error. The very good agreement between the effective theory and the experiments represents an indication that global rescaling of the infinite-width kernel is a main physical mechanism for kernel renormalization in fully connected Bayesian standard-scaled shallow networks.
Collapse
Affiliation(s)
- P Baglioni
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy
- INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy
| | | | - R Aiudi
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy
- INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy
| | - F Di Renzo
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy
- INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy
| | - A Vezzani
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy
- INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy
- Istituto dei Materiali per l'Elettronica ed il Magnetismo (IMEM-CNR), Parco Area delle Scienze, 37/A-43124 Parma, Italy
| | - R Burioni
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy
- INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy
| | | |
Collapse
|
2
|
Lippl S, Kay K, Jensen G, Ferrera VP, Abbott LF. A mathematical theory of relational generalization in transitive inference. Proc Natl Acad Sci U S A 2024; 121:e2314511121. [PMID: 38968113 PMCID: PMC11252811 DOI: 10.1073/pnas.2314511121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 05/30/2024] [Indexed: 07/07/2024] Open
Abstract
Humans and animals routinely infer relations between different items or events and generalize these relations to novel combinations of items. This allows them to respond appropriately to radically novel circumstances and is fundamental to advanced cognition. However, how learning systems (including the brain) can implement the necessary inductive biases has been unclear. We investigated transitive inference (TI), a classic relational task paradigm in which subjects must learn a relation ([Formula: see text] and [Formula: see text]) and generalize it to new combinations of items ([Formula: see text]). Through mathematical analysis, we found that a broad range of biologically relevant learning models (e.g. gradient flow or ridge regression) perform TI successfully and recapitulate signature behavioral patterns long observed in living subjects. First, we found that models with item-wise additive representations automatically encode transitive relations. Second, for more general representations, a single scalar "conjunctivity factor" determines model behavior on TI and, further, the principle of norm minimization (a standard statistical inductive bias) enables models with fixed, partly conjunctive representations to generalize transitively. Finally, neural networks in the "rich regime," which enables representation learning and improves generalization on many tasks, unexpectedly show poor generalization and anomalous behavior on TI. We find that such networks implement a form of norm minimization (over hidden weights) that yields a local encoding mechanism lacking transitivity. Our findings show how minimal statistical learning principles give rise to a classical relational inductive bias (transitivity), explain empirically observed behaviors, and establish a formal approach to understanding the neural basis of relational abstraction.
Collapse
Affiliation(s)
- Samuel Lippl
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
| | - Kenneth Kay
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University, New York, NY10027
- Grossman Center for the Statistics of Mind, Columbia University, New York, NY10027
| | - Greg Jensen
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
- Department of Psychology, Reed College, Portland, OR97202
| | - Vincent P. Ferrera
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
- Department of Psychiatry, Columbia University Medical Center, New York, NY10032
| | - L. F. Abbott
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY10027
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University, New York, NY10027
- Department of Neuroscience, Columbia University Medical Center, New York, NY10032
| |
Collapse
|
3
|
Bahri Y, Dyer E, Kaplan J, Lee J, Sharma U. Explaining neural scaling laws. Proc Natl Acad Sci U S A 2024; 121:e2311878121. [PMID: 38913889 PMCID: PMC11228526 DOI: 10.1073/pnas.2311878121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 03/05/2024] [Indexed: 06/26/2024] Open
Abstract
The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents under modifications of task and architecture aspect ratio. Our work provides a taxonomy for classifying different scaling regimes, underscores that there can be different mechanisms driving improvements in loss, and lends insight into the microscopic origin and relationships between scaling exponents.
Collapse
Affiliation(s)
| | | | - Jared Kaplan
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD21218
| | | | - Utkarsh Sharma
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD21218
| |
Collapse
|
4
|
Li Q, Sorscher B, Sompolinsky H. Representations and generalization in artificial and brain neural networks. Proc Natl Acad Sci U S A 2024; 121:e2311805121. [PMID: 38913896 PMCID: PMC11228472 DOI: 10.1073/pnas.2311805121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Humans and animals excel at generalizing from limited data, a capability yet to be fully replicated in artificial intelligence. This perspective investigates generalization in biological and artificial deep neural networks (DNNs), in both in-distribution and out-of-distribution contexts. We introduce two hypotheses: First, the geometric properties of the neural manifolds associated with discrete cognitive entities, such as objects, words, and concepts, are powerful order parameters. They link the neural substrate to the generalization capabilities and provide a unified methodology bridging gaps between neuroscience, machine learning, and cognitive science. We overview recent progress in studying the geometry of neural manifolds, particularly in visual object recognition, and discuss theories connecting manifold dimension and radius to generalization capacity. Second, we suggest that the theory of learning in wide DNNs, especially in the thermodynamic limit, provides mechanistic insights into the learning processes generating desired neural representational geometries and generalization. This includes the role of weight norm regularization, network architecture, and hyper-parameters. We will explore recent advances in this theory and ongoing challenges. We also discuss the dynamics of learning and its relevance to the issue of representational drift in the brain.
Collapse
Affiliation(s)
- Qianyi Li
- The Harvard Biophysics Graduate Program, Harvard University, Cambridge, MA02138
- Center for Brain Science, Harvard University, Cambridge, MA02138
| | - Ben Sorscher
- The Applied Physics Department, Stanford University, Stanford, CA94305
| | - Haim Sompolinsky
- Center for Brain Science, Harvard University, Cambridge, MA02138
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem9190401, Israel
| |
Collapse
|
5
|
Lippl S, Kay K, Jensen G, Ferrera VP, Abbott L. A mathematical theory of relational generalization in transitive inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.22.554287. [PMID: 37662223 PMCID: PMC10473627 DOI: 10.1101/2023.08.22.554287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Humans and animals routinely infer relations between different items or events and generalize these relations to novel combinations of items. This allows them to respond appropriately to radically novel circumstances and is fundamental to advanced cognition. However, how learning systems (including the brain) can implement the necessary inductive biases has been unclear. Here we investigated transitive inference (TI), a classic relational task paradigm in which subjects must learn a relation (A > B and B > C) and generalize it to new combinations of items (A > C). Through mathematical analysis, we found that a broad range of biologically relevant learning models (e.g. gradient flow or ridge regression) perform TI successfully and recapitulate signature behavioral patterns long observed in living subjects. First, we found that models with item-wise additive representations automatically encode transitive relations. Second, for more general representations, a single scalar "conjunctivity factor" determines model behavior on TI and, further, the principle of norm minimization (a standard statistical inductive bias) enables models with fixed, partly conjunctive representations to generalize transitively. Finally, neural networks in the "rich regime," which enables representation learning and has been found to improve generalization, unexpectedly show poor generalization and anomalous behavior. We find that such networks implement a form of norm minimization (over hidden weights) that yields a local encoding mechanism lacking transitivity. Our findings show how minimal statistical learning principles give rise to a classical relational inductive bias (transitivity), explain empirically observed behaviors, and establish a formal approach to understanding the neural basis of relational abstraction.
Collapse
Affiliation(s)
- Samuel Lippl
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, NY
- Center for Theoretical Neuroscience, Columbia University, NY
- Department of Neuroscience, Columbia University Medical Center, NY
| | - Kenneth Kay
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, NY
- Center for Theoretical Neuroscience, Columbia University, NY
- Grossman Center for the Statistics of Mind, Columbia University, NY
| | - Greg Jensen
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, NY
- Department of Neuroscience, Columbia University Medical Center, NY
- Department of Psychology at Reed College, OR
| | - Vincent P. Ferrera
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, NY
- Department of Neuroscience, Columbia University Medical Center, NY
- Department of Psychiatry, Columbia University Medical Center, NY
| | - L.F. Abbott
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, NY
- Center for Theoretical Neuroscience, Columbia University, NY
- Department of Neuroscience, Columbia University Medical Center, NY
| |
Collapse
|
6
|
Shi C, Pan L, Hu H, Dokmanić I. Homophily modulates double descent generalization in graph convolution networks. Proc Natl Acad Sci U S A 2024; 121:e2309504121. [PMID: 38346190 PMCID: PMC10895367 DOI: 10.1073/pnas.2309504121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 01/17/2024] [Indexed: 02/28/2024] Open
Abstract
Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental observations of "transductive" double descent in key networks and datasets, we use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. Our results illuminate the nuances of learning on homophilic versus heterophilic data and predict double descent whose existence in GNNs has been questioned by recent work. We show how risk is shaped by the interplay between the graph noise, feature noise, and the number of training labels. Our findings apply beyond stylized models, capturing qualitative trends in real-world GNNs and datasets. As a case in point, we use our analytic insights to improve performance of state-of-the-art graph convolution networks on heterophilic datasets.
Collapse
Affiliation(s)
- Cheng Shi
- Departement Mathematik und Informatik, Universität Basel, Basel4051, Switzerland
| | - Liming Pan
- School of Cyber Science and Technology, University of Science and Technology of China, Hefei230026, China
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing210023, China
| | - Hong Hu
- Wharton Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA19104-1686
| | - Ivan Dokmanić
- Departement Mathematik und Informatik, Universität Basel, Basel4051, Switzerland
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL61801
| |
Collapse
|
7
|
Ruben BS, Pehlevan C. Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles. ARXIV 2024:arXiv:2307.03176v3. [PMID: 37461424 PMCID: PMC10350086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map.
Collapse
Affiliation(s)
| | - Cengiz Pehlevan
- Center for Brain Science, Harvard University, Cambridge, MA 02138
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA 02138
| |
Collapse
|
8
|
Yao B, Xu Y, Jing J, Zhang W, Guo Y, Zhang Z, Zhang S, Liu J, Xue C. Comparison and Verification of Three Algorithms for Accuracy Improvement of Quartz Resonant Pressure Sensors. MICROMACHINES 2023; 15:23. [PMID: 38258142 PMCID: PMC10819135 DOI: 10.3390/mi15010023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/24/2024]
Abstract
Pressure measurement is of great importance due to its wide range of applications in many fields. AT-cut quartz, with its exceptional precision and durability, stands out as an excellent pressure transducer due to its superior accuracy and stable performance over time. However, its intrinsic temperature dependence significantly hinders its potential application in varying temperature environments. Herein, three different learning algorithms (i.e., multivariate polynomial regression, multilayer perceptron networks, and support vector regression) are elaborated in detail and applied to establish the prediction models for compensating the temperature effect of the resonant pressure sensor, respectively. The AC-cut quartz, which is sensitive to temperature variations, is paired with the AT-cut quartz, providing the essential temperature information. The output frequencies derived from the AT-cut and AC-cut quartzes are selected as input data for these learning algorithms. Through experimental validation, all three methods are effective, and a remarkable improvement in accuracy can be achieved. Among the three methods, the MPR model has exceptionally high accuracy in predicting pressure. The calculated residual error over the temperature range of -10-40 °C is less than 0.008% of 40 MPa full scale (FS). An intelligent automatic compensation and real-time processing system for the resonant pressure sensor is developed as well, which may contribute to improving the efficiency in online calibration and large-scale industrialization. This paper paves a promising way for the temperature compensation of resonant pressure sensors.
Collapse
Affiliation(s)
- Bin Yao
- Key Laboratory of Instrumentation Science & Dynamic Measurement, Ministry of Education, North University of China, Taiyuan 030051, China; (B.Y.); (Y.G.); (S.Z.); (J.L.)
| | - Yanbo Xu
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361102, China; (Y.X.); (J.J.); (W.Z.)
| | - Junming Jing
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361102, China; (Y.X.); (J.J.); (W.Z.)
| | - Wenjun Zhang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361102, China; (Y.X.); (J.J.); (W.Z.)
| | - Yuzhen Guo
- Key Laboratory of Instrumentation Science & Dynamic Measurement, Ministry of Education, North University of China, Taiyuan 030051, China; (B.Y.); (Y.G.); (S.Z.); (J.L.)
| | - Zengxing Zhang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361102, China; (Y.X.); (J.J.); (W.Z.)
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, China
| | - Shiqiang Zhang
- Key Laboratory of Instrumentation Science & Dynamic Measurement, Ministry of Education, North University of China, Taiyuan 030051, China; (B.Y.); (Y.G.); (S.Z.); (J.L.)
| | - Jianwei Liu
- Key Laboratory of Instrumentation Science & Dynamic Measurement, Ministry of Education, North University of China, Taiyuan 030051, China; (B.Y.); (Y.G.); (S.Z.); (J.L.)
| | - Chenyang Xue
- Key Laboratory of Instrumentation Science & Dynamic Measurement, Ministry of Education, North University of China, Taiyuan 030051, China; (B.Y.); (Y.G.); (S.Z.); (J.L.)
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, China
| |
Collapse
|
9
|
Farrell M, Recanatesi S, Shea-Brown E. From lazy to rich to exclusive task representations in neural networks and neural codes. Curr Opin Neurobiol 2023; 83:102780. [PMID: 37757585 DOI: 10.1016/j.conb.2023.102780] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/04/2023] [Accepted: 08/16/2023] [Indexed: 09/29/2023]
Abstract
Neural circuits-both in the brain and in "artificial" neural network models-learn to solve a remarkable variety of tasks, and there is a great current opportunity to use neural networks as models for brain function. Key to this endeavor is the ability to characterize the representations formed by both artificial and biological brains. Here, we investigate this potential through the lens of recently developing theory that characterizes neural networks as "lazy" or "rich" depending on the approach they use to solve tasks: lazy networks solve tasks by making small changes in connectivity, while rich networks solve tasks by significantly modifying weights throughout the network (including "hidden layers"). We further elucidate rich networks through the lens of compression and "neural collapse", ideas that have recently been of significant interest to neuroscience and machine learning. We then show how these ideas apply to a domain of increasing importance to both fields: extracting latent structures through self-supervised learning.
Collapse
Affiliation(s)
- Matthew Farrell
- John A. Paulson School of Engineering and Applied Sciences, Harvard University and Center for Brain Science, Harvard University, United States
| | - Stefano Recanatesi
- Applied Mathematics, Physiology and Biophysics, and Computational Neuroscience Center, University of Washington, United States
| | - Eric Shea-Brown
- Applied Mathematics, Physiology and Biophysics, and Computational Neuroscience Center, University of Washington, United States.
| |
Collapse
|
10
|
Xie M, Muscinelli SP, Decker Harris K, Litwin-Kumar A. Task-dependent optimal representations for cerebellar learning. eLife 2023; 12:e82914. [PMID: 37671785 PMCID: PMC10541175 DOI: 10.7554/elife.82914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/05/2023] [Indexed: 09/07/2023] Open
Abstract
The cerebellar granule cell layer has inspired numerous theoretical models of neural representations that support learned behaviors, beginning with the work of Marr and Albus. In these models, granule cells form a sparse, combinatorial encoding of diverse sensorimotor inputs. Such sparse representations are optimal for learning to discriminate random stimuli. However, recent observations of dense, low-dimensional activity across granule cells have called into question the role of sparse coding in these neurons. Here, we generalize theories of cerebellar learning to determine the optimal granule cell representation for tasks beyond random stimulus discrimination, including continuous input-output transformations as required for smooth motor control. We show that for such tasks, the optimal granule cell representation is substantially denser than predicted by classical theories. Our results provide a general theory of learning in cerebellum-like systems and suggest that optimal cerebellar representations are task-dependent.
Collapse
Affiliation(s)
- Marjorie Xie
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Samuel P Muscinelli
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Kameron Decker Harris
- Department of Computer Science, Western Washington UniversityBellinghamUnited States
| | - Ashok Litwin-Kumar
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| |
Collapse
|
11
|
Mastrogiuseppe F, Hiratani N, Latham P. Evolution of neural activity in circuits bridging sensory and abstract knowledge. eLife 2023; 12:79908. [PMID: 36881019 PMCID: PMC9991064 DOI: 10.7554/elife.79908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 01/06/2023] [Indexed: 03/08/2023] Open
Abstract
The ability to associate sensory stimuli with abstract classes is critical for survival. How are these associations implemented in brain circuits? And what governs how neural activity evolves during abstract knowledge acquisition? To investigate these questions, we consider a circuit model that learns to map sensory input to abstract classes via gradient-descent synaptic plasticity. We focus on typical neuroscience tasks (simple, and context-dependent, categorization), and study how both synaptic connectivity and neural activity evolve during learning. To make contact with the current generation of experiments, we analyze activity via standard measures such as selectivity, correlations, and tuning symmetry. We find that the model is able to recapitulate experimental observations, including seemingly disparate ones. We determine how, in the model, the behaviour of these measures depends on details of the circuit and the task. These dependencies make experimentally testable predictions about the circuitry supporting abstract knowledge acquisition in the brain.
Collapse
Affiliation(s)
| | - Naoki Hiratani
- Center for Brain Science, Harvard UniversityHarvardUnited States
| | - Peter Latham
- Gatsby Computational Neuroscience Unit, University College LondonLondonUnited Kingdom
| |
Collapse
|
12
|
Beiran M, Meirhaeghe N, Sohn H, Jazayeri M, Ostojic S. Parametric control of flexible timing through low-dimensional neural manifolds. Neuron 2023; 111:739-753.e8. [PMID: 36640766 PMCID: PMC9992137 DOI: 10.1016/j.neuron.2022.12.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 09/23/2022] [Accepted: 12/08/2022] [Indexed: 01/15/2023]
Abstract
Biological brains possess an unparalleled ability to adapt behavioral responses to changing stimuli and environments. How neural processes enable this capacity is a fundamental open question. Previous works have identified two candidate mechanisms: a low-dimensional organization of neural activity and a modulation by contextual inputs. We hypothesized that combining the two might facilitate generalization and adaptation in complex tasks. We tested this hypothesis in flexible timing tasks where dynamics play a key role. Examining trained recurrent neural networks, we found that confining the dynamics to a low-dimensional subspace allowed tonic inputs to parametrically control the overall input-output transform, enabling generalization to novel inputs and adaptation to changing conditions. Reverse-engineering and theoretical analyses demonstrated that this parametric control relies on a mechanism where tonic inputs modulate the dynamics along non-linear manifolds while preserving their geometry. Comparisons with data from behaving monkeys confirmed the behavioral and neural signatures of this mechanism.
Collapse
Affiliation(s)
- Manuel Beiran
- Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960, Ecole Normale Superieure - PSL University, 75005 Paris, France; Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Nicolas Meirhaeghe
- Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Institut de Neurosciences de la Timone (INT), UMR 7289, CNRS, Aix-Marseille Université, Marseille 13005, France
| | - Hansem Sohn
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mehrdad Jazayeri
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Srdjan Ostojic
- Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960, Ecole Normale Superieure - PSL University, 75005 Paris, France.
| |
Collapse
|
13
|
Do Q, Li Y, Kane GA, McGuire JT, Scott BB. Assessing evidence accumulation and rule learning in humans with an online game. J Neurophysiol 2023; 129:131-143. [PMID: 36475830 DOI: 10.1152/jn.00124.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Evidence accumulation, an essential component of perception and decision making, is frequently studied with psychophysical tasks involving noisy or ambiguous stimuli. In these tasks, participants typically receive verbal or written instructions that describe the strategy that should be used to guide decisions. Although convenient and effective, explicit instructions can influence learning and decision making strategies and can limit comparisons with animal models, in which behaviors are reinforced through feedback. Here, we developed an online video game and nonverbal training pipeline, inspired by pulse-based tasks for rodents, as an alternative to traditional psychophysical tasks used to study evidence accumulation. Using this game, we collected behavioral data from hundreds of participants trained with an explicit description of the decision rule or with experiential feedback. Participants trained with feedback alone learned the game rules rapidly and used strategies and displayed biases similar to those who received explicit instructions. Finally, by leveraging data across hundreds of participants, we show that perceptual judgments were well described by an accumulation process in which noise scaled nonlinearly with evidence, consistent with previous animal studies but inconsistent with diffusion models widely used to describe perceptual decisions in humans. These results challenge the conventional description of the accumulation process and suggest that online games provide a valuable platform to examine perceptual decision making and learning in humans. In addition, the feedback-based training pipeline developed for this game may be useful for evaluating perceptual decision making in human populations with difficulty following verbal instructions.NEW & NOTEWORTHY Perceptual uncertainty sets critical constraints on our ability to accumulate evidence and make decisions; however, its sources remain unclear. We developed a video game, and feedback-based training pipeline, to study uncertainty during decision making. Leveraging choices from hundreds of subjects, we demonstrate that human choices are inconsistent with popular diffusion models of human decision making and instead are best fit by models in which perceptual uncertainty scales nonlinearly with the strength of sensory evidence.
Collapse
Affiliation(s)
- Quan Do
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston, Massachusetts
| | - Yutong Li
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston, Massachusetts
| | - Gary A Kane
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston, Massachusetts
| | - Joseph T McGuire
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston, Massachusetts
| | - Benjamin B Scott
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston, Massachusetts
| |
Collapse
|
14
|
Benjamin AS, Zhang LQ, Qiu C, Stocker AA, Kording KP. Efficient neural codes naturally emerge through gradient descent learning. Nat Commun 2022; 13:7972. [PMID: 36581618 PMCID: PMC9800366 DOI: 10.1038/s41467-022-35659-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 12/14/2022] [Indexed: 12/30/2022] Open
Abstract
Human sensory systems are more sensitive to common features in the environment than uncommon features. For example, small deviations from the more frequently encountered horizontal orientations can be more easily detected than small deviations from the less frequent diagonal ones. Here we find that artificial neural networks trained to recognize objects also have patterns of sensitivity that match the statistics of features in images. To interpret these findings, we show mathematically that learning with gradient descent in neural networks preferentially creates representations that are more sensitive to common features, a hallmark of efficient coding. This effect occurs in systems with otherwise unconstrained coding resources, and additionally when learning towards both supervised and unsupervised objectives. This result demonstrates that efficient codes can naturally emerge from gradient-like learning.
Collapse
Affiliation(s)
- Ari S Benjamin
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
| | - Ling-Qi Zhang
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Cheng Qiu
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Alan A Stocker
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Konrad P Kording
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
15
|
Bordelon B, Pehlevan C. Population codes enable learning from few examples by shaping inductive bias. eLife 2022; 11:e78606. [PMID: 36524716 PMCID: PMC9839349 DOI: 10.7554/elife.78606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Learning from a limited number of experiences requires suitable inductive biases. To identify how inductive biases are implemented in and shaped by neural codes, we analyze sample-efficient learning of arbitrary stimulus-response maps from arbitrary neural codes with biologically-plausible readouts. We develop an analytical theory that predicts the generalization error of the readout as a function of the number of observed examples. Our theory illustrates in a mathematically precise way how the structure of population codes shapes inductive bias, and how a match between the code and the task is crucial for sample-efficient learning. It elucidates a bias to explain observed data with simple stimulus-response maps. Using recordings from the mouse primary visual cortex, we demonstrate the existence of an efficiency bias towards low-frequency orientation discrimination tasks for grating stimuli and low spatial frequency reconstruction tasks for natural images. We reproduce the discrimination bias in a simple model of primary visual cortex, and further show how invariances in the code to certain stimulus variations alter learning performance. We extend our methods to time-dependent neural codes and predict the sample efficiency of readouts from recurrent networks. We observe that many different codes can support the same inductive bias. By analyzing recordings from the mouse primary visual cortex, we demonstrate that biological codes have lower total activity than other codes with identical bias. Finally, we discuss implications of our theory in the context of recent developments in neuroscience and artificial intelligence. Overall, our study provides a concrete method for elucidating inductive biases of the brain and promotes sample-efficient learning as a general normative coding principle.
Collapse
Affiliation(s)
- Blake Bordelon
- John A Paulson School of Engineering and Applied Sciences, Harvard UniversityCambridgeUnited States
- Center for Brain Science, Harvard UniversityCambridgeUnited States
| | - Cengiz Pehlevan
- John A Paulson School of Engineering and Applied Sciences, Harvard UniversityCambridgeUnited States
- Center for Brain Science, Harvard UniversityCambridgeUnited States
| |
Collapse
|
16
|
Pandey B, Pachitariu M, Brunton BW, Harris KD. Structured random receptive fields enable informative sensory encodings. PLoS Comput Biol 2022; 18:e1010484. [PMID: 36215307 PMCID: PMC9584455 DOI: 10.1371/journal.pcbi.1010484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 10/20/2022] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
Brains must represent the outside world so that animals survive and thrive. In early sensory systems, neural populations have diverse receptive fields structured to detect important features in inputs, yet significant variability has been ignored in classical models of sensory neurons. We model neuronal receptive fields as random, variable samples from parameterized distributions and demonstrate this model in two sensory modalities using data from insect mechanosensors and mammalian primary visual cortex. Our approach leads to a significant theoretical connection between the foundational concepts of receptive fields and random features, a leading theory for understanding artificial neural networks. The modeled neurons perform a randomized wavelet transform on inputs, which removes high frequency noise and boosts the signal. Further, these random feature neurons enable learning from fewer training samples and with smaller networks in artificial tasks. This structured random model of receptive fields provides a unifying, mathematically tractable framework to understand sensory encodings across both spatial and temporal domains.
Collapse
Affiliation(s)
- Biraj Pandey
- Department of Applied Mathematics, University of Washington, Seattle, Washington, United States of America
| | - Marius Pachitariu
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Bingni W. Brunton
- Department of Biology, University of Washington, Seattle, Washington, United States of America
| | - Kameron Decker Harris
- Department of Computer Science, Western Washington University, Bellingham, Washington, United States of America
| |
Collapse
|
17
|
Zavatone-Veth JA, Tong WL, Pehlevan C. Contrasting random and learned features in deep Bayesian linear regression. Phys Rev E 2022; 105:064118. [PMID: 35854590 DOI: 10.1103/physreve.105.064118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display samplewise double-descent behavior in the presence of label noise. Random feature models can also display modelwise double descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making neural networks as wide or as narrow as possible is always optimal. Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained. Taken together, our findings begin to elucidate how architectural details affect generalization performance in this simple class of deep regression models.
Collapse
Affiliation(s)
- Jacob A Zavatone-Veth
- Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA
| | - William L Tong
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Cengiz Pehlevan
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
18
|
Ariosto S, Pacelli R, Ginelli F, Gherardi M, Rotondo P. Universal mean-field upper bound for the generalization gap of deep neural networks. Phys Rev E 2022; 105:064309. [PMID: 35854557 DOI: 10.1103/physreve.105.064309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/08/2022] [Indexed: 11/07/2022]
Abstract
Modern deep neural networks (DNNs) represent a formidable challenge for theorists: according to the commonly accepted probabilistic framework that describes their performance, these architectures should overfit due to the huge number of parameters to train, but in practice they do not. Here we employ results from replica mean field theory to compute the generalization gap of machine learning models with quenched features, in the teacher-student scenario and for regression problems with quadratic loss function. Notably, this framework includes the case of DNNs where the last layer is optimized given a specific realization of the remaining weights. We show how these results-combined with ideas from statistical learning theory-provide a stringent asymptotic upper bound on the generalization gap of fully trained DNN as a function of the size of the dataset P. In particular, in the limit of large P and N_{out} (where N_{out} is the size of the last layer) and N_{out}≪P, the generalization gap approaches zero faster than 2N_{out}/P, for any choice of both architecture and teacher function. Notably, this result greatly improves existing bounds from statistical learning theory. We test our predictions on a broad range of architectures, from toy fully connected neural networks with few hidden layers to state-of-the-art deep convolutional neural networks.
Collapse
Affiliation(s)
- S Ariosto
- Dipartimento di Scienza e Alta Tecnologia and Center for Nonlinear and Complex Systems, Università degli Studi dell'Insubria, Via Valleggio 11, 22100 Como, Italy.,I.N.F.N. Sezione di Milano, Via Celoria 16, 20133 Milan, Italy
| | - R Pacelli
- Dipartimento di Scienza Applicata e Tecnologia, Politecnico di Torino, 10129 Turin, Italy
| | - F Ginelli
- Dipartimento di Scienza e Alta Tecnologia and Center for Nonlinear and Complex Systems, Università degli Studi dell'Insubria, Via Valleggio 11, 22100 Como, Italy.,I.N.F.N. Sezione di Milano, Via Celoria 16, 20133 Milan, Italy
| | - M Gherardi
- I.N.F.N. Sezione di Milano, Via Celoria 16, 20133 Milan, Italy.,Università degli Studi di Milano, Via Celoria 16, 20133 Milan, Italy
| | - P Rotondo
- I.N.F.N. Sezione di Milano, Via Celoria 16, 20133 Milan, Italy.,Università degli Studi di Milano, Via Celoria 16, 20133 Milan, Italy
| |
Collapse
|
19
|
Du Y, Tu Z, Yuan X, Tao D. Efficient Measure for the Expressivity of Variational Quantum Algorithms. PHYSICAL REVIEW LETTERS 2022; 128:080506. [PMID: 35275658 DOI: 10.1103/physrevlett.128.080506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 01/26/2022] [Accepted: 01/31/2022] [Indexed: 06/14/2023]
Abstract
The superiority of variational quantum algorithms (VQAs) such as quantum neural networks (QNNs) and variational quantum eigensolvers (VQEs) heavily depends on the expressivity of the employed Ansätze. Namely, a simple Ansatz is insufficient to capture the optimal solution, while an intricate Ansatz leads to the hardness of trainability. Despite its fundamental importance, an effective strategy of measuring the expressivity of VQAs remains largely unknown. Here, we exploit an advanced tool in statistical learning theory, i.e., covering number, to study the expressivity of VQAs. Particularly, we first exhibit how the expressivity of VQAs with an arbitrary Ansätze is upper bounded by the number of quantum gates and the measurement observable. We next explore the expressivity of VQAs on near-term quantum chips, where the system noise is considered. We observe an exponential decay of the expressivity with increasing circuit depth. We also utilize the achieved expressivity to analyze the generalization of QNNs and the accuracy of VQE. We numerically verify our theory employing VQAs with different levels of expressivity. Our Letter opens the avenue for quantitative understanding of the expressivity of VQAs.
Collapse
Affiliation(s)
- Yuxuan Du
- JD Explore Academy, Beijing 101111, China
| | - Zhuozhuo Tu
- School of Computer Science, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia
| | - Xiao Yuan
- Center on Frontiers of Computing Studies, Department of Computer Science, Peking University, Beijing 100871, China
| | | |
Collapse
|
20
|
Asnaashari K, Krems RV. Gradient domain machine learning with composite kernels: improving the accuracy of PES and force fields for large molecules. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/ac3845] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
The generalization accuracy of machine learning models of potential energy surfaces (PES) and force fields (FF) for large polyatomic molecules can be improved either by increasing the number of training points or by improving the models. In order to build accurate models based on expensive ab initio calculations, much of recent work has focused on the latter. In particular, it has been shown that gradient domain machine learning (GDML) models produce accurate results for high-dimensional molecular systems with a small number of ab initio calculations. The present work extends GDML to models with composite kernels built to maximize inference from a small number of molecular geometries. We illustrate that GDML models can be improved by increasing the complexity of underlying kernels through a greedy search algorithm using Bayesian information criterion as the model selection metric. We show that this requires including anisotropy into kernel functions and produces models with significantly smaller generalization errors. The results are presented for ethanol, uracil, malonaldehyde and aspirin. For aspirin, the model with composite kernels trained by forces at 1000 randomly sampled molecular geometries produces a global 57-dimensional PES with the mean absolute accuracy 0.177 kcal mol−1 (61.9 cm−1) and FFs with the mean absolute error 0.457 kcal mol−1 Å−1.
Collapse
|