1
|
Anisetti VR, Kandala A, Scellier B, Schwarz JM. Frequency Propagation: Multimechanism Learning in Nonlinear Physical Networks. Neural Comput 2024; 36:596-620. [PMID: 38457749 DOI: 10.1162/neco_a_01648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/20/2023] [Indexed: 03/10/2024]
Abstract
We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an activation signal and an error signal whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multimechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multimechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks (Anisetti, Scellier, et al., 2023) also falls under the rubric of multimechanism learning.
Collapse
Affiliation(s)
| | - Ananth Kandala
- Department of Physics, University of Florida, Gainesville, FL 32611, U.S.A.
| | | | - J M Schwarz
- Physics Department, Syracuse University, Syracuse, NY 13244 U.S.A
- Indian Creek Farm, Ithaca, NY 14850, U.S.A.
| |
Collapse
|
3
|
Laborieux A, Ernoult M, Scellier B, Bengio Y, Grollier J, Querlioz D. Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias. Front Neurosci 2021; 15:633674. [PMID: 33679315 PMCID: PMC7930909 DOI: 10.3389/fnins.2021.633674] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 01/26/2021] [Indexed: 11/24/2022] Open
Abstract
Equilibrium Propagation is a biologically-inspired algorithm that trains convergent recurrent neural networks with a local learning rule. This approach constitutes a major lead to allow learning-capable neuromophic systems and comes with strong theoretical guarantees. Equilibrium propagation operates in two phases, during which the network is let to evolve freely and then "nudged" toward a target; the weights of the network are then updated based solely on the states of the neurons that they connect. The weight updates of Equilibrium Propagation have been shown mathematically to approach those provided by Backpropagation Through Time (BPTT), the mainstream approach to train recurrent neural networks, when nudging is performed with infinitely small strength. In practice, however, the standard implementation of Equilibrium Propagation does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of equilibrium propagation, inherent in the use of finite nudging, is responsible for this phenomenon and that canceling it allows training deep convolutional neural networks. We show that this bias can be greatly reduced by using symmetric nudging (a positive nudging and a negative one). We also generalize Equilibrium Propagation to the case of cross-entropy loss (by opposition to squared error). As a result of these advances, we are able to achieve a test error of 11.7% on CIFAR-10, which approaches the one achieved by BPTT and provides a major improvement with respect to the standard Equilibrium Propagation that gives 86% test error. We also apply these techniques to train an architecture with unidirectional forward and backward connections, yielding a 13.2% test error. These results highlight equilibrium propagation as a compelling biologically-plausible approach to compute error gradients in deep neuromorphic systems.
Collapse
Affiliation(s)
- Axel Laborieux
- Université Paris-Saclay, CNRS, Centre de Nanosciences et de Nanotechnologies, Palaiseau, France
| | - Maxence Ernoult
- Université Paris-Saclay, CNRS, Centre de Nanosciences et de Nanotechnologies, Palaiseau, France
- Unité Mixte de Physique, CNRS, Thales, Université Paris-Saclay, Palaiseau, France
- Mila, Université de Montréal, Montreal, QC, Canada
| | | | - Yoshua Bengio
- Mila, Université de Montréal, Montreal, QC, Canada
- Canadian Institute for Advanced Research, Toronto, ON, Canada
| | - Julie Grollier
- Unité Mixte de Physique, CNRS, Thales, Université Paris-Saclay, Palaiseau, France
| | - Damien Querlioz
- Université Paris-Saclay, CNRS, Centre de Nanosciences et de Nanotechnologies, Palaiseau, France
| |
Collapse
|
4
|
Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, Clopath C, Costa RP, de Berker A, Ganguli S, Gillon CJ, Hafner D, Kepecs A, Kriegeskorte N, Latham P, Lindsay GW, Miller KD, Naud R, Pack CC, Poirazi P, Roelfsema P, Sacramento J, Saxe A, Scellier B, Schapiro AC, Senn W, Wayne G, Yamins D, Zenke F, Zylberberg J, Therien D, Kording KP. A deep learning framework for neuroscience. Nat Neurosci 2019; 22:1761-1770. [PMID: 31659335 PMCID: PMC7115933 DOI: 10.1038/s41593-019-0520-2] [Citation(s) in RCA: 348] [Impact Index Per Article: 69.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 09/23/2019] [Indexed: 11/08/2022]
Abstract
Systems neuroscience seeks explanations for how the brain implements a wide variety of perceptual, cognitive and motor tasks. Conversely, artificial intelligence attempts to design computational systems based on the tasks they will have to solve. In artificial neural networks, the three components specified by design are the objective functions, the learning rules and the architectures. With the growing success of deep learning, which utilizes brain-inspired architectures, these three designed components have increasingly become central to how we model, engineer and optimize complex artificial learning systems. Here we argue that a greater focus on these components would also benefit systems neuroscience. We give examples of how this optimization-based framework can drive theoretical and experimental progress in neuroscience. We contend that this principled perspective on systems neuroscience will help to generate more rapid progress.
Collapse
Affiliation(s)
- Blake A Richards
- Mila, Montréal, Quebec, Canada.
- School of Computer Science, McGill University, Montréal, Quebec, Canada.
- Department of Neurology & Neurosurgery, McGill University, Montréal, Quebec, Canada.
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| | - Timothy P Lillicrap
- DeepMind, Inc., London, UK
- Centre for Computation, Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK
| | | | - Yoshua Bengio
- Mila, Montréal, Quebec, Canada
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
- Université de Montréal, Montréal, Quebec, Canada
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, UK
| | - Amelia Christensen
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, London, UK
| | - Rui Ponte Costa
- Computational Neuroscience Unit, School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths, University of Bristol, Bristol, UK
- Department of Physiology, Universität Bern, Bern, Switzerland
| | | | - Surya Ganguli
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Google Brain, Mountain View, CA, USA
| | - Colleen J Gillon
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Danijar Hafner
- Google Brain, Mountain View, CA, USA
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Adam Kepecs
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Nikolaus Kriegeskorte
- Department of Psychology and Neuroscience, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, USA
| | - Peter Latham
- Gatsby Computational Neuroscience Unit, University College London, London, UK
| | - Grace W Lindsay
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, USA
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA
| | - Kenneth D Miller
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, USA
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA
- Department of Neuroscience, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Richard Naud
- University of Ottawa Brain and Mind Institute, Ottawa, Ontario, Canada
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Christopher C Pack
- Department of Neurology & Neurosurgery, McGill University, Montréal, Quebec, Canada
| | - Panayiota Poirazi
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology-Hellas (FORTH), Heraklion, Crete, Greece
| | - Pieter Roelfsema
- Department of Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - João Sacramento
- Institute of Neuroinformatics, ETH Zürich and University of Zürich, Zürich, Switzerland
| | - Andrew Saxe
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Benjamin Scellier
- Mila, Montréal, Quebec, Canada
- Université de Montréal, Montréal, Quebec, Canada
| | - Anna C Schapiro
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Walter Senn
- Department of Physiology, Universität Bern, Bern, Switzerland
| | | | - Daniel Yamins
- Department of Psychology, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Friedemann Zenke
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Centre for Neural Circuits and Behaviour, University of Oxford, Oxford, UK
| | - Joel Zylberberg
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
- Department of Physics and Astronomy York University, Toronto, Ontario, Canada
- Center for Vision Research, York University, Toronto, Ontario, Canada
| | | | - Konrad P Kording
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|