1
|
Ricci M, Cadène R, Serre T. Same-different conceptualization: a machine vision perspective. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2020.08.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
2
|
George D, Lázaro-Gredilla M, Guntupalli JS. From CAPTCHA to Commonsense: How Brain Can Teach Us About Artificial Intelligence. Front Comput Neurosci 2020; 14:554097. [PMID: 33192426 PMCID: PMC7645629 DOI: 10.3389/fncom.2020.554097] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 09/15/2020] [Indexed: 01/06/2023] Open
Abstract
Despite the recent progress in AI powered by deep learning in solving narrow tasks, we are not close to human intelligence in its flexibility, versatility, and efficiency. Efficient learning and effective generalization come from inductive biases, and building Artificial General Intelligence (AGI) is an exercise in finding the right set of inductive biases that make fast learning possible while being general enough to be widely applicable in tasks that humans excel at. To make progress in AGI, we argue that we can look at the human brain for such inductive biases and principles of generalization. To that effect, we propose a strategy to gain insights from the brain by simultaneously looking at the world it acts upon and the computational framework to support efficient learning and generalization. We present a neuroscience-inspired generative model of vision as a case study for such approach and discuss some open problems about the path to AGI.
Collapse
|
3
|
Kass RE, Amari SI, Arai K, Brown EN, Diekman CO, Diesmann M, Doiron B, Eden UT, Fairhall AL, Fiddyment GM, Fukai T, Grün S, Harrison MT, Helias M, Nakahara H, Teramae JN, Thomas PJ, Reimers M, Rodu J, Rotstein HG, Shea-Brown E, Shimazaki H, Shinomoto S, Yu BM, Kramer MA. Computational Neuroscience: Mathematical and Statistical Perspectives. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2018; 5:183-214. [PMID: 30976604 PMCID: PMC6454918 DOI: 10.1146/annurev-statistics-041715-033733] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Mathematical and statistical models have played important roles in neuroscience, especially by describing the electrical activity of neurons recorded individually, or collectively across large networks. As the field moves forward rapidly, new challenges are emerging. For maximal effectiveness, those working to advance computational neuroscience will need to appreciate and exploit the complementary strengths of mechanistic theory and the statistical paradigm.
Collapse
Affiliation(s)
- Robert E Kass
- Carnegie Mellon University, Pittsburgh, PA, USA, 15213;
| | - Shun-Ichi Amari
- RIKEN Brain Science Institute, Wako, Saitama Prefecture, Japan, 351-0198
| | | | - Emery N Brown
- Massachusetts Institute of Technology, Cambridge, MA, USA, 02139
- Harvard Medical School, Boston, MA, USA, 02115
| | | | - Markus Diesmann
- Jülich Research Centre, Jülich, Germany, 52428
- RWTH Aachen University, Aachen, Germany, 52062
| | - Brent Doiron
- University of Pittsburgh, Pittsburgh, PA, USA, 15260
| | - Uri T Eden
- Boston University, Boston, MA, USA, 02215
| | | | | | - Tomoki Fukai
- RIKEN Brain Science Institute, Wako, Saitama Prefecture, Japan, 351-0198
| | - Sonja Grün
- Jülich Research Centre, Jülich, Germany, 52428
- RWTH Aachen University, Aachen, Germany, 52062
| | | | - Moritz Helias
- Jülich Research Centre, Jülich, Germany, 52428
- RWTH Aachen University, Aachen, Germany, 52062
| | - Hiroyuki Nakahara
- RIKEN Brain Science Institute, Wako, Saitama Prefecture, Japan, 351-0198
| | | | - Peter J Thomas
- Case Western Reserve University, Cleveland, OH, USA, 44106
| | - Mark Reimers
- Michigan State University, East Lansing, MI, USA, 48824
| | - Jordan Rodu
- Carnegie Mellon University, Pittsburgh, PA, USA, 15213;
| | | | | | - Hideaki Shimazaki
- Honda Research Institute Japan, Wako, Saitama Prefecture, Japan, 351-0188
- Kyoto University, Kyoto, Kyoto Prefecture, Japan, 606-8502
| | | | - Byron M Yu
- Carnegie Mellon University, Pittsburgh, PA, USA, 15213;
| | | |
Collapse
|
4
|
George D, Lehrach W, Kansky K, Lázaro-Gredilla M, Laan C, Marthi B, Lou X, Meng Z, Liu Y, Wang H, Lavin A, Phoenix DS. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science 2017; 358:science.aag2612. [PMID: 29074582 DOI: 10.1126/science.aag2612] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 09/08/2017] [Indexed: 11/02/2022]
Abstract
Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing-based inference handles recognition, segmentation, and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects such as data efficiency and compositionality that may be important in the path toward general artificial intelligence.
Collapse
Affiliation(s)
- Dileep George
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA.
| | | | - Ken Kansky
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA
| | | | | | | | - Xinghua Lou
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA
| | - Zhaoshi Meng
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA
| | - Yi Liu
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA
| | - Huayan Wang
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA
| | - Alex Lavin
- Vicarious AI, 2 Union Square, Union City, CA 94587, USA
| | | |
Collapse
|
5
|
Ullman S, Assif L, Fetaya E, Harari D. Atoms of recognition in human and computer vision. Proc Natl Acad Sci U S A 2016; 113:2744-9. [PMID: 26884200 PMCID: PMC4790978 DOI: 10.1073/pnas.1513198113] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recently, neural network models of visual object recognition, including biological and deep network models, have shown remarkable progress and have begun to rival human performance in some challenging tasks. These models are trained on image examples and learn to extract features and representations and to use them for categorization. It remains unclear, however, whether the representations and learning processes discovered by current models are similar to those used by the human visual system. Here we show, by introducing and using minimal recognizable images, that the human visual system uses features and processes that are not used by current models and that are critical for recognition. We found by psychophysical studies that at the level of minimal recognizable images a minute change in the image can have a drastic effect on recognition, thus identifying features that are critical for the task. Simulations then showed that current models cannot explain this sensitivity to precise feature configurations and, more generally, do not learn to recognize minimal images at a human level. The role of the features shown here is revealed uniquely at the minimal level, where the contribution of each feature is essential. A full understanding of the learning and use of such features will extend our understanding of visual recognition and its cortical mechanisms and will enhance the capacity of computational models to learn from visual experience and to deal with recognition and detailed image interpretation.
Collapse
Affiliation(s)
- Shimon Ullman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;
| | - Liav Assif
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Ethan Fetaya
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Daniel Harari
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; McGovern Institute for Brain Research, Cambridge, MA 02139
| |
Collapse
|
6
|
Establishing a Statistical Link between Network Oscillations and Neural Synchrony. PLoS Comput Biol 2015; 11:e1004549. [PMID: 26465621 PMCID: PMC4605746 DOI: 10.1371/journal.pcbi.1004549] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 09/04/2015] [Indexed: 01/01/2023] Open
Abstract
Pairs of active neurons frequently fire action potentials or "spikes" nearly synchronously (i.e., within 5 ms of each other). This spike synchrony may occur by chance, based solely on the neurons' fluctuating firing patterns, or it may occur too frequently to be explicable by chance alone. When spike synchrony above chances levels is present, it may subserve computation for a specific cognitive process, or it could be an irrelevant byproduct of such computation. Either way, spike synchrony is a feature of neural data that should be explained. A point process regression framework has been developed previously for this purpose, using generalized linear models (GLMs). In this framework, the observed number of synchronous spikes is compared to the number predicted by chance under varying assumptions about the factors that affect each of the individual neuron's firing-rate functions. An important possible source of spike synchrony is network-wide oscillations, which may provide an essential mechanism of network information flow. To establish the statistical link between spike synchrony and network-wide oscillations, we have integrated oscillatory field potentials into our point process regression framework. We first extended a previously-published model of spike-field association and showed that we could recover phase relationships between oscillatory field potentials and firing rates. We then used this new framework to demonstrate the statistical relationship between oscillatory field potentials and spike synchrony in: 1) simulated neurons, 2) in vitro recordings of hippocampal CA1 pyramidal cells, and 3) in vivo recordings of neocortical V4 neurons. Our results provide a rigorous method for establishing a statistical link between network oscillations and neural synchrony.
Collapse
|
7
|
Ambiguity and nonidentifiability in the statistical analysis of neural codes. Proc Natl Acad Sci U S A 2015; 112:6455-60. [PMID: 25934918 DOI: 10.1073/pnas.1506400112] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many experimental studies of neural coding rely on a statistical interpretation of the theoretical notion of the rate at which a neuron fires spikes. For example, neuroscientists often ask, "Does a population of neurons exhibit more synchronous spiking than one would expect from the covariability of their instantaneous firing rates?" For another example, "How much of a neuron's observed spiking variability is caused by the variability of its instantaneous firing rate, and how much is caused by spike timing variability?" However, a neuron's theoretical firing rate is not necessarily well-defined. Consequently, neuroscientific questions involving the theoretical firing rate do not have a meaning in isolation but can only be interpreted in light of additional statistical modeling choices. Ignoring this ambiguity can lead to inconsistent reasoning or wayward conclusions. We illustrate these issues with examples drawn from the neural-coding literature.
Collapse
|
8
|
|
9
|
Rodríguez-Sánchez AJ, Tsotsos JK. The roles of endstopped and curvature tuned computations in a hierarchical representation of 2D shape. PLoS One 2012; 7:e42058. [PMID: 22912683 PMCID: PMC3415424 DOI: 10.1371/journal.pone.0042058] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 07/02/2012] [Indexed: 11/18/2022] Open
Abstract
That shape is important for perception has been known for almost a thousand years (thanks to Alhazen in 1083) and has been a subject of study ever since by scientists and phylosophers (such as Descartes, Helmholtz or the Gestalt psychologists). Shapes are important object descriptors. If there was any remote doubt regarding the importance of shape, recent experiments have shown that intermediate areas of primate visual cortex such as V2, V4 and TEO are involved in analyzing shape features such as corners and curvatures. The primate brain appears to perform a wide variety of complex tasks by means of simple operations. These operations are applied across several layers of neurons, representing increasingly complex, abstract intermediate processing stages. Recently, new models have attempted to emulate the human visual system. However, the role of intermediate representations in the visual cortex and their importance have not been adequately studied in computational modeling.This paper proposes a model of shape-selective neurons whose shape-selectivity is achieved through intermediate layers of visual representation not previously fully explored. We hypothesize that hypercomplex--also known as endstopped--neurons play a critical role to achieve shape selectivity and show how shape-selective neurons may be modeled by integrating endstopping and curvature computations. This model--a representational and computational system for the detection of 2-dimensional object silhouettes that we term 2DSIL--provides a highly accurate fit with neural data and replicates responses from neurons in area V4 with an average of 83% accuracy. We successfully test a biologically plausible hypothesis on how to connect early representations based on Gabor or Difference of Gaussian filters and later representations closer to object categories without the need of a learning phase as in most recent models.
Collapse
|
10
|
Crouzet SM, Serre T. What are the Visual Features Underlying Rapid Object Recognition? Front Psychol 2011; 2:326. [PMID: 22110461 PMCID: PMC3216029 DOI: 10.3389/fpsyg.2011.00326] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Accepted: 10/23/2011] [Indexed: 11/13/2022] Open
Abstract
Research progress in machine vision has been very significant in recent years. Robust face detection and identification algorithms are already readily available to consumers, and modern computer vision algorithms for generic object recognition are now coping with the richness and complexity of natural visual scenes. Unlike early vision models of object recognition that emphasized the role of figure-ground segmentation and spatial information between parts, recent successful approaches are based on the computation of loose collections of image features without prior segmentation or any explicit encoding of spatial relations. While these models remain simplistic models of visual processing, they suggest that, in principle, bottom-up activation of a loose collection of image features could support the rapid recognition of natural object categories and provide an initial coarse visual representation before more complex visual routines and attentional mechanisms take place. Focusing on biologically plausible computational models of (bottom-up) pre-attentive visual recognition, we review some of the key visual features that have been described in the literature. We discuss the consistency of these feature-based representations with classical theories from visual psychology and test their ability to account for human performance on a rapid object categorization task.
Collapse
Affiliation(s)
- Sébastien M Crouzet
- Cognitive, Linguistic, and Psychological Sciences Department, Institute for Brain Sciences, Brown University Providence, RI, USA
| | | |
Collapse
|
11
|
Amarasingham A, Harrison MT, Hatsopoulos NG, Geman S. Conditional modeling and the jitter method of spike resampling. J Neurophysiol 2011; 107:517-31. [PMID: 22031767 DOI: 10.1152/jn.00633.2011] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The existence and role of fine-temporal structure in the spiking activity of central neurons is the subject of an enduring debate among physiologists. To a large extent, the problem is a statistical one: what inferences can be drawn from neurons monitored in the absence of full control over their presynaptic environments? In principle, properly crafted resampling methods can still produce statistically correct hypothesis tests. We focus on the approach to resampling known as jitter. We review a wide range of jitter techniques, illustrated by both simulation experiments and selected analyses of spike data from motor cortical neurons. We rely on an intuitive and rigorous statistical framework known as conditional modeling to reveal otherwise hidden assumptions and to support precise conclusions. Among other applications, we review statistical tests for exploring any proposed limit on the rate of change of spiking probabilities, exact tests for the significance of repeated fine-temporal patterns of spikes, and the construction of acceptance bands for testing any purported relationship between sensory or motor variables and synchrony or other fine-temporal events.
Collapse
Affiliation(s)
- Asohan Amarasingham
- Department of Mathematics, The City College of New York, and Program in Cognitive Neuroscience, The Graduate Center, City University of New York, New York, New York, USA
| | | | | | | |
Collapse
|
12
|
O'Connor KN, Yin P, Petkov CI, Sutter ML. Complex spectral interactions encoded by auditory cortical neurons: relationship between bandwidth and pattern. Front Syst Neurosci 2010; 4:145. [PMID: 21152347 PMCID: PMC2998047 DOI: 10.3389/fnsys.2010.00145] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Accepted: 09/09/2010] [Indexed: 11/13/2022] Open
Abstract
The focus of most research on auditory cortical neurons has concerned the effects of rather simple stimuli, such as pure tones or broad-band noise, or the modulation of a single acoustic parameter. Extending these findings to feature coding in more complex stimuli such as natural sounds may be difficult, however. Generalizing results from the simple to more complex case may be complicated by non-linear interactions occurring between multiple, simultaneously varying acoustic parameters in complex sounds. To examine this issue in the frequency domain, we performed a parametric study of the effects of two global features, spectral pattern (here ripple frequency) and bandwidth, on primary auditory (A1) neurons in awake macaques. Most neurons were tuned for one or both variables and most also displayed an interaction between bandwidth and pattern implying that their effects were conditional or interdependent. A spectral linear filter model was able to qualitatively reproduce the basic effects and interactions, indicating that a simple neural mechanism may be able to account for these interdependencies. Our results suggest that the behavior of most A1 neurons is likely to depend on multiple parameters, and so most are unlikely to respond independently or invariantly to specific acoustic features.
Collapse
Affiliation(s)
- Kevin N O'Connor
- Center for Neuroscience, University of California Davis Davis, CA, USA
| | | | | | | |
Collapse
|
13
|
Abstract
The human visual system recognizes objects and their constituent parts rapidly and with high accuracy. Standard models of recognition by the visual cortex use feed-forward processing, in which an object's parts are detected before the complete object. However, parts are often ambiguous on their own and require the prior detection and localization of the entire object. We show how a cortical-like hierarchy obtains recognition and localization of objects and parts at multiple levels nearly simultaneously by a single feed-forward sweep from low to high levels of the hierarchy, followed by a feedback sweep from high- to low-level areas.
Collapse
|