1
|
Lesage X, Tran R, Mancini S, Fesquet L. Velocity and Color Estimation Using Event-Based Clustering. SENSORS (BASEL, SWITZERLAND) 2023; 23:9768. [PMID: 38139614 PMCID: PMC10747939 DOI: 10.3390/s23249768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/27/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023]
Abstract
Event-based clustering provides a low-power embedded solution for low-level feature extraction in a scene. The algorithm utilizes the non-uniform sampling capability of event-based image sensors to measure local intensity variations within a scene. Consequently, the clustering algorithm forms similar event groups while simultaneously estimating their attributes. This work proposes taking advantage of additional event information in order to provide new attributes for further processing. We elaborate on the estimation of the object velocity using the mean motion of the cluster. Next, we are examining a novel form of events, which includes intensity measurement of the color at the concerned pixel. These events may be processed to estimate the rough color of a cluster, or the color distribution in a cluster. Lastly, this paper presents some applications that utilize these features. The resulting algorithms are applied and exercised thanks to a custom event-based simulator, which generates videos of outdoor scenes. The velocity estimation methods provide satisfactory results with a trade-off between accuracy and convergence speed. Regarding color estimation, the luminance estimation is challenging in the test cases, while the chrominance is precisely estimated. The estimated quantities are adequate for accurately classifying objects into predefined categories.
Collapse
Affiliation(s)
- Xavier Lesage
- Univ. Grenoble Alpes, CNRS (National Centre for Scientific Research), Grenoble INP (Institute of Engineering), TIMA (Techniques of Informatics and Microelectronics for Integrated Systems Architecture), F-38000 Grenoble, France; (X.L.); (R.T.); (S.M.)
- Orioma, F-38430 Moirans, France
| | - Rosalie Tran
- Univ. Grenoble Alpes, CNRS (National Centre for Scientific Research), Grenoble INP (Institute of Engineering), TIMA (Techniques of Informatics and Microelectronics for Integrated Systems Architecture), F-38000 Grenoble, France; (X.L.); (R.T.); (S.M.)
| | - Stéphane Mancini
- Univ. Grenoble Alpes, CNRS (National Centre for Scientific Research), Grenoble INP (Institute of Engineering), TIMA (Techniques of Informatics and Microelectronics for Integrated Systems Architecture), F-38000 Grenoble, France; (X.L.); (R.T.); (S.M.)
| | - Laurent Fesquet
- Univ. Grenoble Alpes, CNRS (National Centre for Scientific Research), Grenoble INP (Institute of Engineering), TIMA (Techniques of Informatics and Microelectronics for Integrated Systems Architecture), F-38000 Grenoble, France; (X.L.); (R.T.); (S.M.)
| |
Collapse
|
2
|
Schmid D, Jarvers C, Neumann H. Canonical circuit computations for computer vision. BIOLOGICAL CYBERNETICS 2023; 117:299-329. [PMID: 37306782 PMCID: PMC10600314 DOI: 10.1007/s00422-023-00966-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 05/18/2023] [Indexed: 06/13/2023]
Abstract
Advanced computer vision mechanisms have been inspired by neuroscientific findings. However, with the focus on improving benchmark achievements, technical solutions have been shaped by application and engineering constraints. This includes the training of neural networks which led to the development of feature detectors optimally suited to the application domain. However, the limitations of such approaches motivate the need to identify computational principles, or motifs, in biological vision that can enable further foundational advances in machine vision. We propose to utilize structural and functional principles of neural systems that have been largely overlooked. They potentially provide new inspirations for computer vision mechanisms and models. Recurrent feedforward, lateral, and feedback interactions characterize general principles underlying processing in mammals. We derive a formal specification of core computational motifs that utilize these principles. These are combined to define model mechanisms for visual shape and motion processing. We demonstrate how such a framework can be adopted to run on neuromorphic brain-inspired hardware platforms and can be extended to automatically adapt to environment statistics. We argue that the identified principles and their formalization inspires sophisticated computational mechanisms with improved explanatory scope. These and other elaborated, biologically inspired models can be employed to design computer vision solutions for different tasks and they can be used to advance neural network architectures of learning.
Collapse
Affiliation(s)
- Daniel Schmid
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| | - Christian Jarvers
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| |
Collapse
|
3
|
Fu Q. Motion perception based on ON/OFF channels: A survey. Neural Netw 2023; 165:1-18. [PMID: 37263088 DOI: 10.1016/j.neunet.2023.05.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 04/02/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
Motion perception is an essential ability for animals and artificially intelligent systems interacting effectively, safely with surrounding objects and environments. Biological visual systems, that have naturally evolved over hundreds-million years, are quite efficient and robust for motion perception, whereas artificial vision systems are far from such capability. This paper argues that the gap can be significantly reduced by formulation of ON/OFF channels in motion perception models encoding luminance increment (ON) and decrement (OFF) responses within receptive field, separately. Such signal-bifurcating structure has been found in neural systems of many animal species articulating early motion is split and processed in segregated pathways. However, the corresponding biological substrates, and the necessity for artificial vision systems have never been elucidated together, leaving concerns on uniqueness and advantages of ON/OFF channels upon building dynamic vision systems to address real world challenges. This paper highlights the importance of ON/OFF channels in motion perception through surveying current progress covering both neuroscience and computationally modelling works with applications. Compared to related literature, this paper for the first time provides insights into implementation of different selectivity to directional motion of looming, translating, and small-sized target movement based on ON/OFF channels in keeping with soundness and robustness of biological principles. Existing challenges and future trends of such bio-plausible computational structure for visual perception in connection with hotspots of machine learning, advanced vision sensors like event-driven camera finally are discussed.
Collapse
Affiliation(s)
- Qinbing Fu
- Machine Life and Intelligence Research Centre, School of Mathematics and Information Science, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
4
|
Zhang Y, Lv H, Zhao Y, Feng Y, Liu H, Bi G. Event-Based Optical Flow Estimation with Spatio-Temporal Backpropagation Trained Spiking Neural Network. MICROMACHINES 2023; 14:mi14010203. [PMID: 36677264 PMCID: PMC9867051 DOI: 10.3390/mi14010203] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 06/01/2023]
Abstract
The advantages of an event camera, such as low power consumption, large dynamic range, and low data redundancy, enable it to shine in extreme environments where traditional image sensors are not competent, especially in high-speed moving target capture and extreme lighting conditions. Optical flow reflects the target's movement information, and the target's detailed movement can be obtained using the event camera's optical flow information. However, the existing neural network methods for optical flow prediction of event cameras has the problems of extensive computation and high energy consumption in hardware implementation. The spike neural network has spatiotemporal coding characteristics, so it can be compatible with the spatiotemporal data of an event camera. Moreover, the sparse coding characteristic of the spike neural network makes it run with ultra-low power consumption on neuromorphic hardware. However, because of the algorithmic and training complexity, the spike neural network has not been applied in the prediction of the optical flow for the event camera. For this case, this paper proposes an end-to-end spike neural network to predict the optical flow of the discrete spatiotemporal data stream for the event camera. The network is trained with the spatio-temporal backpropagation method in a self-supervised way, which fully combines the spatiotemporal characteristics of the event camera while improving the network performance. Compared with the existing methods on the public dataset, the experimental results show that the method proposed in this paper is equivalent to the best existing methods in terms of optical flow prediction accuracy, and it can save 99% more power consumption than the existing algorithm, which is greatly beneficial to the hardware implementation of the event camera optical flow prediction., laying the groundwork for future low-power hardware implementation of optical flow prediction for event cameras.
Collapse
Affiliation(s)
- Yisa Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hengyi Lv
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yuchen Zhao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yang Feng
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Hailong Liu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Guoling Bi
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
5
|
Ozawa T, Sekikawa Y, Saito H. Accuracy and Speed Improvement of Event Camera Motion Estimation Using a Bird’s-Eye View Transformation. SENSORS 2022; 22:s22030773. [PMID: 35161519 PMCID: PMC8840125 DOI: 10.3390/s22030773] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/14/2022] [Accepted: 01/17/2022] [Indexed: 01/27/2023]
Abstract
Event cameras are bio-inspired sensors that have a high dynamic range and temporal resolution. This property enables motion estimation from textures with repeating patterns, which is difficult to achieve with RGB cameras. Therefore, motion estimation of an event camera is expected to be applied to vehicle position estimation. An existing method, called contrast maximization, is one of the methods that can be used for event camera motion estimation by capturing road surfaces. However, contrast maximization tends to fall into a local solution when estimating three-dimensional motion, which makes correct estimation difficult. To solve this problem, we propose a method for motion estimation by optimizing contrast in the bird’s-eye view space. Instead of performing three-dimensional motion estimation, we reduced the dimensionality to two-dimensional motion estimation by transforming the event data to a bird’s-eye view using homography calculated from the event camera position. This transformation mitigates the problem of the loss function becoming non-convex, which occurs in conventional methods. As a quantitative experiment, we created event data by using a car simulator and evaluated our motion estimation method, showing an improvement in accuracy and speed. In addition, we conducted estimation from real event data and evaluated the results qualitatively, showing an improvement in accuracy.
Collapse
Affiliation(s)
- Takehiro Ozawa
- Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan;
- Correspondence:
| | | | - Hideo Saito
- Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan;
| |
Collapse
|
6
|
Akolkar H, Ieng SH, Benosman R. Real-Time High Speed Motion Prediction Using Fast Aperture-Robust Event-Driven Visual Flow. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:361-372. [PMID: 32750822 DOI: 10.1109/tpami.2020.3010468] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Optical flow is a crucial component of the feature space for early visual processing of dynamic scenes especially in new applications such as self-driving vehicles, drones and autonomous robots. The dynamic vision sensors are well suited for such applications because of their asynchronous, sparse and temporally precise representation of the visual dynamics. Many algorithms proposed for computing visual flow for these sensors suffer from the aperture problem as the direction of the estimated flow is governed by the curvature of the object rather than the true motion direction. Some methods that do overcome this problem by temporal windowing under-utilize the true precise temporal nature of the dynamic sensors. In this paper, we propose a novel multi-scale plane fitting based visual flow algorithm that is robust to the aperture problem and also computationally fast and efficient. Our algorithm performs well in many scenarios ranging from fixed camera recording simple geometric shapes to real world scenarios such as camera mounted on a moving car and can successfully perform event-by-event motion estimation of objects in the scene to allow for predictions of upto 500 ms i.e., equivalent to 10 to 25 frames with traditional cameras.
Collapse
|
7
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 184] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
8
|
Kim Y, Panda P. Optimizing Deeper Spiking Neural Networks for Dynamic Vision Sensing. Neural Netw 2021; 144:686-698. [PMID: 34662827 DOI: 10.1016/j.neunet.2021.09.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 11/20/2022]
Abstract
Spiking Neural Networks (SNNs) have recently emerged as a new generation of low-power deep neural networks due to sparse, asynchronous, and binary event-driven processing. Most previous deep SNN optimization methods focus on static datasets (e.g., MNIST) from a conventional frame-based camera. On the other hand, optimization techniques for event data from Dynamic Vision Sensor (DVS) cameras are still at infancy. Most prior SNN techniques handling DVS data are limited to shallow networks and thus, show low performance. Generally, we observe that the integrate-and-fire behavior of spiking neurons diminishes spike activity in deeper layers. The sparse spike activity results in a sub-optimal solution during training (i.e., performance degradation). To address this limitation, we propose novel algorithmic and architectural advances to accelerate the training of very deep SNNs on DVS data. Specifically, we propose Spike Activation Lift Training (SALT) which increases spike activity across all layers by optimizing both weights and thresholds in convolutional layers. After applying SALT, we train the weights based on the cross-entropy loss. SALT helps the networks to convey ample information across all layers during training and therefore improves the performance. Furthermore, we propose a simple and effective architecture, called Switched-BN, which exploits Batch Normalization (BN). Previous methods show that the standard BN is incompatible with the temporal dynamics of SNNs. Therefore, in Switched-BN architecture, we apply BN to the last layer of an SNN after accumulating all the spikes from previous layer with a spike voltage accumulator (i.e., converting temporal spike information to float value). Even though we apply BN in just one layer of SNNs, our results demonstrate a considerable performance gain without any significant computational overhead. Through extensive experiments, we show the effectiveness of SALT and Switched-BN for training very deep SNNs from scratch on various benchmarks including, DVS-Cifar10, N-Caltech, DHP19, CIFAR10, and CIFAR100. To the best of our knowledge, this is the first work showing state-of-the-art performance with deep SNNs on DVS data.
Collapse
Affiliation(s)
- Youngeun Kim
- Department of Electrical Engineering, Yale University, New Haven, CT, USA.
| | | |
Collapse
|
9
|
Nagata J, Sekikawa Y, Aoki Y. Optical Flow Estimation by Matching Time Surface with Event-Based Cameras. SENSORS (BASEL, SWITZERLAND) 2021; 21:1150. [PMID: 33562162 PMCID: PMC7915966 DOI: 10.3390/s21041150] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 02/02/2021] [Accepted: 02/03/2021] [Indexed: 11/21/2022]
Abstract
In this work, we propose a novel method of estimating optical flow from event-based cameras by matching the time surface of events. The proposed loss function measures the timestamp consistency between the time surface formed by the latest timestamp of each pixel and the one that is slightly shifted in time. This makes it possible to estimate dense optical flows with high accuracy without restoring luminance or additional sensor information. In the experiment, we show that the gradient was more correct and the loss landscape was more stable than the variance loss in the motion compensation approach. In addition, we show that the optical flow can be estimated with high accuracy by optimization with L1 smoothness regularization using publicly available datasets.
Collapse
Affiliation(s)
- Jun Nagata
- Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan;
| | - Yusuke Sekikawa
- Denso IT Laboratory, 2-15-1, Shibuya, Shibuya-ku, Tokyo 150-0002, Japan;
| | - Yoshimitsu Aoki
- Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan;
| |
Collapse
|
10
|
Kong F, Lambert A, Joubert D, Cohen G. Shack-Hartmann wavefront sensing using spatial-temporal data from an event-based image sensor. OPTICS EXPRESS 2020; 28:36159-36175. [PMID: 33379717 DOI: 10.1364/oe.409682] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 10/28/2020] [Indexed: 06/12/2023]
Abstract
An event-based image sensor works dramatically differently from the conventional frame-based image sensors in a way that it only responds to local brightness changes whereas its counterparts' output is a linear representation of the illumination over a fixed exposure time. The output of an event-based image sensor therefore is an asynchronous stream of spatial-temporal events data tagged with the location, timestamp and polarity of the triggered events. Compared to traditional frame-based image sensors, event-based image sensors have advantages of high temporal resolution, low latency, high dynamic range and low power consumption. Although event-based image sensors have been used in many computer vision, navigation and even space situation awareness applications, little work has been done to explore their applicability in the field of wavefront sensing. In this work, we present the integration of an event camera in a Shack-Hartmann wavefront sensor and the usage of event data to determine spot displacement and wavefront estimation. We show that it can achieve the same functionality but with substantial speed and can operate in extremely low light conditions. This makes an event-based Shack-Hartmann wavefront sensor a preferable choice for adaptive optics systems where light budget is limited or high bandwidth is required.
Collapse
|
11
|
Paredes-Valles F, Scheper KYW, de Croon GCHE. Unsupervised Learning of a Hierarchical Spiking Neural Network for Optical Flow Estimation: From Events to Global Motion Perception. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2051-2064. [PMID: 30843817 DOI: 10.1109/tpami.2019.2903179] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The combination of spiking neural networks and event-based vision sensors holds the potential of highly efficient and high-bandwidth optical flow estimation. This paper presents the first hierarchical spiking architecture in which motion (direction and speed) selectivity emerges in an unsupervised fashion from the raw stimuli generated with an event-based camera. A novel adaptive neuron model and stable spike-timing-dependent plasticity formulation are at the core of this neural network governing its spike-based processing and learning, respectively. After convergence, the neural architecture exhibits the main properties of biological visual motion systems, namely feature extraction and local and global motion perception. Convolutional layers with input synapses characterized by single and multiple transmission delays are employed for feature and local motion perception, respectively; while global motion selectivity emerges in a final fully-connected layer. The proposed solution is validated using synthetic and real event sequences. Along with this paper, we provide the cuSNN library, a framework that enables GPU-accelerated simulations of large-scale spiking neural networks. Source code and samples are available at https://github.com/tudelft/cuSNN.
Collapse
|
12
|
Almatrafi M, Baldwin R, Aizawa K, Hirakawa K. Distance Surface for Event-Based Optical Flow. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1547-1556. [PMID: 32305894 DOI: 10.1109/tpami.2020.2986748] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose DistSurf-OF, a novel optical flow method for neuromorphic cameras. Neuromorphic cameras (or event detection cameras) are an emerging sensor modality that makes use of dynamic vision sensors (DVS) to report asynchronously the log-intensity changes (called "events") exceeding a predefined threshold at each pixel. In absence of the intensity value at each pixel location, we introduce a notion of "distance surface"-the distance transform computed from the detected events-as a proxy for object texture. The distance surface is then used as an input to the intensity-based optical flow methods to recover the two dimensional pixel motion. Real sensor experiments verify that the proposed DistSurf-OF accurately estimates the angle and speed of each events.
Collapse
|
13
|
Deng Y, Li Y, Chen H. AMAE: Adaptive Motion-Agnostic Encoder for Event-Based Object Classification. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.3002480] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
14
|
D'Angelo G, Janotte E, Schoepe T, O'Keeffe J, Milde MB, Chicca E, Bartolozzi C. Event-Based Eccentric Motion Detection Exploiting Time Difference Encoding. Front Neurosci 2020; 14:451. [PMID: 32457575 PMCID: PMC7227134 DOI: 10.3389/fnins.2020.00451] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 04/14/2020] [Indexed: 11/13/2022] Open
Abstract
Attentional selectivity tends to follow events considered as interesting stimuli. Indeed, the motion of visual stimuli present in the environment attract our attention and allow us to react and interact with our surroundings. Extracting relevant motion information from the environment presents a challenge with regards to the high information content of the visual input. In this work we propose a novel integration between an eccentric down-sampling of the visual field, taking inspiration from the varying size of receptive fields (RFs) in the mammalian retina, and the Spiking Elementary Motion Detector (sEMD) model. We characterize the system functionality with simulated data and real world data collected with bio-inspired event driven cameras, successfully implementing motion detection along the four cardinal directions and diagonally.
Collapse
Affiliation(s)
- Giulia D'Angelo
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub Facility, Genoa, Italy
| | - Ella Janotte
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - Thorben Schoepe
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - James O'Keeffe
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Moritz B Milde
- International Centre for Neuromorphic Systems, The MARCS Institute, Western Sydney University, Sydney, NSW, Australia
| | - Elisabetta Chicca
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - Chiara Bartolozzi
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub Facility, Genoa, Italy
| |
Collapse
|
15
|
Khoei MA, Ieng SH, Benosman R. Asynchronous Event-Based Motion Processing: From Visual Events to Probabilistic Sensory Representation. Neural Comput 2019; 31:1114-1138. [PMID: 30979350 DOI: 10.1162/neco_a_01191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In this work, we propose a two-layered descriptive model for motion processing from retina to the cortex, with an event-based input from the asynchronous time-based image sensor (ATIS) camera. Spatial and spatiotemporal filtering of visual scenes by motion energy detectors has been implemented in two steps in a simple layer of a lateral geniculate nucleus model and a set of three-dimensional Gabor kernels, eventually forming a probabilistic population response. The high temporal resolution of independent and asynchronous local sensory pixels from the ATIS provides a realistic stimulation to study biological motion processing, as well as developing bio-inspired motion processors for computer vision applications. Our study combines two significant theories in neuroscience: event-based stimulation and probabilistic sensory representation. We have modeled how this might be done at the vision level, as well as suggesting this framework as a generic computational principle among different sensory modalities.
Collapse
Affiliation(s)
- Mina A Khoei
- Vision and Natural Computation Team, Vision Institute, Université Pierre et Marie Curie-Paris 6 (UPMC), Sorbonne Université UMR S968 Inserm, UPMC, CHNO des Quinze-Vingts, CNRS UMRS 7210, Paris 75012, France
| | - Sio-Hoi Ieng
- Vision and Natural Computation Team, Vision Institute, Université Pierre et Marie Curie-Paris 6 (UPMC), Sorbonne Université UMR S968 Inserm, UPMC, CHNO des Quinze-Vingts, CNRS UMRS 7210, Paris 75012, France
| | - Ryad Benosman
- Vision and Natural Computation Team, Vision Institute, Université Pierre et Marie Curie-Paris 6 (UPMC), Sorbonne Université UMR S968 Inserm, UPMC, CHNO des Quinze-Vingts, CNRS UMRS 7210, Paris 75012, France; University of Pittsburgh Medical Center, Pittsburgh, PA 15213; and Carnegie Mellon University, Robotics Institute, Pittsburgh, PA 15213, U.S.A.
| |
Collapse
|
16
|
Milde MB, Bertrand OJN, Ramachandran H, Egelhaaf M, Chicca E. Spiking Elementary Motion Detector in Neuromorphic Systems. Neural Comput 2018; 30:2384-2417. [PMID: 30021082 DOI: 10.1162/neco_a_01112] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Apparent motion of the surroundings on an agent's retina can be used to navigate through cluttered environments, avoid collisions with obstacles, or track targets of interest. The pattern of apparent motion of objects, (i.e., the optic flow), contains spatial information about the surrounding environment. For a small, fast-moving agent, as used in search and rescue missions, it is crucial to estimate the distance to close-by objects to avoid collisions quickly. This estimation cannot be done by conventional methods, such as frame-based optic flow estimation, given the size, power, and latency constraints of the necessary hardware. A practical alternative makes use of event-based vision sensors. Contrary to the frame-based approach, they produce so-called events only when there are changes in the visual scene. We propose a novel asynchronous circuit, the spiking elementary motion detector (sEMD), composed of a single silicon neuron and synapse, to detect elementary motion from an event-based vision sensor. The sEMD encodes the time an object's image needs to travel across the retina into a burst of spikes. The number of spikes within the burst is proportional to the speed of events across the retina. A fast but imprecise estimate of the time-to-travel can already be obtained from the first two spikes of a burst and refined by subsequent interspike intervals. The latter encoding scheme is possible due to an adaptive nonlinear synaptic efficacy scaling. We show that the sEMD can be used to compute a collision avoidance direction in the context of robotic navigation in a cluttered outdoor environment and compared the collision avoidance direction to a frame-based algorithm. The proposed computational principle constitutes a generic spiking temporal correlation detector that can be applied to other sensory modalities (e.g., sound localization), and it provides a novel perspective to gating information in spiking neural networks.
Collapse
Affiliation(s)
- M B Milde
- Institute of Neuroinformatics, University of Zurich, and ETH Zurich, 8057 Zurich, Switzerland
| | - O J N Bertrand
- Neurobiology, Faculty of Biology, Bielefeld University, 33615 Bielefeld, and Cognitive Interaction Technology, Center of Excellence, Bielefeld University, 33501 Bielefeld, Germany
| | - H Ramachandran
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, and Cognitive Interaction Technology, Center of Excellence, Bielefeld University, 33501 Bielefeld, Germany
| | - M Egelhaaf
- Neurobiology, Faculty of Biology, Bielefeld University, 33615 Bielefeld, and Cognitive Interaction Technology, Center of Excellence, Bielefeld University, 33501 Bielefeld, Germany
| | - E Chicca
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany, and Cognitive Interaction Technology, Center of Excellence, Bielefeld University, 33501 Bielefeld, Germany
| |
Collapse
|
17
|
Clady X, Maro JM, Barré S, Benosman RB. A Motion-Based Feature for Event-Based Pattern Recognition. Front Neurosci 2017; 10:594. [PMID: 28101001 PMCID: PMC5209354 DOI: 10.3389/fnins.2016.00594] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 12/13/2016] [Indexed: 11/13/2022] Open
Abstract
This paper introduces an event-based luminance-free feature from the output of asynchronous event-based neuromorphic retinas. The feature consists in mapping the distribution of the optical flow along the contours of the moving objects in the visual scene into a matrix. Asynchronous event-based neuromorphic retinas are composed of autonomous pixels, each of them asynchronously generating "spiking" events that encode relative changes in pixels' illumination at high temporal resolutions. The optical flow is computed at each event, and is integrated locally or globally in a speed and direction coordinate frame based grid, using speed-tuned temporal kernels. The latter ensures that the resulting feature equitably represents the distribution of the normal motion along the current moving edges, whatever their respective dynamics. The usefulness and the generality of the proposed feature are demonstrated in pattern recognition applications: local corner detection and global gesture recognition.
Collapse
Affiliation(s)
- Xavier Clady
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Jean-Matthieu Maro
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Sébastien Barré
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Ryad B Benosman
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| |
Collapse
|
18
|
Rueckauer B, Delbruck T. Evaluation of Event-Based Algorithms for Optical Flow with Ground-Truth from Inertial Measurement Sensor. Front Neurosci 2016; 10:176. [PMID: 27199639 PMCID: PMC4842780 DOI: 10.3389/fnins.2016.00176] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 04/06/2016] [Indexed: 11/29/2022] Open
Abstract
In this study we compare nine optical flow algorithms that locally measure the flow normal to edges according to accuracy and computation cost. In contrast to conventional, frame-based motion flow algorithms, our open-source implementations compute optical flow based on address-events from a neuromorphic Dynamic Vision Sensor (DVS). For this benchmarking we created a dataset of two synthesized and three real samples recorded from a 240 × 180 pixel Dynamic and Active-pixel Vision Sensor (DAVIS). This dataset contains events from the DVS as well as conventional frames to support testing state-of-the-art frame-based methods. We introduce a new source for the ground truth: In the special case that the perceived motion stems solely from a rotation of the vision sensor around its three camera axes, the true optical flow can be estimated using gyro data from the inertial measurement unit integrated with the DAVIS camera. This provides a ground-truth to which we can compare algorithms that measure optical flow by means of motion cues. An analysis of error sources led to the use of a refractory period, more accurate numerical derivatives and a Savitzky-Golay filter to achieve significant improvements in accuracy. Our pure Java implementations of two recently published algorithms reduce computational cost by up to 29% compared to the original implementations. Two of the algorithms introduced in this paper further speed up processing by a factor of 10 compared with the original implementations, at equal or better accuracy. On a desktop PC, they run in real-time on dense natural input recorded by a DAVIS camera.
Collapse
Affiliation(s)
- Bodo Rueckauer
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Tobi Delbruck
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| |
Collapse
|
19
|
Abdul-Kreem LI, Neumann H. Neural Mechanisms of Cortical Motion Computation Based on a Neuromorphic Sensory System. PLoS One 2015; 10:e0142488. [PMID: 26554589 PMCID: PMC4640561 DOI: 10.1371/journal.pone.0142488] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/22/2015] [Indexed: 11/26/2022] Open
Abstract
The visual cortex analyzes motion information along hierarchically arranged visual areas that interact through bidirectional interconnections. This work suggests a bio-inspired visual model focusing on the interactions of the cortical areas in which a new mechanism of feedforward and feedback processing are introduced. The model uses a neuromorphic vision sensor (silicon retina) that simulates the spike-generation functionality of the biological retina. Our model takes into account two main model visual areas, namely V1 and MT, with different feature selectivities. The initial motion is estimated in model area V1 using spatiotemporal filters to locally detect the direction of motion. Here, we adapt the filtering scheme originally suggested by Adelson and Bergen to make it consistent with the spike representation of the DVS. The responses of area V1 are weighted and pooled by area MT cells which are selective to different velocities, i.e. direction and speed. Such feature selectivity is here derived from compositions of activities in the spatio-temporal domain and integrating over larger space-time regions (receptive fields). In order to account for the bidirectional coupling of cortical areas we match properties of the feature selectivity in both areas for feedback processing. For such linkage we integrate the responses over different speeds along a particular preferred direction. Normalization of activities is carried out over the spatial as well as the feature domains to balance the activities of individual neurons in model areas V1 and MT. Our model was tested using different stimuli that moved in different directions. The results reveal that the error margin between the estimated motion and synthetic ground truth is decreased in area MT comparing with the initial estimation of area V1. In addition, the modulated V1 cell activations shows an enhancement of the initial motion estimation that is steered by feedback signals from MT cells.
Collapse
Affiliation(s)
- Luma Issa Abdul-Kreem
- Institute for Neural Information Processing, Ulm University, Ulm, Germany
- Control and Systems Engineering Department, University of Technology, Baghdad, Iraq
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, Ulm, Germany
| |
Collapse
|
20
|
Brosch T, Neumann H, Roelfsema PR. Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks. PLoS Comput Biol 2015; 11:e1004489. [PMID: 26496502 PMCID: PMC4619762 DOI: 10.1371/journal.pcbi.1004489] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 08/05/2015] [Indexed: 11/30/2022] Open
Abstract
The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies. Our experience with the visual world allows us to group image elements that belong to the same perceptual object and to segregate them from other objects and the background. If subjects learn to group contour elements, this experience influences neuronal activity in early visual cortical areas, including the primary visual cortex (V1). Learning presumably depends on alterations in the pattern of connections within and between areas of the visual cortex. However, the processes that control changes in connectivity are not well understood. Here we present the first computational model that can train a neural network to integrate collinear contour elements into elongated curves and to trace a curve through the visual field. The new learning algorithm trains fully recurrent neural networks, provided the connectivity causes the networks to reach a stable state. The model reproduces the behavioral performance of monkeys trained in these tasks and explains the patterns of neuronal activity in the visual cortex that emerge during learning, which is remarkable because the only feedback for the model is a reward for successful trials. We discuss a number of the model predictions that can be tested in future neuroscientific work.
Collapse
Affiliation(s)
- Tobias Brosch
- University of Ulm, Institute of Neural Information Processing, Ulm, Germany
| | - Heiko Neumann
- University of Ulm, Institute of Neural Information Processing, Ulm, Germany
- * E-mail:
| | - Pieter R. Roelfsema
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam, The Netherlands
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Psychiatry Department, Academic Medical Center, Amsterdam, The Netherlands
| |
Collapse
|