1
|
Cohen-Duwek H, Tsur EE. Colorful image reconstruction from neuromorphic event cameras with biologically inspired deep color fusion neural networks. Bioinspir Biomim 2024; 19:036001. [PMID: 38373337 DOI: 10.1088/1748-3190/ad2a7c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 02/19/2024] [Indexed: 02/21/2024]
Abstract
Neuromorphic event-based cameras communicate transients in luminance instead of frames, providing visual information with a fine temporal resolution, high dynamic range and high signal-to-noise ratio. Enriching event data with color information allows for the reconstruction of colorful frame-like intensity maps, supporting improved performance and visually appealing results in various computer vision tasks. In this work, we simulated a biologically inspired color fusion system featuring a three-stage convolutional neural network for reconstructing color intensity maps from event data and sparse color cues. While current approaches for color fusion use full RGB frames in high resolution, our design uses event data and low-spatial and tonal-resolution quantized color cues, providing a high-performing small model for efficient colorful image reconstruction. The proposed model outperforms existing coloring schemes in terms of SSIM, LPIPS, PSNR, and CIEDE2000 metrics. We demonstrate that auxiliary limited color information can be used in conjunction with event data to successfully reconstruct both color and intensity frames, paving the way for more efficient hardware designs.
Collapse
Affiliation(s)
- Hadar Cohen-Duwek
- The Neuro-Biomorphic Engineering Lab, Department of Mathematics and Computer Science, The Open University of Israel, Ra'anana, Israel
| | - Elishai Ezra Tsur
- The Neuro-Biomorphic Engineering Lab, Department of Mathematics and Computer Science, The Open University of Israel, Ra'anana, Israel
| |
Collapse
|
2
|
Sajwani H, Ayyad A, Alkendi Y, Halwani M, Abdulrahman Y, Abusafieh A, Zweiri Y. TactiGraph: An Asynchronous Graph Neural Network for Contact Angle Prediction Using Neuromorphic Vision-Based Tactile Sensing. Sensors (Basel) 2023; 23:6451. [PMID: 37514745 PMCID: PMC10383597 DOI: 10.3390/s23146451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 07/30/2023]
Abstract
Vision-based tactile sensors (VBTSs) have become the de facto method for giving robots the ability to obtain tactile feedback from their environment. Unlike other solutions to tactile sensing, VBTSs offer high spatial resolution feedback without compromising on instrumentation costs or incurring additional maintenance expenses. However, conventional cameras used in VBTS have a fixed update rate and output redundant data, leading to computational overhead.In this work, we present a neuromorphic vision-based tactile sensor (N-VBTS) that employs observations from an event-based camera for contact angle prediction. In particular, we design and develop a novel graph neural network, dubbed TactiGraph, that asynchronously operates on graphs constructed from raw N-VBTS streams exploiting their spatiotemporal correlations to perform predictions. Although conventional VBTSs use an internal illumination source, TactiGraph is reported to perform efficiently in both scenarios (with and without an internal illumination source) thus further reducing instrumentation costs. Rigorous experimental results revealed that TactiGraph achieved a mean absolute error of 0.62∘ in predicting the contact angle and was faster and more efficient than both conventional VBTS and other N-VBTS, with lower instrumentation costs. Specifically, N-VBTS requires only 5.5% of the computing time needed by VBTS when both are tested on the same scenario.
Collapse
Affiliation(s)
- Hussain Sajwani
- UAE National Service & Reserve Authority, Abu Dhabi, United Arab Emirates
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi 127788, United Arab Emirates
| | - Abdulla Ayyad
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi 127788, United Arab Emirates
| | - Yusra Alkendi
- Department of Aerospace Engineering, Khalifa University, Abu Dhabi 127788, United Arab Emirates
| | - Mohamad Halwani
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi 127788, United Arab Emirates
| | - Yusra Abdulrahman
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi 127788, United Arab Emirates
- Department of Aerospace Engineering, Khalifa University, Abu Dhabi 127788, United Arab Emirates
| | - Abdulqader Abusafieh
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi 127788, United Arab Emirates
- Research and Development, Strata Manufacturing PJSC, Al Ain 86519, United Arab Emirates
| | - Yahya Zweiri
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi 127788, United Arab Emirates
- Department of Aerospace Engineering, Khalifa University, Abu Dhabi 127788, United Arab Emirates
| |
Collapse
|
3
|
Kim J, Jung YJ. Multi-Stage Network for Event-Based Video Deblurring with Residual Hint Attention. Sensors (Basel) 2023; 23:2880. [PMID: 36991602 PMCID: PMC10056412 DOI: 10.3390/s23062880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 03/04/2023] [Accepted: 03/05/2023] [Indexed: 06/19/2023]
Abstract
Video deblurring aims at removing the motion blur caused by the movement of objects or camera shake. Traditional video deblurring methods have mainly focused on frame-based deblurring, which takes only blurry frames as the input to produce sharp frames. However, frame-based deblurring has shown poor picture quality in challenging cases of video restoration where severely blurred frames are provided as the input. To overcome this issue, recent studies have begun to explore the event-based approach, which uses the event sequence captured by an event camera for motion deblurring. Event cameras have several advantages compared to conventional frame cameras. Among these advantages, event cameras have a low latency in imaging data acquisition (0.001 ms for event cameras vs. 10 ms for frame cameras). Hence, event data can be acquired at a high acquisition rate (up to one microsecond). This means that the event sequence contains more accurate motion information than video frames. Additionally, event data can be acquired with less motion blur. Due to these advantages, the use of event data is highly beneficial for achieving improvements in the quality of deblurred frames. Accordingly, the results of event-based video deblurring are superior to those of frame-based deblurring methods, even for severely blurred video frames. However, the direct use of event data can often generate visual artifacts in the final output frame (e.g., image noise and incorrect textures), because event data intrinsically contain insufficient textures and event noise. To tackle this issue in event-based deblurring, we propose a two-stage coarse-refinement network by adding a frame-based refinement stage that utilizes all the available frames with more abundant textures to further improve the picture quality of the first-stage coarse output. Specifically, a coarse intermediate frame is estimated by performing event-based video deblurring in the first-stage network. A residual hint attention (RHA) module is also proposed to extract useful attention information from the coarse output and all the available frames. This module connects the first and second stages and effectively guides the frame-based refinement of the coarse output. The final deblurred frame is then obtained by refining the coarse output using the residual hint attention and all the available frame information in the second-stage network. We validated the deblurring performance of the proposed network on the GoPro synthetic dataset (33 videos and 4702 frames) and the HQF real dataset (11 videos and 2212 frames). Compared to the state-of-the-art method (D2Net), we achieved a performance improvement of 1 dB in PSNR and 0.05 in SSIM on the GoPro dataset, and an improvement of 1.7 dB in PSNR and 0.03 in SSIM on the HQF dataset.
Collapse
|
4
|
Azevedo GODA, Fernandes BJT, Silva LHDS, Freire A, de Araújo RP, Cruz F. Event-Based Angular Speed Measurement and Movement Monitoring. Sensors (Basel) 2022; 22:7963. [PMID: 36298314 PMCID: PMC9609678 DOI: 10.3390/s22207963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/13/2022] [Accepted: 10/15/2022] [Indexed: 06/16/2023]
Abstract
Computer vision techniques can monitor the rotational speed of rotating equipment or machines to understand their working conditions and prevent failures. Such techniques are highly precise, contactless, and potentially suitable for applications without massive setup changes. However, traditional vision sensors collect a significant amount of data to process and measure the rotation of high-speed systems, and they are susceptible to motion blur. This work proposes a new method for measuring rotational speed processing event-based data applied to high-speed systems using a neuromorphic sensor. This sensor produces event-based data and is designed to work with high temporal resolution and high dynamic range. The main advantages of the Event-based Angular Speed Measurement (EB-ASM) method are the high dynamic range, the absence of motion blurring, and the possibility of measuring multiple rotations simultaneously with a single device. The proposed method uses the time difference between spikes in a Kernel or Window selected in the sensor frame range. It is evaluated in two experimental scenarios by measuring a fan rotational speed and a Router Computer Numerical Control (CNC) spindle. The results compare measurements with a calibrated digital photo-tachometer. Based on the performed tests, the EB-ASM can measure the rotational speed with a mean absolute error of less than 0.2% for both scenarios.
Collapse
Affiliation(s)
| | | | - Leandro Honorato de Souza Silva
- Escola Politécnica de Pernambuco, University of Pernambuco, Recife 50720-001, Brazil
- Unidade Acadêmica da Área de Indústria, Federal Institute of Paraíba, Cajazeiras 58900-000, Brazil
| | - Agostinho Freire
- Escola Politécnica de Pernambuco, University of Pernambuco, Recife 50720-001, Brazil
| | | | - Francisco Cruz
- School of Computer Science and Engineering, University of New South Wales, Sydney 1466, Australia
- Escuela de Ingeniería, Universidad Central de Chile, Santiago 8330601, Chile
| |
Collapse
|
5
|
Wang Y, Yang J, Peng X, Wu P, Gao L, Huang K, Chen J, Kneip L. Visual Odometry with an Event Camera Using Continuous Ray Warping and Volumetric Contrast Maximization. Sensors (Basel) 2022; 22:5687. [PMID: 35957244 PMCID: PMC9370870 DOI: 10.3390/s22155687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 07/12/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
We present a new solution to tracking and mapping with an event camera. The motion of the camera contains both rotation and translation displacements in the plane, and the displacements happen in an arbitrarily structured environment. As a result, the image matching may no longer be represented by a low-dimensional homographic warping, thus complicating an application of the commonly used Image of Warped Events (IWE). We introduce a new solution to this problem by performing contrast maximization in 3D. The 3D location of the rays cast for each event is smoothly varied as a function of a continuous-time motion parametrization, and the optimal parameters are found by maximizing the contrast in a volumetric ray density field. Our method thus performs joint optimization over motion and structure. The practical validity of our approach is supported by an application to AGV motion estimation and 3D reconstruction with a single vehicle-mounted event camera. The method approaches the performance obtained with regular cameras and eventually outperforms in challenging visual conditions.
Collapse
Affiliation(s)
- Yifu Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Jiaqi Yang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Xin Peng
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Peng Wu
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Ling Gao
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Kun Huang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Jiaben Chen
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Laurent Kneip
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
6
|
Beck M, Maier G, Flitter M, Gruna R, Längle T, Heizmann M, Beyerer J. An Extended Modular Processing Pipeline for Event-Based Vision in Automatic Visual Inspection. Sensors (Basel) 2021; 21:s21186143. [PMID: 34577349 PMCID: PMC8472878 DOI: 10.3390/s21186143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/02/2021] [Accepted: 09/03/2021] [Indexed: 11/16/2022]
Abstract
Dynamic Vision Sensors differ from conventional cameras in that only intensity changes of individual pixels are perceived and transmitted as an asynchronous stream instead of an entire frame. The technology promises, among other things, high temporal resolution and low latencies and data rates. While such sensors currently enjoy much scientific attention, there are only little publications on practical applications. One field of application that has hardly been considered so far, yet potentially fits well with the sensor principle due to its special properties, is automatic visual inspection. In this paper, we evaluate current state-of-the-art processing algorithms in this new application domain. We further propose an algorithmic approach for the identification of ideal time windows within an event stream for object classification. For the evaluation of our method, we acquire two novel datasets that contain typical visual inspection scenarios, i.e., the inspection of objects on a conveyor belt and during free fall. The success of our algorithmic extension for data processing is demonstrated on the basis of these new datasets by showing that classification accuracy of current algorithms is highly increased. By making our new datasets publicly available, we intend to stimulate further research on application of Dynamic Vision Sensors in machine vision applications.
Collapse
Affiliation(s)
- Moritz Beck
- Fraunhofer IOSB, Karlsruhe, Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany; (M.B.); (M.F.); (R.G.); (T.L.); (J.B.)
| | - Georg Maier
- Fraunhofer IOSB, Karlsruhe, Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany; (M.B.); (M.F.); (R.G.); (T.L.); (J.B.)
- Correspondence:
| | - Merle Flitter
- Fraunhofer IOSB, Karlsruhe, Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany; (M.B.); (M.F.); (R.G.); (T.L.); (J.B.)
| | - Robin Gruna
- Fraunhofer IOSB, Karlsruhe, Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany; (M.B.); (M.F.); (R.G.); (T.L.); (J.B.)
| | - Thomas Längle
- Fraunhofer IOSB, Karlsruhe, Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany; (M.B.); (M.F.); (R.G.); (T.L.); (J.B.)
| | - Michael Heizmann
- Institute of Industrial Information Technology (IIIT), Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany;
| | - Jürgen Beyerer
- Fraunhofer IOSB, Karlsruhe, Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany; (M.B.); (M.F.); (R.G.); (T.L.); (J.B.)
- Vision and Fusion Laboratory (IES), Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany
| |
Collapse
|
7
|
Kreiser R, Renner A, Leite VRC, Serhan B, Bartolozzi C, Glover A, Sandamirskaya Y. An On-chip Spiking Neural Network for Estimation of the Head Pose of the iCub Robot. Front Neurosci 2020; 14:551. [PMID: 32655350 PMCID: PMC7325709 DOI: 10.3389/fnins.2020.00551] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 05/04/2020] [Indexed: 11/17/2022] Open
Abstract
In this work, we present a neuromorphic architecture for head pose estimation and scene representation for the humanoid iCub robot. The spiking neuronal network is fully realized in Intel's neuromorphic research chip, Loihi, and precisely integrates the issued motor commands to estimate the iCub's head pose in a neuronal path-integration process. The neuromorphic vision system of the iCub is used to correct for drift in the pose estimation. Positions of objects in front of the robot are memorized using on-chip synaptic plasticity. We present real-time robotic experiments using 2 degrees of freedom (DoF) of the robot's head and show precise path integration, visual reset, and object position learning on-chip. We discuss the requirements for integrating the robotic system and neuromorphic hardware with current technologies.
Collapse
Affiliation(s)
- Raphaela Kreiser
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Alpha Renner
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Vanessa R. C. Leite
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Baris Serhan
- Lincoln Centre for Autonomous Systems, University of Lincoln, Lincoln, United Kingdom
| | | | | | - Yulia Sandamirskaya
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| |
Collapse
|
8
|
Kugele A, Pfeil T, Pfeiffer M, Chicca E. Efficient Processing of Spatio-Temporal Data Streams With Spiking Neural Networks. Front Neurosci 2020; 14:439. [PMID: 32431592 PMCID: PMC7214871 DOI: 10.3389/fnins.2020.00439] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 04/09/2020] [Indexed: 11/15/2022] Open
Abstract
Spiking neural networks (SNNs) are potentially highly efficient models for inference on fully parallel neuromorphic hardware, but existing training methods that convert conventional artificial neural networks (ANNs) into SNNs are unable to exploit these advantages. Although ANN-to-SNN conversion has achieved state-of-the-art accuracy for static image classification tasks, the following subtle but important difference in the way SNNs and ANNs integrate information over time makes the direct application of conversion techniques for sequence processing tasks challenging. Whereas all connections in SNNs have a certain propagation delay larger than zero, ANNs assign different roles to feed-forward connections, which immediately update all neurons within the same time step, and recurrent connections, which have to be rolled out in time and are typically assigned a delay of one time step. Here, we present a novel method to obtain highly accurate SNNs for sequence processing by modifying the ANN training before conversion, such that delays induced by ANN rollouts match the propagation delays in the targeted SNN implementation. Our method builds on the recently introduced framework of streaming rollouts, which aims for fully parallel model execution of ANNs and inherently allows for temporal integration by merging paths of different delays between input and output of the network. The resulting networks achieve state-of-the-art accuracy for multiple event-based benchmark datasets, including N-MNIST, CIFAR10-DVS, N-CARS, and DvsGesture, and through the use of spatio-temporal shortcut connections yield low-latency approximate network responses that improve over time as more of the input sequence is processed. In addition, our converted SNNs are consistently more energy-efficient than their corresponding ANNs.
Collapse
Affiliation(s)
- Alexander Kugele
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
- Bosch Center for Artificial Intelligence, Renningen, Germany
| | - Thomas Pfeil
- Bosch Center for Artificial Intelligence, Renningen, Germany
| | | | - Elisabetta Chicca
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| |
Collapse
|
9
|
Afshar S, Ralph N, Xu Y, Tapson J, van Schaik A, Cohen G. Event-Based Feature Extraction Using Adaptive Selection Thresholds. Sensors (Basel) 2020; 20:s20061600. [PMID: 32183052 PMCID: PMC7146588 DOI: 10.3390/s20061600] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/07/2020] [Accepted: 03/08/2020] [Indexed: 11/25/2022]
Abstract
Unsupervised feature extraction algorithms form one of the most important building blocks in machine learning systems. These algorithms are often adapted to the event-based domain to perform online learning in neuromorphic hardware. However, not designed for the purpose, such algorithms typically require significant simplification during implementation to meet hardware constraints, creating trade offs with performance. Furthermore, conventional feature extraction algorithms are not designed to generate useful intermediary signals which are valuable only in the context of neuromorphic hardware limitations. In this work a novel event-based feature extraction method is proposed that focuses on these issues. The algorithm operates via simple adaptive selection thresholds which allow a simpler implementation of network homeostasis than previous works by trading off a small amount of information loss in the form of missed events that fall outside the selection thresholds. The behavior of the selection thresholds and the output of the network as a whole are shown to provide uniquely useful signals indicating network weight convergence without the need to access network weights. A novel heuristic method for network size selection is proposed which makes use of noise events and their feature representations. The use of selection thresholds is shown to produce network activation patterns that predict classification accuracy allowing rapid evaluation and optimization of system parameters without the need to run back-end classifiers. The feature extraction method is tested on both the N-MNIST (Neuromorphic-MNIST) benchmarking dataset and a dataset of airplanes passing through the field of view. Multiple configurations with different classifiers are tested with the results quantifying the resultant performance gains at each processing stage.
Collapse
|
10
|
Abstract
Object tracking based on the event-based camera or dynamic vision sensor (DVS) remains a challenging task due to the noise events, rapid change of event-stream shape, chaos of complex background textures, and occlusion. To address the challenges, this paper presents a robust event-stream object tracking method based on correlation filter mechanism and convolutional neural network (CNN) representation. In the proposed method, rate coding is used to encode the event-stream object. Feature representations from hierarchical convolutional layers of a pre-trained CNN are used to represent the appearance of the rate encoded event-stream object. Results prove that the proposed method not only achieves good tracking performance in many complicated scenes with noise events, complex background textures, occlusion, and intersected trajectories, but also is robust to variable scale, variable pose, and non-rigid deformations. In addition, the correlation filter-based method has the advantage of high speed. The proposed approach will promote the potential applications of these event-based vision sensors in autonomous driving, robots and many other high-speed scenes.
Collapse
Affiliation(s)
- Hongmin Li
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua University, Beijing, China
| | - Luping Shi
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua University, Beijing, China
| |
Collapse
|
11
|
Afshar S, Hamilton TJ, Tapson J, van Schaik A, Cohen G. Investigation of Event-Based Surfaces for High-Speed Detection, Unsupervised Feature Extraction, and Object Recognition. Front Neurosci 2019; 12:1047. [PMID: 30705618 PMCID: PMC6344467 DOI: 10.3389/fnins.2018.01047] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 12/24/2018] [Indexed: 12/31/2022] Open
Abstract
In this work, we investigate event-based feature extraction through a rigorous framework of testing. We test a hardware efficient variant of Spike Timing Dependent Plasticity (STDP) on a range of spatio-temporal kernels with different surface decaying methods, decay functions, receptive field sizes, feature numbers, and back end classifiers. This detailed investigation can provide helpful insights and rules of thumb for performance vs. complexity trade-offs in more generalized networks, especially in the context of hardware implementation, where design choices can incur significant resource costs. The investigation is performed using a new dataset consisting of model airplanes being dropped free-hand close to the sensor. The target objects exhibit a wide range of relative orientations and velocities. This range of target velocities, analyzed in multiple configurations, allows a rigorous comparison of time-based decaying surfaces (time surfaces) vs. event index-based decaying surface (index surfaces), which are used to perform unsupervised feature extraction, followed by target detection and recognition. We examine each processing stage by comparison to the use of raw events, as well as a range of alternative layer structures, and the use of random features. By comparing results from a linear classifier and an ELM classifier, we evaluate how each element of the system affects accuracy. To generate time and index surfaces, the most commonly used kernels, namely event binning kernels, linearly, and exponentially decaying kernels, are investigated. Index surfaces were found to outperform time surfaces in recognition when invariance to target velocity was made a requirement. In the investigation of network structure, larger networks of neurons with large receptive field sizes were found to perform best. We find that a small number of event-based feature extractors can project the complex spatio-temporal event patterns of the dataset to an almost linearly separable representation in feature space, with best performing linear classifier achieving 98.75% recognition accuracy, using only 25 feature extracting neurons.
Collapse
Affiliation(s)
- Saeed Afshar
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - Tara Julia Hamilton
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - Jonathan Tapson
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - André van Schaik
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - Gregory Cohen
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
12
|
Abstract
In order to safely navigate and orient in their local surroundings autonomous systems need to rapidly extract and persistently track visual features from the environment. While there are many algorithms tackling those tasks for traditional frame-based cameras, these have to deal with the fact that conventional cameras sample their environment with a fixed frequency. Most prominently, the same features have to be found in consecutive frames and corresponding features then need to be matched using elaborate techniques as any information between the two frames is lost. We introduce a novel method to detect and track line structures in data streams of event-based silicon retinae [also known as dynamic vision sensors (DVS)]. In contrast to conventional cameras, these biologically inspired sensors generate a quasicontinuous stream of vision information analogous to the information stream created by the ganglion cells in mammal retinae. All pixels of DVS operate asynchronously without a periodic sampling rate and emit a so-called DVS address event as soon as they perceive a luminance change exceeding an adjustable threshold. We use the high temporal resolution achieved by the DVS to track features continuously through time instead of only at fixed points in time. The focus of this work lies on tracking lines in a mostly static environment which is observed by a moving camera, a typical setting in mobile robotics. Since DVS events are mostly generated at object boundaries and edges which in man-made environments often form lines they were chosen as feature to track. Our method is based on detecting planes of DVS address events in x-y-t-space and tracing these planes through time. It is robust against noise and runs in real time on a standard computer, hence it is suitable for low latency robotics. The efficacy and performance are evaluated on real-world data sets which show artificial structures in an office-building using event data for tracking and frame data for ground-truth estimation from a DAVIS240C sensor.
Collapse
Affiliation(s)
- Lukas Everding
- Department of Electrical and Computer Engineering, Neuroscientific Systemtheory, Technical University of Munich, Munich, Germany
| | - Jörg Conradt
- Department of Electrical and Computer Engineering, Neuroscientific Systemtheory, Technical University of Munich, Munich, Germany
| |
Collapse
|
13
|
Abstract
Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as "CIFAR10-DVS." The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification.
Collapse
Affiliation(s)
- Hongmin Li
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua UniversityBeijing, China
| | - Hanchao Liu
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua UniversityBeijing, China
| | - Xiangyang Ji
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua UniversityBeijing, China.,Department of Automation, Tsinghua UniversityBeijing, China
| | - Guoqi Li
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua UniversityBeijing, China
| | - Luping Shi
- Department of Precision Instrument, Center for Brain-Inspired Computing Research, Tsinghua UniversityBeijing, China
| |
Collapse
|
14
|
Boluda JA, Pardo F, Vegara F. A Selective Change Driven System for High-Speed Motion Analysis. Sensors (Basel) 2016; 16:E1875. [PMID: 27834800 PMCID: PMC5134534 DOI: 10.3390/s16111875] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 10/28/2016] [Accepted: 11/03/2016] [Indexed: 11/29/2022]
Abstract
Vision-based sensing algorithms are computationally-demanding tasks due to the large amount of data acquired and processed. Visual sensors deliver much information, even if data are redundant, and do not give any additional information. A Selective Change Driven (SCD) sensing system is based on a sensor that delivers, ordered by the magnitude of its change, only those pixels that have changed most since the last read-out. This allows the information stream to be adjusted to the computation capabilities. Following this strategy, a new SCD processing architecture for high-speed motion analysis, based on processing pixels instead of full frames, has been developed and implemented into a Field Programmable Gate-Array (FPGA). The programmable device controls the data stream, delivering a new object distance calculation for every new pixel. The acquisition, processing and delivery of a new object distance takes just 1.7 μ s. Obtaining a similar result using a conventional frame-based camera would require a device working at roughly 500 Kfps, which is far from being practical or even feasible. This system, built with the recently-developed 64 × 64 CMOS SCD sensor, shows the potential of the SCD approach when combined with a hardware processing system.
Collapse
Affiliation(s)
- Jose A Boluda
- Departament d'Informàtica, Escola Tècnica Superior d'Enginyeria, Universitat de València, Avd. de la Universidad, s/n, 46100 Burjassot, Spain.
| | - Fernando Pardo
- Departament d'Informàtica, Escola Tècnica Superior d'Enginyeria, Universitat de València, Avd. de la Universidad, s/n, 46100 Burjassot, Spain.
| | - Francisco Vegara
- Departament d'Informàtica, Escola Tècnica Superior d'Enginyeria, Universitat de València, Avd. de la Universidad, s/n, 46100 Burjassot, Spain.
| |
Collapse
|
15
|
Hu Y, Liu H, Pfeiffer M, Delbruck T. DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition. Front Neurosci 2016; 10:405. [PMID: 27630540 PMCID: PMC5006598 DOI: 10.3389/fnins.2016.00405] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 08/19/2016] [Indexed: 11/13/2022] Open
Affiliation(s)
- Yuhuang Hu
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Hongjie Liu
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Michael Pfeiffer
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Tobi Delbruck
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| |
Collapse
|
16
|
Boluda JA, Zuccarello P, Pardo F, Vegara F. Selective change driven imaging: a biomimetic visual sensing strategy. Sensors (Basel) 2011; 11:11000-20. [PMID: 22346684 DOI: 10.3390/s111111000] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Revised: 11/15/2011] [Accepted: 11/18/2011] [Indexed: 12/02/2022]
Abstract
Selective Change Driven (SCD) Vision is a biologically inspired strategy for acquiring, transmitting and processing images that significantly speeds up image sensing. SCD vision is based on a new CMOS image sensor which delivers, ordered by the absolute magnitude of its change, the pixels that have changed after the last time they were read out. Moreover, the traditional full frame processing hardware and programming methodology has to be changed, as a part of this biomimetic approach, to a new processing paradigm based on pixel processing in a data flow manner, instead of full frame image processing.
Collapse
|