1
|
Yu C, Gu Z, Li D, Wang G, Wang A, Li E. STSC-SNN: Spatio-Temporal Synaptic Connection with temporal convolution and attention for spiking neural networks. Front Neurosci 2022; 16:1079357. [PMID: 36620452 PMCID: PMC9817103 DOI: 10.3389/fnins.2022.1079357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 12/08/2022] [Indexed: 12/25/2022] Open
Abstract
Spiking neural networks (SNNs), as one of the algorithmic models in neuromorphic computing, have gained a great deal of research attention owing to temporal information processing capability, low power consumption, and high biological plausibility. The potential to efficiently extract spatio-temporal features makes it suitable for processing event streams. However, existing synaptic structures in SNNs are almost full-connections or spatial 2D convolution, neither of which can extract temporal dependencies adequately. In this work, we take inspiration from biological synapses and propose a Spatio-Temporal Synaptic Connection SNN (STSC-SNN) model to enhance the spatio-temporal receptive fields of synaptic connections, thereby establishing temporal dependencies across layers. Specifically, we incorporate temporal convolution and attention mechanisms to implement synaptic filtering and gating functions. We show that endowing synaptic models with temporal dependencies can improve the performance of SNNs on classification tasks. In addition, we investigate the impact of performance via varied spatial-temporal receptive fields and reevaluate the temporal modules in SNNs. Our approach is tested on neuromorphic datasets, including DVS128 Gesture (gesture recognition), N-MNIST, CIFAR10-DVS (image classification), and SHD (speech digit recognition). The results show that the proposed model outperforms the state-of-the-art accuracy on nearly all datasets.
Collapse
Affiliation(s)
- Chengting Yu
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China,Zhejiang University - University of Illinois at Urbana-Champaign Institute, Zhejiang University, Haining, China
| | - Zheming Gu
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
| | - Da Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
| | - Gaoang Wang
- Zhejiang University - University of Illinois at Urbana-Champaign Institute, Zhejiang University, Haining, China
| | - Aili Wang
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China,Zhejiang University - University of Illinois at Urbana-Champaign Institute, Zhejiang University, Haining, China,*Correspondence: Aili Wang ✉
| | - Erping Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China,Zhejiang University - University of Illinois at Urbana-Champaign Institute, Zhejiang University, Haining, China
| |
Collapse
|
2
|
Zhang S, Wang W, Li H, Zhang S. EVtracker: An Event-Driven Spatiotemporal Method for Dynamic Object Tracking. SENSORS (BASEL, SWITZERLAND) 2022; 22:6090. [PMID: 36015851 PMCID: PMC9414578 DOI: 10.3390/s22166090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 07/06/2022] [Accepted: 08/12/2022] [Indexed: 06/15/2023]
Abstract
An event camera is a novel bio-inspired sensor that effectively compensates for the shortcomings of current frame cameras, which include high latency, low dynamic range, motion blur, etc. Rather than capturing images at a fixed frame rate, an event camera produces an asynchronous signal by measuring the brightness change of each pixel. Consequently, an appropriate algorithm framework that can handle the unique data types of event-based vision is required. In this paper, we propose a dynamic object tracking framework using an event camera to achieve long-term stable tracking of event objects. One of the key novel features of our approach is to adopt an adaptive strategy that adjusts the spatiotemporal domain of event data. To achieve this, we reconstruct event images from high-speed asynchronous streaming data via online learning. Additionally, we apply the Siamese network to extract features from event data. In contrast to earlier models that only extract hand-crafted features, our method provides powerful feature description and a more flexible reconstruction strategy for event data. We assess our algorithm in three challenging scenarios: 6-DoF (six degrees of freedom), translation, and rotation. Unlike fixed cameras in traditional object tracking tasks, all three tracking scenarios involve the simultaneous violent rotation and shaking of both the camera and objects. Results from extensive experiments suggest that our proposed approach achieves superior accuracy and robustness compared to other state-of-the-art methods. Without reducing time efficiency, our novel method exhibits a 30% increase in accuracy over other recent models. Furthermore, results indicate that event cameras are capable of robust object tracking, which is a task that conventional cameras cannot adequately perform, especially for super-fast motion tracking and challenging lighting situations.
Collapse
|
3
|
Li J, Li J, Zhu L, Xiang X, Huang T, Tian Y. Asynchronous Spatio-Temporal Memory Network for Continuous Event-Based Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2975-2987. [PMID: 35377848 DOI: 10.1109/tip.2022.3162962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Event cameras, offering extremely high temporal resolution and high dynamic range, have brought a new perspective to addressing common object detection challenges (e.g., motion blur and low light). However, how to learn a better spatio-temporal representation and exploit rich temporal cues from asynchronous events for object detection still remains an open issue. To address this problem, we propose a novel asynchronous spatio-temporal memory network (ASTMNet) that directly consumes asynchronous events instead of event images prior to processing, which can well detect objects in a continuous manner. Technically, ASTMNet learns an asynchronous attention embedding from the continuous event stream by adopting an adaptive temporal sampling strategy and a temporal attention convolutional module. Besides, a spatio-temporal memory module is designed to exploit rich temporal cues via a lightweight yet efficient inter-weaved recurrent-convolutional architecture. Empirically, it shows that our approach outperforms the state-of-the-art methods using the feed-forward frame-based detectors on three datasets by a large margin (i.e., 7.6% in the KITTI Simulated Dataset, 10.8% in the Gen1 Automotive Dataset, and 10.5% in the 1Mpx Detection Dataset). The results demonstrate that event cameras can perform robust object detection even in cases where conventional cameras fail, e.g., fast motion and challenging light conditions.
Collapse
|
4
|
Iyer LR, Chua Y, Li H. Is Neuromorphic MNIST Neuromorphic? Analyzing the Discriminative Power of Neuromorphic Datasets in the Time Domain. Front Neurosci 2021; 15:608567. [PMID: 33841072 PMCID: PMC8027306 DOI: 10.3389/fnins.2021.608567] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/01/2021] [Indexed: 11/26/2022] Open
Abstract
A major characteristic of spiking neural networks (SNNs) over conventional artificial neural networks (ANNs) is their ability to spike, enabling them to use spike timing for coding and efficient computing. In this paper, we assess if neuromorphic datasets recorded from static images are able to evaluate the ability of SNNs to use spike timings in their calculations. We have analyzed N-MNIST, N-Caltech101 and DvsGesture along these lines, but focus our study on N-MNIST. First we evaluate if additional information is encoded in the time domain in a neuromorphic dataset. We show that an ANN trained with backpropagation on frame-based versions of N-MNIST and N-Caltech101 images achieve 99.23 and 78.01% accuracy. These are comparable to the state of the art-showing that an algorithm that purely works on spatial data can classify these datasets. Second we compare N-MNIST and DvsGesture on two STDP algorithms, RD-STDP, that can classify only spatial data, and STDP-tempotron that classifies spatiotemporal data. We demonstrate that RD-STDP performs very well on N-MNIST, while STDP-tempotron performs better on DvsGesture. Since DvsGesture has a temporal dimension, it requires STDP-tempotron, while N-MNIST can be adequately classified by an algorithm that works on spatial data alone. This shows that precise spike timings are not important in N-MNIST. N-MNIST does not, therefore, highlight the ability of SNNs to classify temporal data. The conclusions of this paper open the question-what dataset can evaluate SNN ability to classify temporal data?
Collapse
Affiliation(s)
- Laxmi R. Iyer
- Neuromorphic Computing, Institute of Infocomms Research, A*Star, Singapore, Singapore
| | - Yansong Chua
- Neuromorphic Computing, Institute of Infocomms Research, A*Star, Singapore, Singapore
| | - Haizhou Li
- Neuromorphic Computing, Institute of Infocomms Research, A*Star, Singapore, Singapore
- Huawei Technologies Co., Ltd., Shenzhen, China
| |
Collapse
|
5
|
Comparing SNNs and RNNs on neuromorphic vision datasets: Similarities and differences. Neural Netw 2020; 132:108-120. [PMID: 32866745 DOI: 10.1016/j.neunet.2020.08.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 06/13/2020] [Accepted: 08/03/2020] [Indexed: 01/16/2023]
Abstract
Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion. Spiking neural networks (SNNs) represent a family of event-driven models with spatiotemporal dynamics for neuromorphic computing, which are widely benchmarked on neuromorphic data. Interestingly, researchers in the machine learning community can argue that recurrent (artificial) neural networks (RNNs) also have the capability to extract spatiotemporal features although they are not event-driven. Thus, the question of "what will happen if we benchmark these two kinds of models together on neuromorphic data" comes out but remains unclear. In this work, we make a systematic study to compare SNNs and RNNs on neuromorphic data, taking the vision datasets as a case study. First, we identify the similarities and differences between SNNs and RNNs (including the vanilla RNNs and LSTM) from the modeling and learning perspectives. To improve comparability and fairness, we unify the supervised learning algorithm based on backpropagation through time (BPTT), the loss function exploiting the outputs at all timesteps, the network structure with stacked fully-connected or convolutional layers, and the hyper-parameters during training. Especially, given the mainstream loss function used in RNNs, we modify it inspired by the rate coding scheme to approach that of SNNs. Furthermore, we tune the temporal resolution of datasets to test model robustness and generalization. At last, a series of contrast experiments are conducted on two types of neuromorphic datasets: DVS-converted (N-MNIST) and DVS-captured (DVS Gesture). Extensive insights regarding recognition accuracy, feature extraction, temporal resolution and contrast, learning generalization, computational complexity and parameter volume are provided, which are beneficial for the model selection on different workloads and even for the invention of novel neural models in the future.
Collapse
|
6
|
D'Angelo G, Janotte E, Schoepe T, O'Keeffe J, Milde MB, Chicca E, Bartolozzi C. Event-Based Eccentric Motion Detection Exploiting Time Difference Encoding. Front Neurosci 2020; 14:451. [PMID: 32457575 PMCID: PMC7227134 DOI: 10.3389/fnins.2020.00451] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 04/14/2020] [Indexed: 11/13/2022] Open
Abstract
Attentional selectivity tends to follow events considered as interesting stimuli. Indeed, the motion of visual stimuli present in the environment attract our attention and allow us to react and interact with our surroundings. Extracting relevant motion information from the environment presents a challenge with regards to the high information content of the visual input. In this work we propose a novel integration between an eccentric down-sampling of the visual field, taking inspiration from the varying size of receptive fields (RFs) in the mammalian retina, and the Spiking Elementary Motion Detector (sEMD) model. We characterize the system functionality with simulated data and real world data collected with bio-inspired event driven cameras, successfully implementing motion detection along the four cardinal directions and diagonally.
Collapse
Affiliation(s)
- Giulia D'Angelo
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub Facility, Genoa, Italy
| | - Ella Janotte
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - Thorben Schoepe
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - James O'Keeffe
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Moritz B Milde
- International Centre for Neuromorphic Systems, The MARCS Institute, Western Sydney University, Sydney, NSW, Australia
| | - Elisabetta Chicca
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - Chiara Bartolozzi
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub Facility, Genoa, Italy
| |
Collapse
|
7
|
Cheng X, Ren Y, Cheng K, Cao J, Hao Q. Method for Training Convolutional Neural Networks for In Situ Plankton Image Recognition and Classification Based on the Mechanisms of the Human Eye. SENSORS 2020; 20:s20092592. [PMID: 32370162 PMCID: PMC7248961 DOI: 10.3390/s20092592] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/20/2020] [Accepted: 04/22/2020] [Indexed: 11/16/2022]
Abstract
In this study, we propose a method for training convolutional neural networks to make them identify and classify images with higher classification accuracy. By combining the Cartesian and polar coordinate systems when describing the images, the method of recognition and classification for plankton images is discussed. The optimized classification and recognition networks are constructed. They are available for in situ plankton images, exploiting the advantages of both coordinate systems in the network training process. Fusing the two types of vectors and using them as the input for conventional machine learning models for classification, support vector machines (SVMs) are selected as the classifiers to combine these two features of vectors, coming from different image coordinate descriptions. The accuracy of the proposed model was markedly higher than those of the initial classical convolutional neural networks when using the in situ plankton image data, with the increases in classification accuracy and recall rate being 5.3% and 5.1% respectively. In addition, the proposed training method can improve the classification performance considerably when used on the public CIFAR-10 dataset.
Collapse
Affiliation(s)
- Xuemin Cheng
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (Y.R.); (K.C.)
- Correspondence:
| | - Yong Ren
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (Y.R.); (K.C.)
| | - Kaichang Cheng
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (Y.R.); (K.C.)
| | - Jie Cao
- School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China; (J.C.); (Q.H.)
| | - Qun Hao
- School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China; (J.C.); (Q.H.)
| |
Collapse
|
8
|
Ramesh B, Ussa A, Della Vedova L, Yang H, Orchard G. Low-Power Dynamic Object Detection and Classification With Freely Moving Event Cameras. Front Neurosci 2020; 14:135. [PMID: 32153357 PMCID: PMC7044237 DOI: 10.3389/fnins.2020.00135] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 02/03/2020] [Indexed: 11/13/2022] Open
Abstract
We present the first purely event-based, energy-efficient approach for dynamic object detection and categorization with a freely moving event camera. Compared to traditional cameras, event-based object recognition systems are considerably behind in terms of accuracy and algorithmic maturity. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional object representation when hardware resources are limited to implement PCA. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance compared to state-of-the-art algorithms. Additionally, we verified the real-time FPGA performance of the proposed object detection method, trained with limited data as opposed to deep learning methods, under a closed-loop aerial vehicle flight mode. We also compare the proposed object categorization framework to pre-trained convolutional neural networks using transfer learning and highlight the drawbacks of using frame-based sensors under dynamic camera motion. Finally, we provide critical insights about the feature extraction method and the classification parameters on the system performance, which aids in understanding the framework to suit various low-power (less than a few watts) application scenarios.
Collapse
Affiliation(s)
- Bharath Ramesh
- Life Science Institute, The N.1 Institute for Health, National University of Singapore, Singapore, Singapore.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Andrés Ussa
- Life Science Institute, The N.1 Institute for Health, National University of Singapore, Singapore, Singapore.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Luca Della Vedova
- Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Hong Yang
- Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Garrick Orchard
- Life Science Institute, The N.1 Institute for Health, National University of Singapore, Singapore, Singapore.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| |
Collapse
|
9
|
Taherkhani A, Belatreche A, Li Y, Cosma G, Maguire LP, McGinnity TM. A review of learning in biologically plausible spiking neural networks. Neural Netw 2019; 122:253-272. [PMID: 31726331 DOI: 10.1016/j.neunet.2019.09.036] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 09/17/2019] [Accepted: 09/23/2019] [Indexed: 11/30/2022]
Abstract
Artificial neural networks have been used as a powerful processing tool in various areas such as pattern recognition, control, robotics, and bioinformatics. Their wide applicability has encouraged researchers to improve artificial neural networks by investigating the biological brain. Neurological research has significantly progressed in recent years and continues to reveal new characteristics of biological neurons. New technologies can now capture temporal changes in the internal activity of the brain in more detail and help clarify the relationship between brain activity and the perception of a given stimulus. This new knowledge has led to a new type of artificial neural network, the Spiking Neural Network (SNN), that draws more faithfully on biological properties to provide higher processing abilities. A review of recent developments in learning of spiking neurons is presented in this paper. First the biological background of SNN learning algorithms is reviewed. The important elements of a learning algorithm such as the neuron model, synaptic plasticity, information encoding and SNN topologies are then presented. Then, a critical review of the state-of-the-art learning algorithms for SNNs using single and multiple spikes is presented. Additionally, deep spiking neural networks are reviewed, and challenges and opportunities in the SNN field are discussed.
Collapse
Affiliation(s)
- Aboozar Taherkhani
- School of Computer Science and Informatics, Faculty of Computing, Engineering and Media, De Montfort University, Leicester, UK.
| | - Ammar Belatreche
- Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK
| | - Yuhua Li
- School of Computer Science and Informatics, Cardiff University, Cardiff, UK
| | - Georgina Cosma
- Department of Computer Science, Loughborough University, Loughborough, UK
| | - Liam P Maguire
- Intelligent Systems Research Centre, Ulster University, Northern Ireland, Derry, UK
| | - T M McGinnity
- Intelligent Systems Research Centre, Ulster University, Northern Ireland, Derry, UK; School of Science and Technology, Nottingham Trent University, Nottingham, UK
| |
Collapse
|