1
|
Gao Y, Lu J, Li S, Li Y, Du S. Hypergraph-Based Multi-View Action Recognition Using Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6610-6622. [PMID: 38536691 DOI: 10.1109/tpami.2024.3382117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THUMV-EACT-50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
Collapse
|
2
|
Yu Q, Gao J, Wei J, Li J, Tan KC, Huang T. Improving Multispike Learning With Plastic Synaptic Delays. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10254-10265. [PMID: 35442893 DOI: 10.1109/tnnls.2022.3165527] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Emulating the spike-based processing in the brain, spiking neural networks (SNNs) are developed and act as a promising candidate for the new generation of artificial neural networks that aim to produce efficient cognitions as the brain. Due to the complex dynamics and nonlinearity of SNNs, designing efficient learning algorithms has remained a major difficulty, which attracts great research attention. Most existing ones focus on the adjustment of synaptic weights. However, other components, such as synaptic delays, are found to be adaptive and important in modulating neural behavior. How could plasticity on different components cooperate to improve the learning of SNNs remains as an interesting question. Advancing our previous multispike learning, we propose a new joint weight-delay plasticity rule, named TDP-DL, in this article. Plastic delays are integrated into the learning framework, and as a result, the performance of multispike learning is significantly improved. Simulation results highlight the effectiveness and efficiency of our TDP-DL rule compared to baseline ones. Moreover, we reveal the underlying principle of how synaptic weights and delays cooperate with each other through a synthetic task of interval selectivity and show that plastic delays can enhance the selectivity and flexibility of neurons by shifting information across time. Due to this capability, useful information distributed away in the time domain can be effectively integrated for a better accuracy performance, as highlighted in our generalization tasks of the image, speech, and event-based object recognitions. Our work is thus valuable and significant to improve the performance of spike-based neuromorphic computing.
Collapse
|
3
|
Gao Y, Lu J, Li S, Ma N, Du S, Li Y, Dai Q. Action Recognition and Benchmark Using Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14081-14097. [PMID: 37527291 DOI: 10.1109/tpami.2023.3300741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper, we propose an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named THUE-ACT-50 and the accompanying THUE-ACT-50-CHL dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.
Collapse
|
4
|
Fang W, Chen Y, Ding J, Yu Z, Masquelier T, Chen D, Huang L, Zhou H, Li G, Tian Y. SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence. SCIENCE ADVANCES 2023; 9:eadi1480. [PMID: 37801497 PMCID: PMC10558124 DOI: 10.1126/sciadv.adi1480] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 09/05/2023] [Indexed: 10/08/2023]
Abstract
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integration of processing neuromorphic datasets and deployment. In this work, we present the SpikingJelly framework to address the aforementioned dilemma. We contribute a full-stack toolkit for preprocessing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips. Compared to existing methods, the training of deep SNNs can be accelerated 11×, and the superior extensibility and flexibility of SpikingJelly enable users to accelerate custom models at low costs through multilevel inheritance and semiautomatic code generation. SpikingJelly paves the way for synthesizing truly energy-efficient SNN-based machine intelligence systems, which will enrich the ecology of neuromorphic computing.
Collapse
Affiliation(s)
- Wei Fang
- School of Computer Science, Peking University, China
- Peng Cheng Laboratory, China
- School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, China
| | - Yanqi Chen
- School of Computer Science, Peking University, China
- Peng Cheng Laboratory, China
| | - Jianhao Ding
- School of Computer Science, Peking University, China
| | - Zhaofei Yu
- Institute for Artificial Intelligence, Peking University, China
| | - Timothée Masquelier
- Centre de Recherche Cerveau et Cognition (CERCO), UMR5549 CNRS–Université Toulouse 3, France
| | - Ding Chen
- Peng Cheng Laboratory, China
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
| | - Liwei Huang
- School of Computer Science, Peking University, China
- Peng Cheng Laboratory, China
| | | | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, China
| | - Yonghong Tian
- School of Computer Science, Peking University, China
- Peng Cheng Laboratory, China
- School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, China
| |
Collapse
|
5
|
Liu S, Leung VCH, Dragotti PL. First-spike coding promotes accurate and efficient spiking neural networks for discrete events with rich temporal structures. Front Neurosci 2023; 17:1266003. [PMID: 37849889 PMCID: PMC10577212 DOI: 10.3389/fnins.2023.1266003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
Spiking neural networks (SNNs) are well-suited to process asynchronous event-based data. Most of the existing SNNs use rate-coding schemes that focus on firing rate (FR), and so they generally ignore the spike timing in events. On the contrary, methods based on temporal coding, particularly time-to-first-spike (TTFS) coding, can be accurate and efficient but they are difficult to train. Currently, there is limited research on applying TTFS coding to real events, since traditional TTFS-based methods impose one-spike constraint, which is not realistic for event-based data. In this study, we present a novel decision-making strategy based on first-spike (FS) coding that encodes FS timings of the output neurons to investigate the role of the first-spike timing in classifying real-world event sequences with complex temporal structures. To achieve FS coding, we propose a novel surrogate gradient learning method for discrete spike trains. In the forward pass, output spikes are encoded into discrete times to generate FS times. In the backpropagation, we develop an error assignment method that propagates error from FS times to spikes through a Gaussian window, and then supervised learning for spikes is implemented through a surrogate gradient approach. Additional strategies are introduced to facilitate the training of FS timings, such as adding empty sequences and employing different parameters for different layers. We make a comprehensive comparison between FS and FR coding in the experiments. Our results show that FS coding achieves comparable accuracy to FR coding while leading to superior energy efficiency and distinct neuronal dynamics on data sequences with very rich temporal structures. Additionally, a longer time delay in the first spike leads to higher accuracy, indicating important information is encoded in the timing of the first spike.
Collapse
Affiliation(s)
- Siying Liu
- Communications and Signal Processing Group, Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom
| | | | | |
Collapse
|
6
|
Yao M, Zhang H, Zhao G, Zhang X, Wang D, Cao G, Li G. Sparser spiking activity can be better: Feature Refine-and-Mask spiking neural network for event-based visual recognition. Neural Netw 2023; 166:410-423. [PMID: 37549609 DOI: 10.1016/j.neunet.2023.07.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 02/23/2023] [Accepted: 07/05/2023] [Indexed: 08/09/2023]
Abstract
Event-based visual, a new visual paradigm with bio-inspired dynamic perception and μs level temporal resolution, has prominent advantages in many specific visual scenarios and gained much research interest. Spiking neural network (SNN) is naturally suitable for dealing with event streams due to its temporal information processing capability and event-driven nature. However, existing works SNN neglect the fact that the input event streams are spatially sparse and temporally non-uniform, and just treat these variant inputs equally. This situation interferes with the effectiveness and efficiency of existing SNNs. In this paper, we propose the feature Refine-and-Mask SNN (RM-SNN), which has the ability of self-adaption to regulate the spiking response in a data-dependent way. We use the Refine-and-Mask (RM) module to refine all features and mask the unimportant features to optimize the membrane potential of spiking neurons, which in turn drops the spiking activity. Inspired by the fact that not all events in spatio-temporal streams are task-relevant, we execute the RM module in both temporal and channel dimensions. Extensive experiments on seven event-based benchmarks, DVS128 Gesture, DVS128 Gait, CIFAR10-DVS, N-Caltech101, DailyAction-DVS, UCF101-DVS, and HMDB51-DVS demonstrate that under the multi-scale constraints of input time window, RM-SNN can significantly reduce the network average spiking activity rate while improving the task performance. In addition, by visualizing spiking responses, we analyze why sparser spiking activity can be better. Code.
Collapse
Affiliation(s)
- Man Yao
- School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China; Peng Cheng Laboratory, Shenzhen 518000, China.
| | - Hengyu Zhang
- School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China; Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518000, China.
| | - Guangshe Zhao
- School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China.
| | - Xiyu Zhang
- School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China.
| | - Dingheng Wang
- Northwest Institute of Mechanical & Electrical Engineering, Xianyang, Shaanxi, China.
| | - Gang Cao
- Beijing Academy of Artificial Intelligence, Beijing 100089, China
| | - Guoqi Li
- Peng Cheng Laboratory, Shenzhen 518000, China; Institute of Automation, Chinese Academy of Sciences, Beijing 100089, China.
| |
Collapse
|
7
|
Yi Z, Lian J, Liu Q, Zhu H, Liang D, Liu J. Learning Rules in Spiking Neural Networks: A Survey. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
8
|
Zheng Y, Yu Z, Wang S, Huang T. Spike-based Motion Estimation for Object Tracking through Bio-inspired Unsupervised Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:335-349. [PMID: 37015554 DOI: 10.1109/tip.2022.3228168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Neuromorphic vision sensors, whose pixels output events/spikes asynchronously with a high temporal resolution according to the scene radiance change, are naturally appropriate for capturing high-speed motion in the scenes. However, how to utilize the events/spikes to smoothly track high-speed moving objects is still a challenging problem. Existing approaches either employ time-consuming iterative optimization, or require large amounts of labeled data to train the object detector. To this end, we propose a bio-inspired unsupervised learning framework, which takes advantage of the spatiotemporal information of events/spikes generated by neuromorphic vision sensors to capture the intrinsic motion patterns. Without off-line training, our models can filter the redundant signals with dynamic adaption module based on short-term plasticity, and extract the motion patterns with motion estimation module based on the spike-timing-dependent plasticity. Combined with the spatiotemporal and motion information of the filtered spike stream, the traditional DBSCAN clustering algorithm and Kalman filter can effectively track multiple targets in extreme scenes. We evaluate the proposed unsupervised framework for object detection and tracking tasks on synthetic data, publicly available event-based datasets, and spiking camera datasets. The experiment results show that the proposed model can robustly detect and smoothly track the moving targets on various challenging scenarios and outperforms state-of-the-art approaches.
Collapse
|
9
|
Spike-train level supervised learning algorithm based on bidirectional modification for liquid state machines. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04152-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Dong J, Jiang R, Xiao R, Yan R, Tang H. Event stream learning using spatio-temporal event surface. Neural Netw 2022; 154:543-559. [DOI: 10.1016/j.neunet.2022.07.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/12/2022] [Accepted: 07/10/2022] [Indexed: 11/29/2022]
|
11
|
Kim Y, Panda P. Optimizing Deeper Spiking Neural Networks for Dynamic Vision Sensing. Neural Netw 2021; 144:686-698. [PMID: 34662827 DOI: 10.1016/j.neunet.2021.09.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 11/20/2022]
Abstract
Spiking Neural Networks (SNNs) have recently emerged as a new generation of low-power deep neural networks due to sparse, asynchronous, and binary event-driven processing. Most previous deep SNN optimization methods focus on static datasets (e.g., MNIST) from a conventional frame-based camera. On the other hand, optimization techniques for event data from Dynamic Vision Sensor (DVS) cameras are still at infancy. Most prior SNN techniques handling DVS data are limited to shallow networks and thus, show low performance. Generally, we observe that the integrate-and-fire behavior of spiking neurons diminishes spike activity in deeper layers. The sparse spike activity results in a sub-optimal solution during training (i.e., performance degradation). To address this limitation, we propose novel algorithmic and architectural advances to accelerate the training of very deep SNNs on DVS data. Specifically, we propose Spike Activation Lift Training (SALT) which increases spike activity across all layers by optimizing both weights and thresholds in convolutional layers. After applying SALT, we train the weights based on the cross-entropy loss. SALT helps the networks to convey ample information across all layers during training and therefore improves the performance. Furthermore, we propose a simple and effective architecture, called Switched-BN, which exploits Batch Normalization (BN). Previous methods show that the standard BN is incompatible with the temporal dynamics of SNNs. Therefore, in Switched-BN architecture, we apply BN to the last layer of an SNN after accumulating all the spikes from previous layer with a spike voltage accumulator (i.e., converting temporal spike information to float value). Even though we apply BN in just one layer of SNNs, our results demonstrate a considerable performance gain without any significant computational overhead. Through extensive experiments, we show the effectiveness of SALT and Switched-BN for training very deep SNNs from scratch on various benchmarks including, DVS-Cifar10, N-Caltech, DHP19, CIFAR10, and CIFAR100. To the best of our knowledge, this is the first work showing state-of-the-art performance with deep SNNs on DVS data.
Collapse
Affiliation(s)
- Youngeun Kim
- Department of Electrical Engineering, Yale University, New Haven, CT, USA.
| | | |
Collapse
|
12
|
Lele A, Fang Y, Ting J, Raychowdhury A. An End-to-end Spiking Neural Network Platform for Edge Robotics: From Event-Cameras to Central Pattern Generation. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2021.3097675] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|