1
|
Lei T, Guan B, Liang M, Liu Z, Liu J, Shang Y, Yu Q. Motion measurements of explosive shock waves based on an event camera. OPTICS EXPRESS 2024; 32:15390-15409. [PMID: 38859191 DOI: 10.1364/oe.506662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 03/26/2024] [Indexed: 06/12/2024]
Abstract
Shock wave measurement is vital in assessing explosive power and designing warheads. To obtain satisfactory observation data of explosive shock waves, it is preferable for optical sensors to possess high-dynamic range and high-time resolution capabilities. In this paper, the event camera is first employed to observe explosive shock waves, leveraging its high dynamic range and low latency. A comprehensive procedure is devised to measure the motion parameters of shock waves accurately. Firstly, the plane lines-based calibration method is proposed to compute the calibration parameters of the event camera, which utilizes the edge-sensitive characteristic of the event camera. Then, the fitted ellipse parameters of the shock wave are estimated based on the concise event data, which are gained by utilizing the characteristics of the event triggering and shock waves' morphology. Finally, the geometric relationship between the ellipse parameters and the radius of the shock wave is derived, and the motion parameters of the shock wave are estimated. To verify the performance of our method, we compare our measurement results in the TNT explosion test with the pressure sensor results and empirical formula prediction. The relative measurement error compared to pressure sensors is the lowest at 0.33% and the highest at 7.58%. The experimental results verify the rationality and effectiveness of our methods.
Collapse
|
2
|
Alkendi Y, Azzam R, Ayyad A, Javed S, Seneviratne L, Zweiri Y. Neuromorphic Camera Denoising Using Graph Neural Network-Driven Transformers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4110-4124. [PMID: 36107888 DOI: 10.1109/tnnls.2022.3201830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Neuromorphic vision is a bio-inspired technology that has triggered a paradigm shift in the computer vision community and is serving as a key enabler for a wide range of applications. This technology has offered significant advantages, including reduced power consumption, reduced processing needs, and communication speedups. However, neuromorphic cameras suffer from significant amounts of measurement noise. This noise deteriorates the performance of neuromorphic event-based perception and navigation algorithms. In this article, we propose a novel noise filtration algorithm to eliminate events that do not represent real log-intensity variations in the observed scene. We employ a graph neural network (GNN)-driven transformer algorithm, called GNN-Transformer, to classify every active event pixel in the raw stream into real log-intensity variation or noise. Within the GNN, a message-passing framework, referred to as EventConv, is carried out to reflect the spatiotemporal correlation among the events while preserving their asynchronous nature. We also introduce the known-object ground-truth labeling (KoGTL) approach for generating approximate ground-truth labels of event streams under various illumination conditions. KoGTL is used to generate labeled datasets, from experiments recorded in challenging lighting conditions, including moon light. These datasets are used to train and extensively test our proposed algorithm. When tested on unseen datasets, the proposed algorithm outperforms state-of-the-art methods by at least 8.8% in terms of filtration accuracy. Additional tests are also conducted on publicly available datasets (ETH Zürich Color-DAVIS346 datasets) to demonstrate the generalization capabilities of the proposed algorithm in the presence of illumination variations and different motion dynamics. Compared to state-of-the-art solutions, qualitative results verified the superior capability of the proposed algorithm to eliminate noise while preserving meaningful events in the scene.
Collapse
|
3
|
Adhuran J, Khan N, Martini MG. Lossless Encoding of Time-Aggregated Neuromorphic Vision Sensor Data Based on Point-Cloud Compression. SENSORS (BASEL, SWITZERLAND) 2024; 24:1382. [PMID: 38474918 DOI: 10.3390/s24051382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/03/2024] [Accepted: 02/15/2024] [Indexed: 03/14/2024]
Abstract
Neuromorphic Vision Sensors (NVSs) are emerging sensors that acquire visual information asynchronously when changes occur in the scene. Their advantages versus synchronous capturing (frame-based video) include a low power consumption, a high dynamic range, an extremely high temporal resolution, and lower data rates. Although the acquisition strategy already results in much lower data rates than conventional video, NVS data can be further compressed. For this purpose, we recently proposed Time Aggregation-based Lossless Video Encoding for Neuromorphic Vision Sensor Data (TALVEN), consisting in the time aggregation of NVS events in the form of pixel-based event histograms, arrangement of the data in a specific format, and lossless compression inspired by video encoding. In this paper, we still leverage time aggregation but, rather than performing encoding inspired by frame-based video coding, we encode an appropriate representation of the time-aggregated data via point-cloud compression (similar to another one of our previous works, where time aggregation was not used). The proposed strategy, Time-Aggregated Lossless Encoding of Events based on Point-Cloud Compression (TALEN-PCC), outperforms the originally proposed TALVEN encoding strategy for the content in the considered dataset. The gain in terms of the compression ratio is the highest for low-event rate and low-complexity scenes, whereas the improvement is minimal for high-complexity and high-event rate scenes. According to experiments on outdoor and indoor spike event data, TALEN-PCC achieves higher compression gains for time aggregation intervals of more than 5 ms. However, the compression gains are lower when compared to state-of-the-art approaches for time aggregation intervals of less than 5 ms.
Collapse
Affiliation(s)
- Jayasingam Adhuran
- Faculty of Engineering, Computing, and the Environment, Kingston University London, Penrhyn Rd., Kingston upon Thames KT1 2EE, UK
| | - Nabeel Khan
- Department of Computer Science, University of Chester, Parkgate Road, Chester CH1 4BJ, UK
| | - Maria G Martini
- Faculty of Engineering, Computing, and the Environment, Kingston University London, Penrhyn Rd., Kingston upon Thames KT1 2EE, UK
| |
Collapse
|
4
|
Huang X, Kachole S, Ayyad A, Naeini FB, Makris D, Zweiri Y. A neuromorphic dataset for tabletop object segmentation in indoor cluttered environment. Sci Data 2024; 11:127. [PMID: 38272894 PMCID: PMC10810887 DOI: 10.1038/s41597-024-02920-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 01/05/2024] [Indexed: 01/27/2024] Open
Abstract
Event-based cameras are commonly leveraged to mitigate issues such as motion blur, low dynamic range, and limited time sampling, which plague conventional cameras. However, a lack of dedicated event-based datasets for benchmarking segmentation algorithms, especially those offering critical depth information for occluded scenes, has been observed. In response, this paper introduces a novel Event-based Segmentation Dataset (ESD), a high-quality event 3D spatial-temporal dataset designed for indoor object segmentation within cluttered environments. ESD encompasses 145 sequences featuring 14,166 manually annotated RGB frames, along with a substantial event count of 21.88 million and 20.80 million events from two stereo-configured event-based cameras. Notably, this densely annotated 3D spatial-temporal event-based segmentation benchmark for tabletop objects represents a pioneering initiative, providing event-wise depth, and annotated instance labels, in addition to corresponding RGBD frames. By releasing ESD, our aim is to offer the research community a challenging segmentation benchmark of exceptional quality.
Collapse
Affiliation(s)
- Xiaoqian Huang
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, Abu Dhabi, UAE
| | - Sanket Kachole
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Abdulla Ayyad
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE
| | | | - Dimitrios Makris
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Yahya Zweiri
- Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE.
- Department of Aerospace Engineering, Khalifa University, Abu Dhabi, UAE.
| |
Collapse
|
5
|
Lu Z, Chen X, Chung VYY, Cai W, Shen Y. EV-LFV: Synthesizing Light Field Event Streams from an Event Camera and Multiple RGB Cameras. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4546-4555. [PMID: 37788211 DOI: 10.1109/tvcg.2023.3320271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Light field videos captured in RGB frames (RGB-LFV) can provide users with a 6 degree-of-freedom immersive video experience by capturing dense multi-subview video. Despite its potential benefits, the processing of dense multi-subview video is extremely resource-intensive, which currently limits the frame rate of RGB-LFV (i.e., lower than 30 fps) and results in blurred frames when capturing fast motion. To address this issue, we propose leveraging event cameras, which provide high temporal resolution for capturing fast motion. However, the cost of current event camera models makes it prohibitive to use multiple event cameras for RGB-LFV platforms. Therefore, we propose EV-LFV, an event synthesis framework that generates full multi-subview event-based RGB-LFV with only one event camera and multiple traditional RGB cameras. EV-LFV utilizes spatial-angular convolution, ConvLSTM, and Transformer to model RGB-LFV's angular features, temporal features, and long-range dependency, respectively, to effectively synthesize event streams for RGB-LFV. To train EV-LFV, we construct the first event-to-LFV dataset consisting of 200 RGB-LFV sequences with ground-truth event streams. Experimental results demonstrate that EV-LFV outperforms state-of-the-art event synthesis methods for generating event-based RGB-LFV, effectively alleviating motion blur in the reconstructed RGB-LFV.
Collapse
|
6
|
Tang S, Lv H, Zhao Y, Feng Y, Liu H, Bi G. Denoising Method Based on Salient Region Recognition for the Spatiotemporal Event Stream. SENSORS (BASEL, SWITZERLAND) 2023; 23:6655. [PMID: 37571439 PMCID: PMC10422208 DOI: 10.3390/s23156655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/11/2023] [Accepted: 07/17/2023] [Indexed: 08/13/2023]
Abstract
Event cameras are the emerging bio-mimetic sensors with microsecond-level responsiveness in recent years, also known as dynamic vision sensors. Due to the inherent sensitivity of event camera hardware to light sources and interference from various external factors, various types of noises are inevitably present in the camera's output results. This noise can degrade the camera's perception of events and the performance of algorithms for processing event streams. Moreover, since the output of event cameras is in the form of address-event representation, efficient denoising methods for traditional frame images are no longer applicable in this case. Most existing denoising methods for event cameras target background activity noise and sometimes remove real events as noise. Furthermore, these methods are ineffective in handling noise generated by high-frequency flickering light sources and changes in diffused light reflection. To address these issues, we propose an event stream denoising method based on salient region recognition in this paper. This method can effectively remove conventional background activity noise as well as irregular noise caused by diffuse reflection and flickering light source changes without significantly losing real events. Additionally, we introduce an evaluation metric that can be used to assess the noise removal efficacy and the preservation of real events for various denoising methods.
Collapse
Affiliation(s)
- Sichao Tang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.T.); (Y.Z.); (Y.F.); (H.L.); (G.B.)
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hengyi Lv
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.T.); (Y.Z.); (Y.F.); (H.L.); (G.B.)
| | - Yuchen Zhao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.T.); (Y.Z.); (Y.F.); (H.L.); (G.B.)
| | - Yang Feng
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.T.); (Y.Z.); (Y.F.); (H.L.); (G.B.)
| | - Hailong Liu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.T.); (Y.Z.); (Y.F.); (H.L.); (G.B.)
| | - Guoling Bi
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.T.); (Y.Z.); (Y.F.); (H.L.); (G.B.)
| |
Collapse
|
7
|
Liang L, Pei H. Affine Iterative Closest Point Algorithm Based on Color Information and Correntropy for Precise Point Set Registration. SENSORS (BASEL, SWITZERLAND) 2023; 23:6475. [PMID: 37514769 PMCID: PMC10383488 DOI: 10.3390/s23146475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023]
Abstract
In this paper, we propose a novel affine iterative closest point algorithm based on color information and correntropy, which can effectively deal with the registration problems with a large number of noise and outliers and small deformations in RGB-D datasets. Firstly, to alleviate the problem of low registration accuracy for data with weak geometric structures, we consider introducing color features into traditional affine algorithms to establish more accurate and reliable correspondences. Secondly, we introduce the correntropy measurement to overcome the influence of a large amount of noise and outliers in the RGB-D datasets, thereby further improving the registration accuracy. Experimental results demonstrate that the proposed registration algorithm has higher registration accuracy, with error reduction of almost 10 times, and achieves more stable robustness than other advanced algorithms.
Collapse
Affiliation(s)
- Lexian Liang
- Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, Unmanned Aerial Vehicle Systems Engineering Technology Research Center of Guangdong, South China University of Technology, Guangzhou 510640, China
| | - Hailong Pei
- Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, Unmanned Aerial Vehicle Systems Engineering Technology Research Center of Guangdong, South China University of Technology, Guangzhou 510640, China
| |
Collapse
|
8
|
Li J, Fu Y, Dong S, Yu Z, Huang T, Tian Y. Asynchronous Spatiotemporal Spike Metric for Event Cameras. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1742-1753. [PMID: 33684047 DOI: 10.1109/tnnls.2021.3061122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Event cameras as bioinspired vision sensors have shown great advantages in high dynamic range and high temporal resolution in vision tasks. Asynchronous spikes from event cameras can be depicted using the marked spatiotemporal point processes (MSTPPs). However, how to measure the distance between asynchronous spikes in the MSTPPs still remains an open issue. To address this problem, we propose a general asynchronous spatiotemporal spike metric considering both spatiotemporal structural properties and polarity attributes for event cameras. Technically, the conditional probability density function is first introduced to describe the spatiotemporal distribution and polarity prior in the MSTPPs. Besides, a spatiotemporal Gaussian kernel is defined to capture the spatiotemporal structure, which transforms discrete spikes into the continuous function in a reproducing kernel Hilbert space (RKHS). Finally, the distance between asynchronous spikes can be quantified by the inner product in the RKHS. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves significant improvement in computational efficiency. Especially, it is able to better depict the changes involving spatiotemporal structural properties and polarity attributes.
Collapse
|
9
|
Zhu Y. 3D Reconstruction of Ancient Building Structure Scene Based on Computer Image Recognition. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH 2023. [DOI: 10.4018/ijitsa.320826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
With the extensive application of computer image recognition (CIR), the high cost of three-dimensional (3D) models, long construction cycles, poor data visualization, and other problems have become the main bottlenecks in further development of CIR. Artificial intelligence (AI) is an important branch of computer science and has a wide range of application prospects and high practical value, especially in the field of medical and health applications of intelligent machines. This article introduces the background of 3D reconstruction of ancient architectural structure scenes, and then presents academic research and a summary on two key applications of CIR. It then summarizes 3D reconstruction and media technology in combination with AI used for medical diagnoses. In this article, the algorithm model is established, and various algorithms are proposed to provide a theoretical basis for the research of 3D reconstruction of ancient building structure scenes based on CIR.
Collapse
|
10
|
Nunes UM, Demiris Y. Robust Event-Based Vision Model Estimation by Dispersion Minimisation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9561-9573. [PMID: 34813470 DOI: 10.1109/tpami.2021.3130049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We propose a novel Dispersion Minimisation framework for event-based vision model estimation, with applications to optical flow and high-speed motion estimation. The framework extends previous event-based motion compensation algorithms by avoiding computing an optimisation score based on an explicit image-based representation, which provides three main benefits: i) The framework can be extended to perform incremental estimation, i.e., on an event-by-event basis. ii) Besides purely visual transformations in 2D, the framework can readily use additional information, e.g., by augmenting the events with depth, to estimate the parameters of motion models in higher dimensional spaces. iii) The optimisation complexity only depends on the number of events. We achieve this by modelling the event alignment according to candidate parameters and minimising the resultant dispersion, which is computed by a family of suitable entropy-based measures. Data whitening is also proposed as a simple and effective pre-processing step to make the framework's accuracy performance more robust, as well as other event-based motion-compensation methods. The framework is evaluated on several challenging motion estimation problems, including 6-DOF transformation, rotational motion, and optical flow estimation, achieving state-of-the-art performance.
Collapse
|
11
|
IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments. J INTELL ROBOT SYST 2022. [DOI: 10.1007/s10846-022-01753-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Liu Y, Zhang F, Chen C, Wang S, Wang Y, Yu Y. Act Like a Radiologist: Towards Reliable Multi-View Correspondence Reasoning for Mammogram Mass Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5947-5961. [PMID: 34061740 DOI: 10.1109/tpami.2021.3085783] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Mammogram mass detection is crucial for diagnosing and preventing the breast cancers in clinical practice. The complementary effect of multi-view mammogram images provides valuable information about the breast anatomical prior structure and is of great significance in digital mammography interpretation. However, unlike radiologists who can utilize the natural reasoning ability to identify masses based on multiple mammographic views, how to endow the existing object detection models with the capability of multi-view reasoning is vital for decision-making in clinical diagnosis but remains the boundary to explore. In this paper, we propose an anatomy-aware graph convolutional network (AGN), which is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability. The proposed AGN consists of three steps. First, we introduce a bipartite graph convolutional network (BGN) to model the intrinsic geometric and semantic relations of ipsilateral views. Second, considering that the visual asymmetry of bilateral views is widely adopted in clinical practice to assist the diagnosis of breast lesions, we propose an inception graph convolutional network (IGN) to model the structural similarities of bilateral views. Finally, based on the constructed graphs, the multi-view information is propagated through nodes methodically, which equips the features learned from the examined view with multi-view reasoning ability. Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance. Visualization results show that AGN provides interpretable visual cues for clinical diagnosis.
Collapse
|
13
|
Liu X, Zhao Y, Yang L, Ge SS. A Spatial-Motion-Segmentation Algorithm by Fusing EDPA and Motion Compensation. SENSORS (BASEL, SWITZERLAND) 2022; 22:6732. [PMID: 36146090 PMCID: PMC9502573 DOI: 10.3390/s22186732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 06/16/2023]
Abstract
Motion segmentation is one of the fundamental steps for detection, tracking, and recognition, and it can separate moving objects from the background. In this paper, we propose a spatial-motion-segmentation algorithm by fusing the events-dimensionality-preprocessing algorithm (EDPA) and the volume of warped events (VWE). The EDPA consists of depth estimation, linear interpolation, and coordinate normalization to obtain an extra dimension (Z) of events. The VWE is conducted by accumulating the warped events (i.e., motion compensation), and the iterative-clustering algorithm is introduced to maximize the contrast (i.e., variance) in the VWE. We established our datasets by utilizing the event-camera simulator (ESIM), which can simulate high-frame-rate videos that are decomposed into frames to generate a large amount of reliable events data. Exterior and interior scenes were segmented in the first part of the experiments. We present the sparrow search algorithm-based gradient ascent (SSA-Gradient Ascent). The SSA-Gradient Ascent, gradient ascent, and particle swarm optimization (PSO) were evaluated in the second part. In Motion Flow 1, the SSA-Gradient Ascent was 0.402% higher than the basic variance value, and 52.941% faster than the basic convergence rate. In Motion Flow 2, the SSA-Gradient Ascent still performed better than the others. The experimental results validate the feasibility of the proposed algorithm.
Collapse
Affiliation(s)
- Xinghua Liu
- School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China
| | - Yunan Zhao
- School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China
| | - Lei Yang
- School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China
| | - Shuzhi Sam Ge
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore 119077, Singapore
| |
Collapse
|
14
|
Wang Y, Yang J, Peng X, Wu P, Gao L, Huang K, Chen J, Kneip L. Visual Odometry with an Event Camera Using Continuous Ray Warping and Volumetric Contrast Maximization. SENSORS (BASEL, SWITZERLAND) 2022; 22:5687. [PMID: 35957244 PMCID: PMC9370870 DOI: 10.3390/s22155687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 07/12/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
We present a new solution to tracking and mapping with an event camera. The motion of the camera contains both rotation and translation displacements in the plane, and the displacements happen in an arbitrarily structured environment. As a result, the image matching may no longer be represented by a low-dimensional homographic warping, thus complicating an application of the commonly used Image of Warped Events (IWE). We introduce a new solution to this problem by performing contrast maximization in 3D. The 3D location of the rays cast for each event is smoothly varied as a function of a continuous-time motion parametrization, and the optimal parameters are found by maximizing the contrast in a volumetric ray density field. Our method thus performs joint optimization over motion and structure. The practical validity of our approach is supported by an application to AGV motion estimation and 3D reconstruction with a single vehicle-mounted event camera. The method approaches the performance obtained with regular cameras and eventually outperforms in challenging visual conditions.
Collapse
Affiliation(s)
- Yifu Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Jiaqi Yang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Xin Peng
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Peng Wu
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Ling Gao
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Kun Huang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Jiaben Chen
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Laurent Kneip
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
15
|
Shiba S, Aoki Y, Gallego G. Event Collapse in Contrast Maximization Frameworks. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22145190. [PMID: 35890869 PMCID: PMC9315985 DOI: 10.3390/s22145190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/04/2022] [Accepted: 07/07/2022] [Indexed: 05/27/2023]
Abstract
Contrast maximization (CMax) is a framework that provides state-of-the-art results on several event-based computer vision tasks, such as ego-motion or optical flow estimation. However, it may suffer from a problem called event collapse, which is an undesired solution where events are warped into too few pixels. As prior works have largely ignored the issue or proposed workarounds, it is imperative to analyze this phenomenon in detail. Our work demonstrates event collapse in its simplest form and proposes collapse metrics by using first principles of space-time deformation based on differential geometry and physics. We experimentally show on publicly available datasets that the proposed metrics mitigate event collapse and do not harm well-posed warps. To the best of our knowledge, regularizers based on the proposed metrics are the only effective solution against event collapse in the experimental settings considered, compared with other methods. We hope that this work inspires further research to tackle more complex warp models.
Collapse
Affiliation(s)
- Shintaro Shiba
- Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan;
- Department of Electrical Engineering and Computer Science, Technische Universität Berlin, 10587 Berlin, Germany;
| | - Yoshimitsu Aoki
- Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan;
| | - Guillermo Gallego
- Department of Electrical Engineering and Computer Science, Technische Universität Berlin, 10587 Berlin, Germany;
- Einstein Center Digital Future and Science of Intelligence Excellence Cluster, 10117 Berlin, Germany
| |
Collapse
|
16
|
Chamorro W, Sola J, Andrade-Cetto J. Event-Based Line SLAM in Real-Time. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3187266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- William Chamorro
- Institut de Robòtica i Informàtica Industrial, CSIC-UPC, C/, Llorens Artigas 4-6, Barcelona, Spain
| | - Joan Sola
- Institut de Robòtica i Informàtica Industrial, CSIC-UPC, C/, Llorens Artigas 4-6, Barcelona, Spain
| | - Juan Andrade-Cetto
- Institut de Robòtica i Informàtica Industrial, CSIC-UPC, C/, Llorens Artigas 4-6, Barcelona, Spain
| |
Collapse
|
17
|
Peng X, Gao L, Wang Y, Kneip L. Globally-Optimal Contrast Maximisation for Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3479-3495. [PMID: 33471749 DOI: 10.1109/tpami.2021.3053243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Event cameras are bio-inspired sensors that perform well in challenging illumination conditions and have high temporal resolution. However, their concept is fundamentally different from traditional frame-based cameras. The pixels of an event camera operate independently and asynchronously. They measure changes of the logarithmic brightness and return them in the highly discretised form of time-stamped events indicating a relative change of a certain quantity since the last event. New models and algorithms are needed to process this kind of measurements. The present work looks at several motion estimation problems with event cameras. The flow of the events is modelled by a general homographic warping in a space-time volume, and the objective is formulated as a maximisation of contrast within the image of warped events. Our core contribution consists of deriving globally optimal solutions to these generally non-convex problems, which removes the dependency on a good initial guess plaguing existing methods. Our methods rely on branch-and-bound optimisation and employ novel and efficient, recursive upper and lower bounds derived for six different contrast estimation functions. The practical validity of our approach is demonstrated by a successful application to three different event camera motion estimation problems.
Collapse
|
18
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 166] [Impact Index Per Article: 83.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
19
|
ESPEE: Event-Based Sensor Pose Estimation Using an Extended Kalman Filter. SENSORS 2021; 21:s21237840. [PMID: 34883852 PMCID: PMC8659537 DOI: 10.3390/s21237840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/18/2021] [Accepted: 11/20/2021] [Indexed: 12/03/2022]
Abstract
Event-based vision sensors show great promise for use in embedded applications requiring low-latency passive sensing at a low computational cost. In this paper, we present an event-based algorithm that relies on an Extended Kalman Filter for 6-Degree of Freedom sensor pose estimation. The algorithm updates the sensor pose event-by-event with low latency (worst case of less than 2 μs on an FPGA). Using a single handheld sensor, we test the algorithm on multiple recordings, ranging from a high contrast printed planar scene to a more natural scene consisting of objects viewed from above. The pose is accurately estimated under rapid motions, up to 2.7 m/s. Thereafter, an extension to multiple sensors is described and tested, highlighting the improved performance of such a setup, as well as the integration with an off-the-shelf mapping algorithm to allow point cloud updates with a 3D scene and enhance the potential applications of this visual odometry solution.
Collapse
|
20
|
|
21
|
|
22
|
Kim H, Kim HJ. Real-Time Rotational Motion Estimation With Contrast Maximization Over Globally Aligned Events. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3088793] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
23
|
Tayarani-Najaran MH, Schmuker M. Event-Based Sensing and Signal Processing in the Visual, Auditory, and Olfactory Domain: A Review. Front Neural Circuits 2021; 15:610446. [PMID: 34135736 PMCID: PMC8203204 DOI: 10.3389/fncir.2021.610446] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
The nervous systems converts the physical quantities sensed by its primary receptors into trains of events that are then processed in the brain. The unmatched efficiency in information processing has long inspired engineers to seek brain-like approaches to sensing and signal processing. The key principle pursued in neuromorphic sensing is to shed the traditional approach of periodic sampling in favor of an event-driven scheme that mimicks sampling as it occurs in the nervous system, where events are preferably emitted upon the change of the sensed stimulus. In this paper we highlight the advantages and challenges of event-based sensing and signal processing in the visual, auditory and olfactory domains. We also provide a survey of the literature covering neuromorphic sensing and signal processing in all three modalities. Our aim is to facilitate research in event-based sensing and signal processing by providing a comprehensive overview of the research performed previously as well as highlighting conceptual advantages, current progress and future challenges in the field.
Collapse
Affiliation(s)
| | - Michael Schmuker
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| |
Collapse
|
24
|
An Asynchronous Real-Time Corner Extraction and Tracking Algorithm for Event Camera. SENSORS 2021; 21:s21041475. [PMID: 33672510 PMCID: PMC7923767 DOI: 10.3390/s21041475] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 02/14/2021] [Accepted: 02/16/2021] [Indexed: 11/25/2022]
Abstract
Event cameras have many advantages over conventional frame-based cameras, such as high temporal resolution, low latency and high dynamic range. However, state-of-the-art event- based algorithms either require too much computation time or have poor accuracy performance. In this paper, we propose an asynchronous real-time corner extraction and tracking algorithm for an event camera. Our primary motivation focuses on enhancing the accuracy of corner detection and tracking while ensuring computational efficiency. Firstly, according to the polarities of the events, a simple yet effective filter is applied to construct two restrictive Surface of Active Events (SAEs), named as RSAE+ and RSAE−, which can accurately represent high contrast patterns; meanwhile it filters noises and redundant events. Afterwards, a new coarse-to-fine corner extractor is proposed to extract corner events efficiently and accurately. Finally, a space, time and velocity direction constrained data association method is presented to realize corner event tracking, and we associate a new arriving corner event with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The experiments are run on a standard event camera dataset, and the experimental results indicate that our method achieves excellent corner detection and tracking performance. Moreover, the proposed method can process more than 4.5 million events per second, showing promising potential in real-time computer vision applications.
Collapse
|
25
|
Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction. Int J Comput Vis 2021. [DOI: 10.1007/s11263-020-01410-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
Tawiah TAQ. A review of algorithms and techniques for image-based recognition and inference in mobile robotic systems. INT J ADV ROBOT SYST 2020. [DOI: 10.1177/1729881420972278] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Autonomous vehicles include driverless, self-driving and robotic cars, and other platforms capable of sensing and interacting with its environment and navigating without human help. On the other hand, semiautonomous vehicles achieve partial realization of autonomy with human intervention, for example, in driver-assisted vehicles. Autonomous vehicles first interact with their surrounding using mounted sensors. Typically, visual sensors are used to acquire images, and computer vision techniques, signal processing, machine learning, and other techniques are applied to acquire, process, and extract information. The control subsystem interprets sensory information to identify appropriate navigation path to its destination and action plan to carry out tasks. Feedbacks are also elicited from the environment to improve upon its behavior. To increase sensing accuracy, autonomous vehicles are equipped with many sensors [light detection and ranging (LiDARs), infrared, sonar, inertial measurement units, etc.], as well as communication subsystem. Autonomous vehicles face several challenges such as unknown environments, blind spots (unseen views), non-line-of-sight scenarios, poor performance of sensors due to weather conditions, sensor errors, false alarms, limited energy, limited computational resources, algorithmic complexity, human–machine communications, size, and weight constraints. To tackle these problems, several algorithmic approaches have been implemented covering design of sensors, processing, control, and navigation. The review seeks to provide up-to-date information on the requirements, algorithms, and main challenges in the use of machine vision–based techniques for navigation and control in autonomous vehicles. An application using land-based vehicle as an Internet of Thing-enabled platform for pedestrian detection and tracking is also presented.
Collapse
|
27
|
Hadviger A, Marković I, Petrović I. Stereo dense depth tracking based on optical flow using frames and events. Adv Robot 2020. [DOI: 10.1080/01691864.2020.1821770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Antea Hadviger
- Laboratory for Autonomous Systems and Mobile Robotics (LAMOR), University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| | - Ivan Marković
- Laboratory for Autonomous Systems and Mobile Robotics (LAMOR), University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| | - Ivan Petrović
- Laboratory for Autonomous Systems and Mobile Robotics (LAMOR), University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| |
Collapse
|
28
|
Gehrig D, Rebecq H, Gallego G, Scaramuzza D. EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01209-w] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Steffen L, Reichard D, Weinland J, Kaiser J, Roennau A, Dillmann R. Neuromorphic Stereo Vision: A Survey of Bio-Inspired Sensors and Algorithms. Front Neurorobot 2019; 13:28. [PMID: 31191287 PMCID: PMC6546825 DOI: 10.3389/fnbot.2019.00028] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 05/07/2019] [Indexed: 11/16/2022] Open
Abstract
Any visual sensor, whether artificial or biological, maps the 3D-world on a 2D-representation. The missing dimension is depth and most species use stereo vision to recover it. Stereo vision implies multiple perspectives and matching, hence it obtains depth from a pair of images. Algorithms for stereo vision are also used prosperously in robotics. Although, biological systems seem to compute disparities effortless, artificial methods suffer from high energy demands and latency. The crucial part is the correspondence problem; finding the matching points of two images. The development of event-based cameras, inspired by the retina, enables the exploitation of an additional physical constraint—time. Due to their asynchronous course of operation, considering the precise occurrence of spikes, Spiking Neural Networks take advantage of this constraint. In this work, we investigate sensors and algorithms for event-based stereo vision leading to more biologically plausible robots. Hereby, we focus mainly on binocular stereo vision.
Collapse
Affiliation(s)
- Lea Steffen
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Daniel Reichard
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Jakob Weinland
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Jacques Kaiser
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Arne Roennau
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Rüdiger Dillmann
- FZI Research Center for Information Technology, Karlsruhe, Germany.,Humanoids and Intelligence Systems Lab, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| |
Collapse
|
30
|
|
31
|
Pfeiffer M, Pfeil T. Deep Learning With Spiking Neurons: Opportunities and Challenges. Front Neurosci 2018; 12:774. [PMID: 30410432 PMCID: PMC6209684 DOI: 10.3389/fnins.2018.00774] [Citation(s) in RCA: 128] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/04/2018] [Indexed: 01/16/2023] Open
Abstract
Spiking neural networks (SNNs) are inspired by information processing in biology, where sparse and asynchronous binary signals are communicated and processed in a massively parallel fashion. SNNs on neuromorphic hardware exhibit favorable properties such as low power consumption, fast inference, and event-driven information processing. This makes them interesting candidates for the efficient implementation of deep neural networks, the method of choice for many machine learning tasks. In this review, we address the opportunities that deep spiking networks offer and investigate in detail the challenges associated with training SNNs in a way that makes them competitive with conventional deep learning, but simultaneously allows for efficient mapping to hardware. A wide range of training methods for SNNs is presented, ranging from the conversion of conventional deep networks into SNNs, constrained training before conversion, spiking variants of backpropagation, and biologically motivated variants of STDP. The goal of our review is to define a categorization of SNN training methods, and summarize their advantages and drawbacks. We further discuss relationships between SNNs and binary networks, which are becoming popular for efficient digital hardware implementation. Neuromorphic hardware platforms have great potential to enable deep spiking networks in real-world applications. We compare the suitability of various neuromorphic systems that have been developed over the past years, and investigate potential use cases. Neuromorphic approaches and conventional machine learning should not be considered simply two solutions to the same classes of problems, instead it is possible to identify and exploit their task-specific advantages. Deep SNNs offer great opportunities to work with new types of event-based sensors, exploit temporal codes and local on-chip learning, and we have so far just scratched the surface of realizing these advantages in practical applications.
Collapse
Affiliation(s)
- Michael Pfeiffer
- Bosch Center for Artificial Intelligence, Robert Bosch GmbH, Renningen, Germany
| | | |
Collapse
|
32
|
Asynchronous, Photometric Feature Tracking Using Events and Frames. COMPUTER VISION – ECCV 2018 2018. [DOI: 10.1007/978-3-030-01258-8_46] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|