1
|
Zhai H, Huang G, Hu Q, Li G, Bao H, Zhang G. NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:7129-7139. [PMID: 39255118 DOI: 10.1109/tvcg.2024.3456201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
In recent years, the paradigm of neural implicit representations has gained substantial attention in the field of Simultaneous Localization and Mapping (SLAM). However, a notable gap exists in the existing approaches when it comes to scene understanding. In this paper, we introduce NIS-SLAM, an efficient neural implicit semantic RGB-D SLAM system, that leverages a pre-trained 2D segmentation network to learn consistent semantic representations. Specifically, for high-fidelity surface reconstruction and spatial consistent scene understanding, we combine high-frequency multi-resolution tetrahedron-based features and low-frequency positional encoding as the implicit scene representations. Besides, to address the inconsistency of 2D segmentation results from multiple views, we propose a fusion strategy that integrates the semantic probabilities from previous non-keyframes into keyframes to achieve consistent semantic learning. Furthermore, we implement a confidence-based pixel sampling and progressive optimization weight function for robust camera tracking. Extensive experimental results on various datasets show the better or more competitive performance of our system when compared to other existing neural dense implicit RGB-D SLAM approaches. Finally, we also show that our approach can be used in augmented reality applications. Project page: https://zju3dv.github.io/nis_slam.
Collapse
|
2
|
Huang X, Chen X, Zhang N, He H, Feng S. ADM-SLAM: Accurate and Fast Dynamic Visual SLAM with Adaptive Feature Point Extraction, Deeplabv3pro, and Multi-View Geometry. SENSORS (BASEL, SWITZERLAND) 2024; 24:3578. [PMID: 38894374 PMCID: PMC11175307 DOI: 10.3390/s24113578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 05/30/2024] [Accepted: 05/31/2024] [Indexed: 06/21/2024]
Abstract
Visual Simultaneous Localization and Mapping (V-SLAM) plays a crucial role in the development of intelligent robotics and autonomous navigation systems. However, it still faces significant challenges in handling highly dynamic environments. The prevalent method currently used for dynamic object recognition in the environment is deep learning. However, models such as Yolov5 and Mask R-CNN require significant computational resources, which limits their potential in real-time applications due to hardware and time constraints. To overcome this limitation, this paper proposes ADM-SLAM, a visual SLAM system designed for dynamic environments that builds upon the ORB-SLAM2. This system integrates efficient adaptive feature point homogenization extraction, lightweight deep learning semantic segmentation based on an improved DeepLabv3, and multi-view geometric segmentation. It optimizes keyframe extraction, segments potential dynamic objects using contextual information with the semantic segmentation network, and detects the motion states of dynamic objects using multi-view geometric methods, thereby eliminating dynamic interference points. The results indicate that ADM-SLAM outperforms ORB-SLAM2 in dynamic environments, especially in high-dynamic scenes, where it achieves up to a 97% reduction in Absolute Trajectory Error (ATE). In various highly dynamic test sequences, ADM-SLAM outperforms DS-SLAM and DynaSLAM in terms of real-time performance and accuracy, proving its excellent adaptability.
Collapse
Affiliation(s)
- Xiaotao Huang
- School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (X.H.)
| | - Xingbin Chen
- School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (X.H.)
- Guangdong Productivity Promotion Center, Guangzhou 510075, China
| | - Ning Zhang
- School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (X.H.)
| | - Hongjie He
- School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (X.H.)
| | - Sang Feng
- School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (X.H.)
| |
Collapse
|
3
|
Zhao S, Yu Z, Wang Z, Liu H, Zhou Z, Ruan L, Wang Q. A Learning-Free Method for Locomotion Mode Prediction by Terrain Reconstruction and Visual-Inertial Odometry. IEEE Trans Neural Syst Rehabil Eng 2023; 31:3895-3905. [PMID: 37782585 DOI: 10.1109/tnsre.2023.3321077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
This research introduces a novel, highly precise, and learning-free approach to locomotion mode prediction, a technique with potential for broad applications in the field of lower-limb wearable robotics. This study represents the pioneering effort to amalgamate 3D reconstruction and Visual-Inertial Odometry (VIO) into a locomotion mode prediction method, which yields robust prediction performance across diverse subjects and terrains, and resilience against various factors including camera view, walking direction, step size, and disturbances from moving obstacles without the need of parameter adjustments. The proposed Depth-enhanced Visual-Inertial Odometry (D-VIO) has been meticulously designed to operate within computational constraints of wearable configurations while demonstrating resilience against unpredictable human movements and sparse features. Evidence of its effectiveness, both in terms of accuracy and operational time consumption, is substantiated through tests conducted using open-source dataset and closed-loop evaluations. Comprehensive experiments were undertaken to validate its prediction accuracy across various test conditions such as subjects, scenarios, sensor mounting positions, camera views, step sizes, walking directions, and disturbances from moving obstacles. A comprehensive prediction accuracy rate of 99.00% confirms the efficacy, generality, and robustness of the proposed method.
Collapse
|
4
|
Wu R, Gao Y. Research on Underwater Complex Scene SLAM Algorithm Based on Image Enhancement. SENSORS (BASEL, SWITZERLAND) 2022; 22:8517. [PMID: 36366215 PMCID: PMC9656716 DOI: 10.3390/s22218517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/31/2022] [Accepted: 11/02/2022] [Indexed: 06/16/2023]
Abstract
Underwater images typically suffer from less explicit feature point information and more redundant information due to wild conditions. To solve these degradation problems, we propose the VINS-MONO algorithm to enhance the quality of the underwater image. Specifically, we first used the FAST feature point extraction algorithm to improve the extraction speed. Then, the inverse optical flow method was used to improve the accuracy of feature extraction. At the same time, several kinds of residual information were extracted and marginalized, separately, in the marginalization part of the back-end, in order to improve the marginalization speed. Extensive experiments on underwater dataset HAUD-Dataset and public dataset EuRoC show that our approach is superior to the original VINS-MONO algorithm. In addition, the original algorithm optimizes the situation in which the feature point information is not obvious, and the redundant information is more complex in the underwater environment, which effectively improves the visual quality of the underwater image.
Collapse
|
5
|
Liu J, Li X, Liu Y, Chen H. RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3191193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Jianheng Liu
- School of Mechanical Engineering and Automation, Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China
| | - Xuanfu Li
- Department of HiSilicon Research, Huawei Technology Co., Ltd, Shenzhen, Guangdong, China
| | - Yueqian Liu
- School of Mechanical Engineering and Automation, Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China
| | - Haoyao Chen
- School of Mechanical Engineering and Automation, Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China
| |
Collapse
|
6
|
CDSFusion: Dense Semantic SLAM for Indoor Environment Using CPU Computing. REMOTE SENSING 2022. [DOI: 10.3390/rs14040979] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Unmanned Aerial Vehicles (UAVs) require the ability to robustly perceive surrounding scenes for autonomous navigation. The semantic reconstruction of the scene is a truly functional understanding of the environment. However, high-performance computing is generally not available on most UAVs, so a lightweight real-time semantic reconstruction method is necessary. Existing methods rely on GPU, and it is difficult to achieve real-time semantic reconstruction on CPU. To solve the problem, an indoor dense semantic Simultaneous Localization and Mapping (SLAM) method using CPU computing is proposed in this paper, named CDSFusion. The CDSFusion is the first system integrating RGBD-based Visual-Inertial Odometry (VIO), semantic segmentation and 3D reconstruction in real-time on a CPU. In our VIO method, the depth information is introduced to improve the accuracy of pose estimation, and FAST features are used for faster tracking. In our semantic reconstruction method, the PSPNet (Pyramid Scene Parsing Network) pre-trained model is optimized to provide the semantic information in real-time on the CPU, and the semantic point clouds are fused using Voxblox. The experimental results demonstrate that camera tracking is accelerated without loss of accuracy in our VIO, and a 3D semantic map is reconstructed in real-time, which is comparable to one generated by the GPU-dependent method.
Collapse
|
7
|
Seel T, Kok M, McGinnis RS. Inertial Sensors-Applications and Challenges in a Nutshell. SENSORS 2020; 20:s20216221. [PMID: 33142738 PMCID: PMC7662337 DOI: 10.3390/s20216221] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 10/29/2020] [Indexed: 12/26/2022]
Abstract
This editorial provides a concise introduction to the methods and applications of inertial sensors. We briefly describe the main characteristics of inertial sensors and highlight the broad range of applications as well as the methodological challenges. Finally, for the reader’s guidance, we give a succinct overview of the papers included in this special issue.
Collapse
Affiliation(s)
- Thomas Seel
- Control Systems Group, Technische Universität Berlin, 10587 Berlin, Germany
- Correspondence:
| | - Manon Kok
- Delft Center for Systems and Control, Delft University of Technology, 2628 CD Delft, The Netherlands;
| | - Ryan S. McGinnis
- Department of Electrical and Biomedical Engineering, University of Vermont, Burlington, VT 05405, USA;
| |
Collapse
|
8
|
Zhao X, Miao C, Zhang H. Multi-Feature Nonlinear Optimization Motion Estimation Based on RGB-D and Inertial Fusion. SENSORS 2020; 20:s20174666. [PMID: 32824978 PMCID: PMC7506712 DOI: 10.3390/s20174666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/12/2020] [Accepted: 08/17/2020] [Indexed: 11/17/2022]
Abstract
To achieve a high precision estimation of indoor robot motion, a tightly coupled RGB-D visual-inertial SLAM system is proposed herein based on multiple features. Most of the traditional visual SLAM methods only rely on points for feature matching and they often underperform in low textured scenes. Besides point features, line segments can also provide geometrical structure information of the environment. This paper utilized both points and lines in low-textured scenes to increase the robustness of RGB-D SLAM system. In addition, we implemented a fast initialization process based on the RGB-D camera to improve the real-time performance of the proposed system and designed a new backend nonlinear optimization framework. By minimizing the cost function formed by the pre-integrated IMU residuals and re-projection errors of points and lines in sliding windows, the state vector is optimized. The experiments evaluated on public datasets show that our system achieves higher accuracy and robustness on trajectories and in pose estimation compared with several state-of-the-art visual SLAM systems.
Collapse
|
9
|
Chen S, Chang CW, Wen CY. Perception in the Dark-Development of a ToF Visual Inertial Odometry System. SENSORS 2020; 20:s20051263. [PMID: 32110910 PMCID: PMC7085618 DOI: 10.3390/s20051263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 02/21/2020] [Accepted: 02/24/2020] [Indexed: 11/21/2022]
Abstract
Visual inertial odometry (VIO) is the front-end of visual simultaneous localization and mapping (vSLAM) methods and has been actively studied in recent years. In this context, a time-of-flight (ToF) camera, with its high accuracy of depth measurement and strong resilience to ambient light of variable intensity, draws our interest. Thus, in this paper, we present a realtime visual inertial system based on a low cost ToF camera. The iterative closest point (ICP) methodology is adopted, incorporating salient point-selection criteria and a robustness-weighting function. In addition, an error-state Kalman filter is used and fused with inertial measurement unit (IMU) data. To test its capability, the ToF–VIO system is mounted on an unmanned aerial vehicle (UAV) platform and operated in a variable light environment. The estimated flight trajectory is compared with the ground truth data captured by a motion capture system. Real flight experiments are also conducted in a dark indoor environment, demonstrating good agreement with estimated performance. The current system is thus shown to be accurate and efficient for use in UAV applications in dark and Global Navigation Satellite System (GNSS)-denied environments.
Collapse
|
10
|
ACK-MSCKF: Tightly-Coupled Ackermann Multi-State Constraint Kalman Filter for Autonomous Vehicle Localization. SENSORS 2019; 19:s19214816. [PMID: 31694304 PMCID: PMC6864737 DOI: 10.3390/s19214816] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 10/29/2019] [Accepted: 11/03/2019] [Indexed: 11/16/2022]
Abstract
Visual-Inertial Odometry (VIO) is subjected to additional unobservable directions under the special motions of ground vehicles, resulting in larger pose estimation errors. To address this problem, a tightly-coupled Ackermann visual-inertial odometry (ACK-MSCKF) is proposed to fuse Ackermann error state measurements and the Stereo Multi-State Constraint Kalman Filter (S-MSCKF) with a tightly-coupled filter-based mechanism. In contrast with S-MSCKF, in which the inertial measurement unit (IMU) propagates the vehicle motion and then the propagation is corrected by stereo visual measurements, we successively update the propagation with Ackermann error state measurements and visual measurements after the process model and state augmentation. This way, additional constraints from the Ackermann measurements are exploited to improve the pose estimation accuracy. Both qualitative and quantitative experimental results evaluated under real-world datasets from an Ackermann steering vehicle lead to the following demonstration: ACK-MSCKF can significantly improve the pose estimation accuracy of S-MSCKF under the special motions of autonomous vehicles, and keep accurate and robust pose estimation available under different vehicle driving cycles and environmental conditions. This paper accompanies the source code for the robotics community.
Collapse
|