1
|
Jiang F, Wang W, You H, Jiang S, Meng X, Kim J, Wang S. TS-LCD: Two-Stage Loop-Closure Detection Based on Heterogeneous Data Fusion. SENSORS (BASEL, SWITZERLAND) 2024; 24:3702. [PMID: 38931487 PMCID: PMC11207695 DOI: 10.3390/s24123702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 05/30/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024]
Abstract
Loop-closure detection plays a pivotal role in simultaneous localization and mapping (SLAM). It serves to minimize cumulative errors and ensure the overall consistency of the generated map. This paper introduces a multi-sensor fusion-based loop-closure detection scheme (TS-LCD) to address the challenges of low robustness and inaccurate loop-closure detection encountered in single-sensor systems under varying lighting conditions and structurally similar environments. Our method comprises two innovative components: a timestamp synchronization method based on data processing and interpolation, and a two-order loop-closure detection scheme based on the fusion validation of visual and laser loops. Experimental results on the publicly available KITTI dataset reveal that the proposed method outperforms baseline algorithms, achieving a significant average reduction of 2.76% in the trajectory error (TE) and a notable decrease of 1.381 m per 100 m in the relative error (RE). Furthermore, it boosts loop-closure detection efficiency by an average of 15.5%, thereby effectively enhancing the positioning accuracy of odometry.
Collapse
Affiliation(s)
- Fangdi Jiang
- School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China; (F.J.); (W.W.); (H.Y.); (S.J.); (X.M.)
| | - Wanqiu Wang
- School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China; (F.J.); (W.W.); (H.Y.); (S.J.); (X.M.)
| | - Hongru You
- School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China; (F.J.); (W.W.); (H.Y.); (S.J.); (X.M.)
| | - Shuhang Jiang
- School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China; (F.J.); (W.W.); (H.Y.); (S.J.); (X.M.)
| | - Xin Meng
- School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China; (F.J.); (W.W.); (H.Y.); (S.J.); (X.M.)
| | - Jonghyuk Kim
- Center of Excellence in Cybercrimes and Digital Forensics, Naif Arab University for Security Sciences, Riyadh 11452, Saudi Arabia;
| | - Shifeng Wang
- School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China; (F.J.); (W.W.); (H.Y.); (S.J.); (X.M.)
- Zhongshan Institute of Changchun University of Science and Technology, Zhongshan 528400, China
| |
Collapse
|
2
|
Xia Z, Booij O, Kooij JFP. Convolutional Cross-View Pose Estimation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3813-3831. [PMID: 38145533 DOI: 10.1109/tpami.2023.3346924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translationally equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 m and a median orientation error of around 1 ° at 14 FPS.
Collapse
|
3
|
Pizzino CAP, Costa RR, Mitchell D, Vargas PA. NeoSLAM: Long-Term SLAM Using Computational Models of the Brain. SENSORS (BASEL, SWITZERLAND) 2024; 24:1143. [PMID: 38400301 PMCID: PMC10892990 DOI: 10.3390/s24041143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 01/30/2024] [Accepted: 02/02/2024] [Indexed: 02/25/2024]
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental problem in the field of robotics, enabling autonomous robots to navigate and create maps of unknown environments. Nevertheless, the SLAM methods that use cameras face problems in maintaining accurate localization over extended periods across various challenging conditions and scenarios. Following advances in neuroscience, we propose NeoSLAM, a novel long-term visual SLAM, which uses computational models of the brain to deal with this problem. Inspired by the human neocortex, NeoSLAM is based on a hierarchical temporal memory model that has the potential to identify temporal sequences of spatial patterns using sparse distributed representations. Being known to have a high representational capacity and high tolerance to noise, sparse distributed representations have several properties, enabling the development of a novel neuroscience-based loop-closure detector that allows for real-time performance, especially in resource-constrained robotic systems. The proposed method has been thoroughly evaluated in terms of environmental complexity by using a wheeled robot deployed in the field and demonstrated that the accuracy of loop-closure detection was improved compared with the traditional RatSLAM system.
Collapse
Affiliation(s)
- Carlos Alexandre Pontes Pizzino
- PEE/COPPE—Department of Electrical Engineering, Federal University of Rio de Janeiro, Cidade Universitária, Centro de Tecnologia, Bloco H, Rio de Janeiro 21941-972, RJ, Brazil;
| | - Ramon Romankevicius Costa
- PEE/COPPE—Department of Electrical Engineering, Federal University of Rio de Janeiro, Cidade Universitária, Centro de Tecnologia, Bloco H, Rio de Janeiro 21941-972, RJ, Brazil;
| | - Daniel Mitchell
- Edinburgh Centre for Robotics, Heriot-Watt University, Edinburgh EH14 4AS, UK; (D.M.); (P.A.V.)
| | - Patrícia Amâncio Vargas
- Edinburgh Centre for Robotics, Heriot-Watt University, Edinburgh EH14 4AS, UK; (D.M.); (P.A.V.)
| |
Collapse
|
4
|
Arshad S, Park TH. SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions. SENSORS (BASEL, SWITZERLAND) 2024; 24:906. [PMID: 38339624 PMCID: PMC10857550 DOI: 10.3390/s24030906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/24/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024]
Abstract
Robust visual place recognition (VPR) enables mobile robots to identify previously visited locations. For this purpose, the extracted visual information and place matching method plays a significant role. In this paper, we critically review the existing VPR methods and group them into three major categories based on visual information used, i.e., handcrafted features, deep features, and semantics. Focusing the benefits of convolutional neural networks (CNNs) and semantics, and limitations of existing research, we propose a robust appearance-based place recognition method, termed SVS-VPR, which is implemented as a hierarchical model consisting of two major components: global scene-based and local feature-based matching. The global scene semantics are extracted and compared with pre-visited images to filter the match candidates while reducing the search space and computational cost. The local feature-based matching involves the extraction of robust local features from CNN possessing invariant properties against environmental conditions and a place matching method utilizing semantic, visual, and spatial information. SVS-VPR is evaluated on publicly available benchmark datasets using true positive detection rate, recall at 100% precision, and area under the curve. Experimental findings demonstrate that SVS-VPR surpasses several state-of-the-art deep learning-based methods, boosting robustness against significant changes in viewpoint and appearance while maintaining efficient matching time performance.
Collapse
Affiliation(s)
- Saba Arshad
- Industrial Artificial Intelligence Research Center, Chungbuk National University, Cheongju 28644, Republic of Korea;
| | - Tae-Hyoung Park
- Department of Intelligent Systems and Robotics, Chungbuk National University, Cheongju 28644, Republic of Korea
| |
Collapse
|
5
|
Joo HJ, Kim J. IS-CAT: Intensity-Spatial Cross-Attention Transformer for LiDAR-Based Place Recognition. SENSORS (BASEL, SWITZERLAND) 2024; 24:582. [PMID: 38257678 PMCID: PMC10821456 DOI: 10.3390/s24020582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/03/2024] [Accepted: 01/10/2024] [Indexed: 01/24/2024]
Abstract
LiDAR place recognition is a crucial component of autonomous navigation, essential for loop closure in simultaneous localization and mapping (SLAM) systems. Notably, while camera-based methods struggle in fluctuating environments, such as weather or light, LiDAR demonstrates robustness against such challenges. This study introduces the intensity and spatial cross-attention transformer, which is a novel approach that utilizes LiDAR to generate global descriptors by fusing spatial and intensity data for enhanced place recognition. The proposed model leveraged a cross attention to a concatenation mechanism to process and integrate multi-layered LiDAR projections. Consequently, the previously unexplored synergy between spatial and intensity data was addressed. We demonstrated the performance of IS-CAT through extensive validation on the NCLT dataset. Additionally, we performed indoor evaluations on our Sejong indoor-5F dataset and demonstrated successful application to a 3D LiDAR SLAM system. Our findings highlight descriptors that demonstrate superior performance in various environments. This performance enhancement is evident in both indoor and outdoor settings, underscoring the practical effectiveness and advancements of our approach.
Collapse
Affiliation(s)
- Hyeong-Jun Joo
- Department of Information and Communications Engineering, Sejong University, Seoul 05006, Republic of Korea;
| | - Jaeho Kim
- Department of Electrical Engineering, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
6
|
Zheng H, Zheng Z, Hu R, Xiao B, Wu Y, Yu F, Liu X, Li G, Deng L. Temporal dendritic heterogeneity incorporated with spiking neural networks for learning multi-timescale dynamics. Nat Commun 2024; 15:277. [PMID: 38177124 PMCID: PMC10766638 DOI: 10.1038/s41467-023-44614-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 12/21/2023] [Indexed: 01/06/2024] Open
Abstract
It is widely believed the brain-inspired spiking neural networks have the capability of processing temporal information owing to their dynamic attributes. However, how to understand what kind of mechanisms contributing to the learning ability and exploit the rich dynamic properties of spiking neural networks to satisfactorily solve complex temporal computing tasks in practice still remains to be explored. In this article, we identify the importance of capturing the multi-timescale components, based on which a multi-compartment spiking neural model with temporal dendritic heterogeneity, is proposed. The model enables multi-timescale dynamics by automatically learning heterogeneous timing factors on different dendritic branches. Two breakthroughs are made through extensive experiments: the working mechanism of the proposed model is revealed via an elaborated temporal spiking XOR problem to analyze the temporal feature integration at different levels; comprehensive performance benefits of the model over ordinary spiking neural networks are achieved on several temporal computing benchmarks for speech recognition, visual recognition, electroencephalogram signal recognition, and robot place recognition, which shows the best-reported accuracy and model compactness, promising robustness and generalization, and high execution efficiency on neuromorphic hardware. This work moves neuromorphic computing a significant step toward real-world applications by appropriately exploiting biological observations.
Collapse
Affiliation(s)
- Hanle Zheng
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Zhong Zheng
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Rui Hu
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Bo Xiao
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Yujie Wu
- Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Fangwen Yu
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Xue Liu
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Lei Deng
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China.
| |
Collapse
|
7
|
Zhu L, Mangan M, Webb B. Neuromorphic sequence learning with an event camera on routes through vegetation. Sci Robot 2023; 8:eadg3679. [PMID: 37756384 DOI: 10.1126/scirobotics.adg3679] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 08/29/2023] [Indexed: 09/29/2023]
Abstract
For many robotics applications, it is desirable to have relatively low-power and efficient onboard solutions. We took inspiration from insects, such as ants, that are capable of learning and following routes in complex natural environments using relatively constrained sensory and neural systems. Such capabilities are particularly relevant to applications such as agricultural robotics, where visual navigation through dense vegetation remains a challenging task. In this scenario, a route is likely to have high self-similarity and be subject to changing lighting conditions and motion over uneven terrain, and the effects of wind on leaves increase the variability of the input. We used a bioinspired event camera on a terrestrial robot to collect visual sequences along routes in natural outdoor environments and applied a neural algorithm for spatiotemporal memory that is closely based on a known neural circuit in the insect brain. We show that this method is plausible to support route recognition for visual navigation and more robust than SeqSLAM when evaluated on repeated runs on the same route or routes with small lateral offsets. By encoding memory in a spiking neural network running on a neuromorphic computer, our model can evaluate visual familiarity in real time from event camera footage.
Collapse
Affiliation(s)
- Le Zhu
- School of Informatics, University of Edinburgh, EH8 9AB Edinburgh, UK
| | - Michael Mangan
- Sheffield Robotics, Department of Computer Science, University of Sheffield, S1 4DP Sheffield, UK
| | - Barbara Webb
- School of Informatics, University of Edinburgh, EH8 9AB Edinburgh, UK
| |
Collapse
|
8
|
Rostkowska M, Skrzypczyński P. Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices. SENSORS (BASEL, SWITZERLAND) 2023; 23:6485. [PMID: 37514780 PMCID: PMC10385632 DOI: 10.3390/s23146485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 07/14/2023] [Accepted: 07/15/2023] [Indexed: 07/30/2023]
Abstract
This paper considers the task of appearance-based localization: visual place recognition from omnidirectional images obtained from catadioptric cameras. The focus is on designing an efficient neural network architecture that accurately and reliably recognizes indoor scenes on distorted images from a catadioptric camera, even in self-similar environments with few discernible features. As the target application is the global localization of a low-cost service mobile robot, the proposed solutions are optimized toward being small-footprint models that provide real-time inference on edge devices, such as Nvidia Jetson. We compare several design choices for the neural network-based architecture of the localization system and then demonstrate that the best results are achieved with embeddings (global descriptors) yielded by exploiting transfer learning and fine tuning on a limited number of catadioptric images. We test our solutions on two small-scale datasets collected using different catadioptric cameras in the same office building. Next, we compare the performance of our system to state-of-the-art visual place recognition systems on the publicly available COLD Freiburg and Saarbrücken datasets that contain images collected under different lighting conditions. Our system compares favourably to the competitors both in terms of the accuracy of place recognition and the inference time, providing a cost- and energy-efficient means of appearance-based localization for an indoor service robot.
Collapse
Affiliation(s)
- Marta Rostkowska
- Institute of Robotics and Machine Intelligence, Poznan University of Technology, 60-965 Poznan, Poland
| | - Piotr Skrzypczyński
- Institute of Robotics and Machine Intelligence, Poznan University of Technology, 60-965 Poznan, Poland
| |
Collapse
|
9
|
Wozniak P, Ozog D. Cross-Domain Indoor Visual Place Recognition for Mobile Robot via Generalization Using Style Augmentation. SENSORS (BASEL, SWITZERLAND) 2023; 23:6134. [PMID: 37447982 PMCID: PMC10346347 DOI: 10.3390/s23136134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/22/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023]
Abstract
The article presents an algorithm for the multi-domain visual recognition of an indoor place. It is based on a convolutional neural network and style randomization. The authors proposed a scene classification mechanism and improved the performance of the models based on synthetic and real data from various domains. In the proposed dataset, a domain change was defined as a camera model change. A dataset of images collected from several rooms was used to show different scenarios, human actions, equipment changes, and lighting conditions. The proposed method was tested in a scene classification problem where multi-domain data were used. The basis was a transfer learning approach with an extension style applied to various combinations of source and target data. The focus was on improving the unknown domain score and multi-domain support. The results of the experiments were analyzed in the context of data collected on a humanoid robot. The article shows that the average score was the highest for the use of multi-domain data and data style enhancement. The method of obtaining average results for the proposed method reached the level of 92.08%. The result obtained by another research team was corrected.
Collapse
Affiliation(s)
- Piotr Wozniak
- Department of Computer and Control Engineering, Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, Al. Powstańców Warszawy 12, 35-959 Rzeszow, Poland;
| | | |
Collapse
|
10
|
Yu F, Wu Y, Ma S, Xu M, Li H, Qu H, Song C, Wang T, Zhao R, Shi L. Brain-inspired multimodal hybrid neural network for robot place recognition. Sci Robot 2023; 8:eabm6996. [PMID: 37163608 DOI: 10.1126/scirobotics.abm6996] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Place recognition is an essential spatial intelligence capability for robots to understand and navigate the world. However, recognizing places in natural environments remains a challenging task for robots because of resource limitations and changing environments. In contrast, humans and animals can robustly and efficiently recognize hundreds of thousands of places in different conditions. Here, we report a brain-inspired general place recognition system, dubbed NeuroGPR, that enables robots to recognize places by mimicking the neural mechanism of multimodal sensing, encoding, and computing through a continuum of space and time. Our system consists of a multimodal hybrid neural network (MHNN) that encodes and integrates multimodal cues from both conventional and neuromorphic sensors. Specifically, to encode different sensory cues, we built various neural networks of spatial view cells, place cells, head direction cells, and time cells. To integrate these cues, we designed a multiscale liquid state machine that can process and fuse multimodal information effectively and asynchronously using diverse neuronal dynamics and bioinspired inhibitory circuits. We deployed the MHNN on Tianjic, a hybrid neuromorphic chip, and integrated it into a quadruped robot. Our results show that NeuroGPR achieves better performance compared with conventional and existing biologically inspired approaches, exhibiting robustness to diverse environmental uncertainty, including perceptual aliasing, motion blur, light, or weather changes. Running NeuroGPR as an overall multi-neural network workload on Tianjic showcases its advantages with 10.5 times lower latency and 43.6% lower power consumption than the commonly used mobile robot processor Jetson Xavier NX.
Collapse
Affiliation(s)
- Fangwen Yu
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Yujie Wu
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
- Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Songchen Ma
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Mingkun Xu
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Hongyi Li
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Huanyu Qu
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Chenhang Song
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Taoyi Wang
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Rong Zhao
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
- IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing 100084, China
| | - Luping Shi
- Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, and Department of Precision Instrument, Tsinghua University, Beijing 100084, China
- IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing 100084, China
- THU-CET HIK Joint Research Center for Brain-Inspired Computing, Tsinghua University, Beijing 100084, China
| |
Collapse
|
11
|
Jing J, Gao T, Zhang W, Gao Y, Sun C. Image Feature Information Extraction for Interest Point Detection: A Comprehensive Review. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4694-4712. [PMID: 36001516 DOI: 10.1109/tpami.2022.3201185] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Interest point detection is one of the most fundamental and critical problems in computer vision and image processing. In this paper, we carry out a comprehensive review on image feature information (IFI) extraction techniques for interest point detection. To systematically introduce how the existing interest point detection methods extract IFI from an input image, we propose a taxonomy of the IFI extraction techniques for interest point detection. According to this taxonomy, we discuss different types of IFI extraction techniques for interest point detection. Furthermore, we identify the main unresolved issues related to the existing IFI extraction techniques for interest point detection and any interest point detection methods that have not been discussed before. The existing popular datasets and evaluation standards are provided and the performances for fifteen state-of-the-art approaches are evaluated and discussed. Moreover, future research directions on IFI extraction techniques for interest point detection are elaborated.
Collapse
|
12
|
Condition-invariant and compact visual place description by convolutional autoencoder. ROBOTICA 2023. [DOI: 10.1017/s0263574723000085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Abstract
Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are convolutional neural network (CNN)-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: (a) their high dimension and (b) lack of generalization, leading to low efficiency and poor performance in real robotic applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in four challenging real-world datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. The code of our work is publicly available at https://github.com/MedlarTea/CAE-VPR.
Collapse
|
13
|
Song X, Zhijiang Z, Liang X, Huaidong Z. Monocular camera and laser based semantic mapping system with temporal-spatial data association for indoor mobile robots. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:1-26. [PMID: 37362690 PMCID: PMC9990965 DOI: 10.1007/s11042-023-14796-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 07/25/2022] [Accepted: 02/05/2023] [Indexed: 06/28/2023]
Abstract
In the future, the goal of service robots is to operate in human-centric indoor environments, requiring close cooperation with humans. In order to enable the robot to perform various interactive tasks, it is necessary for robots to perceive and understand environments from a human perspective. Semantic map is an augmented representation of the environment, containing both geometric information and high-level qualitative features. It can help the robot to comprehensively understand the environment and bridge the gap in human-robot interaction. In this paper, we propose a unified semantic mapping system for indoor mobile robots. This system utilizes the techniques of scene classification and object detection to construct semantic representations of indoor environments by fusing the data of a camera and a laser. In order to improve the accuracy of semantic mapping, the temporal-spatial correlation of semantics is leveraged to realize data association of semantic maps. Also, the proposed semantic mapping system is scalable and portable, which can be applied to different indoor scenarios. The proposed system was evaluated with collected datasets captured in indoor environments. Extensive experimental results indicate that the proposed semantic mapping system exhibits great performance in the robustness and accuracy of semantic mapping.
Collapse
Affiliation(s)
- Xu Song
- School of Smart Manufacturing, Jianghan University, Wuhan, 430056 China
| | - Zuo Zhijiang
- School of Smart Manufacturing, Jianghan University, Wuhan, 430056 China
| | - Xuan Liang
- School of Smart Manufacturing, Jianghan University, Wuhan, 430056 China
| | - Zhou Huaidong
- School of Mechanical Engineering & Automation, Beihang University, Beijing, 100191 China
| |
Collapse
|
14
|
Qin C, Zhang Y, Liu Y, Zhu D, Coleman SA, Kerr D. Structure-Aware Feature Disentanglement With Knowledge Transfer for Appearance-Changing Place Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1278-1290. [PMID: 34460387 DOI: 10.1109/tnnls.2021.3105175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Long-term visual place recognition (VPR) is challenging as the environment is subject to drastic appearance changes across different temporal resolutions, such as time of the day, month, and season. A wide variety of existing methods address the problem by means of feature disentangling or image style transfer but ignore the structural information that often remains stable even under environmental condition changes. To overcome this limitation, this article presents a novel structure-aware feature disentanglement network (SFDNet) based on knowledge transfer and adversarial learning. Explicitly, probabilistic knowledge transfer (PKT) is employed to transfer knowledge obtained from the Canny edge detector to the structure encoder. An appearance teacher module is then designed to ensure that the learning of appearance encoder does not only rely on metric learning. The generated content features with structural information are used to measure the similarity of images. We finally evaluate the proposed approach and compare it to state-of-the-art place recognition methods using six datasets with extreme environmental changes. Experimental results demonstrate the effectiveness and improvements achieved using the proposed framework. Source code and some trained models will be available at http://www.tianshu.org.cn.
Collapse
|
15
|
Shi Y, Yu X, Liu L, Campbell D, Koniusz P, Li H. Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2682-2697. [PMID: 35816536 DOI: 10.1109/tpami.2022.3189702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We address the problem of ground-to-satellite image geo-localization, that is, estimating the camera latitude, longitude and orientation (azimuth angle) by matching a query image captured at the ground level against a large-scale database with geotagged satellite images. Our prior arts treat the above task as pure image retrieval by selecting the most similar satellite reference image matching the ground-level query image. However, such an approach often produces coarse location estimates because the geotag of the retrieved satellite image only corresponds to the image center while the ground camera can be located at any point within the image. To further consolidate our prior research finding, we present a novel geometry-aware geo-localization method. Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image, once its coarse location and orientation have been determined. Moreover, we propose a new geometry-aware image retrieval pipeline to improve the coarse localization accuracy. Apart from a polar transform in our conference work, this new pipeline also maps satellite image pixels to the ground-level plane in the ground-view via a geometry-constrained projective transform to emphasize informative regions, such as road structures, for cross-view geo-localization. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our newly proposed framework. We also significantly improve the performance of coarse localization results compared to the state-of-the-art in terms of location recalls.
Collapse
|
16
|
A Review of Common Techniques for Visual Simultaneous Localization and Mapping. JOURNAL OF ROBOTICS 2023. [DOI: 10.1155/2023/8872822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Mobile robots are widely used in medicine, agriculture, home furnishing, and industry. Simultaneous localization and mapping (SLAM) is the working basis of mobile robots, so it is extremely necessary and meaningful for making researches on SLAM technology. SLAM technology involves robot mechanism kinematics, logic, mathematics, perceptual detection, and other fields. However, it faces the problem of classifying the technical content, which leads to diverse technical frameworks of SLAM. Among all sorts of SLAM, visual SLAM (V-SLAM) has become the key academic research due to its advantages of low price, easy installation, and simple algorithm model. Firstly, we illustrate the superiority of V-SLAM by comparing it with other localization techniques. Secondly, we sort out some open-source V-SLAM algorithms and compare their real-time performance, robustness, and innovation. Then, we analyze the frameworks, mathematical models, and related basic theoretical knowledge of V-SLAM. Meanwhile, we review the related works from four aspects: visual odometry, back-end optimization, loop closure detection, and mapping. Finally, we prospect the future development trend and make a foundation for researchers to expand works in the future. All in all, this paper classifies each module of V-SLAM in detail and provides better readability to readers. This is undoubtedly the most comprehensive review of V-SLAM recently.
Collapse
|
17
|
Locality-constrained continuous place recognition for SLAM in extreme conditions. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04415-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
18
|
Usman M, Ali A, Tahir A, Rahman MZU, Khan AM. Efficient Approach for Extracting High-Level B-Spline Features from LIDAR Data for Light-Weight Mapping. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22239168. [PMID: 36501874 PMCID: PMC9737135 DOI: 10.3390/s22239168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 11/12/2022] [Accepted: 11/21/2022] [Indexed: 05/27/2023]
Abstract
Light-weight and accurate mapping is made possible by high-level feature extraction from sensor readings. In this paper, the high-level B-spline features from a 2D LIDAR are extracted with a faster method as a solution to the mapping problem, making it possible for the robot to interact with its environment while navigating. The computation time of feature extraction is very crucial when mobile robots perform real-time tasks. In addition to the existing assessment measures of B-spline feature extraction methods, the paper also includes a new benchmark time metric for evaluating how well the extracted features perform. For point-to-point association, the most reliable vertex control points of the spline features generated from the hints of low-level point feature FALKO were chosen. The standard three indoor and one outdoor data sets were used for the experiment. The experimental results based on benchmark performance metrics, specifically computation time, show that the presented approach achieves better results than the state-of-the-art methods for extracting B-spline features. The classification of the methods implemented in the B-spline features detection and the algorithms are also presented in the paper.
Collapse
Affiliation(s)
- Muhammad Usman
- Department of Mechanical, Mechatronics, and Manufacturing Engineering, University of Engineering & Technology, Faisalabad Campus, Faisalabad 38000, Pakistan
| | - Ahmad Ali
- Department of Mechanical, Mechatronics, and Manufacturing Engineering, University of Engineering & Technology, Faisalabad Campus, Faisalabad 38000, Pakistan
| | - Abdullah Tahir
- Department of Mechanical, Mechatronics, and Manufacturing Engineering, University of Engineering & Technology, Faisalabad Campus, Faisalabad 38000, Pakistan
| | - Muhammad Zia Ur Rahman
- Department of Mechanical, Mechatronics, and Manufacturing Engineering, University of Engineering & Technology, Faisalabad Campus, Faisalabad 38000, Pakistan
| | - Abdul Manan Khan
- Department of Mechanical Engineering, Hanbat National University, Deajeon 34158, Republic of Korea
| |
Collapse
|
19
|
Ghaffari M, Zhang R, Zhu M, Lin CE, Lin TY, Teng S, Li T, Liu T, Song J. Progress in symmetry preserving robot perception and control through geometry and learning. Front Robot AI 2022; 9:969380. [PMID: 36185972 PMCID: PMC9515513 DOI: 10.3389/frobt.2022.969380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 08/02/2022] [Indexed: 11/22/2022] Open
Abstract
This article reports on recent progress in robot perception and control methods developed by taking the symmetry of the problem into account. Inspired by existing mathematical tools for studying the symmetry structures of geometric spaces, geometric sensor registration, state estimator, and control methods provide indispensable insights into the problem formulations and generalization of robotics algorithms to challenging unknown environments. When combined with computational methods for learning hard-to-measure quantities, symmetry-preserving methods unleash tremendous performance. The article supports this claim by showcasing experimental results of robot perception, state estimation, and control in real-world scenarios.
Collapse
Affiliation(s)
- Maani Ghaffari
- Computational Autonomy and Robotics Laboratory (CURLY), University of Michigan, Ann Arbor, MI, United States
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
FastFusion: Real-Time Indoor Scene Reconstruction with Fast Sensor Motion. REMOTE SENSING 2022. [DOI: 10.3390/rs14153551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Real-time 3D scene reconstruction has attracted a great amount of attention in the fields of augmented reality, virtual reality and robotics. Previous works usually assumed slow sensor motions to avoid large interframe differences and strong image blur, but this limits the applicability of the techniques in real cases. In this study, we propose an end-to-end 3D reconstruction system that combines color, depth and inertial measurements to achieve a robust reconstruction with fast sensor motions. We involved an extended Kalman filter (EKF) to fuse RGB-D-IMU data and jointly optimize feature correspondences, camera poses and scene geometry by using an iterative method. A novel geometry-aware patch deformation technique is proposed to adapt the changes in patch features in the image domain, leading to highly accurate feature tracking with fast sensor motions. In addition, we maintained the global consistency of the reconstructed model by achieving loop closure with submap-based depth image encoding and 3D map deformation. The experiments revealed that our patch deformation method improves the accuracy of feature tracking, that our improved loop detection method is more efficient than the original method and that our system possesses superior 3D reconstruction results compared with the state-of-the-art solutions in handling fast camera motions.
Collapse
|
21
|
Jaenal A, Moreno FA, Gonzalez-Jimenez J. Unsupervised Appearance Map Abstraction for Indoor Visual Place Recognition With Mobile Robots. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3186768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Alberto Jaenal
- Machine Perception and Intelligent Robotics Group (MAPIR-UMA), Malaga Institute for Mechatronics Engineering and Cyber-Physical Systems (IMECH.UMA), University of Malaga, Malaga, Spain
| | - Francisco-Angel Moreno
- Machine Perception and Intelligent Robotics Group (MAPIR-UMA), Malaga Institute for Mechatronics Engineering and Cyber-Physical Systems (IMECH.UMA), University of Malaga, Malaga, Spain
| | - Javier Gonzalez-Jimenez
- Machine Perception and Intelligent Robotics Group (MAPIR-UMA), Malaga Institute for Mechatronics Engineering and Cyber-Physical Systems (IMECH.UMA), University of Malaga, Malaga, Spain
| |
Collapse
|
22
|
Ma J, Zhang J, Xu J, Ai R, Gu W, Chen X. OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3178797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Junyi Ma
- Vehicle Engineering, Beijing Institute of Technology, Beijing, China
| | - Jun Zhang
- HAOMO.AI Technology Co., Ltd, Beijing, China
| | - Jintao Xu
- HAOMO.AI Technology Co., Ltd, Beijing, China
| | - Rui Ai
- HAOMO.AI Technology Co., Ltd, Beijing, China
| | - Weihao Gu
- HAOMO.AI Technology Co., Ltd, Beijing, China
| | | |
Collapse
|
23
|
Paolicelli V, Berton G, Montagna F, Masone C, Caputo B. Adaptive-Attentive Geolocalization From Few Queries: A Hybrid Approach. FRONTIERS IN COMPUTER SCIENCE 2022. [DOI: 10.3389/fcomp.2022.841817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We tackle the task of cross-domain visual geo-localization, where the goal is to geo-localize a given query image against a database of geo-tagged images, in the case where the query and the database belong to different visual domains. In particular, at training time, we consider having access to only few unlabeled queries from the target domain. To adapt our deep neural network to the database distribution, we rely on a 2-fold domain adaptation technique, based on a hybrid generative-discriminative approach. To further enhance the architecture, and to ensure robustness across domains, we employ a novel attention layer that can easily be plugged into existing architectures. Through a large number of experiments, we show that this adaptive-attentive approach makes the model robust to large domain shifts, such as unseen cities or weather conditions. Finally, we propose a new large-scale dataset for cross-domain visual geo-localization, called SVOX.
Collapse
|
24
|
Zhang H, Zhao T, Zhong Y, Yin Y, Yuan H, Dian S. An efficient loop closure detection method based on spatially constrained feature matching. INTEL SERV ROBOT 2022. [DOI: 10.1007/s11370-022-00423-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
25
|
Siva S, Zhang H. Robot perceptual adaptation to environment changes for long-term human teammate following. Int J Rob Res 2022. [DOI: 10.1177/0278364919896625] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Perception is one of the several fundamental abilities required by robots, and it also poses significant challenges, especially in real-world field applications. Long-term autonomy introduces additional difficulties to robot perception, including short- and long-term changes of the robot operation environment (e.g., lighting changes). In this article, we propose an innovative human-inspired approach named robot perceptual adaptation (ROPA) that is able to calibrate perception according to the environment context, which enables perceptual adaptation in response to environmental variations. ROPA jointly performs feature learning, sensor fusion, and perception calibration under a unified regularized optimization framework. We also implement a new algorithm to solve the formulated optimization problem, which has a theoretical guarantee to converge to the optimal solution. In addition, we collect a large-scale dataset from physical robots in the field, called perceptual adaptation to environment changes (PEAC), with the aim to benchmark methods for robot adaptation to short-term and long-term, and fast and gradual lighting changes for human detection based upon different feature modalities extracted from color and depth sensors. Utilizing the PEAC dataset, we conduct extensive experiments in the application of human recognition and following in various scenarios to evaluate ROPA. Experimental results have validated that the ROPA approach obtains promising performance in terms of accuracy and efficiency, and effectively adapts robot perception to address short-term and long-term lighting changes in human detection and following applications.
Collapse
Affiliation(s)
- Sriram Siva
- Human-Centered Robotics Lab, Colorado School of Mines, Golden, CO, USA
| | - Hao Zhang
- Human-Centered Robotics Lab, Colorado School of Mines, Golden, CO, USA
| |
Collapse
|
26
|
Investigating the Role of Image Retrieval for Visual Localization. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01615-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
27
|
Djenouri Y, Hatleskog J, Hjelmervik J, Bjorne E, Utstumo T, Mobarhan M. Deep learning based decomposition for visual navigation in industrial platforms. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02908-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractIn the heavy asset industry, such as oil & gas, offshore personnel need to locate various equipment on the installation on a daily basis for inspection and maintenance purposes. However, locating equipment in such GPS denied environments is very time consuming due to the complexity of the environment and the large amount of equipment. To address this challenge we investigate an alternative approach to study the navigation problem based on visual imagery data instead of current ad-hoc methods where engineering drawings or large CAD models are used to find equipment. In particular, this paper investigates the combination of deep learning and decomposition for the image retrieval problem which is central for visual navigation. A convolutional neural network is first used to extract relevant features from the image database. The database is then decomposed into clusters of visually similar images, where several algorithms have been explored in order to make the clusters as independent as possible. The Bag-of-Words (BoW) approach is then applied on each cluster to build a vocabulary forest. During the searching process the vocabulary forest is exploited to find the most relevant images to the query image. To validate the usefulness of the proposed framework, intensive experiments have been carried out using both standard datasets and images from industrial environments. We show that the suggested approach outperforms the BoW-based image retrieval solutions, both in terms of computing time and accuracy. We also show the applicability of this approach on real industrial scenarios by applying the model on imagery data from offshore oil platforms.
Collapse
|
28
|
Rozsypálek Z, Broughton G, Linder P, Rouček T, Blaha J, Mentzl L, Kusumam K, Krajník T. Contrastive Learning for Image Registration in Visual Teach and Repeat Navigation. SENSORS 2022; 22:s22082975. [PMID: 35458959 PMCID: PMC9030179 DOI: 10.3390/s22082975] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/04/2022] [Accepted: 04/11/2022] [Indexed: 12/04/2022]
Abstract
Visual teach and repeat navigation (VT&R) is popular in robotics thanks to its simplicity and versatility. It enables mobile robots equipped with a camera to traverse learned paths without the need to create globally consistent metric maps. Although teach and repeat frameworks have been reported to be relatively robust to changing environments, they still struggle with day-to-night and seasonal changes. This paper aims to find the horizontal displacement between prerecorded and currently perceived images required to steer a robot towards the previously traversed path. We employ a fully convolutional neural network to obtain dense representations of the images that are robust to changes in the environment and variations in illumination. The proposed model achieves state-of-the-art performance on multiple datasets with seasonal and day/night variations. In addition, our experiments show that it is possible to use the model to generate additional training examples that can be used to further improve the original model’s robustness. We also conducted a real-world experiment on a mobile robot to demonstrate the suitability of our method for VT&R.
Collapse
Affiliation(s)
- Zdeněk Rozsypálek
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
- Correspondence:
| | - George Broughton
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
| | - Pavel Linder
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
| | - Tomáš Rouček
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
| | - Jan Blaha
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
| | - Leonard Mentzl
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
| | - Keerthy Kusumam
- Department of Computer Science, University of Nottingham, Jubilee Campus, 7301 Wollaton Rd, Lenton, Nottingham NG8 1BB, UK;
| | - Tomáš Krajník
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (G.B.); (P.L.); (T.R.); (J.B.); (L.M.); (T.K.)
| |
Collapse
|
29
|
Rouček T, Amjadi AS, Rozsypálek Z, Broughton G, Blaha J, Kusumam K, Krajník T. Self-Supervised Robust Feature Matching Pipeline for Teach and Repeat Navigation. SENSORS (BASEL, SWITZERLAND) 2022; 22:2836. [PMID: 35458823 PMCID: PMC9032253 DOI: 10.3390/s22082836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/28/2022] [Accepted: 03/31/2022] [Indexed: 06/14/2023]
Abstract
The performance of deep neural networks and the low costs of computational hardware has made computer vision a popular choice in many robotic systems. An attractive feature of deep-learned methods is their ability to cope with appearance changes caused by day-night cycles and seasonal variations. However, deep learning of neural networks typically relies on large numbers of hand-annotated images, which requires significant effort for data collection and annotation. We present a method that allows autonomous, self-supervised training of a neural network in visual teach-and-repeat (VT&R) tasks, where a mobile robot has to traverse a previously taught path repeatedly. Our method is based on a fusion of two image registration schemes: one based on a Siamese neural network and another on point-feature matching. As the robot traverses the taught paths, it uses the results of feature-based matching to train the neural network, which, in turn, provides coarse registration estimates to the feature matcher. We show that as the neural network gets trained, the accuracy and robustness of the navigation increases, making the robot capable of dealing with significant changes in the environment. This method can significantly reduce the data annotation efforts when designing new robotic systems or introducing robots into new environments. Moreover, the method provides annotated datasets that can be deployed in other navigation systems. To promote the reproducibility of the research presented herein, we provide our datasets, codes and trained models online.
Collapse
Affiliation(s)
- Tomáš Rouček
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (A.S.A.); (Z.R.); (G.B.); (J.B.); (T.K.)
| | - Arash Sadeghi Amjadi
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (A.S.A.); (Z.R.); (G.B.); (J.B.); (T.K.)
| | - Zdeněk Rozsypálek
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (A.S.A.); (Z.R.); (G.B.); (J.B.); (T.K.)
| | - George Broughton
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (A.S.A.); (Z.R.); (G.B.); (J.B.); (T.K.)
| | - Jan Blaha
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (A.S.A.); (Z.R.); (G.B.); (J.B.); (T.K.)
| | - Keerthy Kusumam
- Department of Computer Science, University of Nottingham, Jubilee Campus, 7301 Wollaton Rd, Lenton, Nottingham NG8 1BB, UK;
| | - Tomáš Krajník
- Artificial Intelligence Center, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague 6, Czech Republic; (A.S.A.); (Z.R.); (G.B.); (J.B.); (T.K.)
| |
Collapse
|
30
|
Malone C, Garg S, Xu M, Peynot T, Milford M. Improving Road Segmentation in Challenging Domains Using Similar Place Priors. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3146894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
31
|
Ozdemir A, Scerri M, Barron AB, Philippides A, Mangan M, Vasilaki E, Manneschi L. EchoVPR: Echo State Networks for Visual Place Recognition. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3150505] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
32
|
Liu S, Fan J, Ai D, Song H, Fu T, Wang Y, Yang J. Feature matching for texture-less endoscopy images via superpixel vector field consistency. BIOMEDICAL OPTICS EXPRESS 2022; 13:2247-2265. [PMID: 35519251 PMCID: PMC9045917 DOI: 10.1364/boe.450259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 01/05/2022] [Accepted: 01/23/2022] [Indexed: 06/14/2023]
Abstract
Feature matching is an important technology to obtain the surface morphology of soft tissues in intraoperative endoscopy images. The extraction of features from clinical endoscopy images is a difficult problem, especially for texture-less images. The reduction of surface details makes the problem more challenging. We proposed an adaptive gradient-preserving method to improve the visual feature of texture-less images. For feature matching, we first constructed a spatial motion field by using the superpixel blocks and estimated its information entropy matching with the motion consistency algorithm to obtain the initial outlier feature screening. Second, we extended the superpixel spatial motion field to the vector field and constrained it with the vector feature to optimize the confidence of the initial matching set. Evaluations were implemented on public and undisclosed datasets. Our method increased by an order of magnitude in the three feature point extraction methods than the original image. In the public dataset, the accuracy and F1-score increased to 92.6% and 91.5%. The matching score was improved by 1.92%. In the undisclosed dataset, the reconstructed surface integrity of the proposed method was improved from 30% to 85%. Furthermore, we also presented the surface reconstruction result of differently sized images to validate the robustness of our method, which showed high-quality feature matching results. Overall, the experiment results proved the effectiveness of the proposed matching method. This demonstrates its capability to extract sufficient visual feature points and generate reliable feature matches for 3D reconstruction and meaningful applications in clinical.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Tianyu Fu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
33
|
Toft C, Maddern W, Torii A, Hammarstrand L, Stenborg E, Safari D, Okutomi M, Pollefeys M, Sivic J, Pajdla T, Kahl F, Sattler T. Long-Term Visual Localization Revisited. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2074-2088. [PMID: 33074802 DOI: 10.1109/tpami.2020.3032010] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing conditions, including day-night changes, as well as weather and seasonal variations, while providing highly accurate six degree-of-freedom (6DOF) camera pose estimates. In this paper, we extend three publicly available datasets containing images captured under a wide variety of viewing conditions, but lacking camera pose information, with ground truth pose information, making evaluation of the impact of various factors on 6DOF camera pose estimation accuracy possible. We also discuss the performance of state-of-the-art localization approaches on these datasets. Additionally, we release around half of the poses for all conditions, and keep the remaining half private as a test set, in the hopes that this will stimulate research on long-term visual localization, learned local image features, and related research areas. Our datasets are available at visuallocalization.net, where we are also hosting a benchmarking server for automatic evaluation of results on the test set. The presented state-of-the-art results are to a large degree based on submissions to our server.
Collapse
|
34
|
Extracting Statistical Signatures of Geometry and Structure in 2D Occupancy Grid Maps for Global Localization. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3151154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
35
|
Hussaini S, Milford M, Fischer T. Spiking Neural Networks for Visual Place Recognition Via Weighted Neuronal Assignments. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3149030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Somayeh Hussaini
- QUT Centre for Robotics, Queensland University of Technology, Brisbane, QLD, Australia
| | - Michael Milford
- QUT Centre for Robotics, Queensland University of Technology, Brisbane, QLD, Australia
| | - Tobias Fischer
- QUT Centre for Robotics, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
36
|
Arcanjo B, Ferrarini B, Milford M, McDonald-Maier KD, Ehsan S. An Efficient and Scalable Collection of Fly-Inspired Voting Units for Visual Place Recognition in Changing Environments. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3140827] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
37
|
Shen Y, Wang R, Zuo W, Zheng N. TCL: Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3141663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
38
|
|
39
|
Company-Corcoles JP, Garcia-Fidalgo E, Ortiz A. Appearance-based loop closure detection combining lines and learned points for low-textured environments. Auton Robots 2022. [DOI: 10.1007/s10514-021-10032-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractHand-crafted point descriptors have been traditionally used for visual loop closure detection. However, in low-textured environments, it is usually difficult to find enough point features and, hence, the performance of such algorithms degrade. Under this context, this paper proposes a loop closure detection method that combines lines and learned points to work, particularly, in scenarios where hand-crafted points fail. To index previous images, we adopt separate incremental binary Bag-of-Words (BoW) schemes for points and lines. Moreover, we adopt a binarization procedure for features’ descriptors to benefit from the advantages of learned features into a binary BoW model. Furthermore, image candidates from each BoW instance are merged using a novel query-adaptive late fusion approach. Finally, a spatial verification stage, which integrates appearance and geometry perspectives, allows us to enhance the global performance of the method. Our approach is validated using several public datasets, outperforming other state-of-the-art solutions in most cases, especially in low-textured scenarios.
Collapse
|
40
|
An S, Zhu H, Wei D, Tsintotas KA, Gasteratos A. Fast and incremental loop closure detection with deep features and proximity graphs. J FIELD ROBOT 2022. [DOI: 10.1002/rob.22060] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Shan An
- State Key Lab of Software Development Environment Beihang University Beijing China
| | - Haogang Zhu
- State Key Lab of Software Development Environment Beihang University Beijing China
| | - Dong Wei
- Tech & Data Center JD.COM Inc. Beijing China
| | | | - Antonios Gasteratos
- Department of Production and Management Engineering Democritus University of Thrace Xanthi Greece
| |
Collapse
|
41
|
InstaIndoor and multi-modal deep learning for indoor scene recognition. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06781-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
42
|
Wang Y, Xue T, Li Q. A Robust Image-Sequence-Based Framework for Visual Place Recognition in Changing Environments. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:152-163. [PMID: 32203043 DOI: 10.1109/tcyb.2020.2977128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article proposes a robust image-sequence-based framework to deal with two challenges of visual place recognition in changing environments: 1) viewpoint variations and 2) environmental condition variations. Our framework includes two main parts. The first part is to calculate the distance between two images from a reference image sequence and a query image sequence. In this part, we remove the deep features of nonoverlap contents in these two images and utilize the remaining deep features to calculate the distance. As the deep features of nonoverlap contents are caused by viewpoint variations, removing these deep features can improve the robustness to viewpoint variations. Based on the first part, in the second part, we first calculate the distances of all pairs of images from a reference image sequence and a query image sequence, and obtain a distance matrix. Afterward, we design two convolutional operators to retrieve the distance submatrix with the minimum diagonal distribution. The minimum diagonal distribution contains more environmental information, which is insensitive to environmental condition variations. The experimental results suggest that our framework exhibits better performance than several state-of-the-art methods. Moreover, the analysis of runtime shows that our framework has the potential to satisfy real-time demands.
Collapse
|
43
|
Yang B, Xu X, Ren J, Cheng L, Guo L, Zhang Z. SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2021.11.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
44
|
Pearson MJ, Dora S, Struckmeier O, Knowles TC, Mitchinson B, Tiwari K, Kyrki V, Bohte S, Pennartz CMA. Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding. Front Robot AI 2021; 8:732023. [PMID: 34966789 PMCID: PMC8710724 DOI: 10.3389/frobt.2021.732023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Recognising familiar places is a competence required in many engineering applications that interact with the real world such as robot navigation. Combining information from different sensory sources promotes robustness and accuracy of place recognition. However, mismatch in data registration, dimensionality, and timing between modalities remain challenging problems in multisensory place recognition. Spurious data generated by sensor drop-out in multisensory environments is particularly problematic and often resolved through adhoc and brittle solutions. An effective approach to these problems is demonstrated by animals as they gracefully move through the world. Therefore, we take a neuro-ethological approach by adopting self-supervised representation learning based on a neuroscientific model of visual cortex known as predictive coding. We demonstrate how this parsimonious network algorithm which is trained using a local learning rule can be extended to combine visual and tactile sensory cues from a biomimetic robot as it naturally explores a visually aliased environment. The place recognition performance obtained using joint latent representations generated by the network is significantly better than contemporary representation learning techniques. Further, we see evidence of improved robustness at place recognition in face of unimodal sensor drop-out. The proposed multimodal deep predictive coding algorithm presented is also linearly extensible to accommodate more than two sensory modalities, thereby providing an intriguing example of the value of neuro-biologically plausible representation learning for multimodal navigation.
Collapse
Affiliation(s)
- Martin J Pearson
- Bristol Robotics Laboratory, University of The West England Bristol, Bristol, United Kingdom
| | - Shirin Dora
- Department of Computer Science, Loughborough University, Loughborough, United Kingdom.,Center for Mathematics and Informatics, Amsterdam, Netherlands
| | | | - Thomas C Knowles
- Bristol Robotics Laboratory, University of The West England Bristol, Bristol, United Kingdom
| | - Ben Mitchinson
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
| | - Kshitij Tiwari
- Intelligent Robotics Group, Aalto University, Helsinki, Finland
| | - Ville Kyrki
- Intelligent Robotics Group, Aalto University, Helsinki, Finland
| | - Sander Bohte
- Center for Mathematics and Informatics, Amsterdam, Netherlands.,Department of Cognitive and Systems Neuroscience, University of Amsterdam, Amsterdam, Netherlands
| | - Cyriel M A Pennartz
- Department of Cognitive and Systems Neuroscience, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
45
|
Gopalapillai R, Gupta D, Zakariah M, Alotaibi YA. Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification. SENSORS 2021; 21:s21237950. [PMID: 34883955 PMCID: PMC8659746 DOI: 10.3390/s21237950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/18/2021] [Accepted: 11/25/2021] [Indexed: 11/16/2022]
Abstract
Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel’s local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.
Collapse
Affiliation(s)
- Radhakrishnan Gopalapillai
- Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India;
| | - Deepa Gupta
- Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India;
- Correspondence:
| | - Mohammed Zakariah
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia; (M.Z.); (Y.A.A.)
| | - Yousef Ajami Alotaibi
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia; (M.Z.); (Y.A.A.)
| |
Collapse
|
46
|
Bielecki A, Śmigielski P. Three-Dimensional Outdoor Analysis of Single Synthetic Building Structures by an Unmanned Flying Agent Using Monocular Vision. SENSORS 2021; 21:s21217270. [PMID: 34770577 PMCID: PMC8587298 DOI: 10.3390/s21217270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 09/22/2021] [Accepted: 10/27/2021] [Indexed: 11/16/2022]
Abstract
An algorithm designed for analysis and understanding a 3D urban-type environment by an autonomous flying agent, equipped only with a monocular vision, is presented. The algorithm is hierarchical and is based on the structural representation of the analyzed scene. Firstly, the robot observes the scene from a high altitude to build a 2D representation of a single object and a graph representation of the 2D scene. The 3D representation of each object arises as a consequence of the robot’s actions, as a result of which it projects the object’s solid on different planes. The robot assigns the obtained representations to the corresponding vertex of the created graph. The algorithm was tested by using the embodied robot operating on the real scene. The tests showed that the robot equipped with the algorithm was able not only to localize the predefined object, but also to perform safe, collision-free maneuvers close to the structures in the scene.
Collapse
Affiliation(s)
- Andrzej Bielecki
- Institute of Computer Science, Faculty of Exact and Natural Sciences, Pedagogical University in Kraków, Podchorążych 2, 30-084 Kraków, Poland
- Correspondence: or
| | | |
Collapse
|
47
|
Yu M, Zhang L, Wang W, Huang H. Loop Closure Detection by Using Global and Local Features With Photometric and Viewpoint Invariance. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8873-8885. [PMID: 34699356 DOI: 10.1109/tip.2021.3116898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Loop closure detection plays an important role in many Simultaneous Localization and Mapping (SLAM) systems, while the main challenge lies in the photometric and viewpoint variance. This paper presents a novel loop closure detection algorithm that is more robust to the variance by using both global and local features. Specifically, the global feature with the consolidation of photometric and viewpoint invariance is learned by a Siamese Network from the intensity, depth, gradient and normal vectors distribution. The local feature with rotation invariance is based on the histogram of relative pixel intensity and geometric information like curvature and coplanarity. Then, these two types of features are jointly leveraged for the robust detection of loop closures. The extensive experiments have been conducted on the publicly available RGB-D benchmark datasets like TUM and KITTI. The results demonstrate that our algorithm can effectively address challenging scenarios with large photometric and viewpoint variance, which outperforms other state-of-the-art methods.
Collapse
|
48
|
Manzoor S, Joo SH, Kim EJ, Bae SH, In GG, Pyo JW, Kuc TY. 3D Recognition Based on Sensor Modalities for Robotic Systems: A Survey. SENSORS (BASEL, SWITZERLAND) 2021; 21:7120. [PMID: 34770429 PMCID: PMC8587961 DOI: 10.3390/s21217120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/17/2021] [Accepted: 10/20/2021] [Indexed: 11/16/2022]
Abstract
3D visual recognition is a prerequisite for most autonomous robotic systems operating in the real world. It empowers robots to perform a variety of tasks, such as tracking, understanding the environment, and human-robot interaction. Autonomous robots equipped with 3D recognition capability can better perform their social roles through supportive task assistance in professional jobs and effective domestic services. For active assistance, social robots must recognize their surroundings, including objects and places to perform the task more efficiently. This article first highlights the value-centric role of social robots in society by presenting recently developed robots and describes their main features. Instigated by the recognition capability of social robots, we present the analysis of data representation methods based on sensor modalities for 3D object and place recognition using deep learning models. In this direction, we delineate the research gaps that need to be addressed, summarize 3D recognition datasets, and present performance comparisons. Finally, a discussion of future research directions concludes the article. This survey is intended to show how recent developments in 3D visual recognition based on sensor modalities using deep-learning-based approaches can lay the groundwork to inspire further research and serves as a guide to those who are interested in vision-based robotics applications.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Tae-Yong Kuc
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Korea; (S.M.); (S.-H.J.); (E.-J.K.); (S.-H.B.); (G.-G.I.); (J.-W.P.)
| |
Collapse
|
49
|
Chen S, Wu J, Lu Q, Wang Y, Lin Z. Cross-scene loop-closure detection with continual learning for visual simultaneous localization and mapping. INT J ADV ROBOT SYST 2021. [DOI: 10.1177/17298814211050560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Humans maintain good memory and recognition capability of previous environments when they are learning about new ones. Thus humans are able to continually learn and increase their experience. It is also obvious importance for autonomous mobile robot. The simultaneous localization and mapping system plays an important role in localization and navigation of robot. The loop-closure detection method is an indispensable part of the relocation and map construction, which is critical to correct mappoint errors of simultaneous localization and mapping. Existing visual loop-closure detection methods based on deep learning are not capable of continual learning in terms of cross-scene environment, which bring a great limitation to the application scope. In this article, we propose a novel end-to-end loop-closure detection method based on continual learning, which can effectively suppress the decline of the memory capability of simultaneous localization and mapping system by introducing firstly the orthogonal projection operator into the loop-closure detection to overcome the catastrophic forgetting problem of mobile robot in large-scale and multi-scene environments. Based on the three scenes from public data sets, the experimental results show that the proposed method has a strong capability of continual learning in the cross-scene environment where existing state-of-the-art methods fail.
Collapse
Affiliation(s)
- Shilang Chen
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Junjun Wu
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Qinghua Lu
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Yanran Wang
- Department of Computer Science, Jinan University, Guangzhou, China
| | - Zeqin Lin
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| |
Collapse
|
50
|
Wang J, Li C, Li B, Pang C, Fang Z. High-precision and robust localization system for mobile robots in complex and large-scale indoor scenes. INT J ADV ROBOT SYST 2021. [DOI: 10.1177/17298814211047690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
High-precision and robust localization is the key issue for long-term and autonomous navigation of mobile robots in industrial scenes. In this article, we propose a high-precision and robust localization system based on laser and artificial landmarks. The proposed localization system is mainly composed of three modules, namely scoring mechanism-based global localization module, laser and artificial landmark-based localization module, and relocalization trigger module. Global localization module processes the global map to obtain the map pyramid, thus improve the global localization speed and accuracy when robots are powered on or kidnapped. Laser and artificial landmark-based localization module is employed to achieve robust localization in highly dynamic scenes and high-precision localization in target areas. The relocalization trigger module is used to monitor the current localization quality in real time by matching the current laser scan with the global map and feeds it back to the global localization module to improve the robustness of the system. Experimental results show that our method can achieve robust robot localization and real-time detection of the current localization quality in indoor scenes and industrial environment. In the target area, the position error is less than 0.004 m and the angle error is less than 0.01 rad.
Collapse
Affiliation(s)
- Jibo Wang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Chengpeng Li
- SIASUN Robot & Automation Co., Ltd, Shenyang, China
| | - Bangyu Li
- SIASUN Robot & Automation Co., Ltd, Shenyang, China
| | - Chenglin Pang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Zheng Fang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| |
Collapse
|