1
|
Zhang C, Weng X, Cao Y, Ding M. Monocular Absolute Depth Estimation from Motion for Small Unmanned Aerial Vehicles by Geometry-Based Scale Recovery. SENSORS (BASEL, SWITZERLAND) 2024; 24:4541. [PMID: 39065938 PMCID: PMC11281144 DOI: 10.3390/s24144541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 07/08/2024] [Accepted: 07/11/2024] [Indexed: 07/28/2024]
Abstract
In recent years, there has been extensive research and application of unsupervised monocular depth estimation methods for intelligent vehicles. However, a major limitation of most existing approaches is their inability to predict absolute depth values in physical units, as they generally suffer from the scale problem. Furthermore, most research efforts have focused on ground vehicles, neglecting the potential application of these methods to unmanned aerial vehicles (UAVs). To address these gaps, this paper proposes a novel absolute depth estimation method specifically designed for flight scenes using a monocular vision sensor, in which a geometry-based scale recovery algorithm serves as a post-processing stage of relative depth estimation results with scale consistency. By exploiting the feature correspondence between successive images and using the pose data provided by equipped navigation sensors, the scale factor between relative and absolute scales is calculated according to a multi-view geometry model, and then absolute depth maps are generated by pixel-wise multiplication of relative depth maps with the scale factor. As a result, the unsupervised monocular depth estimation technology is extended from relative depth estimation in semi-structured scenes to absolute depth estimation in unstructured scenes. Experiments on the publicly available Mid-Air dataset and customized data demonstrate the effectiveness of our method in different cases and settings, as well as its robustness to navigation sensor noise. The proposed method only requires UAVs to be equipped with monocular camera and common navigation sensors, and the obtained absolute depth information can be directly used for downstream tasks, which is significant for this kind of vehicle that has rarely been explored in previous depth estimation studies.
Collapse
Affiliation(s)
- Chuanqi Zhang
- College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (C.Z.); (X.W.)
| | - Xiangrui Weng
- College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (C.Z.); (X.W.)
| | - Yunfeng Cao
- College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (C.Z.); (X.W.)
| | - Meng Ding
- College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
| |
Collapse
|
2
|
Bai J, Qin H, Lai S, Guo J, Guo Y. GLPanoDepth: Global-to-Local Panoramic Depth Estimation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:2936-2949. [PMID: 38619939 DOI: 10.1109/tip.2024.3386403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Depth estimation is a fundamental task in many vision applications. With the popularity of omnidirectional cameras, it becomes a new trend to tackle this problem in the spherical space. In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete descriptions of the scene than perspective images. However, fully-convolutional networks that most current solutions rely on fail to capture rich global contexts from the panorama. To address this issue and also the distortion of equirectangular projection in the panorama, we propose Cubemap Vision Transformers (CViT), a new transformer-based architecture that can model long-range dependencies and extract distortion-free global features from the panorama. We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals. As a general architecture, it removes any restriction that has been imposed on the panorama in many other monocular panoramic depth estimation methods. To preserve important local features, we further design a convolution-based branch in our pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision transformers at multiple scales. This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving state-of-the-art performance in panoramic depth estimation.
Collapse
|
3
|
Jiang Z, Cai Z, Hui N, Li B. Multi-Level Optimization for Data-Driven Camera-LiDAR Calibration in Data Collection Vehicles. SENSORS (BASEL, SWITZERLAND) 2023; 23:8889. [PMID: 37960588 PMCID: PMC10648985 DOI: 10.3390/s23218889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/20/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023]
Abstract
Accurately calibrating camera-LiDAR systems is crucial for achieving effective data fusion, particularly in data collection vehicles. Data-driven calibration methods have gained prominence over target-based methods due to their superior adaptability to diverse environments. However, current data-driven calibration methods are susceptible to suboptimal initialization parameters, which can significantly impact the accuracy and efficiency of the calibration process. In response to these challenges, this paper proposes a novel general model for the camera-LiDAR calibration that abstracts away the technical details in existing methods, introduces an improved objective function that effectively mitigates the issue of suboptimal parameter initialization, and develops a multi-level parameter optimization algorithm that strikes a balance between accuracy and efficiency during iterative optimization. The experimental results demonstrate that the proposed method effectively mitigates the effects of suboptimal initial calibration parameters, achieving highly accurate and efficient calibration results. The suggested technique exhibits versatility and adaptability to accommodate various sensor configurations, making it a notable advancement in the field of camera-LiDAR calibration, with potential applications in diverse fields including autonomous driving, robotics, and computer vision.
Collapse
Affiliation(s)
| | - Zhongliang Cai
- School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China; (Z.J.); (N.H.); (B.L.)
| | | | | |
Collapse
|
4
|
Cen Y, Huang X, Liu J, Qin Y, Wu X, Ye S, Du S, Liao W. Application of three-dimensional reconstruction technology in dentistry: a narrative review. BMC Oral Health 2023; 23:630. [PMID: 37667286 PMCID: PMC10476426 DOI: 10.1186/s12903-023-03142-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 06/16/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND Three-dimensional(3D) reconstruction technology is a method of transforming real goals into mathematical models consistent with computer logic expressions and has been widely used in dentistry, but the lack of review and summary leads to confusion and misinterpretation of information. The purpose of this review is to provide the first comprehensive link and scientific analysis of 3D reconstruction technology and dentistry to bridge the information bias between these two disciplines. METHODS The IEEE Xplore and PubMed databases were used for rigorous searches based on specific inclusion and exclusion criteria, supplemented by Google Academic as a complementary tool to retrieve all literature up to February 2023. We conducted a narrative review focusing on the empirical findings of the application of 3D reconstruction technology to dentistry. RESULTS We classify the technologies applied to dentistry according to their principles and summarize the different characteristics of each category, as well as the different application scenarios determined by these characteristics of each technique. In addition, we indicate their development prospects and worthy research directions in the field of dentistry, from individual techniques to the overall discipline of 3D reconstruction technology, respectively. CONCLUSIONS Researchers and clinicians should make different decisions on the choice of 3D reconstruction technology based on different objectives. The main trend in the future development of 3D reconstruction technology is the joint application of technology.
Collapse
Affiliation(s)
- Yueyan Cen
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China
| | - Xinyue Huang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China
| | - Jialing Liu
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China
| | - Yichun Qin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China
| | - Xinrui Wu
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China
| | - Shiyang Ye
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China
| | - Shufang Du
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China.
| | - Wen Liao
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No.14, 3Rd Section of Ren Min Nan Rd. Chengdu, Sichuan, 610041, China.
| |
Collapse
|
5
|
Gonzalez-Romo NI, Hanalioglu S, Mignucci-Jiménez G, Abramov I, Xu Y, Preul MC. Anatomic Depth Estimation and 3-Dimensional Reconstruction of Microsurgical Anatomy Using Monoscopic High-Definition Photogrammetry and Machine Learning. Oper Neurosurg (Hagerstown) 2023; 24:432-444. [PMID: 36701667 DOI: 10.1227/ons.0000000000000544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/17/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Immersive anatomic environments offer an alternative when anatomic laboratory access is limited, but current three-dimensional (3D) renderings are not able to simulate the anatomic detail and surgical perspectives needed for microsurgical education. OBJECTIVE To perform a proof-of-concept study of a novel photogrammetry 3D reconstruction technique, converting high-definition (monoscopic) microsurgical images into a navigable, interactive, immersive anatomy simulation. METHODS Images were acquired from cadaveric dissections and from an open-access comprehensive online microsurgical anatomic image database. A pretrained neural network capable of depth estimation from a single image was used to create depth maps (pixelated images containing distance information that could be used for spatial reprojection and 3D rendering). Virtual reality (VR) experience was assessed using a VR headset, and augmented reality was assessed using a quick response code-based application and a tablet camera. RESULTS Significant correlation was found between processed image depth estimations and neuronavigation-defined coordinates at different levels of magnification. Immersive anatomic models were created from dissection images captured in the authors' laboratory and from images retrieved from the Rhoton Collection. Interactive visualization and magnification allowed multiple perspectives for an enhanced experience in VR. The quick response code offered a convenient method for importing anatomic models into the real world for rehearsal and for comparing other anatomic preparations side by side. CONCLUSION This proof-of-concept study validated the use of machine learning to render 3D reconstructions from 2-dimensional microsurgical images through depth estimation. This spatial information can be used to develop convenient, realistic, and immersive anatomy image models.
Collapse
Affiliation(s)
- Nicolas I Gonzalez-Romo
- Department of Neurosurgery, The Loyal and Edith Davis Neurosurgical Research Laboratory, Barrow Neurological Institute, St. Joseph's Hospital and Medical Center, Phoenix, AZ, USA
| | | | | | | | | | | |
Collapse
|
6
|
Romero-Lugo A, Magadan-Salazar A, Fuentes-Pacheco J, Pinto-Elías R. A Comparison of Deep Neural Networks for Monocular Depth Map Estimation in Natural Environments Flying at Low Altitude. SENSORS (BASEL, SWITZERLAND) 2022; 22:9830. [PMID: 36560196 PMCID: PMC9785825 DOI: 10.3390/s22249830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/11/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
Currently, the use of Unmanned Aerial Vehicles (UAVs) in natural and complex environments has been increasing, because they are appropriate and affordable solutions to support different tasks such as rescue, forestry, and agriculture by collecting and analyzing high-resolution monocular images. Autonomous navigation at low altitudes is an important area of research, as it would allow monitoring parts of the crop that are occluded by their foliage or by other plants. This task is difficult due to the large number of obstacles that might be encountered in the drone's path. The generation of high-quality depth maps is an alternative for providing real-time obstacle detection and collision avoidance for autonomous UAVs. In this paper, we present a comparative analysis of four supervised learning deep neural networks and a combination of two for monocular depth map estimation considering images captured at low altitudes in simulated natural environments. Our results show that the Boosting Monocular network is the best performing in terms of depth map accuracy because of its capability to process the same image at different scales to avoid loss of fine details.
Collapse
Affiliation(s)
| | | | - Jorge Fuentes-Pacheco
- CONACyT-Centro de Investigación en Ciencias, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca 62209, Morelos, Mexico
| | - Raúl Pinto-Elías
- Tecnológico Nacional de México, CENIDET, Cuernavaca 62490, Morelos, Mexico
| |
Collapse
|
7
|
Davydov Y, Chen WH, Lin YC. Supervised Object-Specific Distance Estimation from Monocular Images for Autonomous Driving. SENSORS (BASEL, SWITZERLAND) 2022; 22:8846. [PMID: 36433443 PMCID: PMC9693490 DOI: 10.3390/s22228846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/11/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
Accurate distance estimation is a requirement for advanced driver assistance systems (ADAS) to provide drivers with safety-related functions such as adaptive cruise control and collision avoidance. Radars and lidars can be used for providing distance information; however, they are either expensive or provide poor object information compared to image sensors. In this study, we propose a lightweight convolutional deep learning model that can extract object-specific distance information from monocular images. We explore a variety of training and five structural settings of the model and conduct various tests on the KITTI dataset for evaluating seven different road agents, namely, person, bicycle, car, motorcycle, bus, train, and truck. Additionally, in all experiments, a comparison with the Monodepth2 model is carried out. Experimental results show that the proposed model outperforms Monodepth2 by 15% in terms of the average weighted mean absolute error (MAE).
Collapse
Affiliation(s)
- Yury Davydov
- Graduate Institute of Automation Technology, National Taipei University of Technology, Taipei 10608, Taiwan
| | - Wen-Hui Chen
- Graduate Institute of Automation Technology, National Taipei University of Technology, Taipei 10608, Taiwan
| | - Yu-Chen Lin
- Department of Automatic Control Engineering, Feng Chia University, Taichung 40724, Taiwan
| |
Collapse
|
8
|
Masoumian A, Rashwan HA, Cristiano J, Asif MS, Puig D. Monocular Depth Estimation Using Deep Learning: A Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:5353. [PMID: 35891033 PMCID: PMC9325018 DOI: 10.3390/s22145353] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 07/01/2022] [Accepted: 07/15/2022] [Indexed: 06/15/2023]
Abstract
In current decades, significant advancements in robotics engineering and autonomous vehicles have improved the requirement for precise depth measurements. Depth estimation (DE) is a traditional task in computer vision that can be appropriately predicted by applying numerous procedures. This task is vital in disparate applications such as augmented reality and target tracking. Conventional monocular DE (MDE) procedures are based on depth cues for depth prediction. Various deep learning techniques have demonstrated their potential applications in managing and supporting the traditional ill-posed problem. The principal purpose of this paper is to represent a state-of-the-art review of the current developments in MDE based on deep learning techniques. For this goal, this paper tries to highlight the critical points of the state-of-the-art works on MDE from disparate aspects. These aspects include input data shapes and training manners such as supervised, semi-supervised, and unsupervised learning approaches in combination with applying different datasets and evaluation indicators. At last, limitations regarding the accuracy of the DL-based MDE models, computational time requirements, real-time inference, transferability, input images shape and domain adaptation, and generalization are discussed to open new directions for future research.
Collapse
Affiliation(s)
- Armin Masoumian
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
- Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521, USA;
| | - Hatem A. Rashwan
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
| | - Julián Cristiano
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
| | - M. Salman Asif
- Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521, USA;
| | - Domenec Puig
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
| |
Collapse
|
9
|
Application Research of Bridge Damage Detection Based on the Improved Lightweight Convolutional Neural Network Model. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12126225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To ensure the safety and rational use of bridge traffic lines, the existing bridge structural damage detection models are not perfect for feature extraction and have difficulty meeting the practicability of detection equipment. Based on the YOLO (You Only Look Once) algorithm, this paper proposes a lightweight target detection algorithm with enhanced feature extraction of bridge structural damage. The BIFPN (Bidirectional Feature Pyramid Network) network structure is used for multi-scale feature fusion, which enhances the ability to extract damage features of bridge structures, and uses EFL (Equalized Focal Loss) to optimize the sample imbalance processing mechanism, which improves the accuracy of bridge structure damage target detection. The evaluation test of the model has been carried out in the constructed BDD (Bridge Damage Dataset) dataset. Compared with the YOLOv3-tiny, YOLOv5S, and B-YOLOv5S models, the mAP@.5 of the BE-YOLOv5S model increased by 45.1%, 2%, and 1.6% respectively. The analysis and comparison of the experimental results prove that the BE-YOLOv5S network model proposed in this paper has a better performance and a more reliable performance in the detection of bridge structural damage. It can meet the needs of bridge structure damage detection engineering with high requirements for real-time and flexibility.
Collapse
|
10
|
Hu H, Zhu M, Li M, Chan KL. Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information. SENSORS (BASEL, SWITZERLAND) 2022; 22:2576. [PMID: 35408191 PMCID: PMC9003335 DOI: 10.3390/s22072576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/24/2022] [Accepted: 03/24/2022] [Indexed: 02/01/2023]
Abstract
Recently, the research on monocular 3D target detection based on pseudo-LiDAR data has made some progress. In contrast to LiDAR-based algorithms, the robustness of pseudo-LiDAR methods is still inferior. After conducting in-depth experiments, we realized that the main limitations are due to the inaccuracy of the target position and the uncertainty in the depth distribution of the foreground target. These two problems arise from the inaccurate depth estimation. To deal with the aforementioned problems, we propose two innovative solutions. The first is a novel method based on joint image segmentation and geometric constraints, used to predict the target depth and provide the depth prediction confidence measure. The predicted target depth is fused with the overall depth of the scene and results in the optimal target position. For the second, we utilize the target scale, normalized with the Gaussian function, as a priori information. The uncertainty of depth distribution, which can be visualized as long-tail noise, is reduced. With the refined depth information, we convert the optimized depth map into the point cloud representation, called a pseudo-LiDAR point cloud. Finally, we input the pseudo-LiDAR point cloud to the LiDAR-based algorithm to detect the 3D target. We conducted extensive experiments on the challenging KITTI dataset. The results demonstrate that our proposed framework outperforms various state-of-the-art methods by more than 12.37% and 5.34% on the easy and hard settings of the KITTI validation subset, respectively. On the KITTI test set, our framework also outperformed state-of-the-art methods by 5.1% and 1.76% on the easy and hard settings, respectively.
Collapse
Affiliation(s)
- Henan Hu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (H.H.); (M.Z.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Ming Zhu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (H.H.); (M.Z.)
| | - Muyu Li
- Centre for Intelligent Multidimensional Data Analysis Limited, Hong Kong, China;
| | - Kwok-Leung Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
11
|
Wade L, Needham L, McGuigan P, Bilzon J. Applications and limitations of current markerless motion capture methods for clinical gait biomechanics. PeerJ 2022; 10:e12995. [PMID: 35237469 PMCID: PMC8884063 DOI: 10.7717/peerj.12995] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 02/02/2022] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Markerless motion capture has the potential to perform movement analysis with reduced data collection and processing time compared to marker-based methods. This technology is now starting to be applied for clinical and rehabilitation applications and therefore it is crucial that users of these systems understand both their potential and limitations. This literature review aims to provide a comprehensive overview of the current state of markerless motion capture for both single camera and multi-camera systems. Additionally, this review explores how practical applications of markerless technology are being used in clinical and rehabilitation settings, and examines the future challenges and directions markerless research must explore to facilitate full integration of this technology within clinical biomechanics. METHODOLOGY A scoping review is needed to examine this emerging broad body of literature and determine where gaps in knowledge exist, this is key to developing motion capture methods that are cost effective and practically relevant to clinicians, coaches and researchers around the world. Literature searches were performed to examine studies that report accuracy of markerless motion capture methods, explore current practical applications of markerless motion capture methods in clinical biomechanics and identify gaps in our knowledge that are relevant to future developments in this area. RESULTS Markerless methods increase motion capture data versatility, enabling datasets to be re-analyzed using updated pose estimation algorithms and may even provide clinicians with the capability to collect data while patients are wearing normal clothing. While markerless temporospatial measures generally appear to be equivalent to marker-based motion capture, joint center locations and joint angles are not yet sufficiently accurate for clinical applications. Pose estimation algorithms are approaching similar error rates of marker-based motion capture, however, without comparison to a gold standard, such as bi-planar videoradiography, the true accuracy of markerless systems remains unknown. CONCLUSIONS Current open-source pose estimation algorithms were never designed for biomechanical applications, therefore, datasets on which they have been trained are inconsistently and inaccurately labelled. Improvements to labelling of open-source training data, as well as assessment of markerless accuracy against gold standard methods will be vital next steps in the development of this technology.
Collapse
Affiliation(s)
- Logan Wade
- Department for Health, University of Bath, Bath, United Kingdom,Centre for Analysis of Motion, Entertainment Research and Applications, University of Bath, Bath, United Kingdom
| | - Laurie Needham
- Department for Health, University of Bath, Bath, United Kingdom,Centre for Analysis of Motion, Entertainment Research and Applications, University of Bath, Bath, United Kingdom
| | - Polly McGuigan
- Department for Health, University of Bath, Bath, United Kingdom,Centre for Analysis of Motion, Entertainment Research and Applications, University of Bath, Bath, United Kingdom
| | - James Bilzon
- Department for Health, University of Bath, Bath, United Kingdom,Centre for Analysis of Motion, Entertainment Research and Applications, University of Bath, Bath, United Kingdom,Centre for Sport Exercise and Osteoarthritis Research Versus Arthritis, University of Bath, Bath, United Kingdom
| |
Collapse
|
12
|
Abstract
Vision-based three-dimensional (3D) shape measurement techniques have been widely applied over the past decades in numerous applications due to their characteristics of high precision, high efficiency and non-contact. Recently, great advances in computing devices and artificial intelligence have facilitated the development of vision-based measurement technology. This paper mainly focuses on state-of-the-art vision-based methods that can perform 3D shape measurement with high precision and high resolution. Specifically, the basic principles and typical techniques of triangulation-based measurement methods as well as their advantages and limitations are elaborated, and the learning-based techniques used for 3D vision measurement are enumerated. Finally, the advances of, and the prospects for, further improvement of vision-based 3D shape measurement techniques are proposed.
Collapse
|
13
|
Ni NJ, Ma FY, Wu XM, Liu X, Zhang HY, Yu YF, Guo MC, Zhu SY. Novel application of multispectral refraction topography in the observation of myopic control effect by orthokeratology lens in adolescents. World J Clin Cases 2021; 9:8985-8998. [PMID: 34786382 PMCID: PMC8567508 DOI: 10.12998/wjcc.v9.i30.8985] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/10/2021] [Accepted: 08/25/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Myopia, as one of the common ocular diseases, often occurs in adolescence. In addition to the harm from itself, it may also lead to serious complications. Thus, prevention and control of myopia are attracting more and more attention. Previous research revealed that single-focal glasses and orthokeratology lenses (OK lenses) played an important part in slowing down myopia and preventing high myopia.
AIM To compare the clinical effects of OK lenses and frame glasses against the increase of diopter in adolescent myopia and further explore the mechanism of the OK lens.
METHODS Changes in diopter and axial length were collected among 70 adolescent myopia patients (124 eyes) wearing OK lenses for 1 year (group A) and 59 adolescent myopia patients (113 eyes) wearing frame glasses (group B). Refractive states of their retina were inspected through multispectral refraction topography. The obtained hyperopic defocus was analyzed for the mechanism of OK lenses on slowing down the increase of myopic diopter by delaying the increase of ocular axis length and reducing the near hyperopia defocus.
RESULTS Teenagers in groups A and B were divided into low myopia (0D - -3.00 D) and moderate myopia (-3.25D - -6.00 D), without statistical differences among gender and age. After 1-year treatment, the increase of diopter and axis length and changes of retinal hyperopic defocus amount of group A were significantly less than those of group B. According to the multiple linear analysis, the retinal defocus in the upper, lower, nasal, and temporal directions had almost the same effect on the total defocus. The amount of peripheral retinal defocus (15°-53°) in group A was significantly lower than that in group B.
CONCLUSION Multispectral refraction topography is progressive and instructive in clinical prevention and control of myopia.
Collapse
Affiliation(s)
- Ning-Jun Ni
- Department of Technology, Zigong Yuan-Xin Energy Saving Technology Co. Ltd, Zigong 643030, Sichuan Province, China
| | - Fei-Yan Ma
- Department of Ophthalmology, The Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei Province, China
| | - Xiao-Mei Wu
- Department of Ophthalmology, The First People’s Hospital of Zigong, Zigong 643000, Sichuan Province, China
| | - Xiao Liu
- Department of Ophthalmology, The First People’s Hospital of Zigong, Zigong 643000, Sichuan Province, China
| | - Hong-Yan Zhang
- Department of Ophthalmology, The First People’s Hospital of Zigong, Zigong 643000, Sichuan Province, China
| | - Yi-Fei Yu
- Department of Optometry, North Sichuan Medical College, Nanchong 637000, Sichuan Province, China
| | - Mei-Chen Guo
- Department of Ophthalmology, The First People’s Hospital of Zigong, Zigong 643000, Sichuan Province, China
| | - Sheng-Yong Zhu
- Department of Ophthalmology, The First People’s Hospital of Zigong, Zigong 643000, Sichuan Province, China
| |
Collapse
|
14
|
MADNet 2.0: Pixel-Scale Topography Retrieval from Single-View Orbital Imagery of Mars Using Deep Learning. REMOTE SENSING 2021. [DOI: 10.3390/rs13214220] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The High-Resolution Imaging Science Experiment (HiRISE) onboard the Mars Reconnaissance Orbiter provides remotely sensed imagery at the highest spatial resolution at 25–50 cm/pixel of the surface of Mars. However, due to the spatial resolution being so high, the total area covered by HiRISE targeted stereo acquisitions is very limited. This results in a lack of the availability of high-resolution digital terrain models (DTMs) which are better than 1 m/pixel. Such high-resolution DTMs have always been considered desirable for the international community of planetary scientists to carry out fine-scale geological analysis of the Martian surface. Recently, new deep learning-based techniques that are able to retrieve DTMs from single optical orbital imagery have been developed and applied to single HiRISE observational data. In this paper, we improve upon a previously developed single-image DTM estimation system called MADNet (1.0). We propose optimisations which we collectively call MADNet 2.0, which is based on a supervised image-to-height estimation network, multi-scale DTM reconstruction, and 3D co-alignment processes. In particular, we employ optimised single-scale inference and multi-scale reconstruction (in MADNet 2.0), instead of multi-scale inference and single-scale reconstruction (in MADNet 1.0), to produce more accurate large-scale topographic retrieval with boosted fine-scale resolution. We demonstrate the improvements of the MADNet 2.0 DTMs produced using HiRISE images, in comparison to the MADNet 1.0 DTMs and the published Planetary Data System (PDS) DTMs over the ExoMars Rosalind Franklin rover’s landing site at Oxia Planum. Qualitative and quantitative assessments suggest the proposed MADNet 2.0 system is capable of producing pixel-scale DTM retrieval at the same spatial resolution (25 cm/pixel) of the input HiRISE images.
Collapse
|
15
|
SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches. SENSORS 2021; 21:s21165476. [PMID: 34450917 PMCID: PMC8398641 DOI: 10.3390/s21165476] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/06/2021] [Accepted: 08/09/2021] [Indexed: 01/17/2023]
Abstract
Monocular depth estimation based on unsupervised learning has attracted great attention due to the rising demand for lightweight monocular vision sensors. Inspired by multi-task learning, semantic information has been used to improve the monocular depth estimation models. However, multi-task learning is still limited by multi-type annotations. As far as we know, there are scarcely any large public datasets that provide all the necessary information. Therefore, we propose a novel network architecture Semantic-Feature-Aided Monocular Depth Estimation Network (SFA-MDEN) to extract multi-resolution depth features and semantic features, which are merged and fed into the decoder, with the goal of predicting depth with the support of semantics. Instead of using loss functions to relate the semantics and depth, the fusion of feature maps for semantics and depth is employed to predict the monocular depth. Therefore, two accessible datasets with similar topics for depth estimation and semantic segmentation can meet the requirements of SFA-MDEN for training sets. We explored the performance of the proposed SFA-MDEN with experiments on different datasets, including KITTI, Make3D, and our own dataset BHDE-v1. The experimental results demonstrate that SFA-MDEN achieves competitive accuracy and generalization capacity compared to state-of-the-art methods.
Collapse
|
16
|
Rapid Single Image-Based DTM Estimation from ExoMars TGO CaSSIS Images Using Generative Adversarial U-Nets. REMOTE SENSING 2021. [DOI: 10.3390/rs13152877] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The lack of adequate stereo coverage and where available, lengthy processing time, various artefacts, and unsatisfactory quality and complexity of automating the selection of the best set of processing parameters, have long been big barriers for large-area planetary 3D mapping. In this paper, we propose a deep learning-based solution, called MADNet (Multi-scale generative Adversarial u-net with Dense convolutional and up-projection blocks), that avoids or resolves all of the above issues. We demonstrate the wide applicability of this technique with the ExoMars Trace Gas Orbiter Colour and Stereo Surface Imaging System (CaSSIS) 4.6 m/pixel images on Mars. Only a single input image and a coarse global 3D reference are required, without knowing any camera models or imaging parameters, to produce high-quality and high-resolution full-strip Digital Terrain Models (DTMs) in a few seconds. In this paper, we discuss technical details of the MADNet system and provide detailed comparisons and assessments of the results. The resultant MADNet 8 m/pixel CaSSIS DTMs are qualitatively very similar to the 1 m/pixel HiRISE DTMs. The resultant MADNet CaSSIS DTMs display excellent agreement with nested Mars Reconnaissance Orbiter Context Camera (CTX), Mars Express’s High-Resolution Stereo Camera (HRSC), and Mars Orbiter Laser Altimeter (MOLA) DTMs at large-scale, and meanwhile, show fairly good correlation with the High-Resolution Imaging Science Experiment (HiRISE) DTMs for fine-scale details. In addition, we show how MADNet outperforms traditional photogrammetric methods, both on speed and quality, for other datasets like HRSC, CTX, and HiRISE, without any parameter tuning or re-training of the model. We demonstrate the results for Oxia Planum (the landing site of the European Space Agency’s Rosalind Franklin ExoMars rover 2023) and a couple of sites of high scientific interest.
Collapse
|
17
|
Khan F, Hussain S, Basak S, Lemley J, Corcoran P. An efficient encoder-decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data. Neural Netw 2021; 142:479-491. [PMID: 34280691 DOI: 10.1016/j.neunet.2021.07.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 06/13/2021] [Accepted: 07/05/2021] [Indexed: 01/08/2023]
Abstract
Depth estimation from a single image frame is a fundamental challenge in computer vision, with many applications such as augmented reality, action recognition, image understanding, and autonomous driving. Large and diverse training sets are required for accurate depth estimation from a single image frame. Due to challenges in obtaining dense ground-truth depth, a new 3D pipeline of 100 synthetic virtual human models is presented to generate multiple 2D facial images and corresponding ground truth depth data, allowing complete control over image variations. To validate the synthetic facial depth data, we propose an evaluation of state-of-the-art depth estimation algorithms based on single image frames on the generated synthetic dataset. Furthermore, an improved encoder-decoder based neural network is presented. This network is computationally efficient and shows better performance than current state-of-the-art when tested and evaluated across 4 public datasets. Our training methodology relies on the use of synthetic data samples which provides a more reliable ground truth for depth estimation. Additionally, using a combination of appropriate loss functions leads to improved performance than the current state-of-the-art network performances. Our approach clearly outperforms competing methods across different test datasets, setting a new state-of-the-art for facial depth estimation from synthetic data.
Collapse
Affiliation(s)
- Faisal Khan
- Department of Electronic Engineering, College of Science and Engineering, National University of Ireland Galway, Galway, H91 TK33, Ireland.
| | - Shahid Hussain
- Data Science Institute, National University of Ireland Galway, Galway H91 TK33, Ireland
| | - Shubhajit Basak
- School of Computer Science, National University of Ireland Galway, Galway H91 TK33, Ireland
| | - Joseph Lemley
- Xperi Corporation, Block 5 Parkmore East Business Park, Galway, H91V0TX, Ireland
| | - Peter Corcoran
- Department of Electronic Engineering, College of Science and Engineering, National University of Ireland Galway, Galway, H91 TK33, Ireland
| |
Collapse
|
18
|
Hwang SJ, Park SJ, Kim GM, Baek JH. Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network. SENSORS 2021; 21:s21082691. [PMID: 33920357 PMCID: PMC8069522 DOI: 10.3390/s21082691] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 04/01/2021] [Accepted: 04/09/2021] [Indexed: 12/25/2022]
Abstract
A colonoscopy is a medical examination used to check disease or abnormalities in the large intestine. If necessary, polyps or adenomas would be removed through the scope during a colonoscopy. Colorectal cancer can be prevented through this. However, the polyp detection rate differs depending on the condition and skill level of the endoscopist. Even some endoscopists have a 90% chance of missing an adenoma. Artificial intelligence and robot technologies for colonoscopy are being studied to compensate for these problems. In this study, we propose a self-supervised monocular depth estimation using spatiotemporal consistency in the colon environment. It is our contribution to propose a loss function for reconstruction errors between adjacent predicted depths and a depth feedback network that uses predicted depth information of the previous frame to predict the depth of the next frame. We performed quantitative and qualitative evaluation of our approach, and the proposed FBNet (depth FeedBack Network) outperformed state-of-the-art results for unsupervised depth estimation on the UCL datasets.
Collapse
|
19
|
Liu P, Zhang Z, Meng Z, Gao N. Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function. SENSORS 2020; 21:s21010054. [PMID: 33374278 PMCID: PMC7794707 DOI: 10.3390/s21010054] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 12/21/2020] [Accepted: 12/21/2020] [Indexed: 01/07/2023]
Abstract
Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.
Collapse
Affiliation(s)
- Peng Liu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300130, China; (P.L.); (Z.M.); (N.G.)
- School of Mechanical Engineering, Hebei University of Technology, Tianjin 300130, China
- Key Laboratory of Intelligent Data Information Processing and Control of Hebei Province, Tangshan University, Tangshan 063000, China
| | - Zonghua Zhang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300130, China; (P.L.); (Z.M.); (N.G.)
- School of Mechanical Engineering, Hebei University of Technology, Tianjin 300130, China
- Correspondence: ; Tel.: +86-1862-288-0015
| | - Zhaozong Meng
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300130, China; (P.L.); (Z.M.); (N.G.)
- School of Mechanical Engineering, Hebei University of Technology, Tianjin 300130, China
| | - Nan Gao
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300130, China; (P.L.); (Z.M.); (N.G.)
- School of Mechanical Engineering, Hebei University of Technology, Tianjin 300130, China
| |
Collapse
|
20
|
Yang J, Li S, Wang Z, Dong H, Wang J, Tang S. Using Deep Learning to Detect Defects in Manufacturing: A Comprehensive Survey and Current Challenges. MATERIALS 2020; 13:ma13245755. [PMID: 33339413 PMCID: PMC7766692 DOI: 10.3390/ma13245755] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/05/2020] [Accepted: 12/07/2020] [Indexed: 12/18/2022]
Abstract
The detection of product defects is essential in quality control in manufacturing. This study surveys stateoftheart deep-learning methods in defect detection. First, we classify the defects of products, such as electronic components, pipes, welded parts, and textile materials, into categories. Second, recent mainstream techniques and deep-learning methods for defects are reviewed with their characteristics, strengths, and shortcomings described. Third, we summarize and analyze the application of ultrasonic testing, filtering, deep learning, machine vision, and other technologies used for defect detection, by focusing on three aspects, namely method and experimental results. To further understand the difficulties in the field of defect detection, we investigate the functions and characteristics of existing equipment used for defect detection. The core ideas and codes of studies related to high precision, high positioning, rapid detection, small object, complex background, occluded object detection and object association, are summarized. Lastly, we outline the current achievements and limitations of the existing methods, along with the current research challenges, to assist the research community on defect detection in setting a further agenda for future studies.
Collapse
Affiliation(s)
- Jing Yang
- School of Mechanical Engineering, Guizhou University, Guiyang 550025, China; (J.Y.); (Z.W.); (H.D.); (J.W.)
- Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
| | - Shaobo Li
- School of Mechanical Engineering, Guizhou University, Guiyang 550025, China; (J.Y.); (Z.W.); (H.D.); (J.W.)
- Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
- Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China;
- Correspondence:
| | - Zheng Wang
- School of Mechanical Engineering, Guizhou University, Guiyang 550025, China; (J.Y.); (Z.W.); (H.D.); (J.W.)
| | - Hao Dong
- School of Mechanical Engineering, Guizhou University, Guiyang 550025, China; (J.Y.); (Z.W.); (H.D.); (J.W.)
| | - Jun Wang
- School of Mechanical Engineering, Guizhou University, Guiyang 550025, China; (J.Y.); (Z.W.); (H.D.); (J.W.)
| | - Shihao Tang
- Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China;
| |
Collapse
|
21
|
Zhao Z, Zhu Y, Li Y, Qiu Z, Luo Y, Xie C, Zhang Z. Multi-Camera-Based Universal Measurement Method for 6-DOF of Rigid Bodies in World Coordinate System. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20195547. [PMID: 32998291 PMCID: PMC7583861 DOI: 10.3390/s20195547] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 09/24/2020] [Indexed: 06/11/2023]
Abstract
The measurement of six-degrees-of-freedom (6-DOF) of rigid bodies plays an important role in many industries, but it often requires the use of professional instruments and software, or has limitations on the shape of measured objects. In this paper, a 6-DOF measurement method based on multi-camera is proposed, which is accomplished using at least two ordinary cameras and is made available for most morphological rigid bodies. First, multi-camera calibration based on Zhang Zhengyou's calibration method is introduced. In addition to the intrinsic and extrinsic parameters of cameras, the pose relationship between the camera coordinate system and the world coordinate system can also be obtained. Secondly, the 6-DOF calculation model of proposed method is gradually analyzed by the matrix analysis method. With the help of control points arranged on the rigid body, the 6-DOF of the rigid body can be calculated by the least square method. Finally, the Phantom 3D high-speed photogrammetry system (P3HPS) with an accuracy of 0.1 mm/m was used to evaluate this method. The experiment results show that the average error of the rotational degrees of freedom (DOF) measurement is less than 1.1 deg, and the average error of the movement DOF measurement is less than 0.007 m. In conclusion, the accuracy of the proposed method meets the requirements.
Collapse
Affiliation(s)
- Zuoxi Zhao
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| | - Yuchang Zhu
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| | - Yuanhong Li
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| | - Zhi Qiu
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| | - Yangfan Luo
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| | - Chaoshi Xie
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| | - Zhuangzhuang Zhang
- College of Engineering, South China Agricultural University, Guangzhou 510642, China; (Y.Z.); (Y.L.); (Z.Q.); (Y.L.); (C.X.); (Z.Z.)
- Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China
| |
Collapse
|
22
|
Abstract
Recently, deep learning frameworks have been deployed in visual odometry systems and achieved comparable results to traditional feature matching based systems. However, most deep learning-based frameworks inevitably need labeled data as ground truth for training. On the other hand, monocular odometry systems are incapable of restoring absolute scale. External or prior information has to be introduced for scale recovery. To solve these problems, we present a novel deep learning-based RGB-D visual odometry system. Our two main contributions are: (i) during network training and pose estimation, the depth images are fed into the network to form a dual-stream structure with the RGB images, and a dual-stream deep neural network is proposed. (ii) the system adopts an unsupervised end-to-end training method, thus the labor-intensive data labeling task is not required. We have tested our system on the KITTI dataset, and results show that the proposed RGB-D Visual Odometry (VO) system has obvious advantages over other state-of-the-art systems in terms of both translation and rotation errors.
Collapse
|