1
|
Gómez J, Aycard O, Baber J. Efficient Detection and Tracking of Human Using 3D LiDAR Sensor. SENSORS (BASEL, SWITZERLAND) 2023; 23:4720. [PMID: 37430633 DOI: 10.3390/s23104720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/08/2023] [Accepted: 05/09/2023] [Indexed: 07/12/2023]
Abstract
Light Detection and Ranging (LiDAR) technology is now becoming the main tool in many applications such as autonomous driving and human-robot collaboration. Point-cloud-based 3D object detection is becoming popular and widely accepted in the industry and everyday life due to its effectiveness for cameras in challenging environments. In this paper, we present a modular approach to detect, track and classify persons using a 3D LiDAR sensor. It combines multiple principles: a robust implementation for object segmentation, a classifier with local geometric descriptors, and a tracking solution. Moreover, we achieve a real-time solution in a low-performance machine by reducing the number of points to be processed by obtaining and predicting regions of interest via movement detection and motion prediction without any previous knowledge of the environment. Furthermore, our prototype is able to successfully detect and track persons consistently even in challenging cases due to limitations on the sensor field of view or extreme pose changes such as crouching, jumping, and stretching. Lastly, the proposed solution is tested and evaluated in multiple real 3D LiDAR sensor recordings taken in an indoor environment. The results show great potential, with particularly high confidence in positive classifications of the human body as compared to state-of-the-art approaches.
Collapse
Affiliation(s)
- Juan Gómez
- Laboratoire d'Informatique (LIG), University of Grenoble Alpes, 38000 Grenoble, France
| | - Olivier Aycard
- Laboratoire d'Informatique (LIG), University of Grenoble Alpes, 38000 Grenoble, France
| | - Junaid Baber
- Laboratoire d'Informatique (LIG), University of Grenoble Alpes, 38000 Grenoble, France
| |
Collapse
|
2
|
Abstract
Pedestrian detection and tracking is necessary for autonomous vehicles and traffic management. This paper presents a novel solution to pedestrian detection and tracking for urban scenarios based on Doppler LiDAR that records both the position and velocity of the targets. The workflow consists of two stages. In the detection stage, the input point cloud is first segmented to form clusters, frame by frame. A subsequent multiple pedestrian separation process is introduced to further segment pedestrians close to each other. While a simple speed classifier is capable of extracting most of the moving pedestrians, a supervised machine learning-based classifier is adopted to detect pedestrians with insignificant radial velocity. In the tracking stage, the pedestrian’s state is estimated by a Kalman filter, which uses the speed information to estimate the pedestrian’s dynamics. Based on the similarity between the predicted and detected states of pedestrians, a greedy algorithm is adopted to associate the trajectories with the detection results. The presented detection and tracking methods are tested on two data sets collected in San Francisco, California by a mobile Doppler LiDAR system. The results of the pedestrian detection demonstrate that the proposed two-step classifier can improve the detection performance, particularly for detecting pedestrians far from the sensor. For both data sets, the use of Doppler speed information improves the F1-score and the recall by 15% to 20%. The subsequent tracking from the Kalman filter can achieve 83.9–55.3% for the multiple object tracking accuracy (MOTA), where the contribution of the speed measurements is secondary and insignificant.
Collapse
|
3
|
Wen J, Guillen L, Abe T, Suganuma T. A Hierarchy-Based System for Recognizing Customer Activity in Retail Environments. SENSORS 2021; 21:s21144712. [PMID: 34300452 PMCID: PMC8309534 DOI: 10.3390/s21144712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/05/2021] [Accepted: 07/06/2021] [Indexed: 11/18/2022]
Abstract
Customer activity (CA) in retail environments, which ranges over various shopper situations in store spaces, provides valuable information for store management and marketing planning. Several systems have been proposed for customer activity recognition (CAR) from in-store camera videos, and most of them use machine learning based end-to-end (E2E) CAR models, due to their remarkable performance. Usually, such E2E models are trained for target conditions (i.e., particular CA types in specific store spaces). Accordingly, the existing systems are not malleable to fit the changes in target conditions because they require entire retraining of their specialized E2E models and concurrent use of additional E2E models for new target conditions. This paper proposes a novel CAR system based on a hierarchy that organizes CA types into different levels of abstraction from lowest to highest. The proposed system consists of multiple CAR models, each of which performs CAR tasks that belong to a certain level of the hierarchy on the lower level’s output, and thus conducts CAR for videos through the models level by level. Since these models are separated, this system can deal efficiently with the changes in target conditions by modifying some models individually. Experimental results show the effectiveness of the proposed system in adapting to different target conditions.
Collapse
Affiliation(s)
- Jiahao Wen
- Graduate School of Information Sciences, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan;
| | - Luis Guillen
- Research Institute of Electrical Communication, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan;
| | - Toru Abe
- Cyberscience Center, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan;
- Correspondence:
| | - Takuo Suganuma
- Cyberscience Center, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan;
| |
Collapse
|
4
|
Korkalo O, Takala T. Measurement Noise Model for Depth Camera-Based People Tracking. SENSORS 2021; 21:s21134488. [PMID: 34209168 PMCID: PMC8271657 DOI: 10.3390/s21134488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/14/2021] [Accepted: 06/24/2021] [Indexed: 11/20/2022]
Abstract
Depth cameras are widely used in people tracking applications. They typically suffer from significant range measurement noise, which causes uncertainty in the detections made of the people. The data fusion, state estimation and data association tasks require that the measurement uncertainty is modelled, especially in multi-sensor systems. Measurement noise models for different kinds of depth sensors have been proposed, however, the existing approaches require manual calibration procedures which can be impractical to conduct in real-life scenarios. In this paper, we present a new measurement noise model for depth camera-based people tracking. In our tracking solution, we utilise the so-called plan-view approach, where the 3D measurements are transformed to the floor plane, and the tracking problem is solved in 2D. We directly model the measurement noise in the plan-view domain, and the errors that originate from the imaging process and the geometric transformations of the 3D data are combined. We also present a method for directly defining the noise models from the observations. Together with our depth sensor network self-calibration routine, the approach allows fast and practical deployment of depth-based people tracking systems.
Collapse
Affiliation(s)
- Otto Korkalo
- VTT Technical Research Centre of Finland Ltd., P.O. Box 1000, FI-02044 Espoo, Finland
- Correspondence:
| | - Tapio Takala
- Department of Computer Science, Aalto University, FI-00076 Espoo, Finland;
| |
Collapse
|
5
|
Asymmetric Adaptive Fusion in a Two-Stream Network for RGB-D Human Detection. SENSORS 2021; 21:s21030916. [PMID: 33572928 PMCID: PMC7866388 DOI: 10.3390/s21030916] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/06/2021] [Accepted: 01/12/2021] [Indexed: 11/16/2022]
Abstract
In recent years, human detection in indoor scenes has been widely applied in smart buildings and smart security, but many related challenges can still be difficult to address, such as frequent occlusion, low illumination and multiple poses. This paper proposes an asymmetric adaptive fusion two-stream network (AAFTS-net) for RGB-D human detection. This network can fully extract person-specific depth features and RGB features while reducing the typical complexity of a two-stream network. A depth feature pyramid is constructed by combining contextual information, with the motivation of combining multiscale depth features to improve the adaptability for targets of different sizes. An adaptive channel weighting (ACW) module weights the RGB-D feature channels to achieve efficient feature selection and information complementation. This paper also introduces a novel RGB-D dataset for human detection called RGBD-human, on which we verify the performance of the proposed algorithm. The experimental results show that AAFTS-net outperforms existing state-of-the-art methods and can maintain stable performance under conditions of frequent occlusion, low illumination and multiple poses.
Collapse
|
6
|
Zhang WL, Yang K, Xin YT, Zhao TS. Multi-Object Tracking Algorithm for RGB-D Images Based on Asymmetric Dual Siamese Networks. SENSORS (BASEL, SWITZERLAND) 2020; 20:E6745. [PMID: 33255800 PMCID: PMC7728318 DOI: 10.3390/s20236745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 11/14/2020] [Accepted: 11/23/2020] [Indexed: 06/12/2023]
Abstract
Currently, intelligent security systems are widely deployed in indoor buildings to ensure the safety of people in shopping malls, banks, train stations, and other indoor buildings. Multi-Object Tracking (MOT), as an important component of intelligent security systems, has received much attention from many researchers in recent years. However, existing multi-objective tracking algorithms still suffer from trajectory drift and interruption problems in crowded scenes, which cannot provide valuable data for managers. In order to solve the above problems, this paper proposes a Multi-Object Tracking algorithm for RGB-D images based on Asymmetric Dual Siamese networks (ADSiamMOT-RGBD). This algorithm combines appearance information from RGB images and target contour information from depth images. Furthermore, the attention module is applied to repress the redundant information in the combined features to overcome the trajectory drift problem. We also propose a trajectory analysis module, which analyzes whether the head movement trajectory is correct in combination with time-context information. It reduces the number of human error trajectories. The experimental results show that the proposed method in this paper has better tracking quality on the MICC, EPFL, and UMdatasets than the previous work.
Collapse
|
7
|
Wang Y, Wei X, Shen H, Ding L, Wan J. Robust fusion for RGB-D tracking using CNN features. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106302] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
A Novel RGB-D SLAM Algorithm Based on Cloud Robotics. SENSORS 2019; 19:s19235288. [PMID: 31805628 PMCID: PMC6928679 DOI: 10.3390/s19235288] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 11/25/2022]
Abstract
In this paper, we present a novel red-green-blue-depth simultaneous localization and mapping (RGB-D SLAM) algorithm based on cloud robotics, which combines RGB-D SLAM with the cloud robot and offloads the back-end process of the RGB-D SLAM algorithm to the cloud. This paper analyzes the front and back parts of the original RGB-D SLAM algorithm and improves the algorithm from three aspects: feature extraction, point cloud registration, and pose optimization. Experiments show the superiority of the improved algorithm. In addition, taking advantage of the cloud robotics, the RGB-D SLAM algorithm is combined with the cloud robot and the back-end part of the computationally intensive algorithm is offloaded to the cloud. Experimental validation is provided, which compares the cloud robotic-based RGB-D SLAM algorithm with the local RGB-D SLAM algorithm. The results of the experiments demonstrate the superiority of our framework. The combination of cloud robotics and RGB-D SLAM can not only improve the efficiency of SLAM but also reduce the robot’s price and size.
Collapse
|
9
|
Rasoulidanesh M, Yadav S, Herath S, Vaghei Y, Payandeh S. Deep Attention Models for Human Tracking Using RGBD. SENSORS 2019; 19:s19040750. [PMID: 30781737 PMCID: PMC6412970 DOI: 10.3390/s19040750] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 01/31/2019] [Accepted: 02/03/2019] [Indexed: 11/30/2022]
Abstract
Visual tracking performance has long been limited by the lack of better appearance models. These models fail either where they tend to change rapidly, like in motion-based tracking, or where accurate information of the object may not be available, like in color camouflage (where background and foreground colors are similar). This paper proposes a robust, adaptive appearance model which works accurately in situations of color camouflage, even in the presence of complex natural objects. The proposed model includes depth as an additional feature in a hierarchical modular neural framework for online object tracking. The model adapts to the confusing appearance by identifying the stable property of depth between the target and the surrounding object(s). The depth complements the existing RGB features in scenarios when RGB features fail to adapt, hence becoming unstable over a long duration of time. The parameters of the model are learned efficiently in the Deep network, which consists of three modules: (1) The spatial attention layer, which discards the majority of the background by selecting a region containing the object of interest; (2) the appearance attention layer, which extracts appearance and spatial information about the tracked object; and (3) the state estimation layer, which enables the framework to predict future object appearance and location. Three different models were trained and tested to analyze the effect of depth along with RGB information. Also, a model is proposed to utilize only depth as a standalone input for tracking purposes. The proposed models were also evaluated in real-time using KinectV2 and showed very promising results. The results of our proposed network structures and their comparison with the state-of-the-art RGB tracking model demonstrate that adding depth significantly improves the accuracy of tracking in a more challenging environment (i.e., cluttered and camouflaged environments). Furthermore, the results of depth-based models showed that depth data can provide enough information for accurate tracking, even without RGB information.
Collapse
Affiliation(s)
- Maryamsadat Rasoulidanesh
- Networked Robotics and Sensing Laboratory, School of Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
| | - Srishti Yadav
- Networked Robotics and Sensing Laboratory, School of Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
| | - Sachini Herath
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
| | - Yasaman Vaghei
- School of Mechatronic Systems Engineering, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
| | - Shahram Payandeh
- Networked Robotics and Sensing Laboratory, School of Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
| |
Collapse
|
10
|
Dimitrievski M, Veelaert P, Philips W. Behavioral Pedestrian Tracking Using a Camera and LiDAR Sensors on a Moving Vehicle. SENSORS 2019; 19:s19020391. [PMID: 30669359 PMCID: PMC6359120 DOI: 10.3390/s19020391] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 01/04/2019] [Accepted: 01/15/2019] [Indexed: 11/16/2022]
Abstract
In this paper, we present a novel 2D–3D pedestrian tracker designed for applications in autonomous vehicles. The system operates on a tracking by detection principle and can track multiple pedestrians in complex urban traffic situations. By using a behavioral motion model and a non-parametric distribution as state model, we are able to accurately track unpredictable pedestrian motion in the presence of heavy occlusion. Tracking is performed independently, on the image and ground plane, in global, motion compensated coordinates. We employ Camera and LiDAR data fusion to solve the association problem where the optimal solution is found by matching 2D and 3D detections to tracks using a joint log-likelihood observation model. Each 2D–3D particle filter then updates their state from associated observations and a behavioral motion model. Each particle moves independently following the pedestrian motion parameters which we learned offline from an annotated training dataset. Temporal stability of the state variables is achieved by modeling each track as a Markov Decision Process with probabilistic state transition properties. A novel track management system then handles high level actions such as track creation, deletion and interaction. Using a probabilistic track score the track manager can cull false and ambiguous detections while updating tracks with detections from actual pedestrians. Our system is implemented on a GPU and exploits the massively parallelizable nature of particle filters. Due to the Markovian nature of our track representation, the system achieves real-time performance operating with a minimal memory footprint. Exhaustive and independent evaluation of our tracker was performed by the KITTI benchmark server, where it was tested against a wide variety of unknown pedestrian tracking situations. On this realistic benchmark, we outperform all published pedestrian trackers in a multitude of tracking metrics.
Collapse
Affiliation(s)
- Martin Dimitrievski
- TELIN-IPI, Ghent University - imec, St-Pietersnieuwstraat 41, B-9000 Gent, Belgium.
| | - Peter Veelaert
- TELIN-IPI, Ghent University - imec, St-Pietersnieuwstraat 41, B-9000 Gent, Belgium.
| | - Wilfried Philips
- TELIN-IPI, Ghent University - imec, St-Pietersnieuwstraat 41, B-9000 Gent, Belgium.
| |
Collapse
|
11
|
Liu H, Luo J, Wu P, Xie S, Li H. People detection and tracking using RGB-D cameras for mobile robots. INT J ADV ROBOT SYST 2016. [DOI: 10.1177/1729881416657746] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
People detection and tracking is an essential capability for mobile robots in order to achieve natural human–robot interaction. In this article, a human detection and tracking system is designed and validated for mobile robots using color data with depth information RGB-depth (RGB-D) cameras. The whole framework is composed of human detection, tracking and re-identification. Firstly, ground points and ceiling planes are removed to reduce computation effort. A prior-knowledge guided random sample consensus fitting algorithm is used to detect the ground plane and ceiling points. All left points are projected onto the ground plane and subclusters are segmented for candidate detection. Meanshift clustering with an Epanechnikov kernel is conducted to partition different points into subclusters. We propose the new idea of spatial region of interest plan view maps which are employed to identify human candidates from point cloud subclusters. Here, a depth-weighted histogram is extracted online to feature a human candidate. Then, a particle filter algorithm is adopted to track the human’s motion. The integration of the depth-weighted histogram and particle filter provides a precise tool to track the motion of human objects. Finally, data association is set up to re-identify humans who are tracked. Extensive experiments are conducted to demonstrate the effectiveness and robustness of our human detection and tracking system.
Collapse
Affiliation(s)
- Hengli Liu
- School of Electrical and Automation Engineering, Shanghai University, Shanghai, China
| | - Jun Luo
- School of Electrical and Automation Engineering, Shanghai University, Shanghai, China
| | - Peng Wu
- School of Electrical and Automation Engineering, Shanghai University, Shanghai, China
| | - Shaorong Xie
- School of Electrical and Automation Engineering, Shanghai University, Shanghai, China
| | - Hengyu Li
- School of Electrical and Automation Engineering, Shanghai University, Shanghai, China
| |
Collapse
|
12
|
|