1
|
Cao J, Pang Y, Xie J, Khan FS, Shao L. From Handcrafted to Deep Features for Pedestrian Detection: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4913-4934. [PMID: 33929956 DOI: 10.1109/tpami.2021.3076733] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and a deep experimental analysis. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at https://github.com/JialeCao001/PedSurvey.
Collapse
|
2
|
Remote Sensing Approaches for Meteorological Disaster Monitoring: Recent Achievements and New Challenges. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19063701. [PMID: 35329388 PMCID: PMC8951235 DOI: 10.3390/ijerph19063701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 03/03/2022] [Accepted: 03/18/2022] [Indexed: 12/04/2022]
Abstract
Meteorological disaster monitoring is an important research direction in remote sensing technology in the field of meteorology, which can serve many meteorological disaster management tasks. The key issues in the remote sensing monitoring of meteorological disasters are monitoring task arrangement and organization, meteorological disaster information extraction, and multi-temporal disaster information change detection. To accurately represent the monitoring tasks, it is necessary to determine the timescale, perform sensor planning, and construct a representation model to monitor information. On this basis, the meteorological disaster information is extracted by remote sensing data-processing approaches. Furthermore, the multi-temporal meteorological disaster information is compared to detect the evolution of meteorological disasters. Due to the highly dynamic nature of meteorological disasters, the process characteristics of meteorological disasters monitoring have attracted more attention. Although many remote sensing approaches were successfully used for meteorological disaster monitoring, there are still gaps in process monitoring. In future, research on sensor planning, information representation models, multi-source data fusion, etc., will provide an important basis and direction to promote meteorological disaster process monitoring. The process monitoring strategy will further promote the discovery of correlations and impact mechanisms in the evolution of meteorological disasters.
Collapse
|
3
|
Ortiz Castelló V, Salvador Igual I, del Tejo Catalá O, Perez-Cortes JC. High-Profile VRU Detection on Resource-Constrained Hardware Using YOLOv3/v4 on BDD100K. J Imaging 2020; 6:142. [PMID: 34460539 PMCID: PMC8321163 DOI: 10.3390/jimaging6120142] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 12/10/2020] [Accepted: 12/14/2020] [Indexed: 11/16/2022] Open
Abstract
Vulnerable Road User (VRU) detection is a major application of object detection with the aim of helping reduce accidents in advanced driver-assistance systems and enabling the development of autonomous vehicles. Due to intrinsic complexity present in computer vision and to limitations in processing capacity and bandwidth, this task has not been completely solved nowadays. For these reasons, the well established YOLOv3 net and the new YOLOv4 one are assessed by training them on a huge, recent on-road image dataset (BDD100K), both for VRU and full on-road classes, with a great improvement in terms of detection quality when compared to their MS-COCO-trained generic correspondent models from the authors but with negligible costs in forward pass time. Additionally, some models were retrained when replacing the original Leaky ReLU convolutional activation functions from original YOLO implementation with two cutting-edge activation functions: the self-regularized non-monotonic function (MISH) and its self-gated counterpart (SWISH), with significant improvements with respect to the original activation function detection performance. Additionally, some trials were carried out including recent data augmentation techniques (mosaic and cutmix) and some grid size configurations, with cumulative improvements over the previous results, comprising different performance-throughput trade-offs.
Collapse
Affiliation(s)
- Vicent Ortiz Castelló
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, 46022 Valencia, Spain; (V.O.C.); (O.d.T.C.)
| | - Ismael Salvador Igual
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, 46022 Valencia, Spain; (V.O.C.); (O.d.T.C.)
| | - Omar del Tejo Catalá
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, 46022 Valencia, Spain; (V.O.C.); (O.d.T.C.)
| | - Juan-Carlos Perez-Cortes
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, 46022 Valencia, Spain; (V.O.C.); (O.d.T.C.)
- Departamento de Informática de Sistemas y Computadores (DISCA), Universitat Politècnica de València, 46022 Valencia, Spain
| |
Collapse
|
4
|
Yamaguchi A, Maya S, Maruchi K, Ueno K. LTSpAUC: Learning Time-Series Shapelets for Partial AUC Maximization. BIG DATA 2020; 8:391-411. [PMID: 33090026 DOI: 10.1089/big.2020.0069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Shapelets are discriminative segments used to classify time-series instances. Shapelet methods that jointly learn both classifiers and shapelets have been studied in recent years because such methods provide both interpretable results and superior accuracy. The partial area under the receiver operating characteristic curve (pAUC) for a low range of false-positive rates (FPR) is an important performance measure for practical cases in industries such as medicine, manufacturing, and maintenance. In this article, we propose a method that jointly learns both shapelets and a classifier for pAUC optimization in any FPR range, including the full AUC. In addition, we propose the following two extensions for shapelet methods: (1) reducing algorithmic complexity in time-series length to linear time and (2) explicitly determining the classes that shapelets tend to match. Comparing with state-of-the-art learning-based shapelet methods, we demonstrated the superiority of pAUC on UCR time-series data sets and its effectiveness in industrial case studies from medicine, manufacturing, and maintenance.
Collapse
Affiliation(s)
- Akihiro Yamaguchi
- System AI Laboratory, Corporate R&D Center, Toshiba Corporation, Kawasaki, Japan
| | - Shigeru Maya
- System AI Laboratory, Corporate R&D Center, Toshiba Corporation, Kawasaki, Japan
| | - Kohei Maruchi
- System AI Laboratory, Corporate R&D Center, Toshiba Corporation, Kawasaki, Japan
| | - Ken Ueno
- System AI Laboratory, Corporate R&D Center, Toshiba Corporation, Kawasaki, Japan
| |
Collapse
|
5
|
Development of Land Cover Classification Model Using AI Based FusionNet Network. REMOTE SENSING 2020. [DOI: 10.3390/rs12193171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Prompt updates of land cover maps are important, as spatial information of land cover is widely used in many areas. However, current manual digitizing methods are time consuming and labor intensive, hindering rapid updates of land cover maps. The objective of this study was to develop an artificial intelligence (AI) based land cover classification model that allows for rapid land cover classification from high-resolution remote sensing (HRRS) images. The model comprises of three modules: pre-processing, land cover classification, and post-processing modules. The pre-processing module separates the HRRS image into multiple aspects by overlapping 75% using the sliding window algorithm. The land cover classification module was developed using the convolutional neural network (CNN) concept, based the FusionNet network and used to assign a land cover type to the separated HRRS images. Post-processing module determines ultimate land cover types by summing up the separated land cover result from the land cover classification module. Model training and validation were conducted to evaluate the performance of the developed model. The land cover maps and orthographic images of 547.29 km2 in area from the Jeonnam province in Korea were used to train the model. For model validation, two spatial and temporal different sites, one from Subuk-myeon of Jeonnam province in 2018 and the other from Daseo-myeon of Chungbuk province in 2016, were randomly chosen. The model performed reasonably well, demonstrating overall accuracies of 0.81 and 0.71, and kappa coefficients of 0.75 and 0.64, for the respective validation sites. The model performance was better when only considering the agricultural area by showing overall accuracy of 0.83 and kappa coefficients of 0.73. It was concluded that the developed model may assist rapid land cover update especially for agricultural areas and incorporation field boundary lineation is suggested as future study to further improve the model accuracy.
Collapse
|
6
|
|
7
|
Zhao ZQ, Zheng P, Xu ST, Wu X. Object Detection With Deep Learning: A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3212-3232. [PMID: 30703038 DOI: 10.1109/tnnls.2018.2876865] [Citation(s) in RCA: 753] [Impact Index Per Article: 150.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.
Collapse
|
8
|
Abari ME, Naghsh-Nilchi A. Toward a pedestrian detection method by various feature combinations. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2019. [DOI: 10.3233/kes-190411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
9
|
Liu C, Guo Y, Li S, Chang F. ACF Based Region Proposal Extraction for YOLOv3 Network Towards High-Performance Cyclist Detection in High Resolution Images. SENSORS (BASEL, SWITZERLAND) 2019; 19:E2671. [PMID: 31200511 PMCID: PMC6630625 DOI: 10.3390/s19122671] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 06/09/2019] [Accepted: 06/11/2019] [Indexed: 11/16/2022]
Abstract
You Only Look Once (YOLO) deep network can detect objects quickly with high precision and has been successfully applied in many detection problems. The main shortcoming of YOLO network is that YOLO network usually cannot achieve high precision when dealing with small-size object detection in high resolution images. To overcome this problem, we propose an effective region proposal extraction method for YOLO network to constitute an entire detection structure named ACF-PR-YOLO, and take the cyclist detection problem to show our methods. Instead of directly using the generated region proposals for classification or regression like most region proposal methods do, we generate large-size potential regions containing objects for the following deep network. The proposed ACF-PR-YOLO structure includes three main parts. Firstly, a region proposal extraction method based on aggregated channel feature (ACF) is proposed, called ACF based region proposal (ACF-PR) method. In ACF-PR, ACF is firstly utilized to fast extract candidates and then a bounding boxes merging and extending method is designed to merge the bounding boxes into correct region proposals for the following YOLO net. Secondly, we design suitable YOLO net for fine detection in the region proposals generated by ACF-PR. Lastly, we design a post-processing step, in which the results of YOLO net are mapped into the original image outputting the detection and localization results. Experiments performed on the Tsinghua-Daimler Cyclist Benchmark with high resolution images and complex scenes show that the proposed method outperforms the other tested representative detection methods in average precision, and that it outperforms YOLOv3 by 13.69 % average precision and outperforms SSD by 25.27 % average precision.
Collapse
Affiliation(s)
- Chunsheng Liu
- School of Control Science and Engineering, Shandong University, Ji'nan 250061, China.
| | - Yu Guo
- School of Control Science and Engineering, Shandong University, Ji'nan 250061, China.
| | - Shuang Li
- School of Control Science and Engineering, Shandong University, Ji'nan 250061, China.
| | - Faliang Chang
- School of Control Science and Engineering, Shandong University, Ji'nan 250061, China.
| |
Collapse
|
10
|
Pedestrian and Cyclist Detection and Intent Estimation for Autonomous Vehicles: A Survey. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9112335] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
As autonomous vehicles become more common on the roads, their advancement draws on safety concerns for vulnerable road users, such as pedestrians and cyclists. This paper presents a review of recent developments in pedestrian and cyclist detection and intent estimation to increase the safety of autonomous vehicles, for both the driver and other road users. Understanding the intentions of the pedestrian/cyclist enables the self-driving vehicle to take actions to avoid incidents. To make this possible, development of methods/techniques, such as deep learning (DL), for the autonomous vehicle will be explored. For example, the development of pedestrian detection has been significantly advanced using DL approaches, such as; Fast Region-Convolutional Neural Network (R-CNN) , Faster R-CNN and Single Shot Detector (SSD). Although DL has been around for several decades, the hardware to realise the techniques have only recently become viable. Using these DL methods for pedestrian and cyclist detection and applying it for the tracking, motion modelling and pose estimation can allow for a successful and accurate method of intent estimation for the vulnerable road users. Although there has been a growth in research surrounding the study of pedestrian detection using vision-based approaches, further attention should include focus on cyclist detection. To further improve safety for these vulnerable road users (VRUs), approaches such as sensor fusion and intent estimation should be investigated.
Collapse
|
11
|
Zhu C, Yin XC. Detecting Multi-Resolution Pedestrians Using Group Cost-Sensitive Boosting with Channel Features. SENSORS 2019; 19:s19040780. [PMID: 30769813 PMCID: PMC6412415 DOI: 10.3390/s19040780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 02/02/2019] [Accepted: 02/11/2019] [Indexed: 11/16/2022]
Abstract
Significant progress has been achieved in the past few years for the challenging task of pedestrian detection. Nevertheless, a major bottleneck of existing state-of-the-art approaches lies in a great drop in performance with reducing resolutions of the detected targets. For the boosting-based detectors which are popular in pedestrian detection literature, a possible cause for this drop is that in their boosting training process, low-resolution samples, which are usually more difficult to be detected due to the missing details, are still treated equally importantly as high-resolution samples, resulting in the false negatives since they are more easily rejected in the early stages and can hardly be recovered in the late stages. To address this problem, we propose in this paper a robust multi-resolution detection approach with a novel group cost-sensitive boosting algorithm, which is derived from the standard AdaBoost algorithm to further explore different costs for different resolution groups of the samples in the boosting process, and to place greater emphasis on low-resolution groups in order to better handle the detection of multi-resolution targets. The effectiveness of the proposed approach is evaluated on the Caltech pedestrian benchmark and KAIST (Korea Advanced Institute of Science and Technology) multispectral pedestrian benchmark, and validated by its promising performance on different resolution-specific test sets of both benchmarks.
Collapse
Affiliation(s)
- Chao Zhu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China.
| | - Xu-Cheng Yin
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China.
| |
Collapse
|
12
|
Ouyang W, Zhou H, Li H, Li Q, Yan J, Wang X. Jointly Learning Deep Features, Deformable Parts, Occlusion and Classification for Pedestrian Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:1874-1887. [PMID: 28809675 DOI: 10.1109/tpami.2017.2738645] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well explored. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture (Code available on www.ee.cuhk.edu.hk/wlouyang/projects/ouyangWiccv13Joint/index.html). By establishing automatic, mutual interaction among components, the deep model has average miss rate 8.57 percent/11.71 percent on the Caltech benchmark dataset with new/original annotations.
Collapse
|
13
|
Zuo X, Shen J, Yu H, Xu D, Qian C, Shan Y. Fast Pedestrian Detection Based on the Selective Window Differential Filter. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9746-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
14
|
|
15
|
Wang Y, Piérard S, Su SZ, Jodoin PM. Improving pedestrian detection using motion-guided filtering. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2016.11.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
16
|
Cao J, Pang Y, Li X. Learning Multilayer Channel Features for Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3210-3220. [PMID: 28459686 DOI: 10.1109/tip.2017.2694224] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Pedestrian detection based on the combination of convolutional neural network (CNN) and traditional handcrafted features (i.e., HOG+LUV) has achieved great success. In general, HOG+LUV are used to generate the candidate proposals and then CNN classifies these proposals. Despite its success, there is still room for improvement. For example, CNN classifies these proposals by the fully connected layer features, while proposal scores and the features in the inner-layers of CNN are ignored. In this paper, we propose a unifying framework called multi-layer channel features (MCF) to overcome the drawback. It first integrates HOG+LUV with each layer of CNN into a multi-layer image channels. Based on the multi-layer image channels, a multi-stage cascade AdaBoost is then learned. The weak classifiers in each stage of the multi-stage cascade are learned from the image channels of corresponding layer. Experiments on Caltech data set, INRIA data set, ETH data set, TUD-Brussels data set, and KITTI data set are conducted. With more abundant features, an MCF achieves the state of the art on Caltech pedestrian data set (i.e., 10.40% miss rate). Using new and accurate annotations, an MCF achieves 7.98% miss rate. As many non-pedestrian detection windows can be quickly rejected by the first few stages, it accelerates detection speed by 1.43 times. By eliminating the highly overlapped detection windows with lower scores after the first stage, it is 4.07 times faster than negligible performance loss.
Collapse
|
17
|
Multi-channel Convolutional Neural Network Ensemble for Pedestrian Detection. PATTERN RECOGNITION AND IMAGE ANALYSIS 2017. [DOI: 10.1007/978-3-319-58838-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register]
|
18
|
|
19
|
A Theoretical Analysis of Why Hybrid Ensembles Work. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2017:1930702. [PMID: 28255296 PMCID: PMC5307253 DOI: 10.1155/2017/1930702] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Revised: 12/06/2016] [Accepted: 01/05/2017] [Indexed: 11/23/2022]
Abstract
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
Collapse
|
20
|
Object Detection and Classification by Decision-Level Fusion for Intelligent Vehicle Systems. SENSORS 2017; 17:s17010207. [PMID: 28117742 PMCID: PMC5298778 DOI: 10.3390/s17010207] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 01/06/2017] [Accepted: 01/16/2017] [Indexed: 11/17/2022]
Abstract
To understand driving environments effectively, it is important to achieve accurate detection and classification of objects detected by sensor-based intelligent vehicle systems, which are significantly important tasks. Object detection is performed for the localization of objects, whereas object classification recognizes object classes from detected object regions. For accurate object detection and classification, fusing multiple sensor information into a key component of the representation and perception processes is necessary. In this paper, we propose a new object-detection and classification method using decision-level fusion. We fuse the classification outputs from independent unary classifiers, such as 3D point clouds and image data using a convolutional neural network (CNN). The unary classifiers for the two sensors are the CNN with five layers, which use more than two pre-trained convolutional layers to consider local to global features as data representation. To represent data using convolutional layers, we apply region of interest (ROI) pooling to the outputs of each layer on the object candidate regions generated using object proposal generation to realize color flattening and semantic grouping for charge-coupled device and Light Detection And Ranging (LiDAR) sensors. We evaluate our proposed method on a KITTI benchmark dataset to detect and classify three object classes: cars, pedestrians and cyclists. The evaluation results show that the proposed method achieves better performance than the previous methods. Our proposed method extracted approximately 500 proposals on a 1226×370 image, whereas the original selective search method extracted approximately 106×n proposals. We obtained classification performance with 77.72% mean average precision over the entirety of the classes in the moderate detection level of the KITTI benchmark dataset.
Collapse
|
21
|
Cao J, Pang Y, Li X. Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5538-5551. [PMID: 27654480 DOI: 10.1109/tip.2016.2609807] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Most state-of-the-art methods in pedestrian detection are unable to achieve a good trade-off between accuracy and efficiency. For example, ACF has a fast speed but a relatively low detection rate, while checkerboards have a high detection rate but a slow speed. Inspired by some simple inherent attributes of pedestrians (i.e., appearance constancy and shape symmetry), we propose two new types of non-neighboring features: side-inner difference features (SIDF) and symmetrical similarity features (SSFs). SIDF can characterize the difference between the background and pedestrian and the difference between the pedestrian contour and its inner part. SSF can capture the symmetrical similarity of pedestrian shape. However, it is difficult for neighboring features to have such above characterization abilities. Finally, we propose to combine both non-neighboring features and neighboring features for pedestrian detection. It is found that non-neighboring features can further decrease the log-average miss rate by 4.44%. The relationship between our proposed method and some state-of-the-art methods is also given. Experimental results on INRIA, Caltech, and KITTI data sets demonstrate the effectiveness and efficiency of the proposed method. Compared with the state-of-the-art methods without using CNN, our method achieves the best detection performance on Caltech, outperforming the second best method (i.e., checkerboards) by 2.27%. Using the new annotations of Caltech, it can achieve 11.87% miss rate, which outperforms other methods.
Collapse
|
22
|
|