1
|
Kaur J, Singh W. Tools, techniques, datasets and application areas for object detection in an image: a review. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:38297-38351. [PMID: 35493415 PMCID: PMC9033309 DOI: 10.1007/s11042-022-13153-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 02/24/2022] [Accepted: 04/10/2022] [Indexed: 06/14/2023]
Abstract
Object detection is one of the most fundamental and challenging tasks to locate objects in images and videos. Over the past, it has gained much attention to do more research on computer vision tasks such as object classification, counting of objects, and object monitoring. This study provides a detailed literature review focusing on object detection and discusses the object detection techniques. A systematic review has been followed to summarize the current research work's findings and discuss seven research questions related to object detection. Our contribution to the current research work is (i) analysis of traditional, two-stage, one-stage object detection techniques, (ii) Dataset preparation and available standard dataset, (iii) Annotation tools, and (iv) performance evaluation metrics. In addition, a comparative analysis has been performed and analyzed that the proposed techniques are different in their architecture, optimization function, and training strategies. With the remarkable success of deep neural networks in object detection, the performance of the detectors has improved. Various research challenges and future directions for object detection also has been discussed in this research paper.
Collapse
Affiliation(s)
- Jaskirat Kaur
- Department of Computer Science, Punjabi University, Patiala, India
| | - Williamjeet Singh
- Department of Computer Science and Engineering, Punjabi University, Patiala, India
| |
Collapse
|
2
|
Sparse robust multiview feature selection via adaptive-weighting strategy. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01453-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Sun G, Cong Y, Zhang Y, Zhao G, Fu Y. Continual Multiview Task Learning via Deep Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:139-150. [PMID: 32175877 DOI: 10.1109/tnnls.2020.2977497] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The state-of-the-art multitask multiview (MTMV) learning tackles a scenario where multiple tasks are related to each other via multiple shared feature views. However, in many real-world scenarios where a sequence of the multiview task comes, the higher storage requirement and computational cost of retraining previous tasks with MTMV models have presented a formidable challenge for this lifelong learning scenario. To address this challenge, in this article, we propose a new continual multiview task learning model that integrates deep matrix factorization and sparse subspace learning in a unified framework, which is termed deep continual multiview task learning (DCMvTL). More specifically, as a new multiview task arrives, DCMvTL first adopts a deep matrix factorization technique to capture hidden and hierarchical representations for this new coming multiview task while accumulating the fresh multiview knowledge in a layerwise manner. Then, a sparse subspace learning model is employed for the extracted factors at each layer and further reveals cross-view correlations via a self-expressive constraint. For model optimization, we derive a general multiview learning formulation when a new multiview task comes and apply an alternating minimization strategy to achieve lifelong learning. Extensive experiments on benchmark data sets demonstrate the effectiveness of our proposed DCMvTL model compared with the existing state-of-the-art MTMV and lifelong multiview task learning models.
Collapse
|
4
|
Wu M, Ling H, Bi N, Gao S, Hu Q, Sheng H, Yu J. Visual Tracking with Multiview Trajectory Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8355-8367. [PMID: 32790628 DOI: 10.1109/tip.2020.3014952] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent progresses in visual tracking have greatly improved the tracking performance. However, challenges such as occlusion and view change remain obstacles in real world deployment. A natural solution to these challenges is to use multiple cameras with multiview inputs, though existing systems are mostly limited to specific targets (e.g. human), static cameras, and/or require camera calibration. To break through these limitations, we propose a generic multiview tracking (GMT) framework that allows camera movement, while requiring neither specific object model nor camera calibration. A key innovation in our framework is a cross-camera trajectory prediction network (TPN), which implicitly and dynamically encodes camera geometric relations, and hence addresses missing target issues such as occlusion. Moreover, during tracking, we assemble information across different cameras to dynamically update a novel collaborative correlation filter (CCF), which is shared among cameras to achieve robustness against view change. The two components are integrated into a correlation filter tracking framework, where features are trained offline using existing single view tracking datasets. For evaluation, we first contribute a new generic multiview tracking dataset (GMTD) with careful annotations, and then run experiments on the GMTD and CAMPUS datasets. The proposed GMT algorithm shows clear advantages in terms of robustness over state-of-the-art ones.
Collapse
|
5
|
Fan B, Cong Y, Tian J, Tang Y. Reliable Multi-kernel Subtask Graph Correlation Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8120-8133. [PMID: 32746242 DOI: 10.1109/tip.2020.3009883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many astonishing correlation filter trackers pay limited concentration on the tracking reliability and locating accuracy. To solve the issues, we propose a reliable and accurate cross correlation particle filter tracker via graph regularized multi-kernel multi-subtask learning. Specifically, multiple non-linear kernels are assigned to multi-channel features with reliable feature selection. Each kernel space corresponds to one type of reliable and discriminative features. Then, we define the trace of each target subregion with one feature as a single view, and their multi-view cooperations and interdependencies are exploited to jointly learn multi-kernel subtask cross correlation particle filters, and make them complement and boost each other. The learned filters consist of two complementary parts: weighted combination of base kernels and reliable integration of base filters. The former is associated to feature reliability with importance map, and the weighted information reflects different tracking contribution to accurate location. The second part is to find the reliable target subtasks via the response map, to exclude the distractive subtasks or backgrounds. Besides, the proposed tracker constructs the Laplacian graph regularization via cross similarity of different subtasks, which not only exploits the intrinsic structure among subtasks, and preserves their spatial layout structure, but also maintains the temporal-spatial consistency of subtasks. Comprehensive experiments on five datasets demonstrate its remarkable and competitive performance against state-of-the-art methods.
Collapse
|
6
|
Abstract
Deep features extracted from convolutional neural networks have been recently utilized in visual tracking to obtain a generic and semantic representation of target candidates. In this paper, we propose a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we propose an efficient and fast numerical algorithm that consists of two subproblems with the close-form solutions. Different evaluations in terms of success and precision on the benchmarks of challenging image sequences (e.g., OTB50 and OTB100) demonstrate the superior performance of the STLDF against several state-of-the-art trackers.
Collapse
|
7
|
Chen X, Chen H, Wu H, Huang Y, Yang Y, Zhang W, Xiong P. Robust Visual Ship Tracking with an Ensemble Framework via Multi-View Learning and Wavelet Filter. SENSORS 2020; 20:s20030932. [PMID: 32050581 PMCID: PMC7039392 DOI: 10.3390/s20030932] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 02/03/2020] [Accepted: 02/07/2020] [Indexed: 12/02/2022]
Abstract
Maritime surveillance videos provide crucial on-spot kinematic traffic information (traffic volume, ship speeds, headings, etc.) for varied traffic participants (maritime regulation departments, ship crew, ship owners, etc.) which greatly benefits automated maritime situational awareness and maritime safety improvement. Conventional models heavily rely on visual ship features for the purpose of tracking ships from maritime image sequences which may contain arbitrary tracking oscillations. To address this issue, we propose an ensemble ship tracking framework with a multi-view learning algorithm and wavelet filter model. First, the proposed model samples ship candidates with a particle filter following the sequential importance sampling rule. Second, we propose a multi-view learning algorithm to obtain raw ship tracking results in two steps: extracting a group of distinct ship contour relevant features (i.e., Laplacian of Gaussian, local binary pattern, Gabor filter, histogram of oriented gradient, and canny descriptors) and learning high-level intrinsic ship features by jointly exploiting underlying relationships shared by each type of ship contour features. Third, with the help of the wavelet filter, we performed a data quality control procedure to identify abnormal oscillations in the ship positions which were further corrected to generate the final ship tracking results. We demonstrate the proposed ship tracker’s performance on typical maritime traffic scenarios through four maritime surveillance videos.
Collapse
Affiliation(s)
- Xinqiang Chen
- Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China; (X.C.); (Y.Y.)
| | - Huixing Chen
- Merchant Marine College, Shanghai Maritime University, Shanghai 201306, China;
| | - Huafeng Wu
- Merchant Marine College, Shanghai Maritime University, Shanghai 201306, China;
- Correspondence: (H.W.); (W.Z.)
| | - Yanguo Huang
- School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China;
| | - Yongsheng Yang
- Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China; (X.C.); (Y.Y.)
| | - Wenhui Zhang
- School of Traffic and Transportation, Northeast Forestry University, Harbin 150040, China
- Correspondence: (H.W.); (W.Z.)
| | - Pengwen Xiong
- School of Information Engineering, Nanchang University, Nanchang 330031, China;
| |
Collapse
|
8
|
Zheng W, Gou C, Wang FY. A novel approach inspired by optic nerve characteristics for few-shot occluded face recognition. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.045] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
|
10
|
Li K, Kong Y, Fu Y. Visual Object Tracking via Multi-Stream Deep Similarity Learning Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3311-3320. [PMID: 31869790 DOI: 10.1109/tip.2019.2959249] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual tracking remains a challenging research problem because of appearance variations of the object over time, changing cluttered background and requirement for real-time speed. In this paper, we investigate the problem of real-time accurate tracking in a instance-level tracking-by-verification mechanism. We propose a multi-stream deep similarity learning network to learn a similarity comparison model purely off-line. Our loss function encourages the distance between a positive patch and the background patches to be larger than that between the positive patch and the target template. Then, the learned model is directly used to determine the patch in each frame that is most distinctive to the background context and similar to the target template. Within the learned feature space, even if the distance between positive patches becomes large caused by the interference of background clutter, impact from hard distractors from the same class or the appearance change of the target, our method can still distinguish the target robustly using the relative distance. Besides, we also propose a complete framework considering the recovery from failures and the template updating to further improve the tracking performance without taking too much computing resource. Experiments on visual tracking benchmarks show the effectiveness of the proposed tracker when comparing with several recent real-time-speed trackers as well as trackers already included in the benchmarks.
Collapse
|
11
|
|
12
|
Zhu G, Zhang Z, Wang J, Wu Y, Lu H. Dynamic Collaborative Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3035-3046. [PMID: 32175852 DOI: 10.1109/tnnls.2018.2861838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Correlation filter has been demonstrated remarkable success for visual tracking recently. However, most existing methods often face model drift caused by several factors, such as unlimited boundary effect, heavy occlusion, fast motion, and distracter perturbation. To address the issue, this paper proposes a unified dynamic collaborative tracking framework that can perform more flexible and robust position prediction. Specifically, the framework learns the object appearance model by jointly training the objective function with three components: target regression submodule, distracter suppression submodule, and maximum margin relation submodule. The first submodule mainly takes advantage of the circulant structure of training samples to obtain the distinguishing ability between the target and its surrounding background. The second submodule optimizes the label response of the possible distracting region close to zero for reducing the peak value of the confidence map in the distracting region. Inspired by the structure output support vector machines, the third submodule is introduced to utilize the differences between target appearance representation and distracter appearance representation in the discriminative mapping space for alleviating the disturbance of the most possible hard negative samples. In addition, a CUR filter as an assistant detector is embedded to provide effective object candidates for alleviating the model drift problem. Comprehensive experimental results show that the proposed approach achieves the state-of-the-art performance in several public benchmark data sets.
Collapse
|
13
|
Tian X, Li Y, Liu T, Wang X, Tao D. Eigenfunction-Based Multitask Learning in a Reproducing Kernel Hilbert Space. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1818-1830. [PMID: 30371390 DOI: 10.1109/tnnls.2018.2873649] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multitask learning aims to improve the performance on related tasks by exploring the interdependence among them. Existing multitask learning methods explore the relatedness among tasks on the basis of the input features and the model parameters. In this paper, we focus on nonparametric multitask learning and propose to measure task relatedness from a novel perspective in a reproducing kernel Hilbert space (RKHS). Past works have shown that the objective function for a given task can be approximated using the top eigenvalues and corresponding eigenfunctions of a predefined integral operator on an RKHS. In our method, we formulate our objective for multitask learning as a linear combination of two sets of eigenfunctions, common eigenfunctions shared by different tasks and unique eigenfunctions in individual tasks, such that the eigenfunctions for one task can provide additional information on another and help to improve its performance. We present both theoretical and empirical validations of our proposed approach. The theoretical analysis demonstrates that our learning algorithm is uniformly argument stable and that the convergence rate of the generalization upper bound can be improved by learning multiple tasks. Experiments on several benchmark multitask learning data sets show that our method yields promising results.
Collapse
|
14
|
Li J, Zhang B, Zhang D. Shared Autoencoder Gaussian Process Latent Variable Model for Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4272-4286. [PMID: 29990089 DOI: 10.1109/tnnls.2017.2761401] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multiview learning reveals the latent correlation among different modalities and utilizes the complementary information to achieve a better performance in many applications. In this paper, we propose a novel multiview learning model based on the Gaussian process latent variable model (GPLVM) to learn a set of nonlinear and nonparametric mapping functions and obtain a shared latent variable in the manifold space. Different from the previous work on the GPLVM, the proposed shared autoencoder Gaussian process (SAGP) latent variable model assumes that there is an additional mapping from the observed data to the shared manifold space. Due to the introduction of the autoencoder framework, both nonlinear projections from and to the observation are considered simultaneously. Additionally, instead of fully connecting used in the conventional autoencoder, the SAGP achieves the mappings utilizing the GP, which remarkably reduces the number of estimated parameters and avoids the phenomenon of overfitting. To make the proposed method adaptive for classification, a discriminative regularization is embedded into the proposed method. In the optimization process, an efficient algorithm based on the alternating direction method and gradient decent techniques is designed to solve the encoder and decoder parts alternatively. Experimental results on three real-world data sets substantiate the effectiveness and superiority of the proposed approach as compared with the state of the art.
Collapse
|
15
|
Li Y, Tian X, Liu T, Tao D. On Better Exploring and Exploiting Task Relationships in Multitask Learning: Joint Model and Feature Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1975-1985. [PMID: 28436901 DOI: 10.1109/tnnls.2017.2690683] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Multitask learning (MTL) aims to learn multiple tasks simultaneously through the interdependence between different tasks. The way to measure the relatedness between tasks is always a popular issue. There are mainly two ways to measure relatedness between tasks: common parameters sharing and common features sharing across different tasks. However, these two types of relatedness are mainly learned independently, leading to a loss of information. In this paper, we propose a new strategy to measure the relatedness that jointly learns shared parameters and shared feature representations. The objective of our proposed method is to transform the features of different tasks into a common feature space in which the tasks are closely related and the shared parameters can be better optimized. We give a detailed introduction to our proposed MTL method. Additionally, an alternating algorithm is introduced to optimize the nonconvex objection. A theoretical bound is given to demonstrate that the relatedness between tasks can be better measured by our proposed MTL algorithm. We conduct various experiments to verify the superiority of the proposed joint model and feature MTL method.
Collapse
|
16
|
Chen Z, You X, Zhong B, Li J, Tao D. Dynamically Modulated Mask Sparse Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3706-3718. [PMID: 28113386 DOI: 10.1109/tcyb.2016.2577718] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Visual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model. Our contributions are twofold. First, mask templates produced by frame difference are introduced into our template dictionary. Since the mask templates contain abundant structure information of corruptions, the model could encode information about the corruptions on the object more efficiently. Meanwhile, the robustness of the tracker is further enhanced by adopting system dynamic, which considers the moving tendency of the object. Second, we provide the theoretic guarantee that by adapting the modulated template dictionary system, our new sparse model can be solved by the accelerated proximal gradient algorithm as efficient as in traditional sparse tracking methods. Extensive experimental evaluations demonstrate that our method significantly outperforms 21 other cutting-edge algorithms in both speed and tracking accuracy, especially when there are challenges such as pose variation, occlusion, and illumination changes.
Collapse
|
17
|
Zhang S, Lan X, Yao H, Zhou H, Tao D, Li X. A Biologically Inspired Appearance Model for Robust Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2357-2370. [PMID: 27448375 DOI: 10.1109/tnnls.2016.2586194] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we propose a biologically inspired appearance model for robust visual tracking. Motivated in part by the success of the hierarchical organization of the primary visual cortex (area V1), we establish an architecture consisting of five layers: whitening, rectification, normalization, coding, and pooling. The first three layers stem from the models developed for object recognition. In this paper, our attention focuses on the coding and pooling layers. In particular, we use a discriminative sparse coding method in the coding layer along with spatial pyramid representation in the pooling layer, which makes it easier to distinguish the target to be tracked from its background in the presence of appearance variations. An extensive experimental study shows that the proposed method has higher tracking accuracy than several state-of-the-art trackers.
Collapse
|
18
|
Nie L, Zhang L, Meng L, Song X, Chang X, Li X. Modeling Disease Progression via Multisource Multitask Learners: A Case Study With Alzheimer's Disease. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1508-1519. [PMID: 26929064 DOI: 10.1109/tnnls.2016.2520964] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Understanding the progression of chronic diseases can empower the sufferers in taking proactive care. To predict the disease status in the future time points, various machine learning approaches have been proposed. However, a few of them jointly consider the dual heterogeneities of chronic disease progression. In particular, the predicting task at each time point has features from multiple sources, and multiple tasks are related to each other in chronological order. To tackle this problem, we propose a novel and unified scheme to coregularize the prior knowledge of source consistency and temporal smoothness. We theoretically prove that our proposed model is a linear model. Before training our model, we adopt the matrix factorization approach to address the data missing problem. Extensive evaluations on real-world Alzheimer's disease data set have demonstrated the effectiveness and efficiency of our model. It is worth mentioning that our model is generally applicable to a rich range of chronic diseases.
Collapse
|
19
|
Yang Y, Hu W, Xie Y, Zhang W, Zhang T. Temporal Restricted Visual Tracking Via Reverse-Low-Rank Sparse Learning. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:485-498. [PMID: 27046920 DOI: 10.1109/tcyb.2016.2519532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
An effective representation model, which aims to mine the most meaningful information in the data, plays an important role in visual tracking. Some recent particle-filter-based trackers achieve promising results by introducing the low-rank assumption into the representation model. However, their assumed low-rank structure of candidates limits the robustness when facing severe challenges such as abrupt motion. To avoid the above limitation, we propose a temporal restricted reverse-low-rank learning algorithm for visual tracking with the following advantages: 1) the reverse-low-rank model jointly represents target and background templates via candidates, which exploits the low-rank structure among consecutive target observations and enforces the temporal consistency of target in a global level; 2) the appearance consistency may be broken when target suffers from sudden changes. To overcome this issue, we propose a local constraint via l1,2 mixed-norm, which can not only ensures the local consistency of target appearance, but also tolerates the sudden changes between two adjacent frames; and 3) to alleviate the inference of unreasonable representation values due to outlier candidates, an adaptive weighted scheme is designed to improve the robustness of the tracker. By evaluating on 26 challenge video sequences, the experiments show the effectiveness and favorable performance of the proposed algorithm against 12 state-of-the-art visual trackers.
Collapse
|
20
|
He Z, Li X, You X, Tao D, Tang YY. Connected Component Model for Multi-Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:3698-3711. [PMID: 27214900 DOI: 10.1109/tip.2016.2570553] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In multi-object tracking, it is critical to explore the data associations by exploiting the temporal information from a sequence of frames rather than the information from the adjacent two frames. Since straightforwardly obtaining data associations from multi-frames is an NP-hard multi-dimensional assignment (MDA) problem, most existing methods solve this MDA problem by either developing complicated approximate algorithms, or simplifying MDA as a 2D assignment problem based upon the information extracted only from adjacent frames. In this paper, we show that the relation between associations of two observations is the equivalence relation in the data association problem, based on the spatial-temporal constraint that the trajectories of different objects must be disjoint. Therefore, the MDA problem can be equivalently divided into independent subproblems by equivalence partitioning. In contrast to existing works for solving the MDA problem, we develop a connected component model (CCM) by exploiting the constraints of the data association and the equivalence relation on the constraints. Based upon CCM, we can efficiently obtain the global solution of the MDA problem for multi-object tracking by optimizing a sequence of independent data association subproblems. Experiments on challenging public data sets demonstrate that our algorithm outperforms the state-of-the-art approaches.
Collapse
|
21
|
Cai B, Xu X, Xing X, Jia K, Miao J, Tao D. BIT: Biologically Inspired Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1327-1339. [PMID: 26800541 DOI: 10.1109/tip.2016.2520358] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Visual tracking is challenging due to image variations caused by various factors, such as object deformation, scale change, illumination change, and occlusion. Given the superior tracking performance of human visual system (HVS), an ideal design of biologically inspired model is expected to improve computer visual tracking. This is, however, a difficult task due to the incomplete understanding of neurons' working mechanism in the HVS. This paper aims to address this challenge based on the analysis of visual cognitive mechanism of the ventral stream in the visual cortex, which simulates shallow neurons (S1 units and C1 units) to extract low-level biologically inspired features for the target appearance and imitates an advanced learning mechanism (S2 units and C2 units) to combine generative and discriminative models for target location. In addition, fast Gabor approximation and fast Fourier transform are adopted for real-time learning and detection in this framework. Extensive experiments on large-scale benchmark data sets show that the proposed biologically inspired tracker performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness. The acceleration technique in particular ensures that biologically inspired tracker maintains a speed of approximately 45 frames/s.
Collapse
|
22
|
Lan X, Ma AJ, Yuen PC, Chellappa R. Joint Sparse Representation and Robust Feature-Level Fusion for Multi-Cue Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:5826-5841. [PMID: 26415172 DOI: 10.1109/tip.2015.2481325] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Visual tracking using multiple features has been proved as a robust approach because features could complement each other. Since different types of variations such as illumination, occlusion, and pose may occur in a video sequence, especially long sequence videos, how to properly select and fuse appropriate features has become one of the key problems in this approach. To address this issue, this paper proposes a new joint sparse representation model for robust feature-level fusion. The proposed method dynamically removes unreliable features to be fused for tracking by using the advantages of sparse representation. In order to capture the non-linear similarity of features, we extend the proposed method into a general kernelized framework, which is able to perform feature fusion on various kernel spaces. As a result, robust tracking performance is obtained. Both the qualitative and quantitative experimental results on publicly available videos show that the proposed method outperforms both sparse representation-based and fusion based-trackers.
Collapse
|