1
|
Yuan D, Chang X, Liu Q, Yang Y, Wang D, Shu M, He Z, Shi G. Active Learning for Deep Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13284-13296. [PMID: 37163401 DOI: 10.1109/tnnls.2023.3266837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled training samples, and the number and quality of these samples directly affect the representational capability of the trained model. However, this approach is restrictive in practice, because manually labeling such a large number of training samples is time-consuming and prohibitively expensive. In this article, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNN model. Under the guidance of active learning, the tracker based on the trained deep CNN model can achieve competitive tracking performance while reducing the labeling cost. More specifically, to ensure the diversity of selected samples, we propose an active learning method based on multiframe collaboration to select those training samples that should be and need to be annotated. Meanwhile, considering the representativeness of these selected samples, we adopt a nearest-neighbor discrimination method based on the average nearest-neighbor distance to screen isolated samples and low-quality samples. Therefore, the training samples' subset selected based on our method requires only a given budget to maintain the diversity and representativeness of the entire sample set. Furthermore, we adopt a Tversky loss to improve the bounding box estimation of our tracker, which can ensure that the tracker achieves more accurate target states. Extensive experimental results confirm that our active-learning-based tracker (ALT) achieves competitive tracking accuracy and speed compared with state-of-the-art trackers on the seven most challenging evaluation benchmarks. Project website: https://sites.google.com/view/altrack/.
Collapse
|
2
|
Fu Y, Li M, Liu W, Wang Y, Zhang J, Yin B, Wei X, Yang X. Distractor-Aware Event-Based Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:6129-6141. [PMID: 37889807 DOI: 10.1109/tip.2023.3326683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
Event cameras, or dynamic vision sensors, have recently achieved success from fundamental vision tasks to high-level vision researches. Due to its ability to asynchronously capture light intensity changes, event camera has an inherent advantage to capture moving objects in challenging scenarios including objects under low light, high dynamic range, or fast moving objects. Thus event camera are natural for visual object tracking. However, the current event-based trackers derived from RGB trackers simply modify the input images to event frames and still follow conventional tracking pipeline that mainly focus on object texture for target distinction. As a result, the trackers may not be robust dealing with challenging scenarios such as moving cameras and cluttered foreground. In this paper, we propose a distractor-aware event-based tracker that introduces transformer modules into Siamese network architecture (named DANet). Specifically, our model is mainly composed of a motion-aware network and a target-aware network, which simultaneously exploits both motion cues and object contours from event data, so as to discover motion objects and identify the target object by removing dynamic distractors. Our DANet can be trained in an end-to-end manner without any post-processing and can run at over 80 FPS on a single V100. We conduct comprehensive experiments on two large event tracking datasets to validate the proposed model. We demonstrate that our tracker has superior performance against the state-of-the-art trackers in terms of both accuracy and efficiency.
Collapse
|
3
|
Chen H, Wang Z, Tian H, Yuan L, Wang X, Leng P. A Robust Visual Tracking Method Based on Reconstruction Patch Transformer Tracking. SENSORS (BASEL, SWITZERLAND) 2022; 22:6558. [PMID: 36081017 PMCID: PMC9460596 DOI: 10.3390/s22176558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/19/2022] [Accepted: 08/24/2022] [Indexed: 06/15/2023]
Abstract
Recently, the transformer model has progressed from the field of visual classification to target tracking. Its primary method replaces the cross-correlation operation in the Siamese tracker. The backbone of the network is still a convolutional neural network (CNN). However, the existing transformer-based tracker simply deforms the features extracted by the CNN into patches and feeds them into the transformer encoder. Each patch contains a single element of the spatial dimension of the extracted features and inputs into the transformer structure to use cross-attention instead of cross-correlation operations. This paper proposes a reconstruction patch strategy which combines the extracted features with multiple elements of the spatial dimension into a new patch. The reconstruction operation has the following advantages: (1) the correlation between adjacent elements combines well, and the features extracted by the CNN are usable for classification and regression; (2) using the performer operation reduces the amount of network computation and the dimension of the patch sent to the transformer, thereby sharply reducing the network parameters and improving the model-tracking speed.
Collapse
Affiliation(s)
- Hui Chen
- College of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Zhenhai Wang
- College of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Hongyu Tian
- School of Physics and Electronic Engineering, Linyi University, Linyi 276005, China
| | - Lutao Yuan
- College of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Xing Wang
- College of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Peng Leng
- Shandong (Linyi) Modern Agricultural Research Institute, Zhejiang University, Linyi 276000, China
| |
Collapse
|
4
|
Zhou Y, Du X, Wang M, Huo S, Zhang Y, Kung SY. Cross-Scale Residual Network: A General Framework for Image Super-Resolution, Denoising, and Deblocking. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5855-5867. [PMID: 33531310 DOI: 10.1109/tcyb.2020.3044374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In general, image restoration involves mapping from low-quality images to their high-quality counterparts. Such optimal mapping is usually nonlinear and learnable by machine learning. Recently, deep convolutional neural networks have proven promising for such learning processing. It is desirable for an image processing network to support well with three vital tasks, namely: 1) super-resolution; 2) denoising; and 3) deblocking. It is commonly recognized that these tasks have strong correlations, which enable us to design a general framework to support all tasks. In particular, the selection of feature scales is known to significantly impact the performance on these tasks. To this end, we propose the cross-scale residual network to exploit scale-related features among the three tasks. The proposed network can extract spatial features across different scales and establish cross-temporal feature reusage, so as to handle different tasks in a general framework. Our experiments show that the proposed approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations for multiple image restoration tasks.
Collapse
|
5
|
Face Recognition and Tracking Framework for Human–Robot Interaction. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recently, face recognition became a key element in social cognition which is used in various applications including human–robot interaction (HRI), pedestrian identification, and surveillance systems. Deep convolutional neural networks (CNNs) have achieved notable progress in recognizing faces. However, achieving accurate and real-time face recognition is still a challenging problem, especially in unconstrained environments due to occlusion, lighting conditions, and the diversity in head poses. In this paper, we present a robust face recognition and tracking framework in unconstrained settings. We developed our framework based on lightweight CNNs for all face recognition stages, including face detection, alignment and feature extraction, to achieve higher accuracies in these challenging circumstances while maintaining the real-time capabilities required for HRI systems. To maintain the accuracy, a single-shot multi-level face localization in the wild (RetinaFace) is utilized for face detection, and additive angular margin loss (ArcFace) is employed for recognition. For further enhancement, we introduce a face tracking algorithm that combines the information from tracked faces with the recognized identity to use in the further frames. This tracking algorithm improves the overall processing time and accuracy. The proposed system performance is tested in real-time experiments applied in an HRI study. Our proposed framework achieves real-time capabilities with an average of 99%, 95%, and 97% precision, recall, and F-score respectively. In addition, we implemented our system as a modular ROS package that makes it straightforward for integration in different real-world HRI systems.
Collapse
|
6
|
He X, Chen CYC. Exploring reliable visual tracking via target embedding network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
7
|
Yuan D, Chang X, Huang PY, Liu Q, He Z. Self-Supervised Deep Correlation Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:976-985. [PMID: 33259298 DOI: 10.1109/tip.2020.3037518] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The training of a feature extraction network typically requires abundant manually annotated training samples, making this a time-consuming and costly process. Accordingly, we propose an effective self-supervised learning-based tracker in a deep correlation framework (named: self-SDCT). Motivated by the forward-backward tracking consistency of a robust tracker, we propose a multi-cycle consistency loss as self-supervised information for learning feature extraction network from adjacent video frames. At the training stage, we generate pseudo-labels of consecutive video frames by forward-backward prediction under a Siamese correlation tracking framework and utilize the proposed multi-cycle consistency loss to learn a feature extraction network. Furthermore, we propose a similarity dropout strategy to enable some low-quality training sample pairs to be dropped and also adopt a cycle trajectory consistency loss in each sample pair to improve the training loss function. At the tracking stage, we employ the pre-trained feature extraction network to extract features and utilize a Siamese correlation tracking framework to locate the target using forward tracking alone. Extensive experimental results indicate that the proposed self-supervised deep correlation tracker (self-SDCT) achieves competitive tracking performance contrasted to state-of-the-art supervised and unsupervised tracking methods on standard evaluation benchmarks.
Collapse
|
8
|
Zheng L, Chen Y, Tang M, Wang J, Lu H. Siamese Deformable Cross-Correlation Network for Real-Time Visual Tracking. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.080] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
9
|
Real-Time Object Tracking via Adaptive Correlation Filters. SENSORS 2020; 20:s20154124. [PMID: 32722140 PMCID: PMC7435421 DOI: 10.3390/s20154124] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/18/2020] [Accepted: 07/21/2020] [Indexed: 11/29/2022]
Abstract
Although correlation filter-based trackers (CFTs) have made great achievements on both robustness and accuracy, the performance of trackers can still be improved, because most of the existing trackers use either a sole filter template or fixed features fusion weight to represent a target. Herein, a real-time dual-template CFT for various challenge scenarios is proposed in this work. First, the color histograms, histogram of oriented gradient (HOG), and color naming (CN) features are extracted from the target image patch. Then, the dual-template is utilized based on the target response confidence. Meanwhile, in order to solve the various appearance variations in complicated challenge scenarios, the schemes of discriminative appearance model, multi-peaks target re-detection, and scale adaptive are integrated into the proposed tracker. Furthermore, the problem that the filter model may drift or even corrupt is solved by using high confidence template updating technique. In the experiment, 27 existing competitors, including 16 handcrafted features-based trackers (HFTs) and 11 deep features-based trackers (DFTs), are introduced for the comprehensive contrastive analysis on four benchmark databases. The experimental results demonstrate that the proposed tracker performs favorably against state-of-the-art HFTs and is comparable with the DFTs.
Collapse
|
10
|
Zheng Y, Song H, Zhang K, Fan J, Liu X. Dynamically Spatiotemporal Regularized Correlation Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2336-2347. [PMID: 31443056 DOI: 10.1109/tnnls.2019.2929407] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recently, due to the high performance, spatially regularized strategy has been widely applied to addressing the issue of boundary effects existed in correlation filter (CF)-based visual tracking. Specifically, it introduces a spatially regularized term to penalize the coefficients of the CFs to be learned depending on their spatial locations. However, the regularization weights are often formed as a fixed Gaussian function, and hence may cause the learned model degenerate due to the inflexible constraints on the ever-changing CFs to be learned over time during tracking. To address this issue, in this paper, we develop a dynamically spatiotemporal regularization model to constrain the CFs to be learned with the ever-changing regularization weights learned from two consecutive frames. The proposed method jointly learns the CFs along with the dynamically spatiotemporal constraint term, which can be efficiently solved in the Fourier domain by the alternative direction method. Extensive evaluations on the popular data sets OTB-100 and VOT-2016 demonstrate that the proposed tracker performs favorably against the baseline tracker and several recently proposed state-of-the-art methods.
Collapse
|
11
|
Abstract
AbstractAn important area of computer vision is real-time object tracking, which is now widely used in intelligent transportation and smart industry technologies. Although the correlation filter object tracking methods have a good real-time tracking effect, it still faces many challenges such as scale variation, occlusion, and boundary effects. Many scholars have continuously improved existing methods for better efficiency and tracking performance in some aspects. To provide a comprehensive understanding of the background, key technologies and algorithms of single object tracking, this article focuses on the correlation filter-based object tracking algorithms. Specifically, the background and current advancement of the object tracking methodologies, as well as the presentation of the main datasets are introduced. All kinds of methods are summarized to present tracking results in various vision problems, and a visual tracking method based on reliability is observed.
Collapse
|
12
|
Mechanical System and Template-Matching-Based Position-Measuring Method for Automatic Spool Positioning and Loading in Welding Wire Winding. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10113762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Welding wire is a major type of welding consumable, which needs to be winded onto spools for sale. Currently, the winding process is accomplished manually due to obstacles such as automatic spool loading and clamping. When loading the spool, the angular position of the spool is a prerequisite for matching the drive rod on the spindle and drive bore on the spool. Therefore, this paper proposes a template-matching method combined with area-based matching and feature-point detection to measure the angular position of the spool, and presents a mechanical system that can rotate the spool to match the drive rod and push the spool onto the spindle. A novel feature-point distribution density (FPDD) method was developed to accelerate the matching process and improve matching reliability by pre-locating the searching area. The robustness and accuracy of the template-matching-based measuring method were validated using a built prototype of the mechanical system. The comparison result shows that the proposed method was superior in robustness, accuracy, and speed, and it was efficient for automatic spool loading in the welding wire winding process.
Collapse
|
13
|
Wang N, Zhou W, Song Y, Ma C, Li H. Real-Time Correlation Tracking via Joint Model Compression and Transfer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6123-6135. [PMID: 32356748 DOI: 10.1109/tip.2020.2989544] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Correlation filters (CF) have received considerable attention in visual tracking because of their computational efficiency. Leveraging deep features via off-the-shelf CNN models (e.g., VGG), CF trackers achieve state-of-the-art performance while consuming a large number of computing resources. This limits deep CF trackers to be deployed to many mobile platforms on which only a single-core CPU is available. In this paper, we propose to jointly compress and transfer off-the-shelf CNN models within a knowledge distillation framework. We formulate a CNN model pretrained from the image classification task as a teacher network, and distill this teacher network into a lightweight student network as the feature extractor to speed up CF trackers. In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network. Meanwhile, we design a tracking loss to adapt the objective of the student network from object recognition to visual tracking. The distillation process is performed offline on multiple layers and adaptively updates the student network using a background-aware online learning scheme. The online adaptation stage exploits the background contents to improve the feature discrimination of the student network. Extensive experiments on six standard datasets demonstrate that the lightweight student network accelerates the speed of state-of-the-art deep CF trackers to real-time on a single-core CPU while maintaining almost the same tracking accuracy.
Collapse
|
14
|
A Novel Changing Athlete Body Real-Time Visual Tracking Algorithm Based on Distractor-Aware SiamRPN and HOG-SVM. ELECTRONICS 2020. [DOI: 10.3390/electronics9020378] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Athlete detection in sports videos is a challenging task due to the dynamic and cluttered background. Distractor-aware SiamRPN (DaSiamRPN) has a simple network structure and can be utilized to perform long-term tracking of large data sets. However, similarly to the Siamese network, the tracking results heavily rely on the given position in the initial frame. Hence, there is a lack of solutions for some complex tracking scenarios, such as running and changing of bodies of athletes, especially in the stage from squatting to standing to running. The Haar feature-based cascade classifier is involved to catch the key frame, representing the video frame of the most dramatic changes of the athletes. DaSiamRPN is implemented as the tracking method. In each frame after the key frame, a detection window is given based on the bounding box generated by the DaSiamRPN tracker. In the new detection window, a fusion method (HOG-SVM) combining features of Histograms of Oriented Gradients (HOG) and a linear Support-Vector Machine (SVM) is proposed for detecting the athlete, and the tracking results are updated in real-time by fusing the tracking results of DaSiamRPN and HOG-SVM. Our proposed method has reached a stable and accurate tracking effect in testing on men’s 100 m video sequences and has realized real-time operation.
Collapse
|
15
|
Ge S, Luo Z, Zhang C, Hua Y, Tao D. Distilling Channels for Efficient Deep Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2610-2621. [PMID: 31714225 DOI: 10.1109/tip.2019.2950508] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.
Collapse
|
16
|
Xu T, Feng ZH, Wu XJ, Kittler J. Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5596-5609. [PMID: 31170074 DOI: 10.1109/tip.2019.2919201] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
With efficient appearance learning models, discriminative correlation filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filters. Consequently, the process of learning spatial filters can be approximated by the lasso regularization. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimization framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123, and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Collapse
|
17
|
Teng Z, Xing J, Wang Q, Zhang B, Fan J. Deep Spatial and Temporal Network for Robust Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1762-1775. [PMID: 31562088 DOI: 10.1109/tip.2019.2942502] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.
Collapse
|
18
|
Abstract
Kernel correlation filters (KCF) demonstrate significant potential in visual object tracking by employing robust descriptors. Proper selection of color and texture features can provide robustness against appearance variations. However, the use of multiple descriptors would lead to a considerable feature dimension. In this paper, we propose a novel low-rank descriptor, that provides better precision and success rate in comparison to state-of-the-art trackers. We accomplished this by concatenating the magnitude component of the Overlapped Multi-oriented Tri-scale Local Binary Pattern (OMTLBP), Robustness-Driven Hybrid Descriptor (RDHD), Histogram of Oriented Gradients (HoG), and Color Naming (CN) features. We reduced the rank of our proposed multi-channel feature to diminish the computational complexity. We formulated the Support Vector Machine (SVM) model by utilizing the circulant matrix of our proposed feature vector in the kernel correlation filter. The use of discrete Fourier transform in the iterative learning of SVM reduced the computational complexity of our proposed visual tracking algorithm. Extensive experimental results on Visual Tracker Benchmark dataset show better accuracy in comparison to other state-of-the-art trackers.
Collapse
|
19
|
Zhuo T, Cheng Z, Zhang P, Wong Y, Kankanhalli M. Unsupervised Online Video Object Segmentation With Motion Property Understanding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:237-249. [PMID: 31369377 DOI: 10.1109/tip.2019.2930152] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Unsupervised video object segmentation aims to automatically segment moving objects over an unconstrained video without any user annotation. So far, only few unsupervised online methods have been reported in the literature, and their performance is still far from satisfactory because the complementary information from future frames cannot be processed under online setting. To solve this challenging problem, in this paper, we propose a novel unsupervised online video object segmentation (UOVOS) framework by construing the motion property to mean moving in concurrence with a generic object for segmented regions. By incorporating the salient motion detection and the object proposal, a pixel-wise fusion strategy is developed to effectively remove detection noises, such as dynamic background and stationary objects. Furthermore, by leveraging the obtained segmentation from immediately preceding frames, a forward propagation algorithm is employed to deal with unreliable motion detection and object proposals. Experimental results on several benchmark datasets demonstrate the efficacy of the proposed method. Compared to state-of-the-art unsupervised online segmentation algorithms, the proposed method achieves an absolute gain of 6.2%. Moreover, our method achieves better performance than the best unsupervised offline algorithm on the DAVIS-2016 benchmark dataset. Our code is available on the project website: https://www.github.com/visiontao/uovos.
Collapse
|
20
|
Han Y, Deng C, Zhao B, Tao D. State-aware Anti-drift Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4075-4086. [PMID: 30892207 DOI: 10.1109/tip.2019.2905984] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Correlation filter (CF) based trackers have aroused increasing attentions in visual tracking field due to the superior performance on several datasets while maintaining high running speed. For each frame, an ideal filter is trained in order to discriminate the target from its surrounding background. Considering that the target always undergoes external and internal interference during tracking procedure, the trained tracker should not only have the ability to judge the current state when failure occurs, but also to resist the model drift caused by challenging distractions. To this end, we present a State-aware Anti-drift Tracker (SAT) in this paper, which jointly model the discrimination and reliability information in filter learning. Specifically, global context patches are incorporated into filter training stage to better distinguish the target from backgrounds. Meanwhile, a color-based reliable mask is learned to encourage the filter to focus on more reliable regions suitable for tracking. We show that the proposed optimization problem could be efficiently solved using Alternative Direction Method of Multipliers and fully carried out in Fourier domain. Furthermore, a Kurtosis-based updating scheme is advocated to reveal the tracking condition as well as guarantee a high-confidence template updating. Extensive experiments are conducted on OTB-100 and UAV-20L datasets to compare the SAT tracker with other relevant state-of-the-art methods. Both quantitative and qualitative evaluations further demonstrate the effectiveness and robustness of the proposed work.
Collapse
|
21
|
Wang Y, Shi W, Wu S. Robust UAV-based Tracking Using Hybrid Classifiers. MACHINE VISION AND APPLICATIONS 2019; 30:125-137. [PMID: 30983700 PMCID: PMC6456073 DOI: 10.1007/s00138-018-0981-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 09/15/2018] [Accepted: 09/21/2018] [Indexed: 06/09/2023]
Abstract
Visual object tracking is a key factor for unmanned aerial vehicles (UAVs). In this paper, we propose a robust and effective visual object tracking method with an appearance model based on the locally adaptive regression kernel (LARK). The proposed appearance model encodes the geometric structure of the target. The tracking problem is formulated as two binary classifiers via two support vector machines (SVMs) with online model update. The backward tracking which tracks the target in reverse of time is utilized to evaluate the accuracy and robustness of the two SVMs. The final locations are adaptively fused based on the results of the forward tracking and backward tracking validation. Several state-of-the-art tracking algorithms are evaluated on large scale benchmark datasets which include challenging factors such as heavy occlusion, pose variation, illumination variation and motion blur. Experimental results demonstrate that our method achieves appealing performance.
Collapse
Affiliation(s)
- Yong Wang
- School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
| | - Wei Shi
- INSKY Lab, Leotail Intelligent Tech, 1628, Pudong District, Shanghai
| | - Shandong Wu
- Departments of Radiology, Biomedical Informatics, Bioengineering, and Intelligent Systems, University of Pittsburgh, USA
| |
Collapse
|