1
|
Fu M, Lin Y, Yang J, Cheng J, Lin L, Wang G, Long C, Xu S, Lu J, Li G, Yan J, Chen G, Zhuo S, Chen D. Multitask machine learning-based tumor-associated collagen signatures predict peritoneal recurrence and disease-free survival in gastric cancer. Gastric Cancer 2024; 27:1242-1257. [PMID: 39271552 DOI: 10.1007/s10120-024-01551-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 09/02/2024] [Indexed: 09/15/2024]
Abstract
BACKGROUND Accurate prediction of peritoneal recurrence for gastric cancer (GC) is crucial in clinic. The collagen alterations in tumor microenvironment affect the migration and treatment response of cancer cells. Herein, we proposed multitask machine learning-based tumor-associated collagen signatures (TACS), which are composed of quantitative collagen features derived from multiphoton imaging, to simultaneously predict peritoneal recurrence (TACSPR) and disease-free survival (TACSDFS). METHODS Among 713 consecutive patients, with 275 in training cohort, 222 patients in internal validation cohort, and 216 patients in external validation cohort, we developed and validated a multitask machine learning model for simultaneously predicting peritoneal recurrence (TACSPR) and disease-free survival (TACSDFS). The accuracy of the model for prediction of peritoneal recurrence and prognosis as well as its association with adjuvant chemotherapy were evaluated. RESULTS The TACSPR and TACSDFS were independently associated with peritoneal recurrence and disease-free survival in three cohorts, respectively (all P < 0.001). The TACSPR demonstrated a favorable performance for peritoneal recurrence in all three cohorts. In addition, the TACSDFS also showed a satisfactory accuracy for disease-free survival among included patients. For stage II and III diseases, adjuvant chemotherapy improved the survival of patients with low TACSPR and low TACSDFS, or high TACSPR and low TACSDFS, or low TACSPR and high TACSDFS, but had no impact on patients with high TACSPR and high TACSDFS. CONCLUSIONS The multitask machine learning model allows accurate prediction of peritoneal recurrence and survival for GC and could distinguish patients who might benefit from adjuvant chemotherapy.
Collapse
Affiliation(s)
- Meiting Fu
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
- Department of Gastroenterology, Guangdong Provincial Key Laboratory of Gastroenterology, Nanfang Hospital, Guangzhou, 510515, People's Republic of China
- School of Science, Jimei University, Xiamen, 361021, People's Republic of China
| | - Yuyu Lin
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Junyao Yang
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Jiaxin Cheng
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Liyan Lin
- Department of Pathology, Fujian Key Laboratory of Translational Cancer Medicine, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, People's Republic of China
| | - Guangxing Wang
- School of Science, Jimei University, Xiamen, 361021, People's Republic of China
| | - Chenyan Long
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Shuoyu Xu
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Jianping Lu
- Department of Pathology, Fujian Key Laboratory of Translational Cancer Medicine, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, People's Republic of China
| | - Guoxin Li
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Jun Yan
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Gang Chen
- Department of Pathology, Fujian Key Laboratory of Translational Cancer Medicine, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, People's Republic of China
| | - Shuangmu Zhuo
- School of Science, Jimei University, Xiamen, 361021, People's Republic of China
- Key Laboratory of OptoElectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, 350007, People's Republic of China
| | - Dexin Chen
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China.
| |
Collapse
|
2
|
Wu BR, Ormazabal Arriagada S, Hsu TC, Lin TW, Lin C. Exploiting common patterns in diverse cancer types via multi-task learning. NPJ Precis Oncol 2024; 8:245. [PMID: 39472543 PMCID: PMC11522563 DOI: 10.1038/s41698-024-00700-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 08/30/2024] [Indexed: 11/02/2024] Open
Abstract
Cancer prognosis requires precision to identify high-risk patients and improve survival outcomes. Conventional methods struggle with the complexity of genetic biomarkers and diverse medical data. Our study uses deep learning to distil high-dimensional medical data into low-dimensional feature vectors exploring shared patterns across cancer types. We developed a multi-task bimodal neural network integrating RNA Sequencing and clinical data from three The Cancer Genome Atlas project datasets: Breast Invasive Carcinoma, Lung Adenocarcinoma, and Colon Adenocarcinoma. Our approach significantly improved prognosis prediction, especially for Colon Adenocarcinoma, with up to 26% increase in concordance index and 41% in the area under the precision-recall curve. External validation with Small Cell Lung Cancer achieved comparable metrics, indicating that supplementing small datasets with data from other cancers can improve performance. This work represents initial strides in using multi-task learning for prognosis prediction across cancer types, potentially revealing shared mechanisms among cancers and contributing to future applications in precision medicine.
Collapse
Affiliation(s)
- Bo-Run Wu
- Graduate Institute of Communication Engineering, National Taiwan University (NTU), Taipei, Taiwan
| | - Sofia Ormazabal Arriagada
- Graduate Institute of Communication Engineering, National Taiwan University (NTU), Taipei, Taiwan.
- Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.
- Taiwan International Graduate Program in Artificial Intelligence of Things, NTU, Taipei, Taiwan.
| | - Te-Cheng Hsu
- Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan
| | - Tsung-Wei Lin
- Graduate Institute of Communication Engineering, National Taiwan University (NTU), Taipei, Taiwan
| | - Che Lin
- Graduate Institute of Communication Engineering, National Taiwan University (NTU), Taipei, Taiwan.
- Department of Electrical Engineering, NTU, Taipei, Taiwan.
- Center for Advanced Computing and Imaging in Biomedicine, NTU, Taipei, Taiwan.
- Smart Medicine and Health Informatics Program, NTU, Taipei, Taiwan.
- School of Medicine, NTU, Taipei, Taiwan.
- Center for Biotechnology, NTU, Taipei, Taiwan.
- Computer and Information Networking Center of Electrical Engineering, NTU, Taipei, Taiwan.
| |
Collapse
|
3
|
Kang B, Liang D, Mei J, Tan X, Zhou Q, Zhang D. Robust RGB-T Tracking via Graph Attention-Based Bilinear Pooling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9900-9911. [PMID: 35417355 DOI: 10.1109/tnnls.2022.3161969] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
RGB-T tracker possesses strong capability of fusing two different yet complementary target observations, thus providing a promising solution to fulfill all-weather tracking in intelligent transportation systems. Existing convolutional neural network (CNN)-based RGB-T tracking methods often consider the multisource-oriented deep feature fusion from global viewpoint, but fail to yield satisfactory performance when the target pair only contains partially useful information. To solve this problem, we propose a four-stream oriented Siamese network (FS-Siamese) for RGB-T tracking. The key innovation of our network structure lies in that we formulate multidomain multilayer feature map fusion as a multiple graph learning problem, based on which we develop a graph attention-based bilinear pooling module to explore the partial feature interaction between the RGB and the thermal targets. This can effectively avoid uninformed image blocks disturbing feature embedding fusion. To enhance the efficiency of the proposed Siamese network structure, we propose to adopt meta-learning to incorporate category information in the updating of bilinear pooling results, which can online enforce the exemplar and current target appearance obtaining similar sematic representation. Extensive experiments on grayscale-thermal object tracking (GTOT) and RGBT234 datasets demonstrate that the proposed method outperforms the state-of-the-art methods for the task of RGB-T tracking.
Collapse
|
4
|
Wang Z, Chen H, Yuan L, Ren Y, Tian H, Wang X. SiamMLT: Siamese Hybrid Multi-layer Transformer Fusion Tracker. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11219-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
5
|
Xu T, Feng Z, Wu XJ, Kittler J. Towards Robust Visual Object Tracking with Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1541-1554. [PMID: 37027596 DOI: 10.1109/tip.2023.3246800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tracking performance as compared with the state-of-the-art tracking methods.
Collapse
|
6
|
Wu B, Wei B, Liu J, Wu K, Wang M. Faceted Text Segmentation via Multitask Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3846-3857. [PMID: 32894723 DOI: 10.1109/tnnls.2020.3015996] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Text segmentation is a fundamental step in natural language processing (NLP) and information retrieval (IR) tasks. Most existing approaches do not explicitly take into account the facet information of documents for segmentation. Text segmentation and facet annotation are often addressed as separate problems, but they operate in a common input space. This article proposes FTS, which is a novel model for faceted text segmentation via multitask learning (MTL). FTS models faceted text segmentation as an MTL problem with text segmentation and facet annotation. This model employs the bidirectional long short-term memory (Bi-LSTM) network to learn the feature representation of sentences within a document. The feature representation is shared and adjusted with common parameters by MTL, which can help an optimization model to learn a better-shared and robust feature representation from text segmentation to facet annotation. Moreover, the text segmentation is modeled as a sequence tagging task using LSTM with a conditional random fields (CRFs) classification layer. Extensive experiments are conducted on five data sets from five domains: data structure, data mining, computer network, solid mechanics, and crystallography. The results indicate that the FTS model outperforms several highly cited and state-of-the-art approaches related to text segmentation and facet annotation.
Collapse
|
7
|
|
8
|
Adaptive Channel Selection for Robust Visual Object Tracking with Discriminative Correlation Filters. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01435-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractDiscriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$
10
%
deep feature channels.
Collapse
|
9
|
|
10
|
The framework of learnable kernel function and its application to dictionary learning of SPD data. Pattern Anal Appl 2021. [DOI: 10.1007/s10044-020-00941-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
11
|
Gurkan F, Gunsel B. Integration of regularized l1 tracking and instance segmentation for video object tracking. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.072] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
12
|
Zhang Y, Gao X, Chen Z, Zhong H, Xie H, Yan C. Mining Spatial-Temporal Similarity for Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8107-8119. [PMID: 32746237 DOI: 10.1109/tip.2020.2981813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Correlation filter (CF) is a critical technique to improve accuracy and speed in the field of visual object tracking. Despite being studied extensively, most existing CF methods suffer from failing to make the most of the inherent spatial-temporal prior of videos. To address this limitation, as consecutive frames are eminently resemble in most videos, we investigate a novel scheme to predict targets' future state by exploiting previous observations. Specifically, in this paper, we propose a prediction based CF tracking framework by learning the spatial-temporal similarity of consecutive frames for sample managing, template regularization, and training response pre-weighting. We model the learning problem theoretically as a novel objective and provide effective optimization algorithms to solve the learning task. In addition, we implement two CF trackers with different features. Extensive experiments are conducted on three popular benchmarks to validate our scheme. The encouraging results demonstrate that the proposed scheme can significantly boost the accuracy of CF tracking, and the two trackers achieve competitive performances against state-of-the-art trackers. We finally present a comprehensive analysis on the efficacy of our proposed method and the efficiency of our trackers to facilitate real-world visual tracking applications.
Collapse
|
13
|
Yan S, Smith JS, Lu W, Zhang B. Abnormal Event Detection From Videos Using a Two-Stream Recurrent Variational Autoencoder. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2018.2883368] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
14
|
Lan X, Ye M, Zhang S, Zhou H, Yuen PC. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.10.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
|
16
|
Kang B, Liang D, Ding W, Zhou H, Zhu WP. Grayscale-Thermal Tracking via Inverse Sparse Representation based Collaborative Encoding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3401-3415. [PMID: 31880552 DOI: 10.1109/tip.2019.2959912] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Grayscale-thermal tracking has attracted a great deal of attention due to its capability of fusing two different yet complementary target observations. Existing methods often consider extracting the discriminative target information and exploring the target correlation among different images as two separate issues, ignoring their interdependence. This may cause tracking drifts in challenging video pairs. This paper presents a collaborative encoding model called joint correlation and discriminant analysis based inver-sparse representation (JCDA-InvSR) to jointly encode the target candidates in the grayscale and thermal video sequences. In particular, we develop a multi-objective programming to integrate the feature selection and the multi-view correlation analysis into a unified optimization problem in JCDA-InvSR, which can simultaneously highlight the special characters of the grayscale and thermal targets through alternately optimizing two aspects: the target discrimination within a given image and the target correlation across different images. For robust grayscale-thermal tracking, we also incorporate the prior knowledge of target candidate codes into the SVM based target classifier to overcome the overfitting caused by limited training labels. Extensive experiments on GTOT and RGBT234 datasets illustrate the promising performance of our tracking framework.
Collapse
|
17
|
Fang Y, Ko S, Jo GS. Robust visual tracking based on global-and-local search with confidence reliability estimation. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
Xu T, Feng ZH, Wu XJ, Kittler J. Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5596-5609. [PMID: 31170074 DOI: 10.1109/tip.2019.2919201] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
With efficient appearance learning models, discriminative correlation filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filters. Consequently, the process of learning spatial filters can be approximated by the lasso regularization. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimization framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123, and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Collapse
|
19
|
|
20
|
|
21
|
Xiao Y, Li J, Du B, Wu J, Li X, Chang J, Zhou Y. Robust correlation filter tracking with multi-scale spatial view. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
22
|
Li Z, Wei W, Zhang T, Wang M, Hou S, Peng X. Online Multi-expert Learning for Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:934-946. [PMID: 31425073 DOI: 10.1109/tip.2019.2931082] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The correlation filters based trackers have achieved an excellent performance for object tracking in recent years. However, most existing methods use only one filter but ignore the information of the previous filters. In this paper, we propose a novel online multi-expert learning algorithm for visual tracking. In our proposed scheme, there are former trackers which retain the previous filters, and those trackers will give their predictions in each frame. The current tracker represents the filter of current frame, and both the current tracker and the former trackers constitute our expert ensemble. We use an adaptive Second-order Quantile strategy to learn the weights of each expert, which can take full advantage of all the experts. To simplify our model and remove some bad experts, we prune our models via a minimum entropy criterion. Finally, we propose a new update strategy to avoid the model corruption problem. Extensive experimental results on both OTB2013 and OTB2015 benchmarks demonstrate that our proposed tracker performs favorably against state-of-the-art methods.
Collapse
|
23
|
Gao J, Zhang T, Xu C. SMART: Joint Sampling and Regression for Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3923-3935. [PMID: 30872227 DOI: 10.1109/tip.2019.2904434] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Most existing trackers are either sampling-based or regression-based methods. Sampling-based methods estimate the target state by sampling many target candidates. Although these methods achieve significant performance, they often suffer from a high computational burden. Regression-based methods often learn a computationally efficient regression function to directly predict the geometric distortion between frames. However, most of these methods require large-scale external training videos and are still not very impressive in terms of accuracy. To make both types of methods enhance and complement each other, in this paper, we propose a joint sampling and regression scheme for visual tracking, which leverages the region proposal network by a novel design. Specifically, our method can jointly exploit discriminative target proposal generation and structural target regression to predict target location in a simple feedforward propagation. We evaluate the proposed method on five challenging benchmarks, and extensive experimental results demonstrate that our method performs favorably compared with state-of-the-art trackers with respect to both accuracy and speed.
Collapse
|
24
|
Zhang D, Zakir A. Top–Down Saliency Detection Based on Deep-Learned Features. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2019. [DOI: 10.1142/s1469026819500093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
How to localize objects in images accurately and efficiently is a challenging problem in computer vision. In this paper, a novel top–down fine-grained salient object detection method based on deep-learned features is proposed, which can detect the same object in input image as the query image. The query image and its three subsample images are used as top–down cues to guide saliency detection. We ameliorate convolutional neural network (CNN) using the fast VGG network (VGG-f) pre-trained on ImageNet and re-trained on the Pascal VOC 2012 dataset. Experiment on the FiFA dataset demonstrates that proposed method can localize the saliency region and find the specific object (e.g., human face) as the query. Experiments on the David1 and Face1 sequences conclusively prove that the proposed algorithm is able to effectively deal with many challenging factors including illumination change, shape deformation, scale change and partial occlusion.
Collapse
Affiliation(s)
- Duzhen Zhang
- School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, Jiangsu, P. R. China
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, P. R. China
| | - Ali Zakir
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, P. R. China
| |
Collapse
|
25
|
|
26
|
Sun J, Chen Q, Sun J, Zhang T, Fang W, Wu X. Graph-structured multitask sparsity model for visual tracking. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.02.043] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
27
|
Parallel Correlation Filters for Real-Time Visual Tracking. SENSORS 2019; 19:s19102362. [PMID: 31121983 PMCID: PMC6566153 DOI: 10.3390/s19102362] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 05/15/2019] [Accepted: 05/20/2019] [Indexed: 11/17/2022]
Abstract
Correlation filter-based methods have recently performed remarkably well in terms of accuracy and speed in the visual object tracking research field. However, most existing correlation filter-based methods are not robust to significant appearance changes in the target, especially when the target undergoes deformation, illumination variation, and rotation. In this paper, a novel parallel correlation filters (PCF) framework is proposed for real-time visual object tracking. Firstly, the proposed method constructs two parallel correlation filters, one for tracking the appearance changes in the target, and the other for tracking the translation of the target. Secondly, through weighted merging the response maps of these two parallel correlation filters, the proposed method accurately locates the center position of the target. Finally, in the training stage, a new reasonable distribution of the correlation output is proposed to replace the original Gaussian distribution to train more accurate correlation filters, which can prevent the model from drifting to achieve excellent tracking performance. The extensive qualitative and quantitative experiments on the common object tracking benchmarks OTB-2013 and OTB-2015 have demonstrated that the proposed PCF tracker outperforms most of the state-of-the-art trackers and achieves a high real-time tracking performance.
Collapse
|
28
|
|
29
|
|
30
|
Zhang T, Xu C, Yang MH. Robust Structural Sparse Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:473-486. [PMID: 29994599 DOI: 10.1109/tpami.2018.2797082] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sparse representations have been applied to visual tracking by finding the best candidate region with minimal reconstruction error based on a set of target templates. However, most existing sparse trackers only consider holistic or local representations and do not make full use of the intrinsic structure among and inside target candidate regions, thereby making them less effective when similar objects appear at close proximity or under occlusion. In this paper, we propose a novel structural sparse representation, which not only exploits the intrinsic relationships among target candidate regions and local patches to learn their representations jointly, but also preserves the spatial structure among the local patches inside each target candidate region. For robust visual tracking, we take outliers resulting from occlusion and noise into account when searching for the best target region. Constructed within a Bayesian filtering framework, we show that the proposed algorithm accommodates most existing sparse trackers with respective merits. The formulated problem can be efficiently solved using an accelerated proximal gradient method that yields a sequence of closed form updates. Qualitative and quantitative evaluations on challenging benchmark datasets demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.
Collapse
|
31
|
Zhu G, Wang J, Wang P, Wu Y, Lu H. Feature Distilled Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:440-452. [PMID: 29990247 DOI: 10.1109/tcyb.2017.2776977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Feature extraction and representation is one of the most important components for fast, accurate, and robust visual tracking. Very deep convolutional neural networks (CNNs) provide effective tools for feature extraction with good generalization ability. However, extracting features using very deep CNN models needs high performance hardware due to its large computation complexity, which prohibits its extensions in real-time applications. To alleviate this problem, we aim at obtaining small and fast-to-execute shallow models based on model compression for visual tracking. Specifically, we propose a small feature distilled network (FDN) for tracking by imitating the intermediate representations of a much deeper network. The FDN extracts rich visual features with higher speed than the original deeper network. To further speed-up, we introduce a shift-and-stitch method to reduce the arithmetic operations, while preserving the spatial resolution of the distilled feature maps unchanged. Finally, a scale adaptive discriminative correlation filter is learned on the distilled feature for visual tracking to handle scale variation of the target. Comprehensive experimental results on object tracking benchmark datasets show that the proposed approach achieves 5× speed-up with competitive performance to the state-of-the-art deep trackers.
Collapse
|
32
|
Zhang T, Xu C, Yang MH. Learning Multi-Task Correlation Particle Filters for Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:365-378. [PMID: 29994598 DOI: 10.1109/tpami.2018.2797062] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose a multi-task correlation particle filter (MCPF) for robust visual tracking. We first present the multi-task correlation filter (MCF) that takes the interdependencies among different object parts and features into account to learn the correlation filters jointly. Next, the proposed MCPF is introduced to exploit and complement the strength of a MCF and a particle filter. Compared with existing tracking methods based on correlation filters and particle filters, the proposed MCPF enjoys several merits. First, it exploits the interdependencies among different features to derive the correlation filters jointly, and makes the learned filters complement and enhance each other to obtain consistent responses. Second, it handles partial occlusion via a part-based representation, and exploits the intrinsic relationship among local parts via spatial constraints to preserve object structure and learn the correlation filters jointly. Third, it effectively handles large scale variation via a sampling scheme by drawing particles at different scales for target object state estimation. Fourth, it shepherds the sampled particles toward the modes of the target state distribution via the MCF, and effectively covers object states well using fewer particles than conventional particle filters, thereby resulting in robust tracking performance and low computational cost. Extensive experimental results on four challenging benchmark datasets demonstrate that the proposed MCPF tracking algorithm performs favorably against the state-of-the-art methods.
Collapse
|
33
|
A novel reverse sparse model utilizing the spatio-temporal relationship of target templates for object tracking. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
34
|
Mondal A, Ghosh A, Ghosh S. Scaled and oriented object tracking using ensemble of multilayer perceptrons. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
35
|
Liu X, Xu Q, Mu Y, Yang J, Lin L, Yan S. High-Precision Camera Localization in Scenes with Repetitive Patterns. ACM T INTEL SYST TEC 2018. [DOI: 10.1145/3226111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
This article presents a high-precision multi-modal approach for localizing moving cameras with monocular videos, which has wide potentials in many intelligent applications, including robotics, autonomous vehicles, and so on. Existing visual odometry methods often suffer from symmetric or repetitive scene patterns, e.g., windows on buildings or parking stalls. To address this issue, we introduce a robust camera localization method that contributes in two aspects. First, we formulate feature tracking, the critical step of visual odometry, as a hierarchical min-cost network flow optimization task, and we regularize the formula with flow constraints, cross-scale consistencies, and motion heuristics. The proposed regularized formula is capable of adaptively selecting distinctive features or feature combinations, which is more effective than traditional methods that detect and group repetitive patterns in a separate step. Second, we develop a joint formula for integrating dense visual odometry and sparse GPS readings in a common reference coordinate. The fusion process is guided with high-order statistics knowledge to suppress the impacts of noises, clusters, and model drifting. We evaluate the proposed camera localization method on both public video datasets and a newly created dataset that includes scenes full of repetitive patterns. Results with comparisons show that our method can achieve comparable performance to state-of-the-art methods and is particularly effective for addressing repetitive pattern issues.
Collapse
Affiliation(s)
- Xiaobai Liu
- Department of Computer Science, San Diego State University, San Diego, CA, USA
| | - Qian Xu
- XreLab Inc., San Diego, CA, USA
| | - Yadong Mu
- Institute of Computer Science and Technology, Beijing, China
| | - Jiadi Yang
- University of California, San Jose, California, USA
| | - Liang Lin
- School of Advanced Computing, Sun Yat-Sen University, Guangzhou, China
| | - Shuicheng Yan
- Qihoo/360 Inc., China; National University of Singapore, Singapore
| |
Collapse
|
36
|
Yoon GJ, Hwang HJ, Yoon SM. Visual Object Tracking Using Structured Sparse PCA-Based Appearance Representation and Online Learning. SENSORS (BASEL, SWITZERLAND) 2018; 18:E3513. [PMID: 30340356 PMCID: PMC6209897 DOI: 10.3390/s18103513] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 10/11/2018] [Accepted: 10/15/2018] [Indexed: 11/16/2022]
Abstract
Visual object tracking is a fundamental research area in the field of computer vision and pattern recognition because it can be utilized by various intelligent systems. However, visual object tracking faces various challenging issues because tracking is influenced by illumination change, pose change, partial occlusion and background clutter. Sparse representation-based appearance modeling and dictionary learning that optimize tracking history have been proposed as one possible solution to overcome the problems of visual object tracking. However, there are limitations in representing high dimensional descriptors using the standard sparse representation approach. Therefore, this study proposes a structured sparse principal component analysis to represent the complex appearance descriptors of the target object effectively with a linear combination of a small number of elementary atoms chosen from an over-complete dictionary. Using an online dictionary for learning and updating by selecting similar dictionaries that have high probability makes it possible to track the target object in a variety of environments. Qualitative and quantitative experimental results, including comparison to the current state of the art visual object tracking algorithms, validate that the proposed tracking algorithm performs favorably with changes in the target object and environment for benchmark video sequences.
Collapse
Affiliation(s)
- Gang-Joon Yoon
- National Institute for Mathematical Science, 70 Yuseong-daero 1689 beon-gil, Yuseong-gu, Daejeon 34047, Korea.
| | - Hyeong Jae Hwang
- Artificial Intelligence Research Institute, 22, Daewangpangyo-ro 712beon-gil, Bundang-gu, Seongnam-si 463400, Gyeonggi-do, Korea.
| | - Sang Min Yoon
- College of Computer Science, Kookmin University, 77 Jeongneung-ro, Seongbuk-gu, Seoul 02707, Korea.
| |
Collapse
|
37
|
Li Z, Zhang J, Zhang K, Li Z. Visual Tracking With Weighted Adaptive Local Sparse Appearance Model via Spatio-Temporal Context Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4478-4489. [PMID: 29897873 DOI: 10.1109/tip.2018.2839916] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sparse representation has been widely exploited to develop an effective appearance model for object tracking due to its well discriminative capability in distinguishing the target from its surrounding background. However, most of these methods only consider either the holistic representation or the local one for each patch with equal importance, and hence may fail when the target suffers from severe occlusion or large-scale pose variation. In this paper, we propose a simple yet effective approach that exploits rich feature information from reliable patches based on weighted local sparse representation that takes into account the importance of each patch. Specifically, we design a reconstruction-error based weight function with the reconstruction error of each patch via sparse coding to measure the patch reliability. Moreover, we explore spatio-temporal context information to enhance the robustness of the appearance model, in which the global temporal context is learned via incremental subspace and sparse representation learning with a novel dynamic template update strategy to update the dictionary, while the local spatial context considers the correlation between the target and its surrounding background via measuring the similarity among their sparse coefficients. Extensive experimental evaluations on two large tracking benchmarks demonstrate favorable performance of the proposed method over some state-of-the-art trackers.
Collapse
|
38
|
Sun S, An Z, Jiang X, Zhang B, Zhang J. Robust object tracking with the inverse relocation strategy. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3667-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Spatio-Context-Based Target Tracking with Adaptive Multi-Feature Fusion for Real-World Hazy Scenes. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9550-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
40
|
|
41
|
Target Tracking Algorithm Based on an Adaptive Feature and Particle Filter. INFORMATION 2018. [DOI: 10.3390/info9060140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
42
|
Bo C, Zhang J, Liu J, Yao Q. Robust online object tracking via the convex hull representation model. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.02.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
43
|
Yuen PC, Chellappa R. Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2022-2037. [PMID: 29989985 DOI: 10.1109/tip.2017.2777183] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
Collapse
|
44
|
Sui Y, Wang G, Zhang L. Correlation Filter Learning Toward Peak Strength for Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1290-1303. [PMID: 28422678 DOI: 10.1109/tcyb.2017.2690860] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents a novel visual tracking approach to correlation filter learning toward peak strength of correlation response. Previous methods leverage all features of the target and the immediate background to learn a correlation filter. Some features, however, may be distractive to tracking, like those from occlusion and local deformation, resulting in unstable tracking performance. This paper aims at solving this issue and proposes a novel algorithm to learn the correlation filter. The proposed approach, by imposing an elastic net constraint on the filter, can adaptively eliminate those distractive features in the correlation filtering. A new peak strength metric is proposed to measure the discriminative capability of the learned correlation filter. It is demonstrated that the proposed approach effectively strengthens the peak of the correlation response, leading to more discriminative performance than previous methods. Extensive experiments on a challenging visual tracking benchmark demonstrate that the proposed tracker outperforms most state-of-the-art methods.
Collapse
|
45
|
Gao J, Zhang T, Yang X, Xu C. P2T: Part-to-Target Tracking via Deep Regression Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3074-3086. [PMID: 29994065 DOI: 10.1109/tip.2018.2813166] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Most existing part based tracking methods are part-to-part trackers, which usually have two separated steps including part matching and target localization. Different from existing methods, in this paper, we propose a novel part-totarget (P2T) tracker in a unified fashion by inferring target location from parts directly. To achieve this goal, we propose a novel deep regression model for part to target regression in an end-to-end framework via Convolutional Neural Networks. The proposed model is able to not only exploit part context information to preserve object spatial layout structure, but also learn part reliability to emphasize part importance for robust part to target regression. We evaluate the proposed tracker on 4 challenging benchmark sequences, and extensive experimental results demonstrate that our method performs favorably against state-of-the-art trackers because of the powerful capacity of the proposed deep regression model.
Collapse
|
46
|
Gundogdu E, Alatan AA. Good Features to Correlate for Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2526-2540. [PMID: 29994635 DOI: 10.1109/tip.2018.2806280] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.
Collapse
|
47
|
Shi G, Xu T, Guo J, Luo J, Li Y. Consistently Sampled Correlation Filters with Space Anisotropic Regularization for Visual Tracking. SENSORS 2017; 17:s17122889. [PMID: 29231876 PMCID: PMC5750837 DOI: 10.3390/s17122889] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/04/2017] [Accepted: 12/11/2017] [Indexed: 11/16/2022]
Abstract
Most existing correlation filter-based tracking algorithms, which use fixed patches and cyclic shifts as training and detection measures, assume that the training samples are reliable and ignore the inconsistencies between training samples and detection samples. We propose to construct and study a consistently sampled correlation filter with space anisotropic regularization (CSSAR) to solve these two problems simultaneously. Our approach constructs a spatiotemporally consistent sample strategy to alleviate the redundancies in training samples caused by the cyclical shifts, eliminate the inconsistencies between training samples and detection samples, and introduce space anisotropic regularization to constrain the correlation filter for alleviating drift caused by occlusion. Moreover, an optimization strategy based on the Gauss-Seidel method was developed for obtaining robust and efficient online learning. Both qualitative and quantitative evaluations demonstrate that our tracker outperforms state-of-the-art trackers in object tracking benchmarks (OTBs).
Collapse
Affiliation(s)
- Guokai Shi
- School of Optoelectronics, Image Engineering & Video Technology Lab, Beijing Institute of Technology, Beijing 100081, China.
| | - Tingfa Xu
- School of Optoelectronics, Image Engineering & Video Technology Lab, Beijing Institute of Technology, Beijing 100081, China.
- Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, Beijing 100081, China.
| | - Jie Guo
- School of Optoelectronics, Image Engineering & Video Technology Lab, Beijing Institute of Technology, Beijing 100081, China.
| | - Jiqiang Luo
- School of Optoelectronics, Image Engineering & Video Technology Lab, Beijing Institute of Technology, Beijing 100081, China.
| | - Yuankun Li
- School of Optoelectronics, Image Engineering & Video Technology Lab, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
48
|
Wu T, Lu Y, Zhu SC. Online Object Tracking, Learning and Parsing with And-Or Graphs. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:2465-2480. [PMID: 28026751 DOI: 10.1109/tpami.2016.2644963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper presents a method, called AOGTracker, for simultaneously tracking, learning and parsing (TLP) of unknown objects in video sequences with a hierarchical and compositional And-Or graph (AOG) representation. The TLP method is formulated in the Bayesian framework with a spatial and a temporal dynamic programming (DP) algorithms inferring object bounding boxes on-the-fly. During online learning, the AOG is discriminatively learned using latent SVM [1] to account for appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of a tracked object, as well as distractors (e.g., similar objects) in background. Three key issues in online inference and learning are addressed: (i) maintaining purity of positive and negative examples collected online, (ii) controling model complexity in latent structure learning, and (iii) identifying critical moments to re-learn the structure of AOG based on its intrackability. The intrackability measures uncertainty of an AOG based on its score maps in a frame. In experiments, our AOGTracker is tested on two popular tracking benchmarks with the same parameter setting: the TB-100/50/CVPR2013 benchmarks , [3] , and the VOT benchmarks [4] -VOT 2013, 2014, 2015 and TIR2015 (thermal imagery tracking). In the former, our AOGTracker outperforms state-of-the-art tracking algorithms including two trackers based on deep convolutional network [5] , [6] . In the latter, our AOGTracker outperforms all other trackers in VOT2013 and is comparable to the state-of-the-art methods in VOT2014, 2015 and TIR2015.
Collapse
|
49
|
Chen Z, You X, Zhong B, Li J, Tao D. Dynamically Modulated Mask Sparse Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3706-3718. [PMID: 28113386 DOI: 10.1109/tcyb.2016.2577718] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Visual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model. Our contributions are twofold. First, mask templates produced by frame difference are introduced into our template dictionary. Since the mask templates contain abundant structure information of corruptions, the model could encode information about the corruptions on the object more efficiently. Meanwhile, the robustness of the tracker is further enhanced by adopting system dynamic, which considers the moving tendency of the object. Second, we provide the theoretic guarantee that by adapting the modulated template dictionary system, our new sparse model can be solved by the accelerated proximal gradient algorithm as efficient as in traditional sparse tracking methods. Extensive experimental evaluations demonstrate that our method significantly outperforms 21 other cutting-edge algorithms in both speed and tracking accuracy, especially when there are challenges such as pose variation, occlusion, and illumination changes.
Collapse
|
50
|
Gu X, Wu S, Peng P, Shou L, Chen K, Chen G. CSIR4G: An effective and efficient cross-scenario image retrieval model for glasses. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.07.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|