1
|
Yuan D, Chang X, Liu Q, Yang Y, Wang D, Shu M, He Z, Shi G. Active Learning for Deep Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13284-13296. [PMID: 37163401 DOI: 10.1109/tnnls.2023.3266837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled training samples, and the number and quality of these samples directly affect the representational capability of the trained model. However, this approach is restrictive in practice, because manually labeling such a large number of training samples is time-consuming and prohibitively expensive. In this article, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNN model. Under the guidance of active learning, the tracker based on the trained deep CNN model can achieve competitive tracking performance while reducing the labeling cost. More specifically, to ensure the diversity of selected samples, we propose an active learning method based on multiframe collaboration to select those training samples that should be and need to be annotated. Meanwhile, considering the representativeness of these selected samples, we adopt a nearest-neighbor discrimination method based on the average nearest-neighbor distance to screen isolated samples and low-quality samples. Therefore, the training samples' subset selected based on our method requires only a given budget to maintain the diversity and representativeness of the entire sample set. Furthermore, we adopt a Tversky loss to improve the bounding box estimation of our tracker, which can ensure that the tracker achieves more accurate target states. Extensive experimental results confirm that our active-learning-based tracker (ALT) achieves competitive tracking accuracy and speed compared with state-of-the-art trackers on the seven most challenging evaluation benchmarks. Project website: https://sites.google.com/view/altrack/.
Collapse
|
2
|
Dai W, Liu R, Wu T, Wang M, Yin J, Liu J. Deeply Supervised Skin Lesions Diagnosis With Stage and Branch Attention. IEEE J Biomed Health Inform 2024; 28:719-729. [PMID: 37624725 DOI: 10.1109/jbhi.2023.3308697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Abstract
Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical to classify the images for early diagnosis of skin disorders. However, the practical use of these ensembled CNNs is limited as these networks are heavyweight and inadequate for processing contextual information. Although lightweight networks (e.g., MobileNetV3 and EfficientNet) were developed to achieve parameter reduction for implementing deep neural networks on mobile devices, insufficient depth of feature representation restricts the performance. To address the existing limitations, we develop a new lite and effective neural network, namely HierAttn. The HierAttn applies a novel deep supervision strategy to learn the local and global features by using multi-stage and multi-branch attention mechanisms with only one training loss. The efficacy of HierAttn was evaluated by using the dermoscopy images dataset ISIC2019 and smartphone photos dataset PAD-UFES-20 (PAD2020). The experimental results show that HierAttn achieves the best accuracy and area under the curve (AUC) among the state-of-the-art lightweight networks.
Collapse
|
3
|
Liu R, Liu T, Dan T, Yang S, Li Y, Luo B, Zhuang Y, Fan X, Zhang X, Cai H, Teng Y. AIDMAN: An AI-based object detection system for malaria diagnosis from smartphone thin-blood-smear images. PATTERNS (NEW YORK, N.Y.) 2023; 4:100806. [PMID: 37720337 PMCID: PMC10499858 DOI: 10.1016/j.patter.2023.100806] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/02/2023] [Accepted: 07/07/2023] [Indexed: 09/19/2023]
Abstract
Malaria is a significant public health concern, with ∼95% of cases occurring in Africa, but accurate and timely diagnosis is problematic in remote and low-income areas. Here, we developed an artificial intelligence-based object detection system for malaria diagnosis (AIDMAN). In this system, the YOLOv5 model is used to detect cells in a thin blood smear. An attentional aligner model (AAM) is then applied for cellular classification that consists of multi-scale features, a local context aligner, and multi-scale attention. Finally, a convolutional neural network classifier is applied for diagnosis using blood-smear images, reducing interference caused by false positive cells. The results demonstrate that AIDMAN handles interference well, with a diagnostic accuracy of 98.62% for cells and 97% for blood-smear images. The prospective clinical validation accuracy of 98.44% is comparable to that of microscopists. AIDMAN shows clinically acceptable detection of malaria parasites and could aid malaria diagnosis, especially in areas lacking experienced parasitologists and equipment.
Collapse
Affiliation(s)
- Ruicun Liu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Tuoyu Liu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Tingting Dan
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510600, China
| | - Shan Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Yanbing Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Boyu Luo
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Yingtan Zhuang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Xinyue Fan
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Xianchao Zhang
- Key Laboratory of Medical Electronics and Digital Health of Zhejiang Province, Jiaxing University, Jiaxing 314001, China
- Engineering Research Center of Intelligent Human Health Situation Awareness of Zhejiang Province, Jiaxing University, Jiaxing 314001, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510600, China
| | - Yue Teng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| |
Collapse
|
4
|
Xu T, Feng Z, Wu XJ, Kittler J. Towards Robust Visual Object Tracking with Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1541-1554. [PMID: 37027596 DOI: 10.1109/tip.2023.3246800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tracking performance as compared with the state-of-the-art tracking methods.
Collapse
|
5
|
Wei B, Chen H, Cao S, Ding Q, Luo H. An IoU-aware Siamese network for real-time visual tracking. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
6
|
Fan N, Liu Q, Li X, Zhou Z, He Z. Siamese Residual Network for Efficient Visual Tracking. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
7
|
Nai K, Li Z, Gan Y, Wang Q. Robust Visual Tracking via Multitask Sparse Correlation Filters Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:502-515. [PMID: 34310327 DOI: 10.1109/tnnls.2021.3097498] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, a novel multitask sparse correlation filters (MTSCF) model, which introduces multitask sparse learning into the CFs framework, is proposed for visual tracking. Specifically, the proposed MTSCF method exploits multitask learning to take the interdependencies among different visual features (e.g., histogram of oriented gradient (HOG), color names, and CNN features) into account to simultaneously learn the CFs and make the learned filters enhance and complement each other to boost the tracking performance. Moreover, it also performs feature selection to dynamically select discriminative spatial features from the target region to distinguish the target object from the background. A l2,1 regularization term is considered to realize multitask sparse learning. In order to solve the objective model, alternating direction method of multipliers is utilized for learning the CFs. By considering multitask sparse learning, the proposed MTSCF model can fully utilize the strength of different visual features and select effective spatial features to better model the appearance of the target object. Extensive experiment results on multiple tracking benchmarks demonstrate that our MTSCF tracker achieves competitive tracking performance in comparison with several state-of-the-art trackers.
Collapse
|
8
|
Shen J, Liu Y, Dong X, Lu X, Khan FS, Hoi S. Distilled Siamese Networks for Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8896-8909. [PMID: 34762585 DOI: 10.1109/tpami.2021.3127492] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In recent years, Siamese network based trackers have significantly advanced the state-of-the-art in real-time tracking. Despite their success, Siamese trackers tend to suffer from high memory costs, which restrict their applicability to mobile devices with tight memory budgets. To address this issue, we propose a distilled Siamese tracking framework to learn small, fast and accurate trackers (students), which capture critical knowledge from large Siamese trackers (teachers) by a teacher-students knowledge distillation model. This model is intuitively inspired by the one teacher versus multiple students learning method typically employed in schools. In particular, our model contains a single teacher-student distillation module and a student-student knowledge sharing mechanism. The former is designed using a tracking-specific distillation strategy to transfer knowledge from a teacher to students. The latter is utilized for mutual learning between students to enable in-depth knowledge understanding. Extensive empirical evaluations on several popular Siamese trackers demonstrate the generality and effectiveness of our framework. Moreover, the results on five tracking benchmarks show that the proposed distilled trackers achieve compression rates of up to 18× and frame-rates of 265 FPS, while obtaining comparable tracking accuracy compared to base models.
Collapse
|
9
|
Liu Y, Lian L, Zhang E, Xu L, Xiao C, Zhong X, Li F, Jiang B, Dong Y, Ma L, Huang Q, Xu M, Zhang Y, Yu D, Yan C, Qin P. Mixed-UNet: Refined class activation mapping for weakly-supervised semantic segmentation with multi-scale inference. FRONTIERS IN COMPUTER SCIENCE 2022. [DOI: 10.3389/fcomp.2022.1036934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Deep learning techniques have shown great potential in medical image processing, particularly through accurate and reliable image segmentation on magnetic resonance imaging (MRI) scans or computed tomography (CT) scans, which allow the localization and diagnosis of lesions. However, training these segmentation models requires a large number of manually annotated pixel-level labels, which are time-consuming and labor-intensive, in contrast to image-level labels that are easier to obtain. It is imperative to resolve this problem through weakly-supervised semantic segmentation models using image-level labels as supervision since it can significantly reduce human annotation efforts. Most of the advanced solutions exploit class activation mapping (CAM). However, the original CAMs rarely capture the precise boundaries of lesions. In this study, we propose the strategy of multi-scale inference to refine CAMs by reducing the detail loss in single-scale reasoning. For segmentation, we develop a novel model named Mixed-UNet, which has two parallel branches in the decoding phase. The results can be obtained after fusing the extracted features from two branches. We evaluate the designed Mixed-UNet against several prevalent deep learning-based segmentation approaches on our dataset collected from the local hospital and public datasets. The validation results demonstrate that our model surpasses available methods under the same supervision level in the segmentation of various lesions from brain imaging.
Collapse
|
10
|
Ruan W, Ye M, Wu Y, Liu W, Chen J, Liang C, Li G, Lin CW. TICNet: A Target-Insight Correlation Network for Object Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12150-12162. [PMID: 34033563 DOI: 10.1109/tcyb.2021.3070677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, the correlation filter (CF) and Siamese network have become the two most popular frameworks in object tracking. Existing CF trackers, however, are limited by feature learning and context usage, making them sensitive to boundary effects. In contrast, Siamese trackers can easily suffer from the interference of semantic distractors. To address the above problems, we propose an end-to-end target-insight correlation network (TICNet) for object tracking, which aims at breaking the above limitations on top of a unified network. TICNet is an asymmetric dual-branch network involving a target-background awareness model (TBAM), a spatial-channel attention network (SCAN), and a distractor-aware filter (DAF) for end-to-end learning. Specifically, TBAM aims to distinguish a target from the background in the pixel level, yielding a target likelihood map based on color statistics to mine distractors for DAF learning. SCAN consists of a basic convolutional network, a channel-attention network, and a spatial-attention network, aiming to generate attentive weights to enhance the representation learning of the tracker. Especially, we formulate a differentiable DAF and employ it as a learnable layer in the network, thus helping suppress distracting regions in the background. During testing, DAF, together with TBAM, yields a response map for the final target estimation. Extensive experiments on seven benchmarks demonstrate that TICNet outperforms the state-of-the-art methods while running at real-time speed.
Collapse
|
11
|
Gao L, Liu P, Ning J, Li Y. Visual object tracking via non-local correlation attention learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Wang X, Chen Z, Jiang B, Tang J, Luo B, Tao D. Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-Based Beam Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6239-6254. [PMID: 36166563 DOI: 10.1109/tip.2022.3208437] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. In particular, if a tracker drifts, errors will be accumulated and would further make response scores estimated by the tracker unreliable in future frames. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. More specifically, using the classification-based tracker as the baseline, we first adopt bi-GRU to encode the target feature, proposal feature, and its response score into a unified state representation. The state feature and greedy search result are then fed into the first agent for independent action selection. Afterwards, the output action and state features are fed into the subsequent agent for diverse results prediction. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.
Collapse
|
13
|
Chen D, Chen L, Zhang Y, Wen B, Yang C. A Multiscale Interactive Recurrent Network for Time-Series Forecasting. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8793-8803. [PMID: 33710967 DOI: 10.1109/tcyb.2021.3055951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Time-series forecasting is a key component in the automation and optimization of intelligent applications. It is not a trivial task, as there are various short-term and/or long-term temporal dependencies. Multiscale modeling has been considered as a promising strategy to solve this problem. However, the existing multiscale models either apply an implicit way to model the temporal dependencies or ignore the interrelationships between multiscale subseries. In this article, we propose a multiscale interactive recurrent network (MiRNN) to jointly capture multiscale patterns. MiRNN employs a deep wavelet decomposition network to decompose the raw time series into multiscale subseries. MiRNN introduces three key strategies (truncation, initialization, and message passing) to model the inherent interrelationships between multiscale subseries, as well as a dual-stage attention mechanism to capture multiscale temporal dependencies. Experiments on four real-world datasets demonstrate that our model achieves promising performance compared with the state-of-the-art methods.
Collapse
|
14
|
Song D, Nie WZ, Li WH, Kankanhalli M, Liu AA. Monocular Image-Based 3-D Model Retrieval: A Benchmark. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8114-8127. [PMID: 33531330 DOI: 10.1109/tcyb.2021.3051016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Monocular image-based 3-D model retrieval aims to search for relevant 3-D models from a dataset given one RGB image captured in the real world, which can significantly benefit several applications, such as self-service checkout, online shopping, etc. To help advance this promising yet challenging research topic, we built a novel dataset and organized the first international contest for monocular image-based 3-D model retrieval. Moreover, we conduct a thorough analysis of the state-of-the-art methods. Existing methods can be classified into supervised and unsupervised methods. The supervised methods can be analyzed based on several important aspects, such as the strategies of domain adaptation, view fusion, loss function, and similarity measure. The unsupervised methods focus on solving this problem with unlabeled data and domain adaptation. Seven popular metrics are employed to evaluate the performance, and accordingly, we provide a thorough analysis and guidance for future work. To the best of our knowledge, this is the first benchmark for monocular image-based 3-D model retrieval, which aims to help related research in multiview feature learning, domain adaptation, and information retrieval.
Collapse
|
15
|
Huang B, Xu T, Shen Z, Jiang S, Zhao B, Bian Z. SiamATL: Online Update of Siamese Tracking Network via Attentional Transfer Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7527-7540. [PMID: 33417585 DOI: 10.1109/tcyb.2020.3043520] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Visual object tracking with semantic deep features has recently attracted much attention in computer vision. Especially, Siamese trackers, which aim to learn a decision making-based similarity evaluation, are widely utilized in the tracking community. However, the online updating of the Siamese fashion is still a tricky issue due to the limitation, which is a tradeoff between model adaption and degradation. To address such an issue, in this article, we propose a novel attentional transfer learning-based Siamese network (SiamATL), which fully exploits the previous knowledge to inspire the current tracker learning in the decision-making module. First, we explicitly model the template and surroundings by using an attentional online update strategy to avoid template pollution. Then, we introduce an instance-transfer discriminative correlation filter (ITDCF) to enhance the distinguishing ability of the tracker. Finally, we suggest a mutual compensation mechanism that integrates cross-correlation matching and ITDCF detection into the decision-making subnetwork to achieve online tracking. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art tracking algorithms on multiple large-scale tracking datasets.
Collapse
|
16
|
Teacher-student knowledge distillation for real-time correlation tracking. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
17
|
Bai L, Shao YH, Wang Z, Chen WJ, Deng NY. Multiple Flat Projections for Cross-Manifold Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7704-7718. [PMID: 33523821 DOI: 10.1109/tcyb.2021.3050487] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-manifold clustering is an extreme challenge learning problem. Since the low-density hypothesis is not satisfied in cross-manifold problems, many traditional clustering methods failed to discover the cross-manifold structures. In this article, we propose multiple flat projections clustering (MFPC) for cross-manifold clustering. In our MFPC, the given samples are projected into multiple localized flats to discover the global structures of implicit manifolds. Thus, the intersected clusters are distinguished in various projection flats. In MFPC, a series of nonconvex matrix optimization problems is solved by a proposed recursive algorithm. Furthermore, a nonlinear version of MFPC is extended via kernel tricks to deal with a more complex cross-manifold learning situation. The synthetic tests show that our MFPC works on the cross-manifold structures well. Moreover, experimental results on the benchmark datasets and object tracking videos show excellent performance of our MFPC compared with some state-of-the-art manifold clustering methods.
Collapse
|
18
|
Tan K, Xu TB, Wei Z. IMSiam: IoU-aware Matching-adaptive Siamese network for object tracking. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Li J, Chen Z, Zhong Y, Lam HK, Han J, Ouyang G, Li X, Liu H. Appearance-Based Gaze Estimation for ASD Diagnosis. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6504-6517. [PMID: 35468077 DOI: 10.1109/tcyb.2022.3165063] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Biomarkers, such as magnetic resonance imaging (MRI) and electroencephalogram have been used to help diagnose autism spectrum disorder (ASD). However, the diagnosis needs the assist of specialized medical equipment in the hospital or laboratory. To diagnose ASD in a more effective and convenient way, in this article, we propose an appearance-based gaze estimation algorithm-AttentionGazeNet, to accurately estimate the subject's 3-D gaze from a raw video. The experimental results show its competitive performance on the MPIIGaze dataset and the improvement of 14.7% for static head pose and 46.7% for moving head pose on the EYEDIAP dataset compared with the state-of-the-art gaze estimation algorithms. After projecting the obtained gaze vector onto the screen coordinate, we apply accumulated histogram to taking into account both spatial and temporal information of estimated gaze-point and head-pose sequences. Finally, classification is conducted on our self-collected autistic children video dataset (ACVD), which contains 405 videos from 135 different ASD children, 135 typically developing (TD) children in a primary school, and 135 TD children in a kindergarten. The classification results on ACVD shows the effectiveness and efficiency of our proposed method, with the accuracy 94.8%, the sensitivity 91.1% and the specificity 96.7% for ASD.
Collapse
|
20
|
Huang Y, Li Y, Heyes T, Jourjon G, Cheng A, Seneviratne S, Thilakarathna K, Webb D, Xu RYD. Task adaptive siamese neural networks for open-set recognition of encrypted network traffic with bidirectional dropout. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
21
|
Monowar MM, Hamid MA, Ohi AQ, Alassafi MO, Mridha MF. AutoRet: A Self-Supervised Spatial Recurrent Network for Content-Based Image Retrieval. SENSORS 2022; 22:s22062188. [PMID: 35336358 PMCID: PMC8954462 DOI: 10.3390/s22062188] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/02/2022] [Accepted: 03/08/2022] [Indexed: 02/05/2023]
Abstract
Image retrieval techniques are becoming famous due to the vast availability of multimedia data. The present image retrieval system performs excellently on labeled data. However, often, data labeling becomes costly and sometimes impossible. Therefore, self-supervised and unsupervised learning strategies are currently becoming illustrious. Most of the self/unsupervised strategies are sensitive to the number of classes and can not mix labeled data on availability. In this paper, we introduce AutoRet, a deep convolutional neural network (DCNN) based self-supervised image retrieval system. The system is trained on pairwise constraints. Therefore, it can work in self-supervision and can also be trained on a partially labeled dataset. The overall strategy includes a DCNN that extracts embeddings from multiple patches of images. Further, the embeddings are fused for quality information used for the image retrieval process. The method is benchmarked with three different datasets. From the overall benchmark, it is evident that the proposed method works better in a self-supervised manner. In addition, the evaluation exhibits the proposed method’s performance to be highly convincing while a small portion of labeled data are mixed on availability.
Collapse
Affiliation(s)
- Muhammad Mostafa Monowar
- Department of Information Technology, Faculty of Computing & Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (M.A.H.); (M.O.A.)
- Correspondence:
| | - Md. Abdul Hamid
- Department of Information Technology, Faculty of Computing & Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (M.A.H.); (M.O.A.)
| | - Abu Quwsar Ohi
- Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka 1216, Bangladesh;
| | - Madini O. Alassafi
- Department of Information Technology, Faculty of Computing & Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (M.A.H.); (M.O.A.)
| | - M. F. Mridha
- Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh;
| |
Collapse
|
22
|
Zeng Y, Zeng B, Yin X, Chen G. SiamPCF: siamese point regression with coarse-fine classification network for visual tracking. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02651-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
23
|
Li J, Wang H, Wu K, Liu C, Tan J. Cross-attention-map-based regularization for adversarial domain adaptation. Neural Netw 2021; 145:128-138. [PMID: 34735891 DOI: 10.1016/j.neunet.2021.10.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 09/01/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022]
Abstract
In unsupervised domain adaptation (UDA), many efforts are taken to pull the source domain and the target domain closer by adversarial training. Most methods focus on aligning distributions or features between the source domain and the target domain. However, little attention is paid to the interaction between finer-grained levels, such as classes or samples of the two domains. In contrast to UDA, another transfer learning task, i.e., few-shot learning (FSL), takes full advantage of the finer-grained-level alignment. Many FSL methods implement the interaction between samples of support sets and query sets, leading to significant improvements. We wonder whether we can get some inspiration from these methods and bring such ideas of FSL to UDA. To this end, we first take a closer look at the differences between FSL and UDA and bridge the gap between them by high-confidence sample selection (HCSS). Then we propose cross-attention map generation module (CAMGM) to interact samples selected by HCSS. Moreover, we propose a simple but efficient method called cross-attention-map-based regularization (CAMR) to regularize the feature maps generated by the feature extractor. Experiments on three challenging datasets demonstrate that CAMR can bring solid improvements when added to the original objective. More specifically, the proposed CAMR can outperform original methods by 1% to 2% in most tasks without bells and whistles.
Collapse
Affiliation(s)
- Jingwei Li
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Huanjie Wang
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Ke Wu
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Chengbao Liu
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Jie Tan
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
24
|
Xiao N, Zhang L, Xu X, Guo T, Ma H. Label Disentangled Analysis for unsupervised visual domain adaptation. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107309] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
25
|
Wang Y, Chen X, Mao Z, Yan J. Object Tracking Based on Global Context Attention. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE 2021. [DOI: 10.4018/ijcini.287595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Previous research has shown that tracking algorithms cannot capture long-distance information and lead to the loss of the object when the object was deformed, the illumination changed, and the background was disturbed by similar objects. To remedy this, this article proposes an object-tracking method by introducing the Global Context attention module into the Multi-Domain Network (MDNet) tracker. This method can learn the robust feature representation of the object through the Global Context attention module to better distinguish the background from the object in the presence of interference factors. Extensive experiments on OTB2013, OTB2015, and UAV20L datasets show that the proposed method is significantly improved compared with MDNet and has competitive performance compared with more mainstream tracking algorithms. At the same time, the method proposed in this article achieves better results when the video sequence contains object deformation, illumination change, and background interference with similar objects.
Collapse
Affiliation(s)
- Yucheng Wang
- School of Computer Science, Wuhan University, China
| | - Xi Chen
- School of Computer Science, Wuhan University, China
| | - Zhongjie Mao
- School of Computer Science, Wuhan University, China
| | - Jia Yan
- Department of Electrical Engineering, School of Electronic Information, Wuhan University, China
| |
Collapse
|
26
|
Sun D, Wang X, Lin Y, Yang T, Wu S. Introducing Depth Information Into Generative Target Tracking. Front Neurorobot 2021; 15:718681. [PMID: 34539372 PMCID: PMC8442731 DOI: 10.3389/fnbot.2021.718681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 07/19/2021] [Indexed: 11/23/2022] Open
Abstract
Common visual features used in target tracking, including colour and grayscale, are prone to failure in a confusingly similar-looking background. As the technology of three-dimensional visual information acquisition has gradually gained ground in recent years, the conditions for the wide use of depth information in target tracking has been made available. This study focuses on discussing the possible ways to introduce depth information into the generative target tracking methods based on a kernel density estimation as well as the performance of different methods of introduction, thereby providing a reference for the use of depth information in actual target tracking systems. First, an analysis of the mean-shift technical framework, a typical algorithm used for generative target tracking, is described, and four methods of introducing the depth information are proposed, i.e., the thresholding of the data source, thresholding of the density distribution of the dataset applied, weighting of the data source, and weighting of the density distribution of the dataset. Details of an experimental study conducted to evaluate the validity, characteristics, and advantages of each method are then described. The experimental results showed that the four methods can improve the validity of the basic method to a certain extent and meet the requirements of real-time target tracking in a confusingly similar background. The method of weighting the density distribution of the dataset, into which depth information is introduced, is the prime choice in engineering practise because it delivers an excellent comprehensive performance and the highest level of accuracy, whereas methods such as the thresholding of both the data sources and the density distribution of the dataset are less time-consuming. The performance in comparison with that of a state-of-the-art tracker further verifies the practicality of the proposed approach. Finally, the research results also provide a reference for improvements in other target tracking methods in which depth information can be introduced.
Collapse
Affiliation(s)
- Dongyue Sun
- School of Mechanical Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Xian Wang
- School of Mechanical Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Yonghong Lin
- School of Mechanical Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Tianlong Yang
- School of Mechanical Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Shixu Wu
- Changsha Shi-Lang Technology Co., Ltd., Changsha, China
| |
Collapse
|
27
|
Li Y, Yang J, Ni J, Elazab A, Wu J. TA-Net: Triple attention network for medical image segmentation. Comput Biol Med 2021; 137:104836. [PMID: 34507157 DOI: 10.1016/j.compbiomed.2021.104836] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 09/01/2021] [Accepted: 09/02/2021] [Indexed: 11/16/2022]
Abstract
The automatic segmentation of medical images has made continuous progress due to the development of convolutional neural networks (CNNs) and attention mechanism. However, previous works usually explore the attention features of a certain dimension in the image, thus may ignore the correlation between feature maps in other dimensions. Therefore, how to capture the global features of various dimensions is still facing challenges. To deal with this problem, we propose a triple attention network (TA-Net) by exploring the ability of the attention mechanism to simultaneously recognize global contextual information in the channel domain, spatial domain, and feature internal domain. Specifically, during the encoder step, we propose a channel with self-attention encoder (CSE) block to learn the long-range dependencies of pixels. The CSE effectively increases the receptive field and enhances the representation of target features. In the decoder step, we propose a spatial attention up-sampling (SU) block that makes the network pay more attention to the position of the useful pixels when fusing the low-level and high-level features. Extensive experiments were tested on four public datasets and one local dataset. The datasets include the following types: retinal blood vessels (DRIVE and STARE), cells (ISBI 2012), cutaneous melanoma (ISIC 2017), and intracranial blood vessels. Experimental results demonstrate that the proposed TA-Net is overall superior to previous state-of-the-art methods in different medical image segmentation tasks with high accuracy, promising robustness, and relatively low redundancy.
Collapse
Affiliation(s)
- Yang Li
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jun Yang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jiajia Ni
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen, China
| | - Ahmed Elazab
- School of Biomedical Engineering, Shenzhen University, Shenzhen, China.
| | - Jianhuang Wu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
28
|
|
29
|
Zhang Y, Wang T, Liu K, Zhang B, Chen L. Recent advances of single-object tracking methods: A brief survey. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
30
|
Multi-level dictionary learning for fine-grained images categorization with attention model. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.07.147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
Yang Y, Xing W, Wang D, Zhang S, Yu Q, Wang L. AEVRNet: Adaptive exploration network with variance reduced optimization for visual tracking. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
32
|
An effective AI integrated system for neuron tracing on anisotropic electron microscopy volume. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
33
|
Zhang H, Chen J, Nie G, Lin Y, Yang G, Zhang W(C. Light regression memory and multi-perspective object special proposals for abrupt motion tracking. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
34
|
Zhang S, Gao H, Rao Q. Defense Against Adversarial Attacks by Reconstructing Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6117-6129. [PMID: 34197323 DOI: 10.1109/tip.2021.3092582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Convolutional neural networks (CNNs) are vulnerable to being deceived by adversarial examples generated by adding small, human-imperceptible perturbations to a clean image. In this paper, we propose an image reconstruction network that reconstructs an input adversarial example into a clean output image to defend against such adversarial attacks. Due to the powerful learning capabilities of the residual block structure, our model can learn a precise mapping from adversarial examples to reconstructed examples. The use of a perceptual loss greatly suppresses the error amplification effect and improves the performance of our reconstruction network. In addition, by adding randomization layers to the end of the network, the effects of additional noise are further suppressed, especially for iterative attacks. Our model has the following four advantages. 1) It greatly reduces the impact of adversarial perturbations while having little influence on the prediction performance of clean images. 2) During inference phase, it performs better than most existing model-agnostic defense methods. 3) It has better generalization capability. 4) It can be flexibly combined with other methods, such as adversarially trained models.
Collapse
|
35
|
Jiang PT, Zhang CB, Hou Q, Cheng MM, Wei Y. LayerCAM: Exploring Hierarchical Class Activation Maps for Localization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5875-5888. [PMID: 34156941 DOI: 10.1109/tip.2021.3089943] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The class activation maps are generated from the final convolutional layer of CNN. They can highlight discriminative object regions for the class of interest. These discovered object regions have been widely used for weakly-supervised tasks. However, due to the small spatial resolution of the final convolutional layer, such class activation maps often locate coarse regions of the target objects, limiting the performance of weakly-supervised tasks that need pixel-accurate object locations. Thus, we aim to generate more fine-grained object localization information from the class activation maps to locate the target objects more accurately. In this paper, by rethinking the relationships between the feature maps and their corresponding gradients, we propose a simple yet effective method, called LayerCAM. It can produce reliable class activation maps for different layers of CNN. This property enables us to collect object localization information from coarse (rough spatial localization) to fine (precise fine-grained details) levels. We further integrate them into a high-quality class activation map, where the object-related pixels can be better highlighted. To evaluate the quality of the class activation maps produced by LayerCAM, we apply them to weakly-supervised object localization and semantic segmentation. Experiments demonstrate that the class activation maps generated by our method are more effective and reliable than those by the existing attention methods. The code will be made publicly available.
Collapse
|
36
|
Yang F, Li X, Shen J. MSB-FCN: Multi-Scale Bidirectional FCN for Object Skeleton Extraction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2301-2312. [PMID: 33226943 DOI: 10.1109/tip.2020.3038483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The performance of state-of-the-art object skeleton detection (OSD) methods have been greatly boosted by Convolutional Neural Networks (CNNs). However, the most existing CNN-based OSD methods rely on a 'skip-layer' structure where low-level and high-level features are combined to gather multi-level contextual information. Unfortunately, as shallow features tend to be noisy and lack semantic knowledge, they will cause errors and inaccuracy. Therefore, in order to improve the accuracy of object skeleton detection, we propose a novel network architecture, the Multi-Scale Bidirectional Fully Convolutional Network (MSB-FCN), to better gather and enhance multi-scale high-level contextual information. The advantage is that only deep features are used to construct multi-scale feature representations along with a bidirectional structure for better capturing contextual knowledge. This enables the proposed MSB-FCN to learn semantic-level information from different sub-regions. Moreover, we introduce dense connections into the bidirectional structure to ensure that the learning process at each scale can directly encode information from all other scales. An attention pyramid is also integrated into our MSB-FCN to dynamically control information propagation and reduce unreliable features. Extensive experiments on various benchmarks demonstrate that the proposed MSB-FCN achieves significant improvements over the state-of-the-art algorithms.
Collapse
|
37
|
Guo Q, Feng W, Gao R, Liu Y, Wang S. Exploring the Effects of Blur and Deblurring to Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1812-1824. [PMID: 33417542 DOI: 10.1109/tip.2020.3045630] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The existence of motion blur can inevitably influence the performance of visual object tracking. However, in contrast to the rapid development of visual trackers, the quantitative effects of increasing levels of motion blur on the performance of visual trackers still remain unstudied. Meanwhile, although image-deblurring can produce visually sharp videos for pleasant visual perception, it is also unknown whether visual object tracking can benefit from image deblurring or not. In this paper, we present a Blurred Video Tracking (BVT) benchmark to address these two problems, which contains a large variety of videos with different levels of motion blurs, as well as ground-truth tracking results. To explore the effects of blur and deblurring to visual object tracking, we extensively evaluate 25 trackers on the proposed BVT benchmark and obtain several new interesting findings. Specifically, we find that light motion blur may improve the accuracy of many trackers, but heavy blur usually hurts the tracking performance. We also observe that image deblurring is helpful to improve tracking accuracy on heavily-blurred videos but hurts the performance of lightly-blurred videos. According to these observations, we then propose a new general GAN-based scheme to improve a tracker's robustness to motion blur. In this scheme, a fine-tuned discriminator can effectively serve as an adaptive blur assessor to enable selective frames deblurring during the tracking process. We use this scheme to successfully improve the accuracy of 6 state-of-the-art trackers on motion-blurred videos.
Collapse
|
38
|
Wu D, Dong X, Shen J, Hoi SCH. Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4933-4945. [PMID: 31940565 DOI: 10.1109/tnnls.2019.2959129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics, has been largely left untouched. In this article, we investigate the underestimation phenomenon in the recent twin delay deep deterministic actor-critic algorithm and theoretically demonstrate its existence. We also observe that this underestimation bias does indeed hurt performance in various experiments. Considering the opposite properties of single-critic and double-critic methods, we propose a novel triplet-average deep deterministic policy gradient algorithm that takes the weighted action value of three target critics to reduce the estimation bias. Given the connection between estimation bias and approximation error, we suggest averaging previous target values to reduce per-update error and further improve performance. Extensive empirical results over various continuous control tasks in OpenAI gym show that our approach outperforms the state-of-the-art methods.
Collapse
|
39
|
Wang F, Xu Z, Gan Y, Vong CM, Liu Q. SCNet: Scale-aware coupling-structure network for efficient video object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
40
|
Ntwari T, Park H, Shin J, Paik J. SNS-CF: Siamese Network with Spatially Semantic Correlation Features for Object Tracking. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20174881. [PMID: 32872299 PMCID: PMC7506687 DOI: 10.3390/s20174881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 08/24/2020] [Accepted: 08/25/2020] [Indexed: 06/11/2023]
Abstract
Recent advances in object tracking based on deep Siamese networks shifted the attention away from correlation filters. However, the Siamese network alone does not have as high accuracy as state-of-the-art correlation filter-based trackers, whereas correlation filter-based trackers alone have a frame update problem. In this paper, we present a Siamese network with spatially semantic correlation features (SNS-CF) for accurate, robust object tracking. To deal with various types of features spread in many regions of the input image frame, the proposed SNS-CF consists of-(1) a Siamese feature extractor, (2) a spatially semantic feature extractor, and (3) an adaptive correlation filter. To the best of authors knowledge, the proposed SNS-CF is the first attempt to fuse the Siamese network and the correlation filter to provide high frame rate, real-time visual tracking with a favorable tracking performance to the state-of-the-art methods in multiple benchmarks.
Collapse
|
41
|
Attention shake siamese network with auxiliary relocation branch for visual object tracking. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
42
|
Fiaz M, Mahmood A, Jung SK. Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking. SENSORS 2020; 20:s20144021. [PMID: 32698339 PMCID: PMC7412361 DOI: 10.3390/s20144021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 07/03/2020] [Accepted: 07/15/2020] [Indexed: 11/16/2022]
Abstract
We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to increase discriminative power. The low-level feature fusion improves the tracker robustness against distractors. The channel attention is used to identify more discriminative channels for better target representation. The spatial attention complements the soft mask based approach to better localize the target objects in challenging tracking scenarios. We evaluated our proposed approach over five publicly available benchmark datasets and performed extensive comparisons with 39 state-of-the-art tracking algorithms. The proposed tracker demonstrates excellent performance compared to the existing state-of-the-art trackers.
Collapse
Affiliation(s)
- Mustansar Fiaz
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea;
| | - Arif Mahmood
- Department of Computer Science, Information Technology University, Lahore 54000, Pakistan;
| | - Soon Ki Jung
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea;
- Correspondence:
| |
Collapse
|
43
|
Fiaz M, Mahmood A, Baek KY, Farooq SS, Jung SK. Improving Object Tracking by Added Noise and Channel Attention. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20133780. [PMID: 32640545 PMCID: PMC7374383 DOI: 10.3390/s20133780] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 06/26/2020] [Accepted: 06/28/2020] [Indexed: 06/11/2023]
Abstract
CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.
Collapse
Affiliation(s)
- Mustansar Fiaz
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea; (M.F.); (K.Y.B.); (S.S.F.)
| | - Arif Mahmood
- Department of Computer Science, Information Technology University, Lahore 54000, Pakistan;
| | - Ki Yeol Baek
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea; (M.F.); (K.Y.B.); (S.S.F.)
| | - Sehar Shahzad Farooq
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea; (M.F.); (K.Y.B.); (S.S.F.)
| | - Soon Ki Jung
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea; (M.F.); (K.Y.B.); (S.S.F.)
| |
Collapse
|
44
|
Tang Y, Yang X, Wang N, Song B, Gao X. CGAN-TM: A novel domain-to-domain transferring method for person re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5641-5651. [PMID: 32286985 DOI: 10.1109/tip.2020.2985545] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (re-ID) is a technique aiming to recognize person cross different cameras. Although some supervised methods have achieved favorable performance, they are far from practical application owing to the lack of labeled data. Thus, unsupervised person re-ID methods are in urgent need. Generally, the commonly used approach in existing unsupervised methods is to first utilize the source image dataset for generating a model in supervised manner, and then transfer the source image domain to the target image domain. However, images may lose their identity information after translation, and the distributions between different domains are far away. To solve these problems, we propose an image domain-to-domain translation method by keeping pedestrian's identity information and pulling closer the domains' distributions for unsupervised person re-ID tasks. Our work exploits the CycleGAN to transfer the existing labeled image domain to the unlabeled image domain. Specially, a Self-labeled Triplet Net is proposed to maintain the pedestrian identity information, and maximum mean discrepancy is introduced to pull the domain distribution closer. Extensive experiments have been conducted and the results demonstrate that the proposed method performs superiorly than the state-ofthe- art unsupervised methods on DukeMTMC-reID and Market- 1501.
Collapse
|
45
|
Liang Z, Shen J. Local Semantic Siamese Networks for Fast Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3351-3364. [PMID: 31869793 DOI: 10.1109/tip.2019.2959256] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Learning a powerful feature representation is critical for constructing a robust Siamese tracker. However, most existing Siamese trackers learn the global appearance features of the entire object, which usually suffers from drift problems caused by partial occlusion or non-rigid appearance deformation. In this paper, we propose a new Local Semantic Siamese (LSSiam) network to extract more robust features for solving these drift problems, since the local semantic features contain more fine-grained and partial information. We learn the semantic features during offline training by adding a classification branch into the classical Siamese framework. To further enhance the representation of features, we design a generally focal logistic loss to mine the hard negative samples. During the online tracking, we remove the classification branch and propose an efficient template updating strategy to avoid aggressive computing load. Thus, the proposed tracker can run at a high-speed of 100 Frame-per-Second (FPS) far beyond real-time requirement. Extensive experiments on popular benchmarks demonstrate the proposed LSSiam tracker achieves the state-of-the-art performance with a high-speed. Our source code is available at.
Collapse
|