1
|
Huang W, Zhang H, Guo H, Li W, Quan X, Zhang Y. ADDNS: An asymmetric dual deep network with sharing mechanism for medical image fusion of CT and MR-T2. Comput Biol Med 2023; 166:107531. [PMID: 37806056 DOI: 10.1016/j.compbiomed.2023.107531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/04/2023] [Accepted: 09/27/2023] [Indexed: 10/10/2023]
Abstract
Medical images with different modalities have different semantic characteristics. Medical image fusion aiming to promotion of the visual quality and practical value has become important in medical diagnostics. However, the previous methods do not fully represent semantic and visual features, and the model generalization ability needs to be improved. Furthermore, the brightness-stacking phenomenon is easy to occur during the fusion process. In this paper, we propose an asymmetric dual deep network with sharing mechanism (ADDNS) for medical image fusion. In our asymmetric model-level dual framework, primal Unet part learns to fuse medical images of different modality into a fusion image, while dual Unet part learns to invert the fusion task for multi-modal image reconstruction. This asymmetry of network settings not only enables the ADDNS to fully extract semantic and visual features, but also reduces the model complexity and accelerates the convergence. Furthermore, the sharing mechanism designed according to task relevance also reduces the model complexity and improves the generalization ability of our model. In the end, we use the intermediate supervision method to minimize the difference between fusion image and source images so as to prevent the brightness-stacking problem. Experimental results show that our algorithm achieves better results on both quantitative and qualitative experiments than several state-of-the-art methods.
Collapse
Affiliation(s)
- Wanwan Huang
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China.
| | - Huike Guo
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
| | - Wei Li
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
| | - Xiongwen Quan
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
| | - Yuzhi Zhang
- College of Software, Nankai University, Tianjin, 300350, China
| |
Collapse
|
2
|
Zhang D, Chen CLP, Li T, Zuo Y, Duy NQ. Target tracking method of Siamese networks based on the broad learning system. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Dan Zhang
- Navigation College Dalian Maritime University Dalian China
- Innovation and Entrepreneurship Education College Dalian Minzu University Dalian China
| | - C. L. Philip Chen
- Navigation College Dalian Maritime University Dalian China
- Computer Science and Engineering College South China University of Technology Guangzhou China
- Department of Computer and Information Science Faculty of Science and Technology University of Macau Macau China
| | - Tieshan Li
- Navigation College Dalian Maritime University Dalian China
- School of Automation Engineering University of Electronic Science and Technology of China Chengdu China
| | - Yi Zuo
- Navigation College Dalian Maritime University Dalian China
| | - Nguyen Quang Duy
- Faculty of Navigation Vietnam Maritime University Haiphong Vietnam
| |
Collapse
|
3
|
|
4
|
Xinyi Wei, Lin Z, Liu T, Zhang L. Probabilistic Matrix Factorization for Visual Tracking. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1134/s1054661822010114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Learning deep convolutional descriptor aggregation for efficient visual tracking. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06638-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Wen H, Yan C, Zhou X, Cong R, Sun Y, Zheng B, Zhang J, Bao Y, Ding G. Dynamic Selective Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:9179-9192. [PMID: 34739374 DOI: 10.1109/tip.2021.3123548] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
RGB-D saliency detection is receiving more and more attention in recent years. There are many efforts have been devoted to this area, where most of them try to integrate the multi-modal information, i.e. RGB images and depth maps, via various fusion strategies. However, some of them ignore the inherent difference between the two modalities, which leads to the performance degradation when handling some challenging scenes. Therefore, in this paper, we propose a novel RGB-D saliency model, namely Dynamic Selective Network (DSNet), to perform salient object detection (SOD) in RGB-D images by taking full advantage of the complementarity between the two modalities. Specifically, we first deploy a cross-modal global context module (CGCM) to acquire the high-level semantic information, which can be used to roughly locate salient objects. Then, we design a dynamic selective module (DSM) to dynamically mine the cross-modal complementary information between RGB images and depth maps, and to further optimize the multi-level and multi-scale information by executing the gated and pooling based selection, respectively. Moreover, we conduct the boundary refinement to obtain high-quality saliency maps with clear boundary details. Extensive experiments on eight public RGB-D datasets show that the proposed DSNet achieves a competitive and excellent performance against the current 17 state-of-the-art RGB-D SOD models.
Collapse
|
7
|
Liu R, Chen Q, Yao Y, Fan X, Luo Z. Location-Aware and Regularization-Adaptive Correlation Filters for Robust Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2430-2442. [PMID: 32749966 DOI: 10.1109/tnnls.2020.3005447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Correlation filter (CF) has recently been widely used for visual tracking. The estimation of the search window and the filter-learning strategies is the key component of the CF trackers. Nevertheless, prevalent CF models separately address these issues in heuristic manners. The commonly used CF models directly set the estimated location in the previous frame as the search center for the current one. Moreover, these models usually rely on simple and fixed regularization for filter learning, and thus, their performance is compromised by the search window size and optimization heuristics. To break these limits, this article proposes a location-aware and regularization-adaptive CF (LRCF) for robust visual tracking. LRCF establishes a novel bilevel optimization model to address simultaneously the location-estimation and filter-training problems. We prove that our bilevel formulation can successfully obtain a globally converged CF and the corresponding object location in a collaborative manner. Moreover, based on the LRCF framework, we design two trackers named LRCF-S and LRCF-SA and a series of comparisons to prove the flexibility and effectiveness of the LRCF framework. Extensive experiments on different challenging benchmark data sets demonstrate that our LRCF trackers perform favorably against the state-of-the-art methods in practice.
Collapse
|
8
|
|
9
|
Chakraborty DB, Pal SK. Rough video conceptualization for real-time event precognition with motion entropy. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Yang T, Chan AB. Visual Tracking via Dynamic Memory Networks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:360-374. [PMID: 31352331 DOI: 10.1109/tpami.2019.2929034] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Template-matching methods for visual tracking have gained popularity recently due to their good performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. The reading and writing process of the external memory is controlled by an LSTM network with the search feature map as input. A spatial attention mechanism is applied to concentrate the LSTM input on the potential target as the location of the target is at first unknown. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. In order to alleviate the drift problem, we also design a "negative" memory unit that stores templates for distractors, which are used to cancel out wrong responses from the object template. To further boost the tracking performance, an auxiliary classification loss is added after the feature extractor part. Unlike tracking-by-detection methods where the object's information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target's appearance changes by updating the external memory. Moreover, the capacity of our model is not determined by the network size as with other trackers - the capacity can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on the OTB and VOT datasets demonstrate that our trackers perform favorably against state-of-the-art tracking methods while retaining real-time speed.
Collapse
|
11
|
Zhang C, Liu A, Liu X, Xu Y, Yu H, Ma Y, Li T. Interpreting and Improving Adversarial Robustness of Deep Neural Networks With Neuron Sensitivity. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:1291-1304. [PMID: 33290221 DOI: 10.1109/tip.2020.3042083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Despite the potential risk they bring, adversarial examples are also valuable for providing insights into the weakness and blind-spots of DNNs. Thus, the interpretability of a DNN in the adversarial setting aims to explain the rationale behind its decision-making process and makes deeper understanding which results in better practical applications. To address this issue, we try to explain adversarial robustness for deep models from a new perspective of neuron sensitivity which is measured by neuron behavior variation intensity against benign and adversarial examples. In this paper, we first draw the close connection between adversarial robustness and neuron sensitivities, as sensitive neurons make the most non-trivial contributions to model predictions in the adversarial setting. Based on that, we further propose to improve adversarial robustness by stabilizing the behaviors of sensitive neurons. Moreover, we demonstrate that state-of-the-art adversarial training methods improve model robustness by reducing neuron sensitivities, which in turn confirms the strong connections between adversarial robustness and neuron sensitivity. Extensive experiments on various datasets demonstrate that our algorithm effectively achieves excellent results. To the best of our knowledge, we are the first to study adversarial robustness using neuron sensitivities.
Collapse
|
12
|
|
13
|
Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y. Siamese attentional keypoint network for high performance visual tracking. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105448] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Li G, Liu Z, Ling H. ICNet: Information Conversion Network for RGB-D Based Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4873-4884. [PMID: 32149689 DOI: 10.1109/tip.2020.2976689] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RGB-D based salient object detection (SOD) methods leverage the depth map as a valuable complementary information for better SOD performance. Previous methods mainly resort to exploit the correlation between RGB image and depth map in three fusion domains: input images, extracted features, and output results. However, these fusion strategies cannot fully capture the complex correlation between the RGB image and depth map. Besides, these methods do not fully explore the cross-modal complementarity and the cross-level continuity of information, and treat information from different sources without discrimination. In this paper, to address these problems, we propose a novel Information Conversion Network (ICNet) for RGB-D based SOD by employing the siamese structure with encoder-decoder architecture. To fuse high-level RGB and depth features in an interactive and adaptive way, we propose a novel Information Conversion Module (ICM), which contains concatenation operations and correlation layers. Furthermore, we design a Cross-modal Depth-weighted Combination (CDC) block to discriminate the cross-modal features from different sources and to enhance RGB features with depth features at each level. Extensive experiments on five commonly tested datasets demonstrate the superiority of our ICNet over 15 state-of-theart RGB-D based SOD methods, and validate the effectiveness of the proposed ICM and CDC block.
Collapse
|
15
|
Kang B, Liang D, Ding W, Zhou H, Zhu WP. Grayscale-Thermal Tracking via Inverse Sparse Representation based Collaborative Encoding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3401-3415. [PMID: 31880552 DOI: 10.1109/tip.2019.2959912] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Grayscale-thermal tracking has attracted a great deal of attention due to its capability of fusing two different yet complementary target observations. Existing methods often consider extracting the discriminative target information and exploring the target correlation among different images as two separate issues, ignoring their interdependence. This may cause tracking drifts in challenging video pairs. This paper presents a collaborative encoding model called joint correlation and discriminant analysis based inver-sparse representation (JCDA-InvSR) to jointly encode the target candidates in the grayscale and thermal video sequences. In particular, we develop a multi-objective programming to integrate the feature selection and the multi-view correlation analysis into a unified optimization problem in JCDA-InvSR, which can simultaneously highlight the special characters of the grayscale and thermal targets through alternately optimizing two aspects: the target discrimination within a given image and the target correlation across different images. For robust grayscale-thermal tracking, we also incorporate the prior knowledge of target candidate codes into the SVM based target classifier to overcome the overfitting caused by limited training labels. Extensive experiments on GTOT and RGBT234 datasets illustrate the promising performance of our tracking framework.
Collapse
|
16
|
Abstract
Object tracking has always been an interesting and essential research topic in the domain of computer vision, of which the model update mechanism is an essential work, therefore the robustness of it has become a crucial factor influencing the quality of tracking of a sequence. This review analyses on recent tracking model update strategies, where target model update occasion is first discussed, then we give a detailed discussion on update strategies of the target model based on the mainstream tracking frameworks, and the background update frameworks are discussed afterwards. The experimental performances of the trackers in recent researches acting on specific sequences are listed in this review, where the superiority and some failure cases on each of them are discussed, and conclusions based on those performances are then drawn. It is a crucial point that design of a proper background model as well as its update strategy ought to be put into consideration. A cascade update of the template corresponding to each deep network layer based on the contributions of them to the target recognition can also help with more accurate target location, where target saliency information can be utilized as a tool for state estimation.
Collapse
|
17
|
Liu Y, Sui X, Kuang X, Liu C, Gu G, Chen Q. Object Tracking Based on Vector Convolutional Network and Discriminant Correlation Filters. SENSORS 2019; 19:s19081818. [PMID: 30995781 PMCID: PMC6515056 DOI: 10.3390/s19081818] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 04/04/2019] [Accepted: 04/10/2019] [Indexed: 11/30/2022]
Abstract
Due to the fast speed and high efficiency, discriminant correlation filter (DCF) has drawn great attention in online object tracking recently. However, with the improvement of performance, the costs are the increase in parameters and the decline of speed. In this paper, we propose a novel visual tracking algorithm, namely VDCFNet, and combine DCF with a vector convolutional network (VCNN). We replace one traditional convolutional filter with two novel vector convolutional filters in the convolutional stage of our network. This enables our model with few memories (only 59 KB) trained offline to learn the generic image features. In the online tracking stage, we propose a coarse-to-fine search strategy to solve drift problems under fast motion. Besides, we update model selectively to speed up and increase robustness. The experiments on OTB benchmarks demonstrate that our proposed VDCFNet can achieve a competitive performance while running over real-time speed.
Collapse
Affiliation(s)
- Yuan Liu
- School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Xiubao Sui
- School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Xiaodong Kuang
- School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Chengwei Liu
- School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Guohua Gu
- School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Qian Chen
- School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| |
Collapse
|
18
|
Fan H, Ling H. Parallel Tracking and Verifying. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4130-4144. [PMID: 30892205 DOI: 10.1109/tip.2019.2904789] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Visual object tracking has played a crucial role in computer vision with many applications. Being intensively studied in recent decades, visual tracking has witnessed great advances in either speed (e.g., with correlation filters) or accuracy (e.g., with deep features). Real-time and high accuracy tracking algorithms, nevertheless, remain scarce. In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing ideas from the success of parallel tracking and mapping in visual SLAM. The proposed PTAV framework typically consists of two components, a (base) tracker T and a verifier V, working in parallel on two separate threads. The tracker T aims at providing a super real-time tracking inference and is expected to perform well most of the time; by contrast, the verifier V validates the tracking results and corrects T when needed. The key innovation is that, V does not work on every frame but only upon the requests from T; on the other end, T may adjust the tracking according to the feedback from V. With such collaboration, PTAV enjoys both high efficiency provided by T and strong discriminative power by V. Meanwhile, in order to adapt V to object appearance changes, we maintain a dynamic target template pool for adaptive verification, resulting in further improvement. In our extensive experiments on OTB2015, TC128, UAV20L and VOT2016, PTAV achieves top tracking accuracy among all real-time trackers, and in fact even outperforms many deep learning based algorithms. Moreover, as a general framework, PTAV is very flexible with great potentials for future improvement and generalization.
Collapse
|
19
|
|
20
|
Zhong B, Bai B, Li J, Zhang Y, Fu Y. Hierarchical Tracking by Reinforcement Learning based Searching and Coarse-to-fine Verifying. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:2331-2341. [PMID: 30530365 DOI: 10.1109/tip.2018.2885238] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A class-agnostic tracker typically consists of three key components, i.e., its motion model, its target appearance model, and its updating strategy. However, most recent topperforming trackers mainly focus on constructing complicated appearance models and updating strategies, while using comparatively simple and heuristic motion models that may result in an inefficient search and degrade the tracking performance. To address this issue, we propose a hierarchical tracker that learns to move and track based on the combination of data-driven search at the coarse level, and coarse-to-fine verification at the fine level. At the coarse level, a data-driven motion model learned from deep recurrent reinforcement learning provides our tracker with coarse localization of an object. By formulating motion search as an action-decision problem in reinforcement learning, our tracker utilizes a recurrent convolutional neural network based deep Q-network to effectively learn data-driven searching policies. The learned motion model cannot only significantly reduce the search space, but also provide more reliable interested regions for further verifying. At the fine level, a kernelized correlation filter (KCF) based appearance model is adopted to densely yet efficiently verify a local region centered on the predicted location from the motion model. Through using of circulant matrices and fast Fourier transformation, a large number of candidate samples in the local region can be efficiently and effectively evaluated by the KCF based appearance model. Finally, a simple yet robust estimator is designed to analyze possible tracking failure. The experiments on OTB50 and OTB100 illustrate that our tracker achieves better performance than the state-of-the-art trackers.
Collapse
|
21
|
|