1
|
Zhu X, Sun S, Lin L, Wu Y, Ma X. Transformer-based approaches for neuroimaging: an in-depth review of their role in classification and regression tasks. Rev Neurosci 2024:revneuro-2024-0088. [PMID: 39333087 DOI: 10.1515/revneuro-2024-0088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 09/13/2024] [Indexed: 09/29/2024]
Abstract
In the ever-evolving landscape of deep learning (DL), the transformer model emerges as a formidable neural network architecture, gaining significant traction in neuroimaging-based classification and regression tasks. This paper presents an extensive examination of transformer's application in neuroimaging, surveying recent literature to elucidate its current status and research advancement. Commencing with an exposition on the fundamental principles and structures of the transformer model and its variants, this review navigates through the methodologies and experimental findings pertaining to their utilization in neuroimage classification and regression tasks. We highlight the transformer model's prowess in neuroimaging, showcasing its exceptional performance in classification endeavors while also showcasing its burgeoning potential in regression tasks. Concluding with an assessment of prevailing challenges and future trajectories, this paper proffers insights into prospective research directions. By elucidating the current landscape and envisaging future trends, this review enhances comprehension of transformer's role in neuroimaging tasks, furnishing valuable guidance for further inquiry.
Collapse
Affiliation(s)
- Xinyu Zhu
- Department of Biomedical Engineering, 12496 College of Chemistry and Life Sciences, Beijing University of Technology , Beijing, 100124, China
| | - Shen Sun
- Department of Biomedical Engineering, 12496 College of Chemistry and Life Sciences, Beijing University of Technology , Beijing, 100124, China
| | - Lan Lin
- Department of Biomedical Engineering, 12496 College of Chemistry and Life Sciences, Beijing University of Technology , Beijing, 100124, China
| | - Yutong Wu
- Department of Biomedical Engineering, 12496 College of Chemistry and Life Sciences, Beijing University of Technology , Beijing, 100124, China
| | - Xiangge Ma
- Department of Biomedical Engineering, 12496 College of Chemistry and Life Sciences, Beijing University of Technology , Beijing, 100124, China
| |
Collapse
|
2
|
Discriminative appearance model with template spatial adjustment for visual object tracking. Soft comput 2023. [DOI: 10.1007/s00500-023-07820-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
3
|
Zha Z, Yuan X, Wen B, Zhang J, Zhu C. Nonconvex Structural Sparsity Residual Constraint for Image Restoration. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12440-12453. [PMID: 34161250 DOI: 10.1109/tcyb.2021.3084931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article proposes a novel nonconvex structural sparsity residual constraint (NSSRC) model for image restoration, which integrates structural sparse representation (SSR) with nonconvex sparsity residual constraint (NC-SRC). Although SSR itself is powerful for image restoration by combining the local sparsity and nonlocal self-similarity in natural images, in this work, we explicitly incorporate the novel NC-SRC prior into SSR. Our proposed approach provides more effective sparse modeling for natural images by applying a more flexible sparse representation scheme, leading to high-quality restored images. Moreover, an alternating minimizing framework is developed to solve the proposed NSSRC-based image restoration problems. Extensive experimental results on image denoising and image deblocking validate that the proposed NSSRC achieves better results than many popular or state-of-the-art methods over several publicly available datasets.
Collapse
|
4
|
Discriminative Deep Non-Linear Dictionary Learning for Visual Object Tracking. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11025-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
5
|
Xinyi Wei, Lin Z, Liu T, Zhang L. Probabilistic Matrix Factorization for Visual Tracking. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1134/s1054661822010114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Dun Y, Da Z, Yang S, Qian X. Image super-resolution based on residually dense distilled attention network. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
7
|
Du Y, Han G, Quan Y, Yu Z, Wong HS, Chen CLP, Zhang J. Exploiting Global Low-Rank Structure and Local Sparsity Nature for Tensor Completion. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3898-3910. [PMID: 30047919 DOI: 10.1109/tcyb.2018.2853122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In the era of data science, a huge amount of data has emerged in the form of tensors. In many applications, the collected tensor data are incomplete with missing entries, which affects the analysis process. In this paper, we investigate a new method for tensor completion, in which a low-rank tensor approximation is used to exploit the global structure of data, and sparse coding is used for elucidating the local patterns of data. Regarding the characterization of low-rank structures, a weighted nuclear norm for the tensor is introduced. Meanwhile, an orthogonal dictionary learning process is incorporated into sparse coding for more effective discovery of the local details of data. By simultaneously using the global patterns and local cues, the proposed method can effectively and efficiently recover the lost information of incomplete tensor data. The capability of the proposed method is demonstrated with several experiments on recovering MRI data and visual data, and the experimental results have shown the excellent performance of the proposed method in comparison with recent related methods.
Collapse
|
8
|
Sun J, Chen Q, Sun J, Zhang T, Fang W, Wu X. Graph-structured multitask sparsity model for visual tracking. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.02.043] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
9
|
Wang X, Wang S, Huang Z, Du Y. Structure regularized sparse coding for data representation. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.02.035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Zhao Z, Feng G, Zhang L, Zhu J, Shen Q. Novel orthogonal based collaborative dictionary learning for efficient face recognition. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.09.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
|
12
|
Zhou T, Liu F, Bhaskar H, Yang J, Zhang H, Cai P. Online discriminative dictionary learning for robust object tracking. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.10.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Li A, Lu Z, Wang L, Han P, Wen JR. Large-Scale Sparse Learning From Noisy Tags for Semantic Segmentation. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:253-263. [PMID: 28114055 DOI: 10.1109/tcyb.2016.2631528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we present a large-scale sparse learning (LSSL) approach to solve the challenging task of semantic segmentation of images with noisy tags. Different from the traditional strongly supervised methods that exploit pixel-level labels for semantic segmentation, we make use of much weaker supervision (i.e., noisy tags of images) and then formulate the task of semantic segmentation as a weakly supervised learning (WSL) problem from the view point of noise reduction of superpixel labels. By learning the data manifolds, we transform the WSL problem into an LSSL problem. Based on nonlinear approximation and dimension reduction techniques, a linear-time-complexity algorithm is developed to solve the LSSL problem efficiently. We further extend the LSSL approach to visual feature refinement for semantic segmentation. The experiments demonstrate that the proposed LSSL approach can achieve promising results in semantic segmentation of images with noisy tags.
Collapse
|
14
|
Yu J, Yang X, Gao F, Tao D. Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4014-4024. [PMID: 27529881 DOI: 10.1109/tcyb.2016.2591583] [Citation(s) in RCA: 105] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.
Collapse
|
15
|
Wu L, Wang Y, Pan S. Exploiting Attribute Correlations: A Novel Trace Lasso-Based Weakly Supervised Dictionary Learning Method. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4497-4508. [PMID: 28113537 DOI: 10.1109/tcyb.2016.2612686] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
It is now well established that sparse representation models are working effectively for many visual recognition tasks, and have pushed forward the success of dictionary learning therein. Recent studies over dictionary learning focus on learning discriminative atoms instead of purely reconstructive ones. However, the existence of intraclass diversities (i.e., data objects within the same category but exhibit large visual dissimilarities), and interclass similarities (i.e., data objects from distinct classes but share much visual similarities), makes it challenging to learn effective recognition models. To this end, a large number of labeled data objects are required to learn models which can effectively characterize these subtle differences. However, labeled data objects are always limited to access, committing it difficult to learn a monolithic dictionary that can be discriminative enough. To address the above limitations, in this paper, we propose a weakly-supervised dictionary learning method to automatically learn a discriminative dictionary by fully exploiting visual attribute correlations rather than label priors. In particular, the intrinsic attribute correlations are deployed as a critical cue to guide the process of object categorization, and then a set of subdictionaries are jointly learned with respect to each category. The resulting dictionary is highly discriminative and leads to intraclass diversity aware sparse representations. Extensive experiments on image classification and object recognition are conducted to show the effectiveness of our approach.
Collapse
|
16
|
Du D, Qi H, Wen L, Tian Q, Huang Q, Lyu S. Geometric Hypergraph Learning for Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4182-4195. [PMID: 27875238 DOI: 10.1109/tcyb.2016.2626275] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Graph-based representation is widely used in visual tracking field by finding correct correspondences between target parts in different frames. However, most graph-based trackers consider pairwise geometric relations between local parts. They do not make full use of the target's intrinsic structure, thereby making the representation easily disturbed by errors in pairwise affinities when large deformation or occlusion occurs. In this paper, we propose a geometric hypergraph learning-based tracking method, which fully exploits high-order geometric relations among multiple correspondences of parts in different frames. Then visual tracking is formulated as the mode-seeking problem on the hypergraph in which vertices represent correspondence hypotheses and hyperedges describe high-order geometric relations among correspondences. Besides, a confidence-aware sampling method is developed to select representative vertices and hyperedges to construct the geometric hypergraph for more robustness and scalability. The experiments are carried out on three challenging datasets (VOT2014, OTB100, and Deform-SOT) to demonstrate that our method performs favorably against other existing trackers.
Collapse
|
17
|
Chen Z, You X, Zhong B, Li J, Tao D. Dynamically Modulated Mask Sparse Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3706-3718. [PMID: 28113386 DOI: 10.1109/tcyb.2016.2577718] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Visual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model. Our contributions are twofold. First, mask templates produced by frame difference are introduced into our template dictionary. Since the mask templates contain abundant structure information of corruptions, the model could encode information about the corruptions on the object more efficiently. Meanwhile, the robustness of the tracker is further enhanced by adopting system dynamic, which considers the moving tendency of the object. Second, we provide the theoretic guarantee that by adapting the modulated template dictionary system, our new sparse model can be solved by the accelerated proximal gradient algorithm as efficient as in traditional sparse tracking methods. Extensive experimental evaluations demonstrate that our method significantly outperforms 21 other cutting-edge algorithms in both speed and tracking accuracy, especially when there are challenges such as pose variation, occlusion, and illumination changes.
Collapse
|
18
|
Martinel N, Foresti GL, Micheloni C. Person Reidentification in a Distributed Camera Network Framework. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3530-3541. [PMID: 27249845 DOI: 10.1109/tcyb.2016.2568264] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Plenty of research has been conducted to obtain the best reidentification performance between a single camera-pairs. None of the current approaches has addressed the reidentification in a camera network by considering the network topology (i.e., the structure of the monitored environment). We introduce a distributed network person reidentification framework which introduces the following contributions. 1) a camera matching cost to measure the reidentification performance between nodes of the network and 2) a derivation of the distance vector algorithm which allows to learn the network topology thus to prioritize and limit the cameras inquired for the matching of the probe. Results on three benchmark datasets show that the network topology can be learned in an unsupervised fashion and network-wise reidentification performance improves. As a side effect, we obtain that the communication bandwidth usage is reduced.
Collapse
|
19
|
Yuan Y, Feng Y, Lu X. Statistical Hypothesis Detector for Abnormal Event Detection in Crowded Scenes. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3597-3608. [PMID: 27323389 DOI: 10.1109/tcyb.2016.2572609] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Abnormal event detection is now a challenging task, especially for crowded scenes. Many existing methods learn a normal event model in the training phase, and events which cannot be well represented are treated as abnormalities. However, they fail to make use of abnormal event patterns, which are elements to comprise abnormal events. Moreover, normal patterns in testing videos may be divergent from training ones, due to the existence of abnormalities. To address these problems, in this paper, an abnormality detector is proposed to detect abnormal events based on a statistical hypothesis test. The proposed detector treats each sample as a combination of a set of event patterns. Due to the unavailability of labeled abnormalities for training, abnormal patterns are adaptively extracted from incoming unlabeled testing samples. Contributions of this paper are listed as follows: 1) we introduce the idea of a statistical hypothesis test into the framework of abnormality detection, and abnormal events are identified as ones containing abnormal event patterns while possessing high abnormality detector scores; 2) due to the complexity of video events, noise seldom follows a simple distribution. For this reason, we approximate the complex noise distribution by employing a mixture of Gaussian. This benefits the modeling of video events and improves abnormality detection accuracies; and 3) because of the existence of abnormalities, there are always some unusually occurring normal events in the testing videos, which differ from the training ones. To represent normal events precisely, an online updating strategy is proposed to cover these cases in the normal event patterns. As a result, false detections are eliminated mostly. Extensive experiments and comparisons with state-of-the-art methods verify the effectiveness of the proposed algorithm.
Collapse
|
20
|
Zhang L, Suganthan PN. Visual Tracking With Convolutional Random Vector Functional Link Network. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3243-3253. [PMID: 27542188 DOI: 10.1109/tcyb.2016.2588526] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Deep neural network-based methods have recently achieved excellent performance in visual tracking task. As very few training samples are available in visual tracking task, those approaches rely heavily on extremely large auxiliary dataset such as ImageNet to pretrain the model. In order to address the discrepancy between the source domain (the auxiliary data) and the target domain (the object being tracked), they need to be finetuned during the tracking process. However, those methods suffer from sensitivity to the hyper-parameters such as learning rate, maximum number of epochs, size of mini-batch, and so on. Thus, it is worthy to investigate whether pretraining and fine tuning through conventional back-prop is essential for visual tracking. In this paper, we shed light on this line of research by proposing convolutional random vector functional link (CRVFL) neural network, which can be regarded as a marriage of the convolutional neural network and random vector functional link network, to simplify the visual tracking system. The parameters in the convolutional layer are randomly initialized and kept fixed. Only the parameters in the fully connected layer need to be learned. We further propose an elegant approach to update the tracker. In the widely used visual tracking benchmark, without any auxiliary data, a single CRVFL model achieves 79.0% with a threshold of 20 pixels for the precision plot. Moreover, an ensemble of CRVFL yields comparatively the best result of 86.3%.
Collapse
|
21
|
Wang Y, Tang YY, Li L. Correntropy Matching Pursuit With Application to Robust Digit and Face Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1354-1366. [PMID: 27076481 DOI: 10.1109/tcyb.2016.2544852] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
As an efficient sparse representation algorithm, orthogonal matching pursuit (OMP) has attracted massive attention in recent years. However, OMP and most of its variants estimate the sparse vector using the mean square error criterion, which depends on the Gaussianity assumption of the error distribution. A violation of this assumption, e.g., non-Gaussian noise, may lead to performance degradation. In this paper, a correntropy matching pursuit (CMP) method is proposed to alleviate this problem of OMP. Unlike many other matching pursuit methods, our method is independent of the error distribution. We show that CMP can adaptively assign small weights on severely corrupted entries of data and large weights on clean ones, thus reducing the effect of large noise. Our another contribution is to develop a robust sparse representation-based recognition method based on CMP. Experiments on synthetic and real data show the effectiveness of our method for both sparse approximation and pattern recognition, especially for noisy, corrupted, and incomplete data.
Collapse
|
22
|
Wang X, Shen S, Ning C, Zhang Y, Lv G. Robust object tracking based on local discriminative sparse representation. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2017; 34:533-544. [PMID: 28375323 DOI: 10.1364/josaa.34.000533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Despite much success in the application of sparse representation to object tracking, most of the existing sparse-representation-based tracking methods are still not robust enough for challenges such as pose variations, illumination changes, occlusions, and background distractions. In this paper, we propose a robust object-tracking algorithm via local discriminative sparse representation. The key idea in our method is to develop what we believe is a novel local discriminative sparse representation method for object appearance modeling, which can be helpful to overcome issues such as appearance variations and occlusions. Then a robust tracker based on the local discriminative sparse appearance model is proposed to track the object over time. Additionally, an online dictionary update strategy is introduced in our approach for further robustness. Experimental results on challenging sequences demonstrate the effectiveness and robustness of our proposed method.
Collapse
|
23
|
Lu X, Yuan Y, Zheng X. Joint Dictionary Learning for Multispectral Change Detection. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:884-897. [PMID: 26955060 DOI: 10.1109/tcyb.2016.2531179] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Change detection is one of the most important applications of remote sensing technology. It is a challenging task due to the obvious variations in the radiometric value of spectral signature and the limited capability of utilizing spectral information. In this paper, an improved sparse coding method for change detection is proposed. The intuition of the proposed method is that unchanged pixels in different images can be well reconstructed by the joint dictionary, which corresponds to knowledge of unchanged pixels, while changed pixels cannot. First, a query image pair is projected onto the joint dictionary to constitute the knowledge of unchanged pixels. Then reconstruction error is obtained to discriminate between the changed and unchanged pixels in the different images. To select the proper thresholds for determining changed regions, an automatic threshold selection strategy is presented by minimizing the reconstruction errors of the changed pixels. Adequate experiments on multispectral data have been tested, and the experimental results compared with the state-of-the-art methods prove the superiority of the proposed method. Contributions of the proposed method can be summarized as follows: 1) joint dictionary learning is proposed to explore the intrinsic information of different images for change detection. In this case, change detection can be transformed as a sparse representation problem. To the authors' knowledge, few publications utilize joint learning dictionary in change detection; 2) an automatic threshold selection strategy is presented, which minimizes the reconstruction errors of the changed pixels without the prior assumption of the spectral signature. As a result, the threshold value provided by the proposed method can adapt to different data due to the characteristic of joint dictionary learning; and 3) the proposed method makes no prior assumption of the modeling and the handling of the spectral signature, which can be adapted to different data.
Collapse
|
24
|
Yang Y, Hu W, Xie Y, Zhang W, Zhang T. Temporal Restricted Visual Tracking Via Reverse-Low-Rank Sparse Learning. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:485-498. [PMID: 27046920 DOI: 10.1109/tcyb.2016.2519532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
An effective representation model, which aims to mine the most meaningful information in the data, plays an important role in visual tracking. Some recent particle-filter-based trackers achieve promising results by introducing the low-rank assumption into the representation model. However, their assumed low-rank structure of candidates limits the robustness when facing severe challenges such as abrupt motion. To avoid the above limitation, we propose a temporal restricted reverse-low-rank learning algorithm for visual tracking with the following advantages: 1) the reverse-low-rank model jointly represents target and background templates via candidates, which exploits the low-rank structure among consecutive target observations and enforces the temporal consistency of target in a global level; 2) the appearance consistency may be broken when target suffers from sudden changes. To overcome this issue, we propose a local constraint via l1,2 mixed-norm, which can not only ensures the local consistency of target appearance, but also tolerates the sudden changes between two adjacent frames; and 3) to alleviate the inference of unreasonable representation values due to outlier candidates, an adaptive weighted scheme is designed to improve the robustness of the tracker. By evaluating on 26 challenge video sequences, the experiments show the effectiveness and favorable performance of the proposed algorithm against 12 state-of-the-art visual trackers.
Collapse
|
25
|
Zeng K, Yu J, Wang R, Li C, Tao D. Coupled Deep Autoencoder for Single Image Super-Resolution. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:27-37. [PMID: 26625442 DOI: 10.1109/tcyb.2015.2501373] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Sparse coding has been widely applied to learning-based single image super-resolution (SR) and has obtained promising performance by jointly learning effective representations for low-resolution (LR) and high-resolution (HR) image patch pairs. However, the resulting HR images often suffer from ringing, jaggy, and blurring artifacts due to the strong yet ad hoc assumptions that the LR image patch representation is equal to, is linear with, lies on a manifold similar to, or has the same support set as the corresponding HR image patch representation. Motivated by the success of deep learning, we develop a data-driven model coupled deep autoencoder (CDA) for single image SR. CDA is based on a new deep architecture and has high representational capability. CDA simultaneously learns the intrinsic representations of LR and HR image patches and a big-data-driven function that precisely maps these LR representations to their corresponding HR representations. Extensive experimentation demonstrates the superior effectiveness and efficiency of CDA for single image SR compared to other state-of-the-art methods on Set5 and Set14 datasets.
Collapse
|
26
|
Jiang P, Cheng Y, Wang X, Feng Z. Unfalsified Visual Servoing for Simultaneous Object Recognition and Pose Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:3032-3046. [PMID: 27723610 DOI: 10.1109/tcyb.2015.2495157] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In a complex environment, simultaneous object recognition and tracking has been one of the challenging topics in computer vision and robotics. Current approaches are usually fragile due to spurious feature matching and local convergence for pose determination. Once a failure happens, these approaches lack a mechanism to recover automatically. In this paper, data-driven unfalsified control is proposed for solving this problem in visual servoing. It recognizes a target through matching image features with a 3-D model and then tracks them through dynamic visual servoing. The features can be falsified or unfalsified by a supervisory mechanism according to their tracking performance. Supervisory visual servoing is repeated until a consensus between the model and the selected features is reached, so that model recognition and object tracking are accomplished. Experiments show the effectiveness and robustness of the proposed algorithm to deal with matching and tracking failures caused by various disturbances, such as fast motion, occlusions, and illumination variation.
Collapse
|
27
|
Kruger U. Learning Linear Representation of Space Partitioning Trees Based on Unsupervised Kernel Dimension Reduction. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:3427-3438. [PMID: 28055934 DOI: 10.1109/tcyb.2015.2507362] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Space partitioning trees, which sequentially divide and subdivide a space into disjoint subsets using splitting hyperplanes, play a key role in accelerating the query of samples in the cybernetics and computer vision domains. Associated methods, however, suffer from the curse of dimensionality or stringent assumptions on the data distribution. This paper presents a new concept, termed kernel dimension reduction-tree (KDR-tree), that relies on linear projections computed based on an unsupervised kernel dimension reduction approach. The proposed concept does not rely on any assumption on the data distribution and can capture higher-order statistical information encapsulated within the data. This paper then develops two variants of the KDR-tree concept: 1) to handle residual data [i.e., the residual-based KDR-tree (rKDR-tree) algorithm] and 2) to cope with larger datasets, [i.e., the sampling-based KDR-tree (sKDR-tree) algorithm]. By directly comparing the KDR-tree concept to competitive techniques, involving several benchmark datasets, this paper shows that the sKDR-tree yields a better performance for non-Gaussian distributed datasets. Based on the analysis of three datasets, this paper highlights, experimentally, that the rKDR-tree has the potential to discover the intrinsic dimension. This paper also provides a theoretical analysis about the KDR-tree concept to outline why it outperforms existing techniques if the data distribution is non-Gaussian.
Collapse
|
28
|
|
29
|
Li X, Han Z, Wang L, Lu H. Visual Tracking via Random Walks on Graph Model. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:2144-2155. [PMID: 26292358 DOI: 10.1109/tcyb.2015.2466437] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we formulate visual tracking as random walks on graph models with nodes representing superpixels and edges denoting relationships between superpixels. We integrate two novel graphs with the theory of Markov random walks, resulting in two Markov chains. First, an ergodic Markov chain is enforced to globally search for the candidate nodes with similar features to the template nodes. Second, an absorbing Markov chain is utilized to model the temporal coherence between consecutive frames. The final confidence map is generated by a structural model which combines both appearance similarity measurement derived by the random walks and internal spatial layout demonstrated by different target parts. The effectiveness of the proposed Markov chains as well as the structural model is evaluated both qualitatively and quantitatively. Experimental results on challenging sequences show that the proposed tracking algorithm performs favorably against state-of-the-art methods.
Collapse
|
30
|
Abul Aziz MA, Niu J, Zhao X, Li X. Efficient and Robust Learning for Sustainable and Reacquisition-Enabled Hand Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:945-958. [PMID: 25898327 DOI: 10.1109/tcyb.2015.2418275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The use of machine learning approaches for long-term hand tracking poses some major challenges such as attaining robustness to inconsistencies in lighting, scale and object appearances, background clutter, and total object occlusion/disappearance. To address these issues in this paper, we present a robust machine learning approach based on enhanced particle filter trackers. The inherent drawbacks associated with the particle filter approach, i.e., sample degeneration and sample impoverishment, are minimized by infusing the particle filter with the mean shift approach. Moreover, to instill our tracker with reacquisition ability, we propose a rotation invariant and efficient detection framework named beta histograms of oriented gradients. Our robust appearance model operates on the red, green, blue color histogram and our newly proposed rotation invariant noise compensated local binary patterns descriptor, which is a noise compensated, rotation invariant version of the local binary patterns descriptor. Through our experiments, we demonstrate that our proposed hand tracker performs favorably against state-of-the-art algorithms on numerous challenging video sequences of hand postures, and overcomes the largely unsolved problem of redetecting hands after they vanish and reappear into the frame.
Collapse
|
31
|
Liu M, Zhang D. Pairwise Constraint-Guided Sparse Learning for Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:298-310. [PMID: 26151948 DOI: 10.1109/tcyb.2015.2401733] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Feature selection aims to identify the most informative features for a compact and accurate data representation. As typical supervised feature selection methods, Lasso and its variants using L1-norm-based regularization terms have received much attention in recent studies, most of which use class labels as supervised information. Besides class labels, there are other types of supervised information, e.g., pairwise constraints that specify whether a pair of data samples belong to the same class (must-link constraint) or different classes (cannot-link constraint). However, most of existing L1-norm-based sparse learning methods do not take advantage of the pairwise constraints that provide us weak and more general supervised information. For addressing that problem, we propose a pairwise constraint-guided sparse (CGS) learning method for feature selection, where the must-link and the cannot-link constraints are used as discriminative regularization terms that directly concentrate on the local discriminative structure of data. Furthermore, we develop two variants of CGS, including: 1) semi-supervised CGS that utilizes labeled data, pairwise constraints, and unlabeled data and 2) ensemble CGS that uses the ensemble of pairwise constraint sets. We conduct a series of experiments on a number of data sets from University of California-Irvine machine learning repository, a gene expression data set, two real-world neuroimaging-based classification tasks, and two large-scale attribute classification tasks. Experimental results demonstrate the efficacy of our proposed methods, compared with several established feature selection methods.
Collapse
|
32
|
Tang YY, Wang Y, Li L, Chen CLP. Structural Atomic Representation for Classification. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:2905-2913. [PMID: 25622336 DOI: 10.1109/tcyb.2015.2389232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Recently, a large family of representation-based classification methods have been proposed and attracted great interest in pattern recognition and computer vision. This paper presents a general framework, termed as atomic representation-based classifier (ARC), to systematically unify many of them. By defining different atomic sets, most popular representation-based classifiers (RCs) follow ARC as special cases. Despite good performance, most RCs treat test samples separately and fail to consider the correlation between the test samples. In this paper, we develop a structural ARC (SARC) based on Bayesian analysis and generalizing a Markov random field-based multilevel logistic prior. The proposed SARC can utilize the structural information among the test data to further improve the performance of every RC belonging to the ARC framework. The experimental results on both synthetic and real-database demonstrate the effectiveness of the proposed framework.
Collapse
|
33
|
Lai ZR, Dai DQ, Ren CX, Huang KK. Discriminative and Compact Coding for Robust Face Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1900-1912. [PMID: 25343776 DOI: 10.1109/tcyb.2014.2361770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we propose a novel discriminative and compact coding (DCC) for robust face recognition. It introduces multiple error measurements into regression model. They collaborate to tune regression codes of different properties (sparsity, compactness, high discriminating ability, etc.), to further improve robustness and adaptivity of the regression model. We propose two types of coding models: 1) multiscale error measurements that produces sparse and highly discriminative codes and 2) inspires within-class collaborative representation that produces sparse and compact codes. The update of codes and the combination of different errors are automatically processed. DCC is also robust to the choice of parameters, producing stable regression residuals which are crucial to classification. Extensive experiments on benchmark datasets show that DCC has promising performance and outperforms other state-of-the-art regression models.
Collapse
|
34
|
Yin Y, Xu D, Wang X, Bai M. Online State-Based Structured SVM Combined With Incremental PCA for Robust Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1988-2000. [PMID: 25700478 DOI: 10.1109/tcyb.2014.2363078] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we propose a robust state-based structured support vector machine (SVM) tracking algorithm combined with incremental principal component analysis (PCA). Different from the current structured SVM for tracking, our method directly learns and predicts the object's states and not the 2-D translation transformation during tracking. We define the object's virtual state to combine the state-based structured SVM and incremental PCA. The virtual state is considered as the most confident state of the object in every frame. The incremental PCA is used to update the virtual feature vector corresponding to the virtual state and the principal subspace of the object's feature vectors. In order to improve the accuracy of the prediction, all the feature vectors are projected onto the principal subspace in the learning and prediction process of the state-based structured SVM. Experimental results on several challenging video sequences validate the effectiveness and robustness of our approach.
Collapse
|
35
|
Quinton JC, Smeding A. Dynamic competition and binding of concepts through time and space. Cogn Process 2015. [PMID: 26220703 DOI: 10.1007/s10339-015-0674-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Models of implicit stereotypes (e.g., association of male with math or female with language) usually explain the faster responses observed for stereotype-congruent trials in the Implicit Association Test (IAT) by requiring a fundamental opposition between the male and female concepts (or math-language), limiting the decision-making dynamics to abstract dimensions. This paper introduces alternate models exploiting the sensorimotor dimensions of the IAT, which naturally account for the opposition between concepts, because typically mapped on opposite corners of the screen space and on different response actions. In addition to the emergence of the IAT effect, dynamic characteristics of the decision-making process within these models are tested against human data, obtained with a mouse-tracking adapted IAT procedure.
Collapse
|
36
|
Yang Y, Xie Y, Zhang W, Hu W, Tan Y. Global Coupled Learning and Local Consistencies Ensuring for sparse-based tracking. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.12.060] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
37
|
Huang W, Wang X, Li J, Jin Z. A Novel Feature Extraction Method Based on Collaborative Representation for Face Recognition. INT J PATTERN RECOGN 2015. [DOI: 10.1142/s0218001415560042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Representation-based classification have received much attention in the field of face recognition. Collaborative representation-based classification (CRC) has shown the robustness and high performance. In this paper, we proposed a new feature extraction method-based collaborative representation. Firstly, we get the coefficients of all face samples by collaborative representation. Then we define the inter-class reconstructive errors and intra-class reconstructive errors for each sample. After that, Fisher criterion is used to get the discriminative feature. At last, CRC is executed to get the identification results in the new feature space. Different from other feature extraction methods, the proposed method integrates the classification criterion into the feature extraction. So the feature space we get fits the classifier better. Experiment results on several face databases show that the proposed method is more effective than other state-of-the-art face recognition methods.
Collapse
Affiliation(s)
- Wei Huang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P. R. China
- Department of Computer Science and Engineering, Hanshan Normal University, Chaozhou, P. R. China
| | - Xiaohui Wang
- Department of Computer Science and Engineering, Hanshan Normal University, Chaozhou, P. R. China
| | - Jianzhong Li
- Department of Mathematics and Statics, Hanshan Normal University, Chaozhou, P. R. China
| | - Zhong Jin
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P. R. China
| |
Collapse
|