1
|
Shen X, Zhou Y, Yuan YH, Yang X, Lan L, Zheng Y. Contrastive Transformer Hashing for Compact Video Representation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5992-6003. [PMID: 37903046 DOI: 10.1109/tip.2023.3326994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Video hashing learns compact representation by mapping video into low-dimensional Hamming space and has achieved promising performance in large-scale video retrieval. It is challenging to effectively exploit temporal and spatial structure in an unsupervised setting. To fulfill this gap, this paper proposes Contrastive Transformer Hashing (CTH) for effective video retrieval. Specifically, CTH develops a bidirectional transformer autoencoder, based on which visual reconstruction loss is proposed. CTH is more powerful to capture bidirectional correlations among frames than conventional unidirectional models. In addition, CTH devises multi-modality contrastive loss to reveal intrinsic structure among videos. CTH constructs inter-modality and intra-modality triplet sets and proposes multi-modality contrastive loss to exploit inter-modality and intra-modality similarities simultaneously. We perform video retrieval tasks on four benchmark datasets, i.e., UCF101, HMDB51, SVW30, FCVID using the learned compact hash representation, and extensive empirical results demonstrate the proposed CTH outperforms several state-of-the-art video hashing methods.
Collapse
|
2
|
Tian D, Gong C, Gong M, Wei Y, Feng X. Modeling Cardinality in Image Hashing. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:114-123. [PMID: 34236987 DOI: 10.1109/tcyb.2021.3089879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cardinality constraint, namely, constraining the number of nonzero outputs of models, has been widely used in structural learning. It can be used for modeling the dependencies between multidimensional labels. In hashing, the final outputs are also binary codes, which are similar to multidimensional labels. It has been validated that estimating how many 1's in a multidimensional label vector is easier than directly predicting which elements are 1 and estimating cardinality as a prior step will improve the classification performance. Hence, in this article, we incorporate cardinality constraint into the unsupervised image hashing problem. The proposed model is divided into two steps: 1) estimating the cardinalities of hashing codes and 2) then estimating which bits are 1. Unlike multidimensional labels that are known and fixed in the training phase, the hashing codes are generally learned through an iterative method and, therefore, their cardinalities are unknown and not fixed during the learning procedure. We use a neural network as a cardinality predictor and its parameters are jointly learned with the hashing code generator, which is an autoencoder in our model. The experiments demonstrate the efficiency of our proposed method.
Collapse
|
3
|
Wang H, Sun J, Luo X, Xiang W, Zhang S, Chen C, Hua XS. Toward Effective Domain Adaptive Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1285-1299. [PMID: 37027745 DOI: 10.1109/tip.2023.3242777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This paper studies the problem of unsupervised domain adaptive hashing, which is less-explored but emerging for efficient image retrieval, particularly for cross-domain retrieval. This problem is typically tackled by learning hashing networks with pseudo-labeling and domain alignment techniques. Nevertheless, these approaches usually suffer from overconfident and biased pseudo-labels and inefficient domain alignment without sufficiently exploring semantics, thus failing to achieve satisfactory retrieval performance. To tackle this issue, we present PEACE, a principled framework which holistically explores semantic information in both source and target data and extensively incorporates it for effective domain alignment. For comprehensive semantic learning, PEACE leverages label embeddings to guide the optimization of hash codes for source data. More importantly, to mitigate the effects of noisy pseudo-labels, we propose a novel method to holistically measure the uncertainty of pseudo-labels for unlabeled target data and progressively minimize them through alternative optimization under the guidance of the domain discrepancy. Additionally, PEACE effectively removes domain discrepancy in the Hamming space from two views. In particular, it not only introduces composite adversarial learning to implicitly explore semantic information embedded in hash codes, but also aligns cluster semantic centroids across domains to explicitly exploit label information. Experimental results on several popular domain adaptive retrieval benchmarks demonstrate the superiority of our proposed PEACE compared with various state-of-the-art methods on both single-domain and cross-domain retrieval tasks. Our source codes are available at https://github.com/WillDreamer/PEACE.
Collapse
|
4
|
Badr H, Wanas N, Fayek M. Unsupervised domain adaptation with post-adaptation labeled domain performance preservation. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
5
|
Huang F, Zhang L, Gao X. Domain Adaptation Preconceived Hashing for Unconstrained Visual Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5641-5655. [PMID: 33852407 DOI: 10.1109/tnnls.2021.3071127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Learning to hash has been widely applied for image retrieval due to the low storage and high retrieval efficiency. Existing hashing methods assume that the distributions of the retrieval pool (i.e., the data sets being retrieved) and the query data are similar, which, however, cannot truly reflect the real-world condition due to the unconstrained visual cues, such as illumination, pose, background, and so on. Due to the large distribution gap between the retrieval pool and the query set, the performances of traditional hashing methods are seriously degraded. Therefore, we first propose a new efficient but transferable hashing model for unconstrained cross-domain visual retrieval, in which the retrieval pool and the query sample are drawn from different but semantic relevant domains. Specifically, we propose a simple yet effective unsupervised hashing method, domain adaptation preconceived hashing (DAPH), toward learning domain-invariant hashing representation. Three merits of DAPH are observed: 1) to the best of our knowledge, we first propose unconstrained visual retrieval by introducing DA into hashing for learning transferable hashing codes; 2) a domain-invariant feature transformation with marginal discrepancy distance minimization and feature reconstruction constraint is learned, such that the hashing code is not only domain adaptive but content preserved; and 3) a DA preconceived quantization loss is proposed, which further guarantees the discrimination of the learned hashing code for sample retrieval. Extensive experiments on various benchmark data sets verify that our DAPH outperforms many state-of-the-art hashing methods toward unconstrained (unrestricted) instance retrieval in both single- and cross-domain scenarios.
Collapse
|
6
|
Shi W, Gong Y, Chen B, Hei X. Transductive Semisupervised Deep Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3713-3726. [PMID: 33544678 DOI: 10.1109/tnnls.2021.3054386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep hashing methods have shown their superiority to traditional ones. However, they usually require a large amount of labeled training data for achieving high retrieval accuracies. We propose a novel transductive semisupervised deep hashing (TSSDH) method which is effective to train deep convolutional neural network (DCNN) models with both labeled and unlabeled training samples. TSSDH method consists of the following four main ingredients. First, we extend the traditional transductive learning (TL) principle to make it applicable to DCNN-based deep hashing. Second, we introduce confidence levels for unlabeled samples to reduce adverse effects from uncertain samples. Third, we employ a Gaussian likelihood loss for hash code learning to sufficiently penalize large Hamming distances for similar sample pairs. Fourth, we design the large-margin feature (LMF) regularization to make the learned features satisfy that the distances of similar sample pairs are minimized and the distances of dissimilar sample pairs are larger than a predefined margin. Comprehensive experiments show that the TSSDH method can produce superior image retrieval accuracies compared to the representative semisupervised deep hashing methods under the same number of labeled training samples.
Collapse
|
7
|
Wang X, Hu P, Liu P, Peng D. Deep Semisupervised Class- and Correlation-Collapsed Cross-View Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1588-1601. [PMID: 32386174 DOI: 10.1109/tcyb.2020.2984489] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In many computer vision applications, an object can be represented by multiple different views. Due to the heterogeneous gap triggered by the different views' inconsistent distributions, it is challenging to exploit these multiview data for cross-view retrieval and classification. Motivated by the fact that both labeled and unlabeled data can enhance the relations among different views, this article proposes a deep cross-view learning framework called deep semisupervised classes- and correlation-collapsed cross-view learning (DSC3L) for cross-view retrieval and classification. Different from the existing methods which focus on the two-view problems, the proposed method learns U (generally U ≥ 2 ) view-specific deep transformations to gradually project U different views into a shared space in which the projection embraces the supervised learning and the unsupervised learning. We propose collapsing the instances of the same class from all views into the same point, with the instances of different classes into distinct points simultaneously. Second, to exploit the abundant unlabeled U -wise multiview data, we propose to collapse-correlated data into the same point, with the uncorrelated data into distinct points. Specifically, these two processes are formulated to minimize the two Kullback-Leibler (KL) divergences between the conditional distribution and a desirable one, for each instance. Finally, the two KL divergences are integrated into a joint optimization to learn a discriminative shared space. The experimental results on five widely used public datasets demonstrate the effectiveness of the proposed method.
Collapse
|
8
|
Ma Q, Chen E, Lin Z, Yan J, Yu Z, Ng WWY. Convolutional Multitimescale Echo State Network. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1613-1625. [PMID: 31217137 DOI: 10.1109/tcyb.2019.2919648] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
As efficient recurrent neural network (RNN) models, echo state networks (ESNs) have attracted widespread attention and been applied in many application domains in the last decade. Although they have achieved great success in modeling time series, a single ESN may have difficulty in capturing the multitimescale structures that naturally exist in temporal data. In this paper, we propose the convolutional multitimescale ESN (ConvMESN), which is a novel training-efficient model for capturing multitimescale structures and multiscale temporal dependencies of temporal data. In particular, a multitimescale memory encoder is constructed with a multireservoir structure, in which different reservoirs have recurrent connections with different skip lengths (or time spans). By collecting all past echo states in each reservoir, this multireservoir structure encodes the history of a time series as nonlinear multitimescale echo state representations (MESRs). Our visualization analysis verifies that the MESRs provide better discriminative features for time series. Finally, multiscale temporal dependencies of MESRs are learned by a convolutional layer. By leveraging the multitimescale reservoirs followed by a convolutional learner, the ConvMESN has not only efficient memory encoding ability for temporal data with multitimescale structures but also strong learning ability for complex temporal dependencies. Furthermore, the training-free reservoirs and the single convolutional layer provide high-computational efficiency for the ConvMESN to model complex temporal data. Extensive experiments on 18 multivariate time series (MTS) benchmark datasets and 3 skeleton-based action recognition datasets demonstrate that the ConvMESN captures multitimescale dynamics and outperforms existing methods.
Collapse
|
9
|
Zhou JT, Zhang H, Jin D, Peng X. Dual Adversarial Transfer for Sequence Labeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:434-446. [PMID: 31369370 DOI: 10.1109/tpami.2019.2931569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose a new architecture for addressing sequence labeling, termed Dual Adversarial Transfer Network (DATNet). Specifically, the proposed DATNet includes two variants, i.e., DATNet-F and DATNet-P, which are proposed to explore effective feature fusion between high and low resource. To address the noisy and imbalanced training data, we propose a novel Generalized Resource-Adversarial Discriminator (GRAD) and adopt adversarial training to boost model generalization. We investigate the effects of different components of DATNet across different domains and languages, and show that significant improvement can be obtained especially for low-resource data. Without augmenting any additional hand-crafted features, we achieve state-of-the-art performances on CoNLL, Twitter, PTB-WSJ, OntoNotes and Universal Dependencies with three popular sequence labeling tasks, i.e., Named entity recognition (NER), Part-of-Speech (POS) Tagging and Chunking.
Collapse
|
10
|
Li X, Zhang R, Wang Q, Zhang H. Autoencoder Constrained Clustering With Adaptive Neighbors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:443-449. [PMID: 32217483 DOI: 10.1109/tnnls.2020.2978389] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The conventional subspace clustering method obtains explicit data representation that captures the global structure of data and clusters via the associated subspace. However, due to the limitation of intrinsic linearity and fixed structure, the advantages of prior structure are limited. To address this problem, in this brief, we embed the structured graph learning with adaptive neighbors into the deep autoencoder networks such that an adaptive deep clustering approach, namely, autoencoder constrained clustering with adaptive neighbors (ACC_AN), is developed. The proposed method not only can adaptively investigate the nonlinear structure of data via a parameter-free graph built upon deep features but also can iteratively strengthen the correlations among the deep representations in the learning process. In addition, the local structure of raw data is preserved by minimizing the reconstruction error. Compared to the state-of-the-art works, ACC_AN is the first deep clustering method embedded with the adaptive structured graph learning to update the latent representation of data and structured deep graph simultaneously.
Collapse
|
11
|
Liu X, Fu Q, Wang D, Bai X, Wu X, Tao D. Distributed Complementary Binary Quantization for Joint Hash Table Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5312-5323. [PMID: 32078562 DOI: 10.1109/tnnls.2020.2965992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Building multiple hash tables serves as a very successful technique for gigantic data indexing, which can simultaneously guarantee both the search accuracy and efficiency. However, most of existing multitable indexing solutions, without informative hash codes and strong table complementarity, largely suffer from the table redundancy. To address the problem, we propose a complementary binary quantization (CBQ) method for jointly learning multiple tables and the corresponding informative hash functions in a centralized way. Based on CBQ, we further design a distributed learning algorithm (D-CBQ) to accelerate the training over the large-scale distributed data set. The proposed (D-)CBQ exploits the power of prototype-based incomplete binary coding to well align the data distributions in the original space and the Hamming space and further utilizes the nature of multi-index search to jointly reduce the quantization loss. (D-)CBQ possesses several attractive properties, including the extensibility for generating long hash codes in the product space and the scalability with linear training time. Extensive experiments on two popular large-scale tasks, including the Euclidean and semantic nearest neighbor search, demonstrate that the proposed (D-)CBQ enjoys efficient computation, informative binary quantization, and strong table complementarity, which together help significantly outperform the state of the arts, with up to 57.76% performance gains relatively.
Collapse
|
12
|
Liu H, Li X, Zhang S, Tian Q. Adaptive Hashing With Sparse Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4318-4329. [PMID: 31899436 DOI: 10.1109/tnnls.2019.2954856] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing offers a desirable and effective solution for efficiently retrieving the nearest neighbors from large-scale data because of its low storage and computation costs. One of the most appealing techniques for hashing learning is matrix factorization. However, most hashing methods focus only on building the mapping relationships between the Euclidean and Hamming spaces and, unfortunately, underestimate the naturally sparse structures of the data. In addition, parameter tuning is always a challenging and head-scratching problem for sparse hashing learning. To address these problems, in this article, we propose a novel hashing method termed adaptively sparse matrix factorization hashing (SMFH), which exploits sparse matrix factorization to explore the parsimonious structures of the data. Moreover, SMFH adopts an orthogonal transformation to minimize the quantization loss while deriving the binary codes. The most distinguished property of SMFH is that it is adaptive and parameter-free, that is, SMFH can automatically generate sparse representations and does not require human involvement to tune the regularization parameters for the sparse models. Empirical studies on four publicly available benchmark data sets show that the proposed method can achieve promising performance and is competitive with a variety of state-of-the-art hashing methods.
Collapse
|
13
|
Qiang H, Wan Y, Liu Z, Xiang L, Meng X. Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106188] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
14
|
He S, Wang B, Wang Z, Yang Y, Shen F, Huang Z, Shen HT. Bidirectional Discrete Matrix Factorization Hashing for Image Search. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4157-4168. [PMID: 31603830 DOI: 10.1109/tcyb.2019.2941284] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Unsupervised image hashing has recently gained significant momentum due to the scarcity of reliable supervision knowledge, such as class labels and pairwise relationship. Previous unsupervised methods heavily rely on constructing sufficiently large affinity matrix for exploring the geometric structure of data. Nevertheless, due to lack of adequately preserving the intrinsic information of original visual data, satisfactory performance can hardly be achieved. In this article, we propose a novel approach, called bidirectional discrete matrix factorization hashing (BDMFH), which alternates two mutually promoted processes of 1) learning binary codes from data and 2) recovering data from the binary codes. In particular, we design the inverse factorization model, which enforces the learned binary codes inheriting intrinsic structure from the original visual data. Moreover, we develop an efficient discrete optimization algorithm for the proposed BDMFH. Comprehensive experimental results on three large-scale benchmark datasets show that the proposed BDMFH not only significantly outperforms the state-of-the-arts but also provides the satisfactory computational efficiency.
Collapse
|
15
|
Zhang L, Liu J, Huang F, Yang Y, Zhang D. Deep-Like Hashing-in-Hash for Visual Retrieval: An Embarrassingly Simple Method. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8149-8162. [PMID: 32746246 DOI: 10.1109/tip.2020.3011796] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing hashing methods have yielded significant performance in image and multimedia retrieval, which can be categorized into two groups: shallow hashing and deep hashing. However, there still exist some intrinsic limitations among them. The former generally adopts a one-step strategy to learn the hashing codes for discovering the discriminative binary feature, but the latent discriminative information in the learned hashing codes is not well exploited. The latter, as deep neural network based hashing models, can learn highly discriminative and compact features, but relies on large-scale data and computation resources for numerous network parameters tuning with back-propagation optimization. Straightforward training of deep hashing models from scratch on small-scale data is almost impossible. Therefore, in order to develop efficient but effective learning to hash algorithm that depends only on small-scale data, we propose a novel non-neural network based deep-like learning framework, i.e. multi-level cascaded hashing (MCH) approach with hierarchical learning strategy, for image retrieval. The contributions are threefold. First, a hashing-in-hash architecture is designed in MCH, which inherits the excellent traits of traditional neural networks based deep learning, such that discriminative binary features that are beneficial to image retrieval can be effectively captured. Second, in each level the binary features of all preceding levels and the visual appearance feature are simultaneously cascaded as inputs of all subsequent levels to retrain, which fully exploits the implicated discriminative information. Third, a basic learning to hash (BLH) model with label constraint is proposed for hierarchical learning. Without loss of generality, the existing hashing models can be easily integrated into our MCH framework. We show experimentally on small- and large-scale visual retrieval tasks that our method outperforms several state-of-the-arts.
Collapse
|
16
|
Zhou JT, Zhang H, Jin D, Peng X, Xiao Y, Cao Z. RoSeq: Robust Sequence Labeling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2304-2314. [PMID: 31071057 DOI: 10.1109/tnnls.2019.2911236] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we mainly investigate two issues for sequence labeling, namely, label imbalance and noisy data that are commonly seen in the scenario of named entity recognition (NER) and are largely ignored in the existing works. To address these two issues, a new method termed robust sequence labeling (RoSeq) is proposed. Specifically, to handle the label imbalance issue, we first incorporate label statistics in a novel conditional random field (CRF) loss. In addition, we design an additional loss to reduce the weights of overwhelming easy tokens for augmenting the CRF loss. To address the noisy training data, we adopt an adversarial training strategy to improve model generalization. In experiments, the proposed RoSeq achieves the state-of-the-art performances on CoNLL and English Twitter NER-88.07% on CoNLL-2002 Dutch, 87.33% on CoNLL-2002 Spanish, 52.94% on WNUT-2016 Twitter, and 43.03% on WNUT-2017 Twitter without using the additional data.
Collapse
|
17
|
Zhou T, Zhang C, Gong C, Bhaskar H, Yang J. Multiview Latent Space Learning With Feature Redundancy Minimization. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1655-1668. [PMID: 30571651 DOI: 10.1109/tcyb.2018.2883673] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Multiview learning has received extensive research interest and has demonstrated promising results in recent years. Despite the progress made, there are two significant challenges within multiview learning. First, some of the existing methods directly use original features to reconstruct data points without considering the issue of feature redundancy. Second, existing methods cannot fully exploit the complementary information across multiple views and meanwhile preserve the view-specific properties; therefore, the degraded learning performance will be generated. To address the above issues, we propose a novel multiview latent space learning framework with feature redundancy minimization. We aim to learn a latent space to mitigate the feature redundancy and use the learned representation to reconstruct every original data point. More specifically, we first project the original features from multiple views onto a latent space, and then learn a shared dictionary and view-specific dictionaries to, respectively, exploit the correlations across multiple views as well as preserve the view-specific properties. Furthermore, the Hilbert-Schmidt independence criterion is adopted as a diversity constraint to explore the complementarity of multiview representations, which further ensures the diversity from multiple views and preserves the local structure of the data in each view. Experimental results on six public datasets have demonstrated the effectiveness of our multiview learning approach against other state-of-the-art methods.
Collapse
|
18
|
Jiang S, Mao H, Ding Z, Fu Y. Deep Decision Tree Transfer Boosting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:383-395. [PMID: 30932853 DOI: 10.1109/tnnls.2019.2901273] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Instance transfer approaches consider source and target data together during the training process, and borrow examples from the source domain to augment the training data, when there is limited or no label in the target domain. Among them, boosting-based transfer learning methods (e.g., TrAdaBoost) are most widely used. When dealing with more complex data, we may consider the more complex hypotheses (e.g., a decision tree with deeper layers). However, with the fixed and high complexity of the hypotheses, TrAdaBoost and its variants may face the overfitting problems. Even worse, in the transfer learning scenario, a decision tree with deep layers may overfit different distribution data in the source domain. In this paper, we propose a new instance transfer learning method, i.e., Deep Decision Tree Transfer Boosting (DTrBoost), whose weights are learned and assigned to base learners by minimizing the data-dependent learning bounds across both source and target domains in terms of the Rademacher complexities. This guarantees that we can learn decision trees with deep layers without overfitting. The theorem proof and experimental results indicate the effectiveness of our proposed method.
Collapse
|
19
|
Lan X, Ye M, Zhang S, Zhou H, Yuen PC. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.10.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
20
|
Zhu G, Zhang Z, Wang J, Wu Y, Lu H. Dynamic Collaborative Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3035-3046. [PMID: 32175852 DOI: 10.1109/tnnls.2018.2861838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Correlation filter has been demonstrated remarkable success for visual tracking recently. However, most existing methods often face model drift caused by several factors, such as unlimited boundary effect, heavy occlusion, fast motion, and distracter perturbation. To address the issue, this paper proposes a unified dynamic collaborative tracking framework that can perform more flexible and robust position prediction. Specifically, the framework learns the object appearance model by jointly training the objective function with three components: target regression submodule, distracter suppression submodule, and maximum margin relation submodule. The first submodule mainly takes advantage of the circulant structure of training samples to obtain the distinguishing ability between the target and its surrounding background. The second submodule optimizes the label response of the possible distracting region close to zero for reducing the peak value of the confidence map in the distracting region. Inspired by the structure output support vector machines, the third submodule is introduced to utilize the differences between target appearance representation and distracter appearance representation in the discriminative mapping space for alleviating the disturbance of the most possible hard negative samples. In addition, a CUR filter as an assistant detector is embedded to provide effective object candidates for alleviating the model drift problem. Comprehensive experimental results show that the proposed approach achieves the state-of-the-art performance in several public benchmark data sets.
Collapse
|
21
|
Zhou JT, Fang M, Zhang H, Gong C, Peng X, Cao Z, Goh RSM. Learning With Annotation of Various Degrees. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2794-2804. [PMID: 30640630 DOI: 10.1109/tnnls.2018.2885854] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we study a new problem in the scenario of sequences labeling. To be exact, we consider that the training data are with annotation of various degrees, namely, fully labeled, unlabeled, and partially labeled sequences. The learning with fully un/labeled sequence refers to the standard setting in traditional un/supervised learning, and the proposed partially labeling specifies the subject that the element does not belong to. The partially labeled data are cheaper to obtain compared with the fully labeled data though it is less informative, especially when the tasks require a lot of domain knowledge. To solve such a practical challenge, we propose a novel deep conditional random field (CRF) model which utilizes an end-to-end learning manner to smoothly handle fully/un/partially labeled sequences within a unified framework. To the best of our knowledge, this could be one of the first works to utilize the partially labeled instance for sequence labeling, and the proposed algorithm unifies the deep learning and CRF in an end-to-end framework. Extensive experiments show that our method achieves state-of-the-art performance in two sequence labeling tasks on some popular data sets.
Collapse
|
22
|
Wei P, Ke Y, Goh CK. Feature Analysis of Marginalized Stacked Denoising Autoenconder for Unsupervised Domain Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1321-1334. [PMID: 30281483 DOI: 10.1109/tnnls.2018.2868709] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Marginalized stacked denoising autoencoder (mSDA), has recently emerged with demonstrated effectiveness in domain adaptation. In this paper, we investigate the rationale for why mSDA benefits domain adaptation tasks from the perspective of adaptive regularization. Our investigations focus on two types of feature corruption noise: Gaussian noise (mSDA g ) and Bernoulli dropout noise (mSDA bd ). Both theoretical and empirical results demonstrate that mSDA bd successfully boosts the adaptation performance but mSDA g fails to do so. We then propose a new mSDA with data-dependent multinomial dropout noise (mSDA md ) that overcomes the limitations of mSDA bd and further improves the adaptation performance. Our mSDA md is based on a more realistic assumption: different features are correlated and, thus, should be corrupted with different probabilities. Experimental results demonstrate the superiority of mSDA md to mSDA bd on the adaptation performance and the convergence speed. Finally, we propose a deep transferable feature coding (DTFC) framework for unsupervised domain adaptation. The motivation of DTFC is that mSDA fails to consider the distribution discrepancy across different domains in the feature learning process. We introduce a new element to mSDA: domain divergence minimization by maximum mean discrepancy. This element is essential for domain adaptation as it ensures the extracted deep features to have a small distribution discrepancy. The effectiveness of DTFC is verified by extensive experiments on three benchmark data sets for both Bernoulli dropout noise and multinomial dropout noise.
Collapse
|
23
|
Shen F, Zhou X, Yu J, Yang Y, Liu L, Shen HT. Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3662-3674. [PMID: 30794175 DOI: 10.1109/tip.2019.2899987] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Zero-shot learning aims to classify visual instances from unseen classes in the absence of training examples. This is typically achieved by directly mapping visual features to a semantic embedding space of classes (e.g., attributes or word vectors), where the similarity between the two modalities can be readily measured. However, the semantic space may not be reliable for recognition due to the noisy class embeddings or visual bias problem. In this work, we propose a novel Binary embedding based Zero-Shot Learning (BZSL) method, which recognizes visual instances from unseen classes through an intermediate discriminative Hamming space. Specifically, BZSL jointly learns two binary coding functions to encode both visual instances and class embeddings into the Hamming space, which well alleviates the visual-semantic bias problem. As a desiring property, classifying an unseen instance thereby can be efficiently done by retrieving its nearest-class codes with minimal Hamming distance. During training, by introducing two auxiliary variables for the coding functions, we formulate an equivalent correlation maximization problem, which admits an analytical solution. The resulting algorithm thus enjoys both highly efficient training and scalable novel class inferring. Extensive experiments on four benchmark datasets, including the full ImageNet Fall 2011 dataset with over 20K unseen classes, demonstrate the superiority of our method on the zero-shot learning task. Particularly, we show that increasing the binary embedding dimension can inevitably improve the recognition accuracy.
Collapse
|
24
|
|
25
|
A Reweighted Symmetric Smoothed Function Approximating L0-Norm Regularized Sparse Reconstruction Method. Symmetry (Basel) 2018. [DOI: 10.3390/sym10110583] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Sparse-signal recovery in noisy conditions is a problem that can be solved with current compressive-sensing (CS) technology. Although current algorithms based on L 1 regularization can solve this problem, the L 1 regularization mechanism cannot promote signal sparsity under noisy conditions, resulting in low recovery accuracy. Based on this, we propose a regularized reweighted composite trigonometric smoothed L 0 -norm minimization (RRCTSL0) algorithm in this paper. The main contributions of this paper are as follows: (1) a new smoothed symmetric composite trigonometric (CT) function is proposed to fit the L 0 -norm; (2) a new reweighted function is proposed; and (3) a new L 0 regularization objective function framework is constructed based on the idea of T i k h o n o v regularization. In the new objective function framework, Contributions (1) and (2) are combined as sparsity regularization terms, and errors as deviation terms. Furthermore, the conjugate-gradient (CG) method is used to optimize the objective function, so as to achieve accurate recovery of sparse signal and image under noisy conditions. The numerical experiments on both the simulated and real data verify that the proposed algorithm is superior to other state-of-the-art algorithms, and achieves advanced performance under noisy conditions.
Collapse
|
26
|
Chen J, Mao H, Zhang H, Yi Z. Symmetric low-rank preserving projections for subspace learning. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
27
|
Detecting Building Edges from High Spatial Resolution Remote Sensing Imagery Using Richer Convolution Features Network. REMOTE SENSING 2018. [DOI: 10.3390/rs10091496] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
As the basic feature of building, building edges play an important role in many fields such as urbanization monitoring, city planning, surveying and mapping. Building edges detection from high spatial resolution remote sensing (HSRRS) imagery has always been a long-standing problem. Inspired by the recent success of deep-learning-based edge detection, a building edge detection model using a richer convolutional features (RCF) network is employed in this paper to detect building edges. Firstly, a dataset for building edges detection is constructed by the proposed most peripheral constraint conversion algorithm. Then, based on this dataset the RCF network is retrained. Finally, the edge probability map is obtained by RCF-building model, and this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface. The experimental results suggest that RCF-building model can detect building edges accurately and completely, and that this model has an edge detection F-measure that is at least 5% higher than that of other three typical building extraction methods. In addition, the ablation experiment result proves that using the most peripheral constraint conversion algorithm can generate more superior dataset, and the involved refinement algorithm shows a higher F-measure and better visual effect contrasted with the non-maximal suppression algorithm.
Collapse
|