1
|
Seyedi SA, Tab FA, Lotfi A, Salahian N, Chavoshinejad J. Elastic Adversarial Deep Nonnegative Matrix Factorization for Matrix Completion. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
2
|
Yu Y, Wang W, Shao M, Wu N, Sun Y, Sun Y, Tian Q. Multi-users interaction anomalous subgraph detection for event mining. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
3
|
|
4
|
Michalak M. Theoretical backgrounds of Boolean reasoning-based binary n-clustering. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01708-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
5
|
Yan X, Mao Y, Ye Y, Yu H, Wang FY. Explanation guided cross-modal social image clustering. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
6
|
Liu M, Hu H, Li L, Yu Y, Guan W. Chinese Image Caption Generation via Visual Attention and Topic Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1247-1257. [PMID: 32568717 DOI: 10.1109/tcyb.2020.2997034] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Automatic image captioning is to conduct the cross-modal conversion from image visual content to natural language text. Involving computer vision (CV) and natural language processing (NLP), it has become one of the most sophisticated research issues in the artificial-intelligence area. Based on the deep neural network, the neural image caption (NIC) model has achieved remarkable performance in image captioning, yet there still remain some essential challenges, such as the deviation between descriptive sentences generated by the model and the intrinsic content expressed by the image, the low accuracy of the image scene description, and the monotony of generated sentences. In addition, most of the current datasets and methods for image captioning are in English. However, considering the distinction between Chinese and English in syntax and semantics, it is necessary to develop specialized Chinese image caption generation methods to accommodate the difference. To solve the aforementioned problems, we design the NICVATP2L model via visual attention and topic modeling, in which the visual attention mechanism reduces the deviation and the topic model improves the accuracy and diversity of generated sentences. Specifically, in the encoding phase, convolutional neural network (CNN) and topic model are used to extract visual and topic features of the input images, respectively. In the decoding phase, an attention mechanism is applied to processing image visual features for obtaining image visual region features. Finally, the topic features and the visual region features are combined to guide the two-layer long short-term memory (LSTM) network for generating Chinese image captions. To justify our model, we have conducted experiments over the Chinese AIC-ICC image dataset. The experimental results show that our model can automatically generate more informative and descriptive captions in Chinese in a more natural way, and it outperforms the existing image captioning NIC model.
Collapse
|
7
|
Liu J, Xiao J, Ma H, Li X, Qi Z, Meng X, Meng L. Prompt Learning with Cross-Modal Feature Alignment for Visual Domain Adaptation. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20497-5_34] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
8
|
Abstract
AbstractDealing with relational learning generally relies on tools modeling relational data. An undirected graph can represent these data with vertices depicting entities and edges describing the relationships between the entities. These relationships can be well represented by multiple undirected graphs over the same set of vertices with edges arising from different graphs catching heterogeneous relations. The vertices of those networks are often structured in unknown clusters with varying properties of connectivity. These multiple graphs can be structured as a three-way tensor, where each slice of tensor depicts a graph which is represented by a count data matrix. To extract relevant clusters, we propose an appropriate model-based co-clustering capable of dealing with multiple graphs. The proposed model can be seen as a suitable tensor extension of mixture models of graphs, while the obtained co-clustering can be treated as a consensus clustering of nodes from multiple graphs. Applications on real datasets and comparisons with multi-view clustering and tensor decomposition methods show the interest of our contribution.
Collapse
|
9
|
Hao W, Pang S, Chen Z. Multi-view spectral clustering via common structure maximization of local and global representations. Neural Netw 2021; 143:595-606. [PMID: 34343774 DOI: 10.1016/j.neunet.2021.07.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 06/03/2021] [Accepted: 07/16/2021] [Indexed: 11/16/2022]
Abstract
The essential problem of multi-view spectral clustering is to learn a good common representation by effectively utilizing multi-view information. A popular strategy for improving the quality of the common representation is utilizing global and local information jointly. Most existing methods capture local manifold information by graph regularization. However, once local graphs are constructed, they do not change during the whole optimization process. This may lead to a degenerated common representation in the case of existing unreliable graphs. To address this problem, rather than directly using fixed local representations, we propose a dynamic strategy to construct a common local representation. Then, we impose a fusion term to maximize the common structure of the local and global representations so that they can boost each other in a mutually reinforcing manner. With this fusion term, we integrate local and global representation learning in a unified framework and design an alternative iteration based optimization procedure to solve it. Extensive experiments conducted on a number of benchmark datasets support the superiority of our algorithm over several state-of-the-art methods.
Collapse
Affiliation(s)
- Wenyu Hao
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Shanmin Pang
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Zhikai Chen
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| |
Collapse
|
10
|
An Efficient Blind Image Deblurring Using a Smoothing Function. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING 2021. [DOI: 10.1155/2021/6684345] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This paper introduces an efficient deblurring image method based on a convolution-based and an iterative concept. Our method does not require specific conditions on images, so it can be widely applied for unspecific generic images. The kernel estimation is firstly performed and then will be used to estimate a latent image in each iteration. The final deblurred image is obtained from the convolution of the blurred image with the final estimated kernel. However, image deblurring is an ill-posed problem due to the nonuniqueness of solutions. Therefore, we propose a smoothing function, unlike previous approaches that applied piecewise functions on estimating a latent image. In our approach, we employ L2-regularization on intensity and gradient prior to converging to a solution of the deblurring problem. Moreover, our work is based on the quadratic splitting method. It guarantees that each subproblem has a closed-form solution. Various experiments on synthesized and real-world images confirm that our approach outperforms several existing methods, especially on the images corrupted by noises. Moreover, our method gives more reasonable and more natural deblurred images than those of other methods.
Collapse
|
11
|
Gu M, Zhao Z, Jin W, Hong R, Wu F. Graph-Based Multi-Interaction Network for Video Question Answering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2758-2770. [PMID: 33476268 DOI: 10.1109/tip.2021.3051756] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Video question answering is an important task combining both Natural Language Processing and Computer Vision, which requires a machine to obtain a thorough understanding of the video. Most existing approaches simply capture spatio-temporal information in videos by using a combination of recurrent and convolutional neural networks. Nonetheless, most previous work focus on only salient frames or regions, which normally lacks some significant details, such as potential location and action relations. In this paper, we propose a new method called Graph-based Multi-interaction Network for video question answering. In our model, a new attention mechanism named multi-interaction is designed to capture both element-wise and segment-wise sequence interactions simultaneously, which can be found between and inside the multi-modal inputs. Moreover, we propose a graph-based relation-aware neural network to explore a more fine-grained visual representation, which could explore the relationships and dependencies between objects spatially and temporally. We evaluate our method on TGIF-QA and other two video QA datasets. The qualitative and quantitative experimental results show the effectiveness of our model, which achieves state-of-the-art performance.
Collapse
|
12
|
Biswal BS, Patra S, Mohapatra A, Vipsita S. TriRNSC: triclustering of gene expression microarray data using restricted neighbourhood search. IET Syst Biol 2020; 14:323-333. [PMID: 33399096 PMCID: PMC8687346 DOI: 10.1049/iet-syb.2020.0024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 04/26/2020] [Accepted: 06/22/2020] [Indexed: 11/20/2022] Open
Abstract
Computational analysis of microarray data is crucial for understanding the gene behaviours and deriving meaningful results. Clustering and biclustering of gene expression microarray data in the unsupervised domain are extremely important as their outcomes directly dominate healthcare research in many aspects. However, these approaches fail when the time factor is added as the third dimension to the microarray datasets. This three-dimensional data set can be analysed using triclustering that discovers similar gene sets that pursue identical behaviour under a subset of conditions at a specific time point. A novel triclustering algorithm (TriRNSC) is proposed in this manuscript to discover meaningful triclusters in gene expression profiles. TriRNSC is based on restricted neighbourhood search clustering (RNSC), a popular graph-based clustering approach considering the genes, the experimental conditions and the time points at an instance. The performance of the proposed algorithm is evaluated in terms of volume and some performance measures. Gene Ontology and KEGG pathway analysis are used to validate the TriRNSC results biologically. The efficiency of TriRNSC indicates its capability and reliability and also demonstrates its usability over other state-of-art schemes. The proposed framework initiates the application of the RNSC algorithm in the triclustering of gene expression profiles.
Collapse
Affiliation(s)
- Bhawani Sankar Biswal
- DST-FIST Bioinformatics Lab, Department of Computer Science and Engineering, International Institute of Information Technology (IIIT), Bhubaneswar, India.
| | - Sabyasachi Patra
- DST-FIST Bioinformatics Lab, Department of Computer Science and Engineering, International Institute of Information Technology (IIIT), Bhubaneswar, India
| | - Anjali Mohapatra
- DST-FIST Bioinformatics Lab, Department of Computer Science and Engineering, International Institute of Information Technology (IIIT), Bhubaneswar, India
| | - Swati Vipsita
- DST-FIST Bioinformatics Lab, Department of Computer Science and Engineering, International Institute of Information Technology (IIIT), Bhubaneswar, India
| |
Collapse
|
13
|
|
14
|
He X, Tang J, Du X, Hong R, Ren T, Chua TS. Fast Matrix Factorization With Nonuniform Weights on Missing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2791-2804. [PMID: 30676983 DOI: 10.1109/tnnls.2018.2890117] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high dimensional but sparse. This poses an imbalanced learning problem since the scale of missing entries is usually much larger than that of the observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this paper, we weight the missing data nonuniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix rather than the matrix size. The key idea is twofold: 1) we apply truncated singular value decomposition on the weight matrix to get a more compact representation of the weights and 2) we learn MF parameters with elementwise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.
Collapse
|
15
|
Liu Q, Li Z, Tang J. Discriminative supplementary representation learning for novel-category classification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.03.100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Li Z, Tang J, Zhang L, Yang J. Weakly-supervised Semantic Guided Hashing for Social Image Retrieval. Int J Comput Vis 2020. [DOI: 10.1007/s11263-020-01331-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
Yang Z, Li Q, Liu W, Lv J. Shared Multi-View Data Representation for Multi-Domain Event Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1243-1256. [PMID: 30668464 DOI: 10.1109/tpami.2019.2893953] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Internet platforms provide new ways for people to share experiences, generating massive amounts of data related to various real-world concepts. In this paper, we present an event detection framework to discover real-world events from multiple data domains, including online news media and social media. As multi-domain data possess multiple data views that are heterogeneous, initial dictionaries consisting of labeled data samples are exploited to align the multi-view data. Furthermore, a shared multi-view data representation (SMDR) model is devised, which learns underlying and intrinsic structures shared among the data views by considering the structures underlying the data, data variations, and informativeness of dictionaries. SMDR incorpvarious constraints in the objective function, including shared representation, low-rank, local invariance, reconstruction error, and dictionary independence constraints. Given the data representations achieved by SMDR, class-wise residual models are designed to discover the events underlying the data based on the reconstruction residuals. Extensive experiments conducted on two real-world event detection datasets, i.e., Multi-domain and Multi-modality Event Detection dataset, and MediaEval Social Event Detection 2014 dataset, indicating the effectiveness of the proposed approaches.
Collapse
|
18
|
Hao M, Cao WH, Liu ZT, Wu M, Xiao P. Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.048] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
19
|
Wang X, Gao S. Image encryption algorithm for synchronously updating Boolean networks based on matrix semi-tensor product theory. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.08.041] [Citation(s) in RCA: 232] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
20
|
Li Z, Tang J, Mei T. Deep Collaborative Embedding for Social Image Understanding. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:2070-2083. [PMID: 29994391 DOI: 10.1109/tpami.2018.2852750] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this work, we investigate the problem of learning knowledge from the massive community-contributed images with rich weakly-supervised context information, which can benefit multiple image understanding tasks simultaneously, such as social image tag refinement and assignment, content-based image retrieval, tag-based image retrieval and tag expansion. Towards this end, we propose a Deep Collaborative Embedding (DCE) model to uncover a unified latent space for images and tags. The proposed method incorporates the end-to-end learning and collaborative factor analysis in one unified framework for the optimal compatibility of representation learning and latent space discovery. A nonnegative and discrete refined tagging matrix is learned to guide the end-to-end learning. To collaboratively explore the rich context information of social images, the proposed method integrates the weakly-supervised image-tag correlation, image correlation and tag correlation simultaneously and seamlessly. The proposed model is also extended to embed new tags in the uncovered space. To verify the effectiveness of the proposed method, extensive experiments are conducted on two widely-used social image benchmarks for multiple social image understanding tasks. The encouraging performance of the proposed method over the state-of-the-art approaches demonstrates its superiority.
Collapse
|
21
|
Tang J, Shu X, Li Z, Jiang YG, Tian Q. Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:2027-2034. [PMID: 30908192 DOI: 10.1109/tpami.2019.2906603] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Image retagging aims to improve the tag quality of social images by completing the missing tags, rectifying the noise-corrupted tags, and assigning new high-quality tags. Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by mining the tag-image-user associations. However, such methods will become computationally infeasible with the rapidly increasing number of images, tags and users. It has been proven that the anchor graph can significantly accelerate large-scale graph-based learning by exploring only a small number of anchor points. Inspired by this, we propose a novel Social anchor-Unit GrAph Regularized Tensor Completion (SUGAR-TC) method to efficiently refine the tags of social images, which is insensitive to the scale of data. First, we construct an anchor-unit graph across multiple domains (e.g., image and user domains) rather than traditional anchor graph in a single domain. Second, a tensor completion based on Social anchor-Unit GrAph Regularization (SUGAR) is implemented to refine the tags of the anchor images. Finally, we efficiently assign tags to non-anchor images by leveraging the relationship between the non-anchor units and the anchor units. Experimental results on a real-world social image database well demonstrate the effectiveness and efficiency of SUGAR-TC, outperforming the state-of-the-art methods.
Collapse
|
22
|
|
23
|
Zhang J, Wang Z, Mu Y, Wang Z. Image region label refinement using spatial position relation graph. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
24
|
|
25
|
Pan J, Sun D, Pfister H, Yang MH. Deblurring Images via Dark Channel Prior. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:2315-2328. [PMID: 28952935 DOI: 10.1109/tpami.2017.2753804] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We present an effective blind image deblurring algorithm based on the dark channel prior. The motivation of this work is an interesting observation that the dark channel of blurred images is less sparse. While most patches in a clean image contain some dark pixels, this is not the case when they are averaged with neighboring ones by motion blur. This change in sparsity of the dark channel pixels is an inherent property of the motion blur process, which we prove mathematically and validate using image data. Enforcing sparsity of the dark channel thus helps blind deblurring in various scenarios such as natural, face, text, and low-illumination images. However, imposing sparsity of the dark channel introduces a non-convex non-linear optimization problem. In this work, we introduce a linear approximation to address this issue. Extensive experiments demonstrate that the proposed deblurring algorithm achieves the state-of-the-art results on natural images and performs favorably against methods designed for specific scenarios. In addition, we show that the proposed method can be applied to image dehazing.
Collapse
|
26
|
Li Z, Zhang J, Zhang K, Li Z. Visual Tracking With Weighted Adaptive Local Sparse Appearance Model via Spatio-Temporal Context Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4478-4489. [PMID: 29897873 DOI: 10.1109/tip.2018.2839916] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sparse representation has been widely exploited to develop an effective appearance model for object tracking due to its well discriminative capability in distinguishing the target from its surrounding background. However, most of these methods only consider either the holistic representation or the local one for each patch with equal importance, and hence may fail when the target suffers from severe occlusion or large-scale pose variation. In this paper, we propose a simple yet effective approach that exploits rich feature information from reliable patches based on weighted local sparse representation that takes into account the importance of each patch. Specifically, we design a reconstruction-error based weight function with the reconstruction error of each patch via sparse coding to measure the patch reliability. Moreover, we explore spatio-temporal context information to enhance the robustness of the appearance model, in which the global temporal context is learned via incremental subspace and sparse representation learning with a novel dynamic template update strategy to update the dictionary, while the local spatial context considers the correlation between the target and its surrounding background via measuring the similarity among their sparse coefficients. Extensive experimental evaluations on two large tracking benchmarks demonstrate favorable performance of the proposed method over some state-of-the-art trackers.
Collapse
|
27
|
Li Z, Tang J, He X. Robust Structured Nonnegative Matrix Factorization for Image Representation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1947-1960. [PMID: 28436905 DOI: 10.1109/tnnls.2017.2691725] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Dimensionality reduction has attracted increasing attention, because high-dimensional data have arisen naturally in numerous domains in recent years. As one popular dimensionality reduction method, nonnegative matrix factorization (NMF), whose goal is to learn parts-based representations, has been widely studied and applied to various applications. In contrast to the previous approaches, this paper proposes a novel semisupervised NMF learning framework, called robust structured NMF, that learns a robust discriminative representation by leveraging the block-diagonal structure and the -norm (especially when ) loss function. Specifically, the problems of noise and outliers are well addressed by the -norm ( ) loss function, while the discriminative representations of both the labeled and unlabeled data are simultaneously learned by explicitly exploring the block-diagonal structure. The proposed problem is formulated as an optimization problem with a well-defined objective function solved by the proposed iterative algorithm. The convergence of the proposed optimization algorithm is analyzed both theoretically and empirically. In addition, we also discuss the relationships between the proposed method and some previous methods. Extensive experiments on both the synthetic and real-world data sets are conducted, and the experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods.
Collapse
|
28
|
A dual-kernel spectral-spatial classification approach for hyperspectral images based on Mahalanobis distance metric learning. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.11.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
29
|
|
30
|
Huang TS. LEGO-MM: LEarning Structured Model by Probabilistic loGic Ontology Tree for MultiMedia. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:196-207. [PMID: 28113970 DOI: 10.1109/tip.2016.2612825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Recent advances in multimedia ontology have resulted in a number of concept models, e.g., large-scale concept for multimedia and Mediamill 101, which are accessible and public to other researchers. However, most current research effort still focuses on building new concepts from scratch, very few work explores the appropriate method to construct new concepts upon the existing models already in the warehouse. To address this issue, we propose a new framework in this paper, termed LEarning Structured Model by Probabilistic loGic Ontology Tree for MultiM edia (LEGO 1 -MM), which can seamlessly integrate both the new target training examples and the existing primitive concept models to infer the more complex concept models. LEGO-MM treats the primitive concept models as the lego toy to potentially construct an unlimited vocabulary of new concepts. Specifically, we first formulate the logic operations to be the lego connectors to combine the existing concept models hierarchically in probabilistic logic ontology trees. Then, we incorporate new target training information simultaneously to efficiently disambiguate the underlying logic tree and correct the error propagation. Extensive experiments are conducted on a large vehicle domain data set from ImageNet. The results demonstrate that LEGO-MM has significantly superior performance over the existing state-of-the-art methods, which build new concept models from scratch.
Collapse
|