1
|
Li R, Wang Y, Wang L, Lu H, Wei X, Zhang Q. From Pixels to Semantics: Self-Supervised Video Object Segmentation With Multiperspective Feature Mining. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5801-5812. [PMID: 36054396 DOI: 10.1109/tip.2022.3201603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Existing self-supervised methods pose one-shot video object segmentation (O-VOS) as pixel-level matching to enable segmentation mask propagation across frames. However, the two tasks are not fully equivalent since O-VOS is more reliant on semantic correspondence rather than accurate pixel matching. To remedy this issue, we explore a new self-supervised framework that integrates pixel-level correspondence learning with semantic-level adaptation. The pixel-level correspondence learning is performed through photometric reconstruction of adjacent RGB frames during offline training, while semantic-level adaption operates at test-time by enforcing a bi-directional agreement of the predicted segmentation masks. In addition, we further propose a new network architecture with multi-perspective feature mining mechanism which can not only enhance reliable features but also suppress noisy ones to facilitate more robust image matching. By training the network using the proposed self-supervised framework, we achieve state-of-the-art performance on widely adopted datasets, further closing up the gap between self-supervised learning methods and their fully supervised counterparts.
Collapse
|
2
|
Zhang N, Han J, Liu N. Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4556-4570. [PMID: 35763477 DOI: 10.1109/tip.2022.3185550] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
RGB-D co-salient object detection aims to segment co-occurring salient objects when given a group of relevant images and depth maps. Previous methods often adopt separate pipeline and use hand-crafted features, being hard to capture the patterns of co-occurring salient objects and leading to unsatisfactory results. Using end-to-end CNN models is a straightforward idea, but they are less effective in exploiting global cues due to the intrinsic limitation. Thus, in this paper, we alternatively propose an end-to-end transformer-based model which uses class tokens to explicitly capture implicit class knowledge to perform RGB-D co-salient object detection, denoted as CTNet. Specifically, we first design adaptive class tokens for individual images to explore intra-saliency cues and then develop common class tokens for the whole group to explore inter-saliency cues. Besides, we also leverage the complementary cues between RGB images and depth maps to promote the learning of the above two types of class tokens. In addition, to promote model evaluation, we construct a challenging and large-scale benchmark dataset, named RGBD CoSal1k, which collects 106 groups containing 1000 pairs of RGB-D images with complex scenarios and diverse appearances. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
3
|
Li C, Xuan S, Liu F, Chang E, Wu H. Global attention network for collaborative saliency detection. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01531-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
4
|
Predicting atypical visual saliency for autism spectrum disorder via scale-adaptive inception module and discriminative region enhancement loss. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.125] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
5
|
Chen J, Chen Y, Li W, Ning G, Tong M, Hilton A. Channel and spatial attention based deep object co-segmentation. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106550] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
6
|
Wang F, Xu Z, Gan Y, Vong CM, Liu Q. SCNet: Scale-aware coupling-structure network for efficient video object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
7
|
Wang W, Shen J, Dong X, Borji A, Yang R. Inferring Salient Objects from Human Fixations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1913-1927. [PMID: 30892201 DOI: 10.1109/tpami.2019.2905607] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Previous research in visual saliency has been focused on two major types of models namely fixation prediction and salient object detection. The relationship between the two, however, has been less explored. In this work, we propose to employ the former model type to identify salient objects. We build a novel Attentive Saliency Network (ASNet)1 1.Available at: https://github.com/wenguanwang/ASNet. that learns to detect salient objects from fixations. The fixation map, derived at the upper network layers, mimics human visual attention mechanisms and captures a high-level understanding of the scene from a global view. Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner. ASNet is based on a hierarchy of convLSTMs that offers an efficient recurrent mechanism to sequentially refine the saliency features over multiple steps. Several loss functions, derived from existing saliency evaluation metrics, are incorporated to further boost the performance. Extensive experiments on several challenging datasets show that our ASNet outperforms existing methods and is capable of generating accurate segmentation maps with the help of the computed fixation prior. Our work offers a deeper insight into the mechanisms of attention and narrows the gap between salient object detection and fixation prediction.
Collapse
|
8
|
|
9
|
|
10
|
Xiao Y, Jiang B, Zheng A, Zhou A, Hussain A, Tang J. Saliency detection via multi-view graph based saliency optimization. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.066] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
11
|
Wang J, Sheng B, Li P, Jin Y, Feng DD. Illumination-Guided Video Composition via Gradient Consistency Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5077-5090. [PMID: 31107653 DOI: 10.1109/tip.2019.2916769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Video composition aims at cloning a patch from the source video into the target scene to create a seamless and harmonious blending frame sequence. Previous work in video composition usually suffer from artifacts around the blending region and spatial-temporal consistency when illumination intensity varies in the input source and target video. We propose an illumination-guided video composition method via a unified spatial and temporal optimization framework. Our method can produce globally consistent composition results and maintain the temporal coherency. We first compute a spatial-temporal blending boundary iteratively. For each frame, the gradient field of the target and source frames are mixed adaptively based on gradients and inter-frame color difference. The temporal consistency is further obtained by optimizing luminance gradients throughout all the composition frames. Moreover, we extend the mean-value cloning by smoothing discrepancies between the source and target frames, then eliminate the color distribution overflow exponentially to reduce falsely blending pixels. Various experiments have shown the effectiveness and high-quality performance of our illumination-guided composition.
Collapse
|
12
|
Tao Z, Liu H, Fu H, Fua Y. Multi-View Saliency-Guided Clustering for Image Cosegmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4634-4645. [PMID: 31071036 DOI: 10.1109/tip.2019.2913555] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Image cosegmentation aims at extracting the common objects from multiple images simultaneously. Existing methods mainly solve cosegmentation via the pre-defined graph, which lacks flexibility and robustness to handle various visual patterns. Besides, similar backgrounds also confuse the identifying of the common foreground. To address these issues, we propose a novel Multi-view Saliency-Guided Clustering algorithm (MvSGC) for the image cosegmentation task. In our model, the unsupervised saliency prior is used as partition-level side information to guide the foreground clustering process. To achieve robustness to noises and missing observations, similarities on instance-level and partition-level are both considered. Specifically, a unified clustering model with cosine similarity is proposed to capture the intrinsic structure of data and keep partition result consistent with the side information. Moreover, we leverage multi-view weight learning to integrate multiple feature representations to further improve the robustness of our approach. A K-means-like optimization algorithm is developed to proceed the constrained clustering in a highly efficient way with theoretical support. Experimental results on three benchmark datasets (i.e., the iCoseg, MSRC and Internet image dataset) and one RGB-D image dataset demonstrate the superiority of applying our clustering method for image cosegmentation.
Collapse
|
13
|
Liu Y, Shen J, Wang W, Sun H, Shao L. Better Dense Trajectories by Motion in Videos. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:159-170. [PMID: 29990074 DOI: 10.1109/tcyb.2017.2769097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Currently, the most widely used point trajectories generation methods estimate the trajectories from the dense optical flow, by using a consistency check strategy to detect the occluded regions. However, these methods will miss some important trajectories, thus resulting in breaking smooth areas without any structure especially around the motion boundaries (MBs). We suggest exploring MBs in video to generate more accurate dense point trajectories. Estimating MBs from the video improves the point trajectory accuracy of the discontinuity or occluded areas. Then, we obtain trajectories by tracking the initial feature points through all frames. The experimental results demonstrate that our method outperforms the state-of-the-art methods on the challenging benchmark.
Collapse
|
14
|
|
15
|
Kamranian Z, Naghsh Nilchi AR, Monadjemi A, Navab N. Iterative algorithm for interactive co-segmentation using semantic information propagation. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1221-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Guo F, Wang W, Shen J, Shao L, Yang J, Tao D, Tang YY. Video Saliency Detection Using Object Proposals. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3159-3170. [PMID: 29990032 DOI: 10.1109/tcyb.2017.2761361] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we introduce a novel approach to identify salient object regions in videos via object proposals. The core idea is to solve the saliency detection problem by ranking and selecting the salient proposals based on object-level saliency cues. Object proposals offer a more complete and high-level representation, which naturally caters to the needs of salient object detection. As well as introducing this novel solution for video salient object detection, we reorganize various discriminative saliency cues and traditional saliency assumptions on object proposals. With object candidates, a proposal ranking and voting scheme, based on various object-level saliency cues, is designed to screen out nonsalient parts, select salient object regions, and to infer an initial saliency estimate. Then a saliency optimization process that considers temporal consistency and appearance differences between salient and nonsalient regions is used to refine the initial saliency estimates. Our experiments on public datasets (SegTrackV2, Freiburg-Berkeley Motion Segmentation Dataset, and Densely Annotated Video Segmentation) validate the effectiveness, and the proposed method produces significant improvements over state-of-the-art algorithms.
Collapse
|
17
|
|
18
|
Qiu W, Gao X, Han B. Eye Fixation Assisted Video Saliency Detection via Total Variation-based Pairwise Interaction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4724-4739. [PMID: 29993549 DOI: 10.1109/tip.2018.2843680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
As human visual attention is naturally biased towards foreground objects in a scene, it can be used to extract salient objects in video clips. In this work, we proposed a weakly supervised learning based video saliency detection algorithm utilizing eye fixations information from multiple subjects. Our main idea is to extend eye fixations to saliency regions step by step. First, visual seeds are collected using multiple color space geodesic distance based seed region mapping with filtered and extended eye fixations. This operation helps raw fixation points spread to the most likely salient regions, namely, visual seed regions. Second, in order to seize the essential scene structure from video sequences, we introduce the total variance based pairwise interaction model to learn the potential pairwise relationship between foreground and background within a frame or across video frames. In this vein, visual seed regions eventually grow into salient regions. Compared with previous approaches the generated saliency maps has two most outstanding properties: integrity and purity, which are conductive to segment the foreground and significant to the follow-up tasks. Extensive quantitative and qualitative experiments on various video sequences demonstrate that the proposed method outperforms state-of-theart image and video saliency detection algorithms.
Collapse
|
19
|
Sá Junior JJDM, Backes AR, Bruno OM. Randomized neural network based descriptors for shape classification. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.05.099] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
20
|
Ma B, Hu H, Shen J, Zhang Y, Shao L, Porikli F. Robust Object Tracking by Nonlinear Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4769-4781. [PMID: 29990266 DOI: 10.1109/tnnls.2017.2776124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose a method that obtains a discriminative visual dictionary and a nonlinear classifier for visual tracking tasks in a sparse coding manner based on the globally linear approximation for a nonlinear learning theory. Traditional discriminative tracking methods based on sparse representation learn a dictionary in an unsupervised way and then train a classifier, which may not generate both descriptive and discriminative models for targets by treating dictionary learning and classifier learning separately. In contrast, the proposed tracking approach can construct a dictionary that fully reflects the intrinsic manifold structure of visual data and introduces more discriminative ability in a unified learning framework. Finally, an iterative optimization approach, which computes the optimal dictionary, the associated sparse coding, and a classifier, is introduced. Experiments on two benchmarks show that our tracker achieves a better performance compared with some popular tracking algorithms.
Collapse
|
21
|
|
22
|
Ren Y, Jiao L, Yang S, Wang S. Mutual Learning Between Saliency and Similarity: Image Cosegmentation via Tree Structured Sparsity and Tree Graph Matching. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4690-4704. [PMID: 29993547 DOI: 10.1109/tip.2018.2842207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper proposes a unified mutual learning framework based on image hierarchies, which integrates structured sparsity with tree-graph matching to conquer the problem of weakly supervised image cosegmentation. We focus on the interaction between two common-object properties: saliency and similarity. Most existing cosegmentation methods only pay emphasis on either of them. The proposed method realizes the learning of the prior knowledge for structured sparsity with the help of treegraph matching, which is capable of generating object-oriented salient regions. Meanwhile, it also reduces the searching space and computational complexity of tree-graph matching with the attendance of structured sparsity. We intend to thoughtfully exploit the hierarchically geometrical relationships of coherent objects. Experimental results compared with the state-of-thearts on benchmark datasets confirm that the mutual learning framework are capable of effectively delineating co-existing object patterns in multiple images.
Collapse
|
23
|
Liu C, Wang W, Shen J, Shao L. Stereo Video Object Segmentation Using Stereoscopic Foreground Trajectories. IEEE TRANSACTIONS ON CYBERNETICS 2018; 49:3665-3676. [PMID: 29994416 DOI: 10.1109/tcyb.2018.2846361] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present an unsupervised segmentation framework for stereo videos using stereoscopic trajectories. The proposed stereo trajectory shows favorable properties for modeling the long-term motion information through the whole sequence and explicitly capturing the corresponding relationships between two stereo views. The stereo prior is important for inferring the desired object and guarantees the consistent spatial-temporal segmentation, which contributes to an enjoyable stereo experience. We start by deriving stereo trajectories from left and right views simultaneously, which are represented via a graph structure. Then we detect object-like stereo trajectories via the graph structure to efficiently infer the desired object. Finally, an energy optimization function is proposed to produce the stereo segmentation results via leveraging the object information from stereo trajectories. To benefit potential research, we collected a new stereoscopic video benchmark, which consists of a total of 50 stereo video clips and includes many challenges in segmentation. Extensive experimental results demonstrate that our stereo segmentation method achieves higher performance and preserves better stereo structures, compared with prevailing competitors. The source code and results are available at: https://github.com/shenjianbing/StereoSeg.
Collapse
|
24
|
|
25
|
|
26
|
Li X, Zhao B, Lu X, Lu X, Li X, Zhao B. Key Frame Extraction in the Summary Space. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1923-1934. [PMID: 28693004 DOI: 10.1109/tcyb.2017.2718579] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.
Collapse
|
27
|
Hu H, Ma B, Shen J, Shao L. Manifold Regularized Correlation Object Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1786-1795. [PMID: 28422697 DOI: 10.1109/tnnls.2017.2688448] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we propose a manifold regularized correlation tracking method with augmented samples. To make better use of the unlabeled data and the manifold structure of the sample space, a manifold regularization-based correlation filter is introduced, which aims to assign similar labels to neighbor samples. Meanwhile, the regression model is learned by exploiting the block-circulant structure of matrices resulting from the augmented translated samples over multiple base samples cropped from both target and nontarget regions. Thus, the final classifier in our method is trained with positive, negative, and unlabeled base samples, which is a semisupervised learning framework. A block optimization strategy is further introduced to learn a manifold regularization-based correlation filter for efficient online tracking. Experiments on two public tracking data sets demonstrate the superior performance of our tracker compared with the state-of-the-art tracking approaches.
Collapse
|
28
|
Yang Z, Lian J, Li S, Guo Y, Qi Y, Ma Y. Heterogeneous SPCNN and its application in image segmentation. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.01.044] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
29
|
Li X, Liu L, Lu X. Person Reidentification Based on Elastic Projections. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1314-1327. [PMID: 28422688 DOI: 10.1109/tnnls.2016.2602855] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Person reidentification usually refers to matching people in different camera views in nonoverlapping multicamera networks. Many existing methods learn a similarity measure by projecting the raw feature to a latent subspace to make the same target's distance smaller than different targets' distances. However, the same targets captured in different camera views should hold the same intrinsic attributes while different targets should hold different intrinsic attributes. Projecting all the data to the same subspace would cause loss of such an information and comparably poor discriminability. To address this problem, in this paper, a method based on elastic projections is proposed to learn a pairwise similarity measure for person reidentification. The proposed model learns two projections, positive projection and negative projection, which are both representative and discriminative. The representability refers to: for the same targets captured in two camera views, the positive projection can bridge the corresponding appearance variation and represent the intrinsic attributes of the same targets, while for the different targets captured in two camera views, the negative projection can explore and utilize the different attributes of different targets. The discriminability means that the intraclass distance should become smaller than its original distance after projection, while the interclass distance becomes larger on the contrary, which is the elastic property of the proposed model. In this case, prior information of the original data space is used to give guidance for the learning phase; more importantly, similar targets (but not the same) are effectively reduced by forcing the same targets to become more similar and different targets to become more distinct. The proposed model is evaluated on three benchmark data sets, including VIPeR, GRID, and CUHK, and achieves better performance than other methods.
Collapse
|
30
|
Yuen PC, Chellappa R. Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2022-2037. [PMID: 29989985 DOI: 10.1109/tip.2017.2777183] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
Collapse
|
31
|
Wang W, Shen J, Porikli F, Yang R. Semi-Supervised Video Object Segmentation with Super-Trajectories. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 41:985-998. [PMID: 29993770 DOI: 10.1109/tpami.2018.2819173] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We introduce a semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory". A super-trajectory corresponds to a group of compact point trajectories that exhibit consistent motion patterns, similar appearances, and close spatiotemporal relationships. We generate the compact trajectories using a probabilistic model, which enables handling of occlusions and drifts effectively. To reliably group point trajectories, we adopt a modified version of the density peaks based clustering algorithm that allows capturing rich spatiotemporal relations among trajectories in the clustering process. We incorporate two intuitive mechanisms for segmentation, called as reverse-tracking and object re-occurrence, for robustness and boosting the performance. Building on the proposed video representation, our segmentation method is discriminative enough to accurately propagate the initial annotations in the first frame onto the remaining frames. Our extensive experimental analyses on three challenging benchmarks demonstrate that, given the annotation in the first frame, our method is capable of extracting the target objects from complex backgrounds, and even reidentifying them after prolonged occlusions, producing high-quality video object segments.
Collapse
|
32
|
Gong YJ, Zhou Y. Differential Evolutionary Superpixel Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1390-1404. [PMID: 29990063 DOI: 10.1109/tip.2017.2778569] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Superpixel segmentation has been of increasing importance in many computer vision applications recently. To handle the problem, most state-of-the-art algorithms either adopt a local color variance model or a local optimization algorithm. This paper develops a new approach, named differential evolutionary superpixels, which is able to optimize the global properties of segmentation by means of a global optimizer. We design a comprehensive objective function aggregating within-superpixel error, boundary gradient, and a regularization term. Minimizing the within-superpixel error enforces the homogeneity of superpixels. In addition, the introduction of boundary gradient drives the superpixel boundaries to capture the natural image boundaries, so as to make each superpixel overlaps with a single object. The regularizer further encourages producing similarly sized superpixels that are friendly to human vision. The optimization is then accomplished by a powerful global optimizer-differential evolution. The algorithm constantly evolves the superpixels by mimicking the process of natural evolution, while using a linear complexity to the image size. Experimental results and comparisons with eleven state-of-the-art peer algorithms verify the promising performance of our algorithm.
Collapse
|
33
|
Zhang D, Fu H, Han J, Borji A, Li X. A Review of Co-Saliency Detection Algorithms. ACM T INTEL SYST TEC 2018. [DOI: 10.1145/3158674] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Co-saliency detection is a newly emerging and rapidly growing research area in the computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and it can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this article, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field and to give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.
Collapse
Affiliation(s)
| | - Huazhu Fu
- Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore
| | - Junwei Han
- Northwestern Polytechnical University, Xi'an, China
| | - Ali Borji
- University of Central Florida, Orlando, USA
| | - Xuelong Li
- Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
| |
Collapse
|
34
|
Shen J, Peng J, Shao L. Submodular Trajectories for Better Motion Segmentation in Videos. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2688-2700. [PMID: 29994180 DOI: 10.1109/tip.2018.2795740] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose a new trajectory clustering method using submodular optimization for better motion segmentation in videos. A small number of representative trajectories are first selected by submodular maximization automatically. Then all the initial trajectories can be segmented into fragments with the representative trajectories as centers of fragments. At last, fragments are merged into clusters by a two-stage bottom-up clustering method, and each cluster shows the motion of one moving object. The submodular energy function integrates the quality of all trajectories and their correlations. As a result, thousands of initial trajectories are replaced by only dozens of representative trajectories, which will reduce the negative influence of inaccurate initial trajectories on motion segmentation. The representative trajectories will have larger weights while extracting color or texture information of each moving entity at the step of motion segmentation. Experimental results demonstrate that our method can divide trajectories into more accurate clusters. The final motion segmentation results also illustrate that our method outperforms state-of-the-art motion segmentation methods based on trajectory clustering.
Collapse
|
35
|
Wang W, Shen J, Shao L. Video Salient Object Detection via Fully Convolutional Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:38-49. [PMID: 28945593 DOI: 10.1109/tip.2017.2754941] [Citation(s) in RCA: 178] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).
Collapse
Affiliation(s)
- Wenguan Wang
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China
| | - Jianbing Shen
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China
| | - Ling Shao
- School of Computing Sciences, University of East Anglia, Norwich, U.K
| |
Collapse
|
36
|
Cheng G, Zhou P, Han J. Duplex Metric Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:281-292. [PMID: 28991740 DOI: 10.1109/tip.2017.2760512] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Image set classification has attracted much attention because of its broad applications. Despite the success made so far, the problems of intra-class diversity and inter-class similarity still remain two major challenges. To explore a possible solution to these challenges, this paper proposes a novel approach, termed duplex metric learning (DML), for image set classification. The proposed DML consists of two progressive metric learning stages with different objectives used for feature learning and image classification, respectively. The metric learning regularization is not only used to learn powerful feature representations but also well explored to train an effective classifier. At the first stage, we first train a discriminative stacked autoencoder (DSAE) by layer-wisely imposing a metric learning regularization term on the neurons in the hidden layers and meanwhile minimizing the reconstruction error to obtain new feature mappings in which similar samples are mapped closely to each other and dissimilar samples are mapped farther apart. At the second stage, we discriminatively train a classifier and simultaneously fine-tune the DSAE by optimizing a new objective function, which consists of a classification error term and a metric learning regularization term. Finally, two simple voting strategies are devised for image set classification based on the learnt classifier. In the experiments, we extensively evaluate the proposed framework for the tasks of face recognition, object recognition, and face verification on several commonly-used data sets and state-of-the-art results are achieved in comparison with existing methods.
Collapse
|
37
|
Wang W, Shen J, Yang R, Porikli F. Saliency-Aware Video Object Segmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:20-33. [PMID: 28166489 DOI: 10.1109/tpami.2017.2662005] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Video saliency, aiming for estimation of a single dominant object in a sequence, offers strong object-level cues for unsupervised video object segmentation. In this paper, we present a geodesic distance based technique that provides reliable and temporally consistent saliency measurement of superpixels as a prior for pixel-wise labeling. Using undirected intra-frame and inter-frame graphs constructed from spatiotemporal edges or appearance and motion, and a skeleton abstraction step to further enhance saliency estimates, our method formulates the pixel-wise segmentation task as an energy minimization problem on a function that consists of unary terms of global foreground and background models, dynamic location models, and pairwise terms of label smoothness potentials. We perform extensive quantitative and qualitative experiments on benchmark datasets. Our method achieves superior performance in comparison to the current state-of-the-art in terms of accuracy and speed.
Collapse
|
38
|
Chen S, Fang X, Shen J, Wang L, Shao L. Single-Image Distance Measurement by a Smart Mobile Device. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4451-4462. [PMID: 27705877 DOI: 10.1109/tcyb.2016.2611599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Existing distance measurement methods either require multiple images and special photographing poses or only measure the height with a special view configuration. We propose a novel image-based method that can measure various types of distance from single image captured by a smart mobile device. The embedded accelerometer is used to determine the view orientation of the device. Consequently, pixels can be back-projected to the ground, thanks to the efficient calibration method using two known distances. Then the distance in pixel is transformed to a real distance in centimeter with a linear model parameterized by the magnification ratio. Various types of distance specified in the image can be computed accordingly. Experimental results demonstrate the effectiveness of the proposed method.
Collapse
|
39
|
Porikli F. Selective Video Object Cutout. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5645-5655. [PMID: 28858791 DOI: 10.1109/tip.2017.2745098] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Conventional video segmentation approaches rely heavily on appearance models. Such methods often use appearance descriptors that have limited discriminative power under complex scenarios. To improve the segmentation performance, this paper presents a pyramid histogram-based confidence map that incorporates structure information into appearance statistics. It also combines geodesic distance-based dynamic models. Then, it employs an efficient measure of uncertainty propagation using local classifiers to determine the image regions, where the object labels might be ambiguous. The final foreground cutout is obtained by refining on the uncertain regions. Additionally, to reduce manual labeling, our method determines the frames to be labeled by the human operator in a principled manner, which further boosts the segmentation performance and minimizes the labeling effort. Our extensive experimental analyses on two big benchmarks demonstrate that our solution achieves superior performance, favorable computational efficiency, and reduced manual labeling in comparison to the state of the art.
Collapse
|
40
|
Porikli F. Visual Tracking by Sampling in Part Space. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5800-5810. [PMID: 28858801 DOI: 10.1109/tip.2017.2745204] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we present a novel part-based visual tracking method from the perspective of probability sampling. Specifically, we represent the target by a part space with two online learned probabilities to capture the structure of the target. The proposal distribution memorizes the historical performance of different parts, and it is used for the first round of part selection. The acceptance probability validates the specific tracking stability of each part in a frame, and it determines whether to accept its vote or to reject it. By doing this, we transform the complex online part selection problem into a probability learning one, which is easier to tackle. The observation model of each part is constructed by an improved supervised descent method and is learned in an incremental manner. Experimental results on two benchmarks demonstrate the competitive performance of our tracker against state-of-the-art methods.
Collapse
|
41
|
Wang L, Hua G, Sukthankar R, Xue J, Niu Z, Zheng N. Video Object Discovery and Co-Segmentation with Extremely Weak Supervision. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:2074-2088. [PMID: 28113741 DOI: 10.1109/tpami.2016.2612187] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We present a spatio-temporal energy minimization formulation for simultaneous video object discovery and co-segmentation across multiple videos containing irrelevant frames. Our approach overcomes a limitation that most existing video co-segmentation methods possess, i.e., they perform poorly when dealing with practical videos in which the target objects are not present in many frames. Our formulation incorporates a spatio-temporal auto-context model, which is combined with appearance modeling for superpixel labeling. The superpixel-level labels are propagated to the frame level through a multiple instance boosting algorithm with spatial reasoning, based on which frames containing the target object are identified. Our method only needs to be bootstrapped with the frame-level labels for a few video frames (e.g., usually 1 to 3) to indicate if they contain the target objects or not. Extensive experiments on four datasets validate the efficacy of our proposed method: 1) object segmentation from a single video on the SegTrack dataset, 2) object co-segmentation from multiple videos on a video co-segmentation dataset, and 3) joint object discovery and co-segmentation from multiple videos containing irrelevant frames on the MOViCS dataset and XJTU-Stevens, a new dataset that we introduce in this paper. The proposed method compares favorably with the state-of-the-art in all of these experiments.
Collapse
|
42
|
Porikli F. Higher Order Energies for Image Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:4911-4922. [PMID: 28682257 DOI: 10.1109/tip.2017.2722691] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A novel energy minimization method for general higher order binary energy functions is proposed in this paper. We first relax a discrete higher order function to a continuous one, and use the Taylor expansion to obtain an approximate lower order function, which is optimized by the quadratic pseudo-Boolean optimization or other discrete optimizers. The minimum solution of this lower order function is then used as a new local point, where we expand the original higher order energy function again. Our algorithm does not restrict to any specific form of the higher order binary function or bring in extra auxiliary variables. For concreteness, we show an application of segmentation with the appearance entropy, which is efficiently solved by our method. Experimental results demonstrate that our method outperforms the state-of-the-art methods.
Collapse
|
43
|
Wang W, Shen J, Yu Y, Ma KL. Stereoscopic Thumbnail Creation via Efficient Stereo Saliency Detection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:2014-2027. [PMID: 27541994 DOI: 10.1109/tvcg.2016.2600594] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we propose a framework for automatically producing thumbnails from stereo image pairs. It has two components focusing respectively on stereo saliency detection and stereo thumbnail generation. The first component analyzes stereo saliency through various saliency stimuli, stereoscopic perception and the relevance between two stereo views. The second component uses stereo saliency to guide stereo thumbnail generation. We develop two types of thumbnail generation methods, both changing image size automatically. The first method is called content-persistent cropping (CPC), which aims at cropping stereo images for display devices with different aspect ratios while preserving as much content as possible. The second method is an object-aware cropping method (OAC) for generating the smallest possible thumbnail pair that retains the most important content only and facilitates quick visual exploration of a stereo image database. Quantitative and qualitative experimental evaluations demonstrate promising performance of our thumbnail generation methods in comparison to state-of-the-art algorithms.
Collapse
|
44
|
Li X, Zhao B, Lu X. A General Framework for Edited Video and Raw Video Summarization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3652-3664. [PMID: 28436870 DOI: 10.1109/tip.2017.2695887] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.
Collapse
|
45
|
Zhao S, Lei Z, Sun M, Ma A, Shen J. Diffusion-based saliency detection with optimal seed selection scheme. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
46
|
Zhang D, Han J, Jiang L, Ye S, Chang X. Revealing Event Saliency in Unconstrained Video Collection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:1746-1758. [PMID: 28141520 DOI: 10.1109/tip.2017.2658957] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Recent progresses in multimedia event detection have enabled us to find videos about a predefined event from a large-scale video collection. Research towards more intrinsic unsupervised video understanding is an interesting but understudied field. Specifically, given a collection of videos sharing a common event of interest, the goal is to discover the salient fragments, i.e., the curt video fragments that can concisely portray the underlying event of interest, from each video. To explore this novel direction, this paper proposes an unsupervised event saliency revealing framework. It first extracts features from multiple modalities to represent each shot in the given video collection. Then, these shots are clustered to build the cluster-level event saliency revealing framework, which explores useful information cues (i.e., the intra-cluster prior, inter-cluster discriminability, and inter-cluster smoothness) by a concise optimization model. Compared with the existing methods, our approach could highlight the intrinsic stimulus of the unseen event within a video in an unsupervised fashion. Thus, it could potentially benefit to a wide range of multimedia tasks like video browsing, understanding, and search. To quantitatively verify the proposed method, we systematically compare the method to a number of baseline methods on the TRECVID benchmarks. Experimental results have demonstrated its effectiveness and efficiency.
Collapse
|
47
|
Yuan Y, Zheng X, Lu X. Discovering Diverse Subset for Unsupervised Hyperspectral Band Selection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:51-64. [PMID: 28113180 DOI: 10.1109/tip.2016.2617462] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Band selection, as a special case of the feature selection problem, tries to remove redundant bands and select a few important bands to represent the whole image cube. This has attracted much attention, since the selected bands provide discriminative information for further applications and reduce the computational burden. Though hyperspectral band selection has gained rapid development in recent years, it is still a challenging task because of the following requirements: 1) an effective model can capture the underlying relations between different high-dimensional spectral bands; 2) a fast and robust measure function can adapt to general hyperspectral tasks; and 3) an efficient search strategy can find the desired selected bands in reasonable computational time. To satisfy these requirements, a multigraph determinantal point process (MDPP) model is proposed to capture the full structure between different bands and efficiently find the optimal band subset in extensive hyperspectral applications. There are three main contributions: 1) graphical model is naturally transferred to address band selection problem by the proposed MDPP; 2) multiple graphs are designed to capture the intrinsic relationships between hyperspectral bands; and 3) mixture DPP is proposed to model the multiple dependencies in the proposed multiple graphs, and offers an efficient search strategy to select the optimal bands. To verify the superiority of the proposed method, experiments have been conducted on three hyperspectral applications, such as hyperspectral classification, anomaly detection, and target detection. The reliability of the proposed method in generic hyperspectral tasks is experimentally proved on four real-world hyperspectral data sets.
Collapse
|
48
|
Pang Y, Cao J, Li X. Learning Sampling Distributions for Efficient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:117-129. [PMID: 26742154 DOI: 10.1109/tcyb.2015.2508603] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Object detection is an important task in computer vision and machine intelligence systems. Multistage particle windows (MPW), proposed by Gualdi et al., is an algorithm of fast and accurate object detection. By sampling particle windows (PWs) from a proposal distribution (PD), MPW avoids exhaustively scanning the image. Despite its success, it is unknown how to determine the number of stages and the number of PWs in each stage. Moreover, it has to generate too many PWs in the initialization step and it unnecessarily regenerates too many PWs around object-like regions. In this paper, we attempt to solve the problems of MPW. An important fact we used is that there is a large probability for a randomly generated PW not to contain the object because the object is a sparse event relative to the huge number of candidate windows. Therefore, we design a PD so as to efficiently reject the huge number of nonobject windows. Specifically, we propose the concepts of rejection, acceptance, and ambiguity windows and regions. Then, the concepts are used to form and update a dented uniform distribution and a dented Gaussian distribution. This contrasts to MPW which utilizes only on region of support. The PD of MPW is acceptance-oriented whereas the PD of our method (called iPW) is rejection-oriented. Experimental results on human and face detection demonstrate the efficiency and the effectiveness of the iPW algorithm. The source code is publicly accessible.
Collapse
|
49
|
Lu X, Zheng X, Li X. Latent Semantic Minimal Hashing for Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:355-368. [PMID: 27849528 DOI: 10.1109/tip.2016.2627801] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Hashing-based similarity search is an important technique for large-scale query-by-example image retrieval system, since it provides fast search with computation and memory efficiency. However, it is a challenge work to design compact codes to represent original features with good performance. Recently, a lot of unsupervised hashing methods have been proposed to focus on preserving geometric structure similarity of the data in the original feature space, but they have not yet fully refined image features and explored the latent semantic feature embedding in the data simultaneously. To address the problem, in this paper, a novel joint binary codes learning method is proposed to combine image feature to latent semantic feature with minimum encoding loss, which is referred as latent semantic minimal hashing. The latent semantic feature is learned based on matrix decomposition to refine original feature, thereby it makes the learned feature more discriminative. Moreover, a minimum encoding loss is combined with latent semantic feature learning process simultaneously, so as to guarantee the obtained binary codes are discriminative as well. Extensive experiments on several well-known large databases demonstrate that the proposed method outperforms most state-of-the-art hashing methods.
Collapse
|
50
|
Zheng X, Yuan Y, Lu X. A target detection method for hyperspectral image based on mixture noise model. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.08.015] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|