1
|
Li Y, Hong Y, Song Y, Zhu C, Zhang Y, Wang R. SiamPolar: Semi-supervised realtime video object segmentation with polar representation. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.09.063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
2
|
Abstract
Object segmentation and object tracking are fundamental research areas in the computer vision community. These two topics are difficult to handle some common challenges, such as occlusion, deformation, motion blur, scale variation, and more. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity; the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of Video Object Segmentation and Tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This survey aims to provide a comprehensive review of the state-of-the-art VOST methods, classify these methods into different categories, and identify new trends. First, we broadly categorize VOST methods into Video Object Segmentation (VOS) and Segmentation-based Object Tracking (SOT). Each category is further classified into various types based on the segmentation and tracking mechanism. Moreover, we present some representative VOS and SOT methods of each time node. Second, we provide a detailed discussion and overview of the technical characteristics of the different methods. Third, we summarize the characteristics of the related video dataset and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions.
Collapse
Affiliation(s)
- Rui Yao
- School of Computer Science and Technology, China University of Mining and Technology, China; Engineering Research Center of Mine Digitization, Ministry of Education of the People’s Republic of China, China; The Suzhou Smart City Research Institute, Suzhou University of Science and Technology, Xuzhou, China
| | | | - Shixiong Xia
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Jiaqi Zhao
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
3
|
Symmetry Encoder-Decoder Network with Attention Mechanism for Fast Video Object Segmentation. Symmetry (Basel) 2019. [DOI: 10.3390/sym11081006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Semi-supervised video object segmentation (VOS) has obtained significant progress in recent years. The general purpose of VOS methods is to segment objects in video sequences provided with a single annotation in the first frame. However, many of the recent successful methods heavily fine-tune the object mask in the first frame, which decreases their efficiency. In this work, to address this issue, we propose a symmetry encoder-decoder network with the attention mechanism for video object segmentation (SAVOS) requiring only one forward pass to segment the target object in a video. Specifically, the encoder generates a low-resolution mask with smoothed boundaries, while the decoder further refines the details of the segmentation mask and integrates lower level features progressively. Besides, to obtain accurate segmentation results, we sequentially apply the attention module on multi-scale feature maps for refinement. We conduct several experiments on three challenging datasets (i.e., DAVIS 2016, DAVIS 2017, and SegTrack v2) to show that SAVOS achieves competitive performance against the state-of-the-art.
Collapse
|
4
|
Automatic Annotation of Airborne Images by Label Propagation Based on a Bayesian-CRF Model. REMOTE SENSING 2019. [DOI: 10.3390/rs11020145] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The tremendous advances in deep neural networks have demonstrated the superiority of deep learning techniques for applications such as object recognition or image classification. Nevertheless, deep learning-based methods usually require a large amount of training data, which mainly comes from manual annotation and is quite labor-intensive. In order to reduce the amount of manual work required for generating enough training data, we hereby propose to leverage existing labeled data to generate image annotations automatically. Specifically, the pixel labels are firstly transferred from one image modality to another image modality via geometric transformation to create initial image annotations, and then additional information (e.g., height measurements) is incorporated for Bayesian inference to update the labeling beliefs. Finally, the updated label assignments are optimized with a fully connected conditional random field (CRF), yielding refined labeling for all pixels in the image. The proposed approach is tested on two different scenarios, i.e., (1) label propagation from annotated aerial imagery to unmanned aerial vehicle (UAV) imagery and (2) label propagation from map database to aerial imagery. In each scenario, the refined image labels are used as pseudo-ground truth data for training a convolutional neural network (CNN). Results demonstrate that our model is able to produce accurate label assignments even around complex object boundaries; besides, the generated image labels can be effectively leveraged for training CNNs and achieve comparable classification accuracy as manual image annotations, more specifically, the per-class classification accuracy of the networks trained by the manual image annotations and the generated image labels have a difference within ± 5 % .
Collapse
|
5
|
Mishra A, Ghosh R, Principe JC, Thakor NV, Kukreja SL. A Saccade Based Framework for Real-Time Motion Segmentation Using Event Based Vision Sensors. Front Neurosci 2017; 11:83. [PMID: 28316563 PMCID: PMC5334512 DOI: 10.3389/fnins.2017.00083] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 02/06/2017] [Indexed: 11/25/2022] Open
Abstract
Motion segmentation is a critical pre-processing step for autonomous robotic systems to facilitate tracking of moving objects in cluttered environments. Event based sensors are low power analog devices that represent a scene by means of asynchronous information updates of only the dynamic details at high temporal resolution and, hence, require significantly less calculations. However, motion segmentation using spatiotemporal data is a challenging task due to data asynchrony. Prior approaches for object tracking using neuromorphic sensors perform well while the sensor is static or a known model of the object to be followed is available. To address these limitations, in this paper we develop a technique for generalized motion segmentation based on spatial statistics across time frames. First, we create micromotion on the platform to facilitate the separation of static and dynamic elements of a scene, inspired by human saccadic eye movements. Second, we introduce the concept of spike-groups as a methodology to partition spatio-temporal event groups, which facilitates computation of scene statistics and characterize objects in it. Experimental results show that our algorithm is able to classify dynamic objects with a moving camera with maximum accuracy of 92%.
Collapse
Affiliation(s)
- Abhishek Mishra
- Singapore Institute for Neurotechnology, National University of Singapore Singapore, Singapore
| | - Rohan Ghosh
- Singapore Institute for Neurotechnology, National University of Singapore Singapore, Singapore
| | - Jose C Principe
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL, USA
| | - Nitish V Thakor
- Singapore Institute for Neurotechnology, National University of SingaporeSingapore, Singapore; Biomedical Engineering Department, Johns Hopkins UniversityBaltimore, MD, USA
| | - Sunil L Kukreja
- Singapore Institute for Neurotechnology, National University of Singapore Singapore, Singapore
| |
Collapse
|
6
|
Li K, Zhang J, Tao W. Unsupervised Co-Segmentation for Indefinite Number of Common Foreground Objects. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1898-1909. [PMID: 26886987 DOI: 10.1109/tip.2016.2526900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Co-segmentation addresses the problem of simultaneously extracting the common targets appeared in multiple images. Multiple common targets involved object co-segmentation problem, which is very common in reality, has been a new research hotspot recently. In this paper, an unsupervised object co-segmentation method for indefinite number of common targets is proposed. This method overcomes the inherent limitation of traditional proposal selection-based methods for multiple common targets involved images while retaining their original advantages for objects extracting. For each image, the proposed multi-search strategy extracts each target individually and an adaptive decision criterion is raised to give each candidate a reliable judgment automatically, i.e., target or non-target. The comparison experiments conducted on public data sets iCoseg, MSRC, and a more challenging data set Coseg-INCT demonstrate the superior performance of the proposed method.
Collapse
|
7
|
Incremental multi-class semi-supervised clustering regularized by Kalman filtering. Neural Netw 2015; 71:88-104. [PMID: 26319050 DOI: 10.1016/j.neunet.2015.08.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 07/09/2015] [Accepted: 08/02/2015] [Indexed: 11/22/2022]
Abstract
This paper introduces an on-line semi-supervised learning algorithm formulated as a regularized kernel spectral clustering (KSC) approach. We consider the case where new data arrive sequentially but only a small fraction of it is labeled. The available labeled data act as prototypes and help to improve the performance of the algorithm to estimate the labels of the unlabeled data points. We adopt a recently proposed multi-class semi-supervised KSC based algorithm (MSS-KSC) and make it applicable for on-line data clustering. Given a few user-labeled data points the initial model is learned and then the class membership of the remaining data points in the current and subsequent time instants are estimated and propagated in an on-line fashion. The update of the memberships is carried out mainly using the out-of-sample extension property of the model. Initially the algorithm is tested on computer-generated data sets, then we show that video segmentation can be cast as a semi-supervised learning problem. Furthermore we show how the tracking capabilities of the Kalman filter can be used to provide the labels of objects in motion and thus regularizing the solution obtained by the MSS-KSC algorithm. In the experiments, we demonstrate the performance of the proposed method on synthetic data sets and real-life videos where the clusters evolve in a smooth fashion over time.
Collapse
|
8
|
Abstract
In this paper, semi-automatic methods based on Gaussian random field (GRF) for online object labeling in video were presented. With a user specified region of interest (ROI), the interested object in all of the frames can be labeled. Two methods, i.e. Updated GRF with fixed SmartLabel (UGFS) method and fixed GRF with fixed SmartLabel (FGFS) method were proposed and compared. Evaluations on object categories have indicated that the UGFS method not only improves the real time performance of object labeling in video, but also has relatively high labeling accuracy.
Collapse
|
9
|
The research and application of visual saliency and adaptive support vector machine in target tracking field. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:925341. [PMID: 24363779 PMCID: PMC3865687 DOI: 10.1155/2013/925341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 11/04/2013] [Indexed: 11/18/2022]
Abstract
The efficient target tracking algorithm researches have become current research focus of intelligent robots. The main problems of target tracking process in mobile robot face environmental uncertainty. They are very difficult to estimate the target states, illumination change, target shape changes, complex backgrounds, and other factors and all affect the occlusion in tracking robustness. To further improve the target tracking's accuracy and reliability, we present a novel target tracking algorithm to use visual saliency and adaptive support vector machine (ASVM). Furthermore, the paper's algorithm has been based on the mixture saliency of image features. These features include color, brightness, and sport feature. The execution process used visual saliency features and those common characteristics have been expressed as the target's saliency. Numerous experiments demonstrate the effectiveness and timeliness of the proposed target tracking algorithm in video sequences where the target objects undergo large changes in pose, scale, and illumination.
Collapse
|
10
|
|