1
|
Li X, Liu Y, Zhao H. Saliency Detection Based on Multiple-Level Feature Learning. ENTROPY (BASEL, SWITZERLAND) 2024; 26:383. [PMID: 38785632 PMCID: PMC11119044 DOI: 10.3390/e26050383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/22/2024] [Accepted: 04/25/2024] [Indexed: 05/25/2024]
Abstract
Finding the most interesting areas of an image is the aim of saliency detection. Conventional methods based on low-level features rely on biological cues like texture and color. These methods, however, have trouble with processing complicated or low-contrast images. In this paper, we introduce a deep neural network-based saliency detection method. First, using semantic segmentation, we construct a pixel-level model that gives each pixel a saliency value depending on its semantic category. Next, we create a region feature model by combining both hand-crafted and deep features, which extracts and fuses the local and global information of each superpixel region. Third, we combine the results from the previous two steps, along with the over-segmented superpixel images and the original images, to construct a multi-level feature model. We feed the model into a deep convolutional network, which generates the final saliency map by learning to integrate the macro and micro information based on the pixels and superpixels. We assess our method on five benchmark datasets and contrast it against 14 state-of-the-art saliency detection algorithms. According to the experimental results, our method performs better than the other methods in terms of F-measure, precision, recall, and runtime. Additionally, we analyze the limitations of our method and propose potential future developments.
Collapse
Affiliation(s)
- Xiaoli Li
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110169, China; (Y.L.); (H.Z.)
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Opto-Electronic Information Processing, Shenyang 110169, China
- The Key Lab of Image Understanding and Computer Vision, Shenyang 110619, China
- School of Computer Science and Engineering, Shenyang Jianzhu University, Shenyang 110168, China
| | - Yunpeng Liu
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110169, China; (Y.L.); (H.Z.)
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Opto-Electronic Information Processing, Shenyang 110169, China
- The Key Lab of Image Understanding and Computer Vision, Shenyang 110619, China
| | - Huaici Zhao
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110169, China; (Y.L.); (H.Z.)
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Opto-Electronic Information Processing, Shenyang 110169, China
- The Key Lab of Image Understanding and Computer Vision, Shenyang 110619, China
| |
Collapse
|
2
|
Zong M, Wang R, Ma Y, Ji W. Spatial and temporal saliency based four-stream network with multi-task learning for action recognition. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2022.109884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
3
|
Song S, Jia Z, Yang J, Kasabov N. Salient Detection via the Fusion of Background-based and Multiscale frequency-domain Features. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
4
|
Zhang H, Qu S, Li H, Xu W, Du X. A motion-appearance-aware network for object change detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
5
|
Hassan M, Wang Y, Pang W, Wang D, Li D, Zhou Y, Xu D. GUV-Net for high fidelity shoeprint generation. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00558-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractShoeprints contain valuable information for tracing evidence in forensic scenes, and they need to be generated into cleaned, sharp, and high-fidelity images. Most of the acquired shoeprints are found with low quality and/or in distorted forms. The high-fidelity shoeprint generation is of great significance in forensic science. A wide range of deep learning models has been suggested for super-resolution, being either generalized approaches or application specific. Considering the crucial challenges in shoeprint based processing and lacking specific algorithms, we proposed a deep learning based GUV-Net model for high-fidelity shoeprint generation. GUV-Net imitates learning features from VAE, U-Net, and GAN network models with special treatment of absent ground truth shoeprints. GUV-Net encodes efficient probabilistic distributions in the latent space and decodes variants of samples together with passed key features. GUV-Net forwards the learned samples to a refinement-unit proceeded to the generation of high-fidelity output. The refinement-unit receives low-level features from the decoding module at distinct levels. Furthermore, the refinement process is made more efficient by inverse-encoded in high dimensional space through a parallel inverse encoding network. The objective functions at different levels enable the model to efficiently optimize the parameters by mapping a low quality image to a high-fidelity one by maintaining salient features which are important to forensics. Finally, the performance of the proposed model is evaluated against state-of-the-art super-resolution network models.
Collapse
|
6
|
Ji W, Wang R, Tian Y, Wang X. An attention based dual learning approach for video captioning. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2021.108332] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Ji Y, Zhang H, Gao F, Sun H, Wei H, Wang N, Yang B. LGCNet: A local-to-global context-aware feature augmentation network for salient object detection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.10.055] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
8
|
Singular Spectrum Analysis for Background Initialization with Spatio-Temporal RGB Color Channel Data. ENTROPY 2021; 23:e23121644. [PMID: 34945951 PMCID: PMC8699993 DOI: 10.3390/e23121644] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 12/04/2021] [Accepted: 12/05/2021] [Indexed: 11/17/2022]
Abstract
In video processing, background initialization aims to obtain a scene without foreground objects. Recently, the background initialization problem has attracted the attention of researchers because of its real-world applications, such as video segmentation, computational photography, video surveillance, etc. However, the background initialization problem is still challenging because of the complex variations in illumination, intermittent motion, camera jitter, shadow, etc. This paper proposes a novel and effective background initialization method using singular spectrum analysis. Firstly, we extract the video’s color frames and split them into RGB color channels. Next, RGB color channels of the video are saved as color channel spatio-temporal data. After decomposing the color channel spatio-temporal data by singular spectrum analysis, we obtain the stable and dynamic components using different eigentriple groups. Our study indicates that the stable component contains a background image and the dynamic component includes the foreground image. Finally, the color background image is reconstructed by merging RGB color channel images obtained by reshaping the stable component data. Experimental results on the public scene background initialization databases show that our proposed method achieves a good color background image compared with state-of-the-art methods.
Collapse
|
9
|
Liu Y, Yuan X, Jiang X, Wang P, Kou J, Wang H, Liu M. Dilated Adversarial U-Net Network for automatic gross tumor volume segmentation of nasopharyngeal carcinoma. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107722] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Wang L, Yuan X, Zong M, Ma Y, Ji W, Liu M, Wang R. Multi-cue based four-stream 3D ResNets for video-based action recognition. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.079] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
11
|
|
12
|
Panda S, Nanda PK. Kernel density estimation and correntropy based background modeling and camera model parameter estimation for underwater video object detection. Soft comput 2021. [DOI: 10.1007/s00500-021-05919-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Cascade Object Detection and Remote Sensing Object Detection Method Based on Trainable Activation Function. REMOTE SENSING 2021. [DOI: 10.3390/rs13020200] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Object detection is an important process in surveillance system to locate objects and it is considered as major application in computer vision. The Convolution Neural Network (CNN) based models have been developed by many researchers for object detection to achieve higher performance. However, existing models have some limitations such as overfitting problem and lower efficiency in small object detection. Object detection in remote sensing hasthe limitations of low efficiency in detecting small object and the existing methods have poor localization. Cascade Object Detection methods have been applied to increase the learning process of the detection model. In this research, the Additive Activation Function (AAF) is applied in a Faster Region based CNN (RCNN) for object detection. The proposed AAF-Faster RCNN method has the advantage of better convergence and clear bounding variance. The Fourier Series and Linear Combination of activation function are used to update the loss function. The Microsoft (MS) COCO datasets and Pascal VOC 2007/2012 are used to evaluate the performance of the AAF-Faster RCNN model. The proposed AAF-Faster RCNN is also analyzed for small object detection in the benchmark dataset. The analysis shows that the proposed AAF-Faster RCNN model has higher efficiency than state-of-art Pay Attention to Them (PAT) model in object detection. To evaluate the performance of AAF-Faster RCNN method of object detection in remote sensing, the NWPU VHR-10 remote sensing data set is used to test the proposed method. The AAF-Faster RCNN model has mean Average Precision (mAP) of 83.1% and existing PAT-SSD512 method has the 81.7%mAP in Pascal VOC 2007 dataset.
Collapse
|
14
|
Zheng H, Wang R, Ji W, Zong M, Wong WK, Lai Z, Lv H. Discriminative deep multi-task learning for facial expression recognition. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.04.041] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
15
|
|
16
|
A robust contour detection operator with combined push-pull inhibition and surround suppression. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
17
|
Kwon J. Robust visual tracking based on variational auto-encoding Markov chain Monte Carlo. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.09.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
18
|
|