1
|
Chen J, Cong R, Ip HHS, Kwong S. KepSalinst: Using Peripheral Points to Delineate Salient Instances. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:3392-3405. [PMID: 37943655 DOI: 10.1109/tcyb.2023.3326165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Salient instance segmentation (SIS) is an emerging field that evolves from salient object detection (SOD), aiming at identifying individual salient instances using segmentation maps. Inspired by the success of dynamic convolutions in segmentation tasks, this article introduces a keypoints-based SIS network (KepSalinst). It employs multiple keypoints, that is, the center and several peripheral points of an instance, as effective geometrical guidance for dynamic convolutions. The features at peripheral points can help roughly delineate the spatial extent of the instance and complement the information inside the central features. To fully exploit the complementary components within these features, we design a differentiated patterns fusion (DPF) module. This ensures that the resulting dynamic convolutional filters formed by these features are sufficiently comprehensive for precise segmentation. Furthermore, we introduce a high-level semantic guided saliency (HSGS) module. This module enhances the perception of saliency by predicting a map for the input image to estimate a saliency score for each segmented instance. On four SIS datasets (ILSO, SOC, SIS10K, and COME15K), our KepSalinst outperforms all previous models qualitatively and quantitatively.
Collapse
|
2
|
Li H, Chen X, Yang W, Huang J, Sun K, Wang Y, Huang A, Mei L. Global Semantic-Sense Aggregation Network for Salient Object Detection in Remote Sensing Images. ENTROPY (BASEL, SWITZERLAND) 2024; 26:445. [PMID: 38920454 PMCID: PMC11203128 DOI: 10.3390/e26060445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 05/16/2024] [Accepted: 05/23/2024] [Indexed: 06/27/2024]
Abstract
Salient object detection (SOD) aims to accurately identify significant geographical objects in remote sensing images (RSI), providing reliable support and guidance for extensive geographical information analyses and decisions. However, SOD in RSI faces numerous challenges, including shadow interference, inter-class feature confusion, as well as unclear target edge contours. Therefore, we designed an effective Global Semantic-aware Aggregation Network (GSANet) to aggregate salient information in RSI. GSANet computes the information entropy of different regions, prioritizing areas with high information entropy as potential target regions, thereby achieving precise localization and semantic understanding of salient objects in remote sensing imagery. Specifically, we proposed a Semantic Detail Embedding Module (SDEM), which explores the potential connections among multi-level features, adaptively fusing shallow texture details with deep semantic features, efficiently aggregating the information entropy of salient regions, enhancing information content of salient targets. Additionally, we proposed a Semantic Perception Fusion Module (SPFM) to analyze map relationships between contextual information and local details, enhancing the perceptual capability for salient objects while suppressing irrelevant information entropy, thereby addressing the semantic dilution issue of salient objects during the up-sampling process. The experimental results on two publicly available datasets, ORSSD and EORSSD, demonstrated the outstanding performance of our method. The method achieved 93.91% Sα, 98.36% Eξ, and 89.37% Fβ on the EORSSD dataset.
Collapse
Affiliation(s)
- Hongli Li
- School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
| | - Xuhui Chen
- School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
| | - Wei Yang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Jian Huang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Kaimin Sun
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
| | - Ying Wang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Andong Huang
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| | - Liye Mei
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China
| |
Collapse
|
3
|
Zhu G, Li J, Guo Y. Supplement and Suppression: Both Boundary and Nonboundary Are Helpful for Salient Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6615-6627. [PMID: 34818196 DOI: 10.1109/tnnls.2021.3127959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Current methods aggregate multilevel features from the backbone and introduce edge information to get more refined saliency maps. However, little attention is paid to how to suppress the regions with similar saliency appearances in the background. These regions usually exist in the vicinity of salient objects and have high contrast with the background, which is easy to be misclassified as foreground. To solve this problem, we propose a gated feature interaction network (GFINet) to integrate multiple saliency features, which can utilize nonboundary features with background information to suppress pseudosalient objects and simultaneously apply boundary features to supplement edge details. Different from previous methods that only consider the complementarity between saliency and boundary, the proposed network introduces nonboundary features into the decoder to filter the pseudosalient objects. Specifically, GFINet consists of global features aggregation branch (GFAB), boundary and nonboundary features' perception branch (B&NFPB), and gated feature interaction module (GFIM). According to the global features generated by GFAB, boundary and nonboundary features produced by B&NFPB and GFIM employ a gate structure to adaptively optimize the saliency information interchange between abovementioned features and, thus, predict the final saliency maps. Besides, due to the imbalanced distribution between the boundary pixels and nonboundary ones, the binary cross-entropy (BCE) loss is difficult to predict the pixels near the boundary. Therefore, we design a border region aware (BRA) loss to further boost the quality of boundary and nonboundary, which can guide the network to focus more on the hard pixels near the boundary by assigning different weights to different positions. Compared with 12 counterparts, experimental results on five benchmark datasets show that our method has better generalization and improves the state-of-the-art approach by 4.85% averagely in terms of the regional and boundary evaluation measures. In addition, our model is more efficient with an inference speed of 50.3 FPS when processing a 320 ×320 image. Code has been made available at https://github.com/lesonly/GFINet.
Collapse
|
4
|
Zhang X, Yu Y, Wang Y, Chen X, Wang C. Alignment Integration Network for Salient Object Detection and Its Application for Optical Remote Sensing Images. SENSORS (BASEL, SWITZERLAND) 2023; 23:6562. [PMID: 37514856 PMCID: PMC10386270 DOI: 10.3390/s23146562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 07/18/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
Salient object detection has made substantial progress due to the exploitation of multi-level convolutional features. The key point is how to combine these convolutional features effectively and efficiently. Due to the step by step down-sampling operations in almost all CNNs, multi-level features usually have different scales. Methods based on fully convolutional networks directly apply bilinear up-sampling to low-resolution deep features and then combine them with high-resolution shallow features by addition or concatenation, which neglects the compatibility of features, resulting in misalignment problems. In this paper, to solve the problem, we propose an alignment integration network (ALNet), which aligns adjacent level features progressively to generate powerful combinations. To capture long-range dependencies for high-level integrated features as well as maintain high computational efficiency, a strip attention module (SAM) is introduced into the alignment integration procedures. Benefiting from SAM, multi-level semantics can be selectively propagated to predict precise salient objects. Furthermore, although integrating multi-level convolutional features can alleviate the blur boundary problem to a certain extent, it is still unsatisfactory for the restoration of a real object boundary. Therefore, we design a simple but effective boundary enhancement module (BEM) to guide the network focus on boundaries and other error-prone parts. Based on BEM, an attention weighted loss is proposed to boost the network to generate sharper object boundaries. Experimental results on five benchmark datasets demonstrate that the proposed method can achieve state-of-the-art performance on salient object detection. Moreover, we extend the experiments on the remote sensing datasets, and the results further prove the universality and scalability of ALNet.
Collapse
Affiliation(s)
- Xiaoning Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yi Yu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yuqing Wang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Xiaolin Chen
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Chenglong Wang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
5
|
Cong R, Yang N, Li C, Fu H, Zhao Y, Huang Q, Kwong S. Global-and-Local Collaborative Learning for Co-Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1920-1931. [PMID: 35867373 DOI: 10.1109/tcyb.2022.3169431] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. Therefore, how to effectively extract interimage correspondence is crucial for the CoSOD task. In this article, we propose a global-and-local collaborative learning (GLNet) architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) to capture the comprehensive interimage corresponding relationship among different images from the global and local perspectives. First, we treat different images as different time slices and use 3-D convolution to integrate all intrafeatures intuitively, which can more fully extract the global group semantics. Second, we design a pairwise correlation transformation (PCT) to explore similarity correspondence between pairwise images and combine the multiple local pairwise correspondences to generate the local interimage relationship. Third, the interimage relationships of the GCM and LCM are integrated through a global-and-local correspondence aggregation (GLA) module to explore more comprehensive interimage collaboration cues. Finally, the intra and inter features are adaptively integrated by an intra-and-inter weighting fusion (AEWF) module to learn co-saliency features and predict the co-saliency map. The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms 11 state-of-the-art competitors trained on some large datasets (about 8k-200k images).
Collapse
|
6
|
An Improved Wake Vortex-Based Inversion Method for Submarine Maneuvering State. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:5632128. [PMID: 36820055 PMCID: PMC9938766 DOI: 10.1155/2023/5632128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/28/2023] [Accepted: 01/31/2023] [Indexed: 02/12/2023]
Abstract
As the noise reduction performance of submarines continues to improve, it is difficult to detect and track submarines through acoustic detection techniques. Therefore, nonacoustic submarine detection techniques are becoming more and more important. The submarine movement will leave a wake vortex, and the information of the wake vortex can be used to invert the maneuvering state of the submarine. However, the wake vortex is constantly dissipated in the evolution process, and the strength of the wake vortex is constantly reduced, resulting in the gradual weakening of the characteristics of the wake vortex, which makes the inversion of submarine operating state difficult and less accurate. In order to solve the above problems, this paper proposes an improved wake vortex-based inversion method for submarine maneuvering state. Firstly, a random finite set of submarine wake vortex observation features is established to obtain the feature with the highest correlation degree with submarine maneuvering state in the random finite set. Secondly, the multiscale fusion module and attention mechanism are used to re-encode the weak features of the wake vortex image, and the salient features of the wake vortex image are extracted. Finally, the manipulation state of the wake vortex image is retrieved by the extracted salient features. The experimental results show that the average inversion accuracy of the proposed algorithm is improved by 1.27% in terms of manipulating state inversion of weak feature wake vortex images. The algorithm in this paper can realize the inversion of submarine maneuvering state in the case of weak submarine wake vortex image features and incomplete feature information. It provides the basis for the detection technology based on the submarine wake characteristics.
Collapse
|
7
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
8
|
Xu R, Wang C, Zhang J, Xu S, Meng W, Zhang X. RSSFormer: Foreground Saliency Enhancement for Remote Sensing Land-Cover Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1052-1064. [PMID: 37022079 DOI: 10.1109/tip.2023.3238648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
High spatial resolution (HSR) remote sensing images contain complex foreground-background relationships, which makes the remote sensing land cover segmentation a special semantic segmentation task. The main challenges come from the large-scale variation, complex background samples and imbalanced foreground-background distribution. These issues make recent context modeling methods sub-optimal due to the lack of foreground saliency modeling. To handle these problems, we propose a Remote Sensing Segmentation framework (RSSFormer), including Adaptive TransFormer Fusion Module, Detail-aware Attention Layer and Foreground Saliency Guided Loss. Specifically, from the perspective of relation-based foreground saliency modeling, our Adaptive Transformer Fusion Module can adaptively suppress background noise and enhance object saliency when fusing multi-scale features. Then our Detail-aware Attention Layer extracts the detail and foreground-related information via the interplay of spatial attention and channel attention, which further enhances the foreground saliency. From the perspective of optimization-based foreground saliency modeling, our Foreground Saliency Guided Loss can guide the network to focus on hard samples with low foreground saliency responses to achieve balanced optimization. Experimental results on LoveDA datasets, Vaihingen datasets, Potsdam datasets and iSAID datasets validate that our method outperforms existing general semantic segmentation methods and remote sensing segmentation methods, and achieves a good compromise between computational overhead and accuracy. Our code is available at https://github.com/Rongtao-Xu/RepresentationLearning/tree/main/RSSFormer-TIP2023.
Collapse
|
9
|
Zhou X, Shen K, Weng L, Cong R, Zheng B, Zhang J, Yan C. Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:539-552. [PMID: 35417369 DOI: 10.1109/tcyb.2022.3163152] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.
Collapse
|
10
|
Li G, Liu Z, Zeng D, Lin W, Ling H. Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:526-538. [PMID: 35417367 DOI: 10.1109/tcyb.2022.3162945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Salient object detection (SOD) in optical remote sensing images (RSIs), or RSI-SOD, is an emerging topic in understanding optical RSIs. However, due to the difference between optical RSIs and natural scene images (NSIs), directly applying NSI-SOD methods to optical RSIs fails to achieve satisfactory results. In this article, we propose a novel adjacent context coordination network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for RSI-SOD. Specifically, ACCoNet consists of three parts: 1) an encoder; 2) adjacent context coordination modules (ACCoMs); and 3) a decoder. As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder. ACCoM contains a local branch and two adjacent branches to coordinate the multilevel features simultaneously. The local branch highlights the salient regions in an adaptive way, while the adjacent branches introduce global information of adjacent levels to enhance salient regions. In addition, to extend the capabilities of the classic decoder block (i.e., several cascaded convolutional layers), we extend it with two bifurcations and propose a bifurcation-aggregation block (BAB) to capture the contextual information in the decoder. Extensive experiments on two benchmark datasets demonstrate that the proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/ACCoNet.
Collapse
|
11
|
Three-stream interaction decoder network for RGB-thermal salient object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
12
|
Scribble-attention hierarchical network for weakly supervised salient object detection in optical remote sensing images. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
13
|
Lawrance N, Angel TS. Performance evaluation of image fusion techniques and implementation of new fusion technique for remote sensing satellite data. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The technique of integrating images from two or more sensors that were taken from the same place or the same object is known as image fusion. The goal is to get more spectral and spatial information from the combined image as a whole than from the individual images. It is required to fuse the images in order to improve the spatial and spectral quality of both panchromatic and multispectral images. This study introduces a novel method for fusing remote sensing images that combines L0 smoothing, NSCT (Non-subsampled Contourlet Transform), SR (Sparse Representation), and MAR (Max absolute rule). The multispectral and panchromatic images are initially divided into lower and higher frequency components using the L0 smoothing filter as the method of fusion. The fusion process is then carried out, utilising a technique that combines NSCT and SR to fuse low-frequency components. Similar to this, the Max-absolute fusion rule is used to fuse high-frequency components. In conclusion, the disintegration of fused low-frequency and high-frequency data yields the final image. Our method yields an enhanced outcome in terms of the correlation coefficient, Entropy, spatial frequency, and fusion of mutual information for both the term of picture quality enhancement and visual evaluation. This suggested approach produces superior outcomes after execution. This study makes use of the Landsat-7ETM+, IKONOS, and Quick Bird datasets. Different satellites are used to take each image. There have been two examples of each image used. In comparison to previous Traditional Methods, the proposed image fusion techniques’ output has a quality that is more than 20% higher.
Collapse
Affiliation(s)
- N.A. Lawrance
- Research scholar, Computer Science Engineering, SRM Institute of Science and Technology, Tamilnadu, India
| | - T.S. Shiny Angel
- Computer Science Engineering, SRM Institute of Science and Technology, Tamilnadu, India
| |
Collapse
|
14
|
Zhang N, Han J, Liu N. Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4556-4570. [PMID: 35763477 DOI: 10.1109/tip.2022.3185550] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
RGB-D co-salient object detection aims to segment co-occurring salient objects when given a group of relevant images and depth maps. Previous methods often adopt separate pipeline and use hand-crafted features, being hard to capture the patterns of co-occurring salient objects and leading to unsatisfactory results. Using end-to-end CNN models is a straightforward idea, but they are less effective in exploiting global cues due to the intrinsic limitation. Thus, in this paper, we alternatively propose an end-to-end transformer-based model which uses class tokens to explicitly capture implicit class knowledge to perform RGB-D co-salient object detection, denoted as CTNet. Specifically, we first design adaptive class tokens for individual images to explore intra-saliency cues and then develop common class tokens for the whole group to explore inter-saliency cues. Besides, we also leverage the complementary cues between RGB images and depth maps to promote the learning of the above two types of class tokens. In addition, to promote model evaluation, we construct a challenging and large-scale benchmark dataset, named RGBD CoSal1k, which collects 106 groups containing 1000 pairs of RGB-D images with complex scenarios and diverse appearances. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
15
|
Zheng L, Xiao G, Shi Z, Wang S, Ma J. MSA-Net: Establishing Reliable Correspondences by Multiscale Attention Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4598-4608. [PMID: 35776808 DOI: 10.1109/tip.2022.3186535] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we propose a novel multi-scale attention based network (called MSA-Net) for feature matching problems. Current deep networks based feature matching methods suffer from limited effectiveness and robustness when applied to different scenarios, due to random distributions of outliers and insufficient information learning. To address this issue, we propose a multi-scale attention block to enhance the robustness to outliers, for improving the representational ability of the feature map. In addition, we also design a novel context channel refine block and a context spatial refine block to mine the information context with less parameters along channel and spatial dimensions, respectively. The proposed MSA-Net is able to effectively infer the probability of correspondences being inliers with less parameters. Extensive experiments on outlier removal and relative pose estimation have shown the performance improvements of our network over current state-of-the-art methods with less parameters on both outdoor and indoor datasets. Notably, our proposed network achieves an 11.7% improvement at error threshold 5° without RANSAC than the state-of-the-art method on relative pose estimation task when trained on YFCC100M dataset.
Collapse
|
16
|
Pei J, Zhou T, Tang H, Liu C, Chen C. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03647-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
17
|
Lyu L, Han R, Chen Z. Cascaded parallel crowd counting network with multi-resolution collaborative representation. APPL INTELL 2022; 53:3002-3016. [PMID: 35607431 PMCID: PMC9117858 DOI: 10.1007/s10489-022-03639-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2022] [Indexed: 01/14/2023]
Abstract
Accurately estimating the size and density distribution of a crowd from images is of great importance to public safety and crowd management during the COVID-19 pandemic, but it is very challenging as it is affected by many complex factors, including perspective distortion and background noise information. In this paper, we propose a novel multi-resolution collaborative representation framework called the cascaded parallel network (CP-Net), consisting of three parallel scale-specific branches connected in a cascading mode. In the framework, the three cascaded multi-resolution branches efficiently capture multi-scale features through their specific receptive fields. Additionally, multi-level feature fusion and information filtering are performed continuously on each branch to resist noise interference and perspective distortion. Moreover, we design an information exchange module across independent branches to refine the features extracted by each specific branch and deal with perspective distortion by using complementary information of multiple resolutions. To further improve the robustness of the network to scale variance and generate high-quality density maps, we construct a multi-receptive field fusion module to aggregate multi-scale features more comprehensively. The performance of our proposed CP-Net is verified on the challenging counting datasets (UCF_CC_50, UCF-QNRF, Shanghai Tech A&B, and WorldExpo'10), and the experimental results demonstrate the superiority of the proposed method.
Collapse
Affiliation(s)
- Lei Lyu
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358 China
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, 250358 China
| | - Run Han
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358 China
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, 250358 China
| | - Ziming Chen
- Shandong Zhengzhong Information Technology Co., LTD, Jinan, 250014 China
- Shandong Digital Applied Science Research Institute Co.,LTD, Jinan, 250101 China
| |
Collapse
|
18
|
Wu YH, Liu Y, Zhang L, Cheng MM, Ren B. EDN: Salient Object Detection via Extremely-Downsampled Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3125-3136. [PMID: 35412981 DOI: 10.1109/tip.2022.3164550] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent progress on salient object detection (SOD) mainly benefits from multi-scale learning, where the high-level and low-level features collaborate in locating salient objects and discovering fine details, respectively. However, most efforts are devoted to low-level feature learning by fusing multi-scale features or enhancing boundary representations. High-level features, which although have long proven effective for many other tasks, yet have been barely studied for SOD. In this paper, we tap into this gap and show that enhancing high-level features is essential for SOD as well. To this end, we introduce an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization. To accomplish better multi-level feature fusion, we construct the Scale-Correlated Pyramid Convolution (SCPC) to build an elegant decoder for recovering object details from the above extreme downsampling. Extensive experiments demonstrate that EDN achieves state-of-the-art performance with real-time speed. Our efficient EDN-Lite also achieves competitive performance with a speed of 316fps. Hence, this work is expected to spark some new thinking in SOD. Code is available at https://github.com/yuhuan-wu/EDN.
Collapse
|
19
|
Huang Z, Li W, Xia XG, Tao R. A General Gaussian Heatmap Label Assignment for Arbitrary-Oriented Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1895-1910. [PMID: 35139019 DOI: 10.1109/tip.2022.3148874] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently, many arbitrary-oriented object detection (AOOD) methods have been proposed and attracted widespread attention in many fields. However, most of them are based on anchor-boxes or standard Gaussian heatmaps. Such label assignment strategy may not only fail to reflect the shape and direction characteristics of arbitrary-oriented objects, but also have high parameter-tuning efforts. In this paper, a novel AOOD method called General Gaussian Heatmap Label Assignment (GGHL) is proposed. Specifically, an anchor-free object-adaptation label assignment (OLA) strategy is presented to define the positive candidates based on two-dimensional (2D) oriented Gaussian heatmaps, which reflect the shape and direction features of arbitrary-oriented objects. Based on OLA, an oriented-bounding-box (OBB) representation component (ORC) is developed to indicate OBBs and adjust the Gaussian center prior weights to fit the characteristics of different objects adaptively through neural network learning. Moreover, a joint-optimization loss (JOL) with area normalization and dynamic confidence weighting is designed to refine the misalign optimal results of different subtasks. Extensive experiments on public datasets demonstrate that the proposed GGHL improves the AOOD performance with low parameter-tuning and time costs. Furthermore, it is generally applicable to most AOOD methods to improve their performance including lightweight models on embedded platforms.
Collapse
|
20
|
Sun Y, Li L, Yao T, Lu T, Zheng B, Yan C, Zhang H, Bao Y, Ding G, Slabaugh G. Bidirectional difference locating and semantic consistency reasoning for change captioning. INT J INTELL SYST 2022. [DOI: 10.1002/int.22821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Yaoqi Sun
- School of Automation, College of Computer Science and Technology Hangzhou Dianzi University Hangzhou China
| | - Liang Li
- Institute of Computing Technology CAS Beijing China
| | - Tingting Yao
- School of Automation, College of Computer Science and Technology Hangzhou Dianzi University Hangzhou China
| | - Tongyv Lu
- School of Automation, College of Computer Science and Technology Hangzhou Dianzi University Hangzhou China
| | - Bolun Zheng
- School of Automation, College of Computer Science and Technology Hangzhou Dianzi University Hangzhou China
| | - Chenggang Yan
- School of Automation, College of Computer Science and Technology Hangzhou Dianzi University Hangzhou China
| | - Hua Zhang
- School of Automation, College of Computer Science and Technology Hangzhou Dianzi University Hangzhou China
| | - Yongjun Bao
- Data and Intelligence Department JD.com Beijing China
| | - Guiguang Ding
- School of Software Tsinghua University Beijing China
| | - Gregory Slabaugh
- Digital Environment Research Institute (DERI) Queen Mary University of London London UK
| |
Collapse
|
21
|
|
22
|
Yang S, Lin W, Lin G, Jiang Q, Liu Z. Progressive Self-Guided Loss for Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8426-8438. [PMID: 34606454 DOI: 10.1109/tip.2021.3113794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a simple yet effective progressive self-guided loss function to facilitate deep learning-based salient object detection (SOD) in images. The saliency maps produced by the most relevant works still suffer from incomplete predictions due to the internal complexity of salient objects. Our proposed progressive self-guided loss simulates a morphological closing operation on the model predictions for progressively creating auxiliary training supervisions to step-wisely guide the training process. We demonstrate that this new loss function can guide the SOD model to highlight more complete salient objects step-by-step and meanwhile help to uncover the spatial dependencies of the salient object pixels in a region growing manner. Moreover, a new feature aggregation module is proposed to capture multi-scale features and aggregate them adaptively by a branch-wise attention mechanism. Benefiting from this module, our SOD framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively. Experimental results on several benchmark datasets show that our loss function not only advances the performance of existing SOD models without architecture modification but also helps our proposed framework to achieve state-of-the-art performance.
Collapse
|
23
|
Song D, Dong Y, Li X. Hierarchical Edge Refinement Network for Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7567-7577. [PMID: 34464260 DOI: 10.1109/tip.2021.3106798] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
At present, most saliency detection methods are based on fully convolutional neural networks (FCNs). However, FCNs usually blur the edges of salient objects. Due to that, the multiple convolution and pooling operations of the FCNs will limit the spatial resolution of the feature maps. To alleviate this issue and obtain accurate edges, we propose a hierarchical edge refinement network (HERNet) for accurate saliency detection. In detail, the HERNet is mainly composed of a saliency prediction network and an edge preserving network. Firstly, the saliency prediction network is used to roughly detect the regions of salient objects and is based on a modified U-Net structure. Then, the edge preserving network is used to accurately detect the edges of salient objects, and this network is mainly composed of the atrous spatial pyramid pooling (ASPP) module. Different from the previous indiscriminate supervision strategy, we adopt a new one-to-one hierarchical supervision strategy to supervise the different outputs of the entire network. Experimental results on five traditional benchmark datasets demonstrate that the proposed HERNet performs well when compared with the state-of-the-art methods.
Collapse
|
24
|
Luo H, Han G, Wu X, Liu P, Yang H, Zhang X. LF3Net: Leader-follower feature fusing network for fast saliency detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
25
|
Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images. REMOTE SENSING 2021. [DOI: 10.3390/rs13112163] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although remarkable progress has been made in salient object detection (SOD) in natural scene images (NSI), the SOD of optical remote sensing images (RSI) still faces significant challenges due to various spatial resolutions, cluttered backgrounds, and complex imaging conditions, mainly for two reasons: (1) accurate location of salient objects; and (2) subtle boundaries of salient objects. This paper explores the inherent properties of multi-level features to develop a novel semantic-guided attention refinement network (SARNet) for SOD of NSI. Specifically, the proposed semantic guided decoder (SGD) roughly but accurately locates the multi-scale object by aggregating multiple high-level features, and then this global semantic information guides the integration of subsequent features in a step-by-step feedback manner to make full use of deep multi-level features. Simultaneously, the proposed parallel attention fusion (PAF) module combines cross-level features and semantic-guided information to refine the object’s boundary and highlight the entire object area gradually. Finally, the proposed network architecture is trained through an end-to-end fully supervised model. Quantitative and qualitative evaluations on two public RSI datasets and additional NSI datasets across five metrics show that our SARNet is superior to 14 state-of-the-art (SOTA) methods without any post-processing.
Collapse
|
26
|
Abstract
Object detection in remote sensing images (RSIs) is one of the basic tasks in the field of remote sensing image automatic interpretation. In recent years, the deep object detection frameworks of natural scene images (NSIs) have been introduced into object detection on RSIs, and the detection performance has improved significantly because of the powerful feature representation. However, there are still many challenges concerning the particularities of remote sensing objects. One of the main challenges is the missed detection of small objects which have less than five percent of the pixels of the big objects. Generally, the existing algorithms choose to deal with this problem by multi-scale feature fusion based on a feature pyramid. However, the benefits of this strategy are limited, considering that the location of small objects in the feature map will disappear when the detection task is processed at the end of the network. In this study, we propose a subtask attention network (StAN), which handles the detection task directly on the shallow layer of the network. First, StAN contains one shared feature branch and two subtask attention branches of a semantic auxiliary subtask and a detection subtask based on the multi-task attention network (MTAN). Second, the detection branch uses only low-level features considering small objects. Third, the attention map guidance mechanism is put forward to optimize the network for keeping the identification ability. Fourth, the multi-dimensional sampling module (MdS), global multi-view channel weights (GMulW) and target-guided pixel attention (TPA) are designed for further improvement of the detection accuracy in complex scenes. The experimental results on the NWPU VHR-10 dataset and DOTA dataset demonstrated that the proposed algorithm achieved the SOTA performance, and the missed detection of small objects decreased. On the other hand, ablation experiments also proved the effects of MdS, GMulW and TPA.
Collapse
|