1
|
Li H, Chen X, Yang W, Huang J, Sun K, Wang Y, Huang A, Mei L. Global Semantic-Sense Aggregation Network for Salient Object Detection in Remote Sensing Images. ENTROPY (BASEL, SWITZERLAND) 2024; 26:445. [PMID: 38920454 PMCID: PMC11203128 DOI: 10.3390/e26060445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 05/16/2024] [Accepted: 05/23/2024] [Indexed: 06/27/2024]
Abstract
Salient object detection (SOD) aims to accurately identify significant geographical objects in remote sensing images (RSI), providing reliable support and guidance for extensive geographical information analyses and decisions. However, SOD in RSI faces numerous challenges, including shadow interference, inter-class feature confusion, as well as unclear target edge contours. Therefore, we designed an effective Global Semantic-aware Aggregation Network (GSANet) to aggregate salient information in RSI. GSANet computes the information entropy of different regions, prioritizing areas with high information entropy as potential target regions, thereby achieving precise localization and semantic understanding of salient objects in remote sensing imagery. Specifically, we proposed a Semantic Detail Embedding Module (SDEM), which explores the potential connections among multi-level features, adaptively fusing shallow texture details with deep semantic features, efficiently aggregating the information entropy of salient regions, enhancing information content of salient targets. Additionally, we proposed a Semantic Perception Fusion Module (SPFM) to analyze map relationships between contextual information and local details, enhancing the perceptual capability for salient objects while suppressing irrelevant information entropy, thereby addressing the semantic dilution issue of salient objects during the up-sampling process. The experimental results on two publicly available datasets, ORSSD and EORSSD, demonstrated the outstanding performance of our method. The method achieved 93.91% Sα, 98.36% Eξ, and 89.37% Fβ on the EORSSD dataset.
Collapse
Affiliation(s)
- Hongli Li
- School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
| | - Xuhui Chen
- School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
| | - Wei Yang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Jian Huang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Kaimin Sun
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
| | - Ying Wang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Andong Huang
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| | - Liye Mei
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China
| |
Collapse
|
2
|
Bao L, Zhou X, Lu X, Sun Y, Yin H, Hu Z, Zhang J, Yan C. Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3212-3226. [PMID: 38687650 DOI: 10.1109/tip.2024.3393365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, namely the visible-depth-thermal (VDT) SOD, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, in this paper, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture, which is equipped with the multi-scale fusion (MSF) module. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art methods with a large margin. Our code and results are available at https://github.com/Lx-Bao/QSFNet.
Collapse
|
3
|
Lv C, Wan B, Zhou X, Sun Y, Zhang J, Yan C. Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection. ENTROPY (BASEL, SWITZERLAND) 2024; 26:130. [PMID: 38392385 PMCID: PMC10888287 DOI: 10.3390/e26020130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/24/2024]
Abstract
RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.
Collapse
Affiliation(s)
- Chengtao Lv
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Bin Wan
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Xiaofei Zhou
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yaoqi Sun
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
- Lishui Institute, Hangzhou Dianzi University, Lishui 323000, China
| | - Jiyong Zhang
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Chenggang Yan
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
4
|
Yang X, Xiao S, Zhang H, Xu L, Wu L, Zhang J, Zhang Y. PE-RASP: range image stitching of photon-efficient imaging through reconstruction, alignment, stitching integration network based on intensity image priors. OPTICS EXPRESS 2024; 32:2817-2838. [PMID: 38297801 DOI: 10.1364/oe.514027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 12/27/2023] [Indexed: 02/02/2024]
Abstract
Single photon imaging integrates advanced single photon detection technology with Laser Radar (LiDAR) technology, offering heightened sensitivity and precise time measurement. This approach finds extensive applications in biological imaging, remote sensing, and non-visual field imaging. Nevertheless, current single photon LiDAR systems encounter challenges such as low spatial resolution and a limited field of view in their intensity and range images due to constraints in the imaging detector hardware. To overcome these challenges, this study introduces a novel deep learning image stitching algorithm tailored for single photon imaging. Leveraging the robust feature extraction capabilities of neural networks and the richer feature information present in intensity images, the algorithm stitches range images based on intensity image priors. This innovative approach significantly enhances the spatial resolution and imaging range of single photon LiDAR systems. Simulation and experimental results demonstrate the effectiveness of the proposed method in generating high-quality stitched single-photon intensity images, and the range images exhibit comparable high quality when stitched with prior information from the intensity images.
Collapse
|
5
|
Xu R, Wang C, Zhang J, Xu S, Meng W, Zhang X. RSSFormer: Foreground Saliency Enhancement for Remote Sensing Land-Cover Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1052-1064. [PMID: 37022079 DOI: 10.1109/tip.2023.3238648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
High spatial resolution (HSR) remote sensing images contain complex foreground-background relationships, which makes the remote sensing land cover segmentation a special semantic segmentation task. The main challenges come from the large-scale variation, complex background samples and imbalanced foreground-background distribution. These issues make recent context modeling methods sub-optimal due to the lack of foreground saliency modeling. To handle these problems, we propose a Remote Sensing Segmentation framework (RSSFormer), including Adaptive TransFormer Fusion Module, Detail-aware Attention Layer and Foreground Saliency Guided Loss. Specifically, from the perspective of relation-based foreground saliency modeling, our Adaptive Transformer Fusion Module can adaptively suppress background noise and enhance object saliency when fusing multi-scale features. Then our Detail-aware Attention Layer extracts the detail and foreground-related information via the interplay of spatial attention and channel attention, which further enhances the foreground saliency. From the perspective of optimization-based foreground saliency modeling, our Foreground Saliency Guided Loss can guide the network to focus on hard samples with low foreground saliency responses to achieve balanced optimization. Experimental results on LoveDA datasets, Vaihingen datasets, Potsdam datasets and iSAID datasets validate that our method outperforms existing general semantic segmentation methods and remote sensing segmentation methods, and achieves a good compromise between computational overhead and accuracy. Our code is available at https://github.com/Rongtao-Xu/RepresentationLearning/tree/main/RSSFormer-TIP2023.
Collapse
|
6
|
Song M, Song W, Yang G, Chen C. Improving RGB-D Salient Object Detection via Modality-Aware Decoder. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6124-6138. [PMID: 36112559 DOI: 10.1109/tip.2022.3205747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.
Collapse
|