1
|
Yue G, Zhuo G, Yan W, Zhou T, Tang C, Yang P, Wang T. Boundary uncertainty aware network for automated polyp segmentation. Neural Netw 2024; 170:390-404. [PMID: 38029720 DOI: 10.1016/j.neunet.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/15/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
Recently, leveraging deep neural networks for automated colorectal polyp segmentation has emerged as a hot topic due to the favored advantages in evading the limitations of visual inspection, e.g., overwork and subjectivity. However, most existing methods do not pay enough attention to the uncertain areas of colonoscopy images and often provide unsatisfactory segmentation performance. In this paper, we propose a novel boundary uncertainty aware network (BUNet) for precise and robust colorectal polyp segmentation. Specifically, considering that polyps vary greatly in size and shape, we first adopt a pyramid vision transformer encoder to learn multi-scale feature representations. Then, a simple yet effective boundary exploration module (BEM) is proposed to explore boundary cues from the low-level features. To make the network focus on the ambiguous area where the prediction score is biased to neither the foreground nor the background, we further introduce a boundary uncertainty aware module (BUM) that explores error-prone regions from the high-level features with the assistance of boundary cues provided by the BEM. Through the top-down hybrid deep supervision, our BUNet implements coarse-to-fine polyp segmentation and finally localizes polyp regions precisely. Extensive experiments on five public datasets show that BUNet is superior to thirteen competing methods in terms of both effectiveness and generalization ability.
Collapse
Affiliation(s)
- Guanghui Yue
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Guibin Zhuo
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Weiqing Yan
- School of Computer and Control Engineering, Yantai University, Yantai 264005, China
| | - Tianwei Zhou
- College of Management, Shenzhen University, Shenzhen 518060, China.
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Peng Yang
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Tianfu Wang
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
2
|
Xu K, Guo J. RGB-D salient object detection via convolutional capsule network based on feature extraction and integration. Sci Rep 2023; 13:17652. [PMID: 37848501 PMCID: PMC10582015 DOI: 10.1038/s41598-023-44698-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/11/2023] [Indexed: 10/19/2023] Open
Abstract
Fully convolutional neural network has shown advantages in the salient object detection by using the RGB or RGB-D images. However, there is an object-part dilemma since most fully convolutional neural network inevitably leads to an incomplete segmentation of the salient object. Although the capsule network is capable of recognizing a complete object, it is highly computational demand and time consuming. In this paper, we propose a novel convolutional capsule network based on feature extraction and integration for dealing with the object-part relationship, with less computation demand. First and foremost, RGB features are extracted and integrated by using the VGG backbone and feature extraction module. Then, these features, integrating with depth images by using feature depth module, are upsampled progressively to produce a feature map. In the next step, the feature map is fed into the feature-integrated convolutional capsule network to explore the object-part relationship. The proposed capsule network extracts object-part information by using convolutional capsules with locally-connected routing and predicts the final salient map based on the deconvolutional capsules. Experimental results on four RGB-D benchmark datasets show that our proposed method outperforms 23 state-of-the-art algorithms.
Collapse
Affiliation(s)
- Kun Xu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300000, People's Republic of China
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250014, People's Republic of China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jichang Guo
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300000, People's Republic of China.
| |
Collapse
|
3
|
Zhang Y, Chen F, Peng Z, Zou W, Zhang C. Exploring Focus and Depth-Induced Saliency Detection for Light Field. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1336. [PMID: 37761635 PMCID: PMC10530224 DOI: 10.3390/e25091336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/30/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023]
Abstract
An abundance of features in the light field has been demonstrated to be useful for saliency detection in complex scenes. However, bottom-up saliency detection models are limited in their ability to explore light field features. In this paper, we propose a light field saliency detection method that focuses on depth-induced saliency, which can more deeply explore the interactions between different cues. First, we localize a rough saliency region based on the compactness of color and depth. Then, the relationships among depth, focus, and salient objects are carefully investigated, and the focus cue of the focal stack is used to highlight the foreground objects. Meanwhile, the depth cue is utilized to refine the coarse salient objects. Furthermore, considering the consistency of color smoothing and depth space, an optimization model referred to as color and depth-induced cellular automata is improved to increase the accuracy of saliency maps. Finally, to avoid interference of redundant information, the mean absolute error is chosen as the indicator of the filter to obtain the best results. The experimental results on three public light field datasets show that the proposed method performs favorably against the state-of-the-art conventional light field saliency detection approaches and even light field saliency detection approaches based on deep learning.
Collapse
Affiliation(s)
- Yani Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| | - Fen Chen
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Zongju Peng
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Wenhui Zou
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Changhe Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| |
Collapse
|
4
|
Cong R, Yang N, Li C, Fu H, Zhao Y, Huang Q, Kwong S. Global-and-Local Collaborative Learning for Co-Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1920-1931. [PMID: 35867373 DOI: 10.1109/tcyb.2022.3169431] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. Therefore, how to effectively extract interimage correspondence is crucial for the CoSOD task. In this article, we propose a global-and-local collaborative learning (GLNet) architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) to capture the comprehensive interimage corresponding relationship among different images from the global and local perspectives. First, we treat different images as different time slices and use 3-D convolution to integrate all intrafeatures intuitively, which can more fully extract the global group semantics. Second, we design a pairwise correlation transformation (PCT) to explore similarity correspondence between pairwise images and combine the multiple local pairwise correspondences to generate the local interimage relationship. Third, the interimage relationships of the GCM and LCM are integrated through a global-and-local correspondence aggregation (GLA) module to explore more comprehensive interimage collaboration cues. Finally, the intra and inter features are adaptively integrated by an intra-and-inter weighting fusion (AEWF) module to learn co-saliency features and predict the co-saliency map. The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms 11 state-of-the-art competitors trained on some large datasets (about 8k-200k images).
Collapse
|
5
|
Piao Y, Jiang Y, Zhang M, Wang J, Lu H. PANet: Patch-Aware Network for Light Field Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:379-391. [PMID: 34406954 DOI: 10.1109/tcyb.2021.3095512] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Most existing light field saliency detection methods have achieved great success by exploiting unique light field data-focus information in focal slices. However, they process light field data in a slicewise way, leading to suboptimal results because the relative contribution of different regions in focal slices is ignored. How we can comprehensively explore and integrate focused saliency regions that would positively contribute to accurate saliency detection. Answering this question inspires us to develop a new insight. In this article, we propose a patch-aware network to explore light field data in a regionwise way. First, we excavate focused salient regions with a proposed multisource learning module (MSLM), which generates a filtering strategy for integration followed by three guidances based on saliency, boundary, and position. Second, we design a sharpness recognition module (SRM) to refine and update this strategy and perform feature integration. With our proposed MSLM and SRM, we can obtain more accurate and complete saliency maps. Comprehensive experiments on three benchmark datasets prove that our proposed method achieves competitive performance over 2-D, 3-D, and 4-D salient object detection methods. The code and results of our method are available at https://github.com/OIPLab-DUT/IEEE-TCYB-PANet.
Collapse
|
6
|
Few-shot learning-based RGB-D salient object detection: A case study. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
7
|
Zong G, Wei L, Guo S, Wang Y. A cascaded refined rgb-d salient object detection network based on the attention mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Learning the cross-modal discriminative feature representation for RGB-T crowd counting. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
9
|
FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
Xu Y, Yu X, Zhang J, Zhu L, Wang D. Weakly Supervised RGB-D Salient Object Detection With Prediction Consistency Training and Active Scribble Boosting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2148-2161. [PMID: 35196231 DOI: 10.1109/tip.2022.3151999] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
RGB-D salient object detection (SOD) has attracted increasingly more attention as it shows more robust results in complex scenes compared with RGB SOD. However, state-of-the-art RGB-D SOD approaches heavily rely on a large amount of pixel-wise annotated data for training. Such densely labeled annotations are often labor-intensive and costly. To reduce the annotation burden, we investigate RGB-D SOD from a weakly supervised perspective. More specifically, we use annotator-friendly scribble annotations as supervision signals for model training. Since scribble annotations are much sparser compared to ground-truth masks, some critical object structure information might be neglected. To preserve such structure information, we explicitly exploit the complementary edge information from two modalities (i.e., RGB and depth). Specifically, we leverage the dual-modal edge guidance and introduce a new network architecture with a dual-edge detection module and a modality-aware feature fusion module. In order to use the useful information of unlabeled pixels, we introduce a prediction consistency training scheme by comparing the predictions of two networks optimized by different strategies. Moreover, we develop an active scribble boosting strategy to provide extra supervision signals with negligible annotation cost, leading to significant SOD performance improvement. Extensive experiments on seven benchmarks validate the superiority of our proposed method. Remarkably, the proposed method with scribble annotations achieves competitive performance in comparison to fully supervised state-of-the-art methods.
Collapse
|
11
|
Wang X, Zhu L, Tang S, Fu H, Li P, Wu F, Yang Y, Zhuang Y. Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1107-1119. [PMID: 34990359 DOI: 10.1109/tip.2021.3139232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.
Collapse
|
12
|
Unsupervised RGB-T saliency detection by node classification distance and sparse constrained graph learning. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02434-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
13
|
|
14
|
Zhai Y, Fan DP, Yang J, Borji A, Shao L, Han J, Wang L. Bifurcated Backbone Strategy for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8727-8742. [PMID: 34613915 DOI: 10.1109/tip.2021.3116793] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel Bifurcated Backbone Strategy Network (BBS-Net). Our architecture, is simple, efficient, and backbone-independent. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Extensive experiments show that BBS-Net significantly outperforms 18 state-of-the-art (SOTA) models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach (~4% improvement in S-measure vs . the top-ranked model: DMRA). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. The complete algorithm, benchmark results, and post-processing toolbox are publicly available at https://github.com/zyjwuyan/BBS-Net.
Collapse
|
15
|
Yang S, Lin W, Lin G, Jiang Q, Liu Z. Progressive Self-Guided Loss for Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8426-8438. [PMID: 34606454 DOI: 10.1109/tip.2021.3113794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a simple yet effective progressive self-guided loss function to facilitate deep learning-based salient object detection (SOD) in images. The saliency maps produced by the most relevant works still suffer from incomplete predictions due to the internal complexity of salient objects. Our proposed progressive self-guided loss simulates a morphological closing operation on the model predictions for progressively creating auxiliary training supervisions to step-wisely guide the training process. We demonstrate that this new loss function can guide the SOD model to highlight more complete salient objects step-by-step and meanwhile help to uncover the spatial dependencies of the salient object pixels in a region growing manner. Moreover, a new feature aggregation module is proposed to capture multi-scale features and aggregate them adaptively by a branch-wise attention mechanism. Benefiting from this module, our SOD framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively. Experimental results on several benchmark datasets show that our loss function not only advances the performance of existing SOD models without architecture modification but also helps our proposed framework to achieve state-of-the-art performance.
Collapse
|
16
|
Song D, Dong Y, Li X. Hierarchical Edge Refinement Network for Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7567-7577. [PMID: 34464260 DOI: 10.1109/tip.2021.3106798] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
At present, most saliency detection methods are based on fully convolutional neural networks (FCNs). However, FCNs usually blur the edges of salient objects. Due to that, the multiple convolution and pooling operations of the FCNs will limit the spatial resolution of the feature maps. To alleviate this issue and obtain accurate edges, we propose a hierarchical edge refinement network (HERNet) for accurate saliency detection. In detail, the HERNet is mainly composed of a saliency prediction network and an edge preserving network. Firstly, the saliency prediction network is used to roughly detect the regions of salient objects and is based on a modified U-Net structure. Then, the edge preserving network is used to accurately detect the edges of salient objects, and this network is mainly composed of the atrous spatial pyramid pooling (ASPP) module. Different from the previous indiscriminate supervision strategy, we adopt a new one-to-one hierarchical supervision strategy to supervise the different outputs of the entire network. Experimental results on five traditional benchmark datasets demonstrate that the proposed HERNet performs well when compared with the state-of-the-art methods.
Collapse
|
17
|
Zhu X, Li Y, Fu H, Fan X, Shi Y, Lei J. RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.05.110] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
Chen Z, Cong R, Xu Q, Huang Q. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7012-7024. [PMID: 33141667 DOI: 10.1109/tip.2020.3028289] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
There are two main issues in RGB-D salient object detection: (1) how to effectively integrate the complementarity from the cross-modal RGB-D data; (2) how to prevent the contamination effect from the unreliable depth map. In fact, these two problems are linked and intertwined, but the previous methods tend to focus only on the first problem and ignore the consideration of depth map quality, which may yield the model fall into the sub-optimal state. In this paper, we address these two issues in a holistic model synergistically, and propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity. By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner, and guide the fusion process of two modal data to prevent the contamination occurred. The gated multi-modality attention module in the fusion process exploits the attention mechanism with a gate controller to capture long-range dependencies from a cross-modal perspective. Experimental results compared with 16 state-of-the-art methods on 8 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively. https://github.com/JosephChenHub/DPANet.
Collapse
|
19
|
Luo H, Han G, Wu X, Liu P, Yang H, Zhang X. LF3Net: Leader-follower feature fusing network for fast saliency detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
20
|
Zou W, Zhuo S, Tang Y, Tian S, Li X, Xu C. STA3D: Spatiotemporally attentive 3D network for video saliency prediction. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.04.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
21
|
Stereo superpixel: An iterative framework based on parallax consistency and collaborative optimization. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
22
|
Fu K, Fan DP, Ji GP, Zhao Q, Shen J, Zhu C. Siamese Network for RGB-D Salient Object Detection and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; PP:1-1. [PMID: 33861691 DOI: 10.1109/tpami.2021.3073689] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using 5 popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the SOTAs by an average of ~2.0% (F-measure) across 7 challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T SOD and video SOD, achieving comparable or better performance.
Collapse
|
23
|
Wang X, Li S, Chen C, Hao A, Qin H. Depth quality-aware selective saliency fusion for RGB-D image salient object detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
24
|
Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H. Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3528-3542. [PMID: 33667161 DOI: 10.1109/tip.2021.3062689] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing RGB-D Salient Object Detection (SOD) methods take advantage of depth cues to improve the detection accuracy, while pay insufficient attention to the quality of depth information. In practice, a depth map is often with uneven quality and sometimes suffers from distractors, due to various factors in the acquisition procedure. In this article, to mitigate distractors in depth maps and highlight salient objects in RGB images, we propose a Hierarchical Alternate Interactions Network (HAINet) for RGB-D SOD. Specifically, HAINet consists of three key stages: feature encoding, cross-modal alternate interaction, and saliency reasoning. The main innovation in HAINet is the Hierarchical Alternate Interaction Module (HAIM), which plays a key role in the second stage for cross-modal feature interaction. HAIM first uses RGB features to filter distractors in depth features, and then the purified depth features are exploited to enhance RGB features in turn. The alternate RGB-depth-RGB interaction proceeds in a hierarchical manner, which progressively integrates local and global contexts within a single feature scale. In addition, we adopt a hybrid loss function to facilitate the training of HAINet. Extensive experiments on seven datasets demonstrate that our HAINet not only achieves competitive performance as compared with 19 relevant state-of-the-art methods, but also reaches a real-time processing speed of 43 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/HAINet.
Collapse
|
25
|
Li P, Xing X, Xu X, Cai B, Cheng J. Attention-aware concentrated network for saliency prediction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.083] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
26
|
Ding Y, Ma Z, Wen S, Xie J, Chang D, Si Z, Wu M, Ling H. AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2826-2836. [PMID: 33556008 DOI: 10.1109/tip.2021.3055617] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Classifying the sub-categories of an object from the same super-category (e.g., bird species and cars) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this article, by contrast, we show that by integrating low-level information (e.g., color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of 1) a dual pathway hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, hence learning both high-level semantic and low-level detailed feature representation, and 2) an ROI-guided refinement strategy with ROI-guided dropblock and ROI-guided zoom-in operation, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of any additional bounding box/part annotation. Extensive experiments on three popularly tested FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach achieves state-of-the-art performance. Models and code are available at https://github.com/PRIS-CV/AP-CNN_Pytorch-master.
Collapse
|
27
|
Chen C, Wei J, Peng C, Qin H. Depth-Quality-Aware Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2350-2363. [PMID: 33481710 DOI: 10.1109/tip.2021.3052069] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The existing fusion-based RGB-D salient object detection methods usually adopt the bistream structure to strike a balance in the fusion trade-off between RGB and depth (D). While the D quality usually varies among the scenes, the state-of-the-art bistream approaches are depth-quality-unaware, resulting in substantial difficulties in achieving complementary fusion status between RGB and D and leading to poor fusion results for low-quality D. Thus, this paper attempts to integrate a novel depth-quality-aware subnet into the classic bistream structure in order to assess the depth quality prior to conducting the selective RGB-D fusion. Compared to the SOTA bistream methods, the major advantage of our method is its ability to lessen the importance of the low-quality, no-contribution, or even negative-contribution D regions during RGB-D fusion, achieving a much improved complementary status between RGB and D. Our source code and data are available online at https://github.com/qdu1995/DQSD.
Collapse
|
28
|
Wu J, Han G, Liu P, Yang H, Luo H, Li Q. Saliency Detection with Bilateral Absorbing Markov Chain Guided by Depth Information. SENSORS 2021; 21:s21030838. [PMID: 33513849 PMCID: PMC7865590 DOI: 10.3390/s21030838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/19/2021] [Accepted: 01/22/2021] [Indexed: 11/23/2022]
Abstract
The effectiveness of depth information in saliency detection has been fully proved. However, it is still worth exploring how to utilize the depth information more efficiently. Erroneous depth information may cause detection failure, while non-salient objects may be closer to the camera which also leads to erroneously emphasis on non-salient regions. Moreover, most of the existing RGB-D saliency detection models have poor robustness when the salient object touches the image boundaries. To mitigate these problems, we propose a multi-stage saliency detection model with the bilateral absorbing Markov chain guided by depth information. The proposed model progressively extracts the saliency cues with three level (low-, mid-, and high-level) stages. First, we generate low-level saliency cues by explicitly combining color and depth information. Then, we design a bilateral absorbing Markov chain to calculate mid-level saliency maps. In mid-level, to suppress boundary touch problem, we present the background seed screening mechanism (BSSM) for improving the construction of the two-layer sparse graph and better selecting background-based absorbing nodes. Furthermore, the cross-modal multi-graph learning model (CMLM) is designed to fully explore the intrinsic complementary relationship between color and depth information. Finally, to obtain a more highlighted and homogeneous saliency map in high-level, we structure a depth-guided optimization module which combines cellular automata and suppression-enhancement function pair. This optimization module refines the saliency map in color space and depth space, respectively. Comprehensive experiments on three challenging benchmark datasets demonstrate the effectiveness of our proposed method both qualitatively and quantitatively.
Collapse
Affiliation(s)
- Jiajia Wu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangliang Han
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- Correspondence:
| | - Peixun Liu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
| | - Hang Yang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
| | - Huiyuan Luo
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qingqing Li
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
29
|
Ju M, Ding C, Ren W, Yang Y, Zhang D, Guo YJ. IDE: Image Dehazing and Exposure Using an Enhanced Atmospheric Scattering Model. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2180-2192. [PMID: 33476267 DOI: 10.1109/tip.2021.3050643] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Atmospheric scattering model (ASM) is one of the most widely used model to describe the imaging processing of hazy images. However, we found that ASM has an intrinsic limitation which leads to a dim effect in the recovered results. In this paper, by introducing a new parameter, i.e., light absorption coefficient, into ASM, an enhanced ASM (EASM) is attained, which can address the dim effect and better model outdoor hazy scenes. Relying on this EASM, a simple yet effective gray-world-assumption-based technique called IDE is then developed to enhance the visibility of hazy images. Experimental results show that IDE eliminates the dim effect and exhibits excellent dehazing performance. It is worth mentioning that IDE does not require any training process or extra information related to scene depth, which makes it very fast and robust. Moreover, the global stretch strategy used in IDE can effectively avoid some undesirable effects in recovery results, e.g., over-enhancement, over-saturation, and mist residue, etc. Comparison between the proposed IDE and other state-of-the-art techniques reveals the superiority of IDE in terms of both dehazing quality and efficiency over all the comparable techniques.
Collapse
|
30
|
Zhang Z, Lin Z, Xu J, Jin WD, Lu SP, Fan DP. Bilateral Attention Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1949-1961. [PMID: 33439842 DOI: 10.1109/tip.2021.3049959] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RGB-D salient object detection (SOD) aims to segment the most attractive objects in a pair of cross-modal RGB and depth images. Currently, most existing RGB-D SOD methods focus on the foreground region when utilizing the depth images. However, the background also provides important information in traditional SOD methods for promising performance. To better explore salient information in both foreground and background regions, this paper proposes a Bilateral Attention Network (BiANet) for the RGB-D SOD task. Specifically, we introduce a Bilateral Attention Module (BAM) with a complementary attention mechanism: foreground-first (FF) attention and background-first (BF) attention. The FF attention focuses on the foreground region with a gradual refinement style, while the BF one recovers potentially useful salient information in the background region. Benefited from the proposed BAM module, our BiANet can capture more meaningful foreground and background cues, and shift more attention to refining the uncertain details between foreground and background regions. Additionally, we extend our BAM by leveraging the multi-scale techniques for better SOD performance. Extensive experiments on six benchmark datasets demonstrate that our BiANet outperforms other state-of-the-art RGB-D SOD methods in terms of objective metrics and subjective visual comparison. Our BiANet can run up to 80 fps on 224×224 RGB-D images, with an NVIDIA GeForce RTX 2080Ti GPU. Comprehensive ablation studies also validate our contributions.
Collapse
|
31
|
Zhou T, Fan DP, Cheng MM, Shen J, Shao L. RGB-D salient object detection: A survey. COMPUTATIONAL VISUAL MEDIA 2021; 7:37-69. [PMID: 33432275 PMCID: PMC7788385 DOI: 10.1007/s41095-020-0199-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/07/2020] [Indexed: 06/12/2023]
Abstract
Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
Collapse
Affiliation(s)
- Tao Zhou
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Deng-Ping Fan
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | | | - Jianbing Shen
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Ling Shao
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| |
Collapse
|
32
|
Li C, Cong R, Kwong S, Hou J, Fu H, Zhu G, Zhang D, Huang Q. ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:88-100. [PMID: 32078571 DOI: 10.1109/tcyb.2020.2969255] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Salient object detection from RGB-D images is an important yet challenging vision task, which aims at detecting the most distinctive objects in a scene by combining color information and depth constraints. Unlike prior fusion manners, we propose an attention steered interweave fusion network (ASIF-Net) to detect salient objects, which progressively integrates cross-modal and cross-level complementarity from the RGB image and corresponding depth map via steering of an attention mechanism. Specifically, the complementary features from RGB-D images are jointly extracted and hierarchically fused in a dense and interweaved manner. Such a manner breaks down the barriers of inconsistency existing in the cross-modal data and also sufficiently captures the complementarity. Meanwhile, an attention mechanism is introduced to locate the potential salient regions in an attention-weighted fashion, which advances in highlighting the salient objects and suppressing the cluttered background regions. Instead of focusing only on pixelwise saliency, we also ensure that the detected salient objects have the objectness characteristics (e.g., complete structure and sharp boundary) by incorporating the adversarial learning that provides a global semantic constraint for RGB-D salient object detection. Quantitative and qualitative experiments demonstrate that the proposed method performs favorably against 17 state-of-the-art saliency detectors on four publicly available RGB-D salient object detection datasets. The code and results of our method are available at https://github.com/Li-Chongyi/ASIF-Net.
Collapse
|
33
|
Zhang Q, Cong R, Li C, Cheng MM, Fang Y, Cao X, Zhao Y, Kwong S. Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:1305-1317. [PMID: 33306467 DOI: 10.1109/tip.2020.3042084] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Despite the remarkable advances in visual saliency analysis for natural scene images (NSIs), salient object detection (SOD) for optical remote sensing images (RSIs) still remains an open and challenging problem. In this paper, we propose an end-to-end Dense Attention Fluid Network (DAFNet) for SOD in optical RSIs. A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships, and is further embedded in a Dense Attention Fluid (DAF) structure that enables shallow attention cues flow into deep layers to guide the generation of high-level feature attention maps. Specifically, the GCA module is composed of two key components, where the global feature aggregation module achieves mutual reinforcement of salient feature embeddings from any two spatial locations, and the cascaded pyramid attention module tackles the scale variation issue by building up a cascaded pyramid framework to progressively refine the attention map in a coarse-to-fine manner. In addition, we construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations, which is currently the largest publicly available benchmark. Extensive experiments demonstrate that our proposed DAFNet significantly outperforms the existing state-of-the-art SOD competitors. https://github.com/rmcong/DAFNet_TIP20.
Collapse
|
34
|
Li C, Cong R, Guo C, Li H, Zhang C, Zheng F, Zhao Y. A parallel down-up fusion network for salient object detection in optical remote sensing images. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.108] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
35
|
Chen H, Deng Y, Li Y, Hung TY, Lin G. RGBD Salient Object Detection via Disentangled Cross-modal Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8407-8416. [PMID: 32784141 DOI: 10.1109/tip.2020.3014734] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Depth is beneficial for salient object detection (SOD) for its additional saliency cues. Existing RGBD SOD methods focus on tailoring complicated cross-modal fusion topologies, which although achieve encouraging performance, are with a high risk of over-fitting and ambiguous in studying cross-modal complementarity. Different from these conventional approaches combining cross-modal features entirely without differentiating, we concentrate our attention on decoupling the diverse cross-modal complements to simplify the fusion process and enhance the fusion sufficiency. We argue that if cross-modal heterogeneous representations can be disentangled explicitly, the cross-modal fusion process can hold less uncertainty, while enjoying better adaptability. To this end, we design a disentangled cross-modal fusion network to expose structural and content representations from both modalities by cross-modal reconstruction. For different scenes, the disentangled representations allow the fusion module to easily identify, and incorporate desired complements for informative multi-modal fusion. Extensive experiments show the effectiveness of our designs and a large outperformance over state-of-the-art methods.
Collapse
|
36
|
Zhang M, Ji W, Piao Y, Li J, Zhang Y, Xu S, Lu H. LFNet: Light Field Fusion Network for Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6276-6287. [PMID: 32365027 DOI: 10.1109/tip.2020.2990341] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this work, we propose a novel light field fusion network-LFNet, a CNNs-based light field saliency model using 4D light field data containing abundant spatial and contextual information. The proposed method can reliably locate and identify salient objects even in a complex scene. Our LFNet contains a light field refinement module (LFRM) and a light field integration module (LFIM) which can fully refine and integrate focusness, depths and objectness cues from light field image. The LFRM learns the light field residual between light field and RGB images for refining features with useful light field cues, and then the LFIM weights each refined light field feature and learns spatial correlation between them to predict saliency maps. Our method can take full advantage of light field information and achieve excellent performance especially in complex scenes, e.g., similar foreground and background, multiple or transparent objects and low-contrast environment. Experiments show our method outperforms the state-of-the-art 2D, 3D and 4D methods across three light field datasets.
Collapse
|