1
|
Pei J, Jiang T, Tang H, Liu N, Jin Y, Fan DP, Heng PA. CalibNet: Dual-Branch Cross-Modal Calibration for RGB-D Salient Instance Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4348-4362. [PMID: 39074016 DOI: 10.1109/tip.2024.3432328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
In this study, we propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320×480 input size on the COME15K-E test set, which significantly surpasses the alternative frameworks. Our code and dataset will be publicly available at: https://github.com/PJLallen/CalibNet.
Collapse
|
2
|
Bao L, Zhou X, Lu X, Sun Y, Yin H, Hu Z, Zhang J, Yan C. Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3212-3226. [PMID: 38687650 DOI: 10.1109/tip.2024.3393365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, namely the visible-depth-thermal (VDT) SOD, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, in this paper, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture, which is equipped with the multi-scale fusion (MSF) module. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art methods with a large margin. Our code and results are available at https://github.com/Lx-Bao/QSFNet.
Collapse
|
3
|
Huo D, Wang J, Qian Y, Yang YH. Glass Segmentation with RGB-Thermal Image Pairs. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1911-1926. [PMID: 37030759 DOI: 10.1109/tip.2023.3256762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This paper proposes a new glass segmentation method utilizing paired RGB and thermal images. Due to the large difference between the transmission property of visible light and that of the thermal energy through the glass where most glass is transparent to the visible light but opaque to thermal energy, glass regions of a scene are made more distinguishable with a pair of RGB and thermal images than solely with an RGB image. To exploit such a unique property, we propose a neural network architecture that effectively combines an RGB-thermal image pair with a new multi-modal fusion module based on attention, and integrate CNN and transformer to extract local features and non-local dependencies, respectively. As well, we have collected a new dataset containing 5551 RGB-thermal image pairs with ground-truth segmentation annotations. The qualitative and quantitative evaluations demonstrate the effectiveness of the proposed approach on fusing RGB and thermal data for glass segmentation. Our code and data are available at https://github.com/Dong-Huo/RGB-T-Glass-Segmentation.
Collapse
|
4
|
Cong R, Yang N, Li C, Fu H, Zhao Y, Huang Q, Kwong S. Global-and-Local Collaborative Learning for Co-Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1920-1931. [PMID: 35867373 DOI: 10.1109/tcyb.2022.3169431] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. Therefore, how to effectively extract interimage correspondence is crucial for the CoSOD task. In this article, we propose a global-and-local collaborative learning (GLNet) architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) to capture the comprehensive interimage corresponding relationship among different images from the global and local perspectives. First, we treat different images as different time slices and use 3-D convolution to integrate all intrafeatures intuitively, which can more fully extract the global group semantics. Second, we design a pairwise correlation transformation (PCT) to explore similarity correspondence between pairwise images and combine the multiple local pairwise correspondences to generate the local interimage relationship. Third, the interimage relationships of the GCM and LCM are integrated through a global-and-local correspondence aggregation (GLA) module to explore more comprehensive interimage collaboration cues. Finally, the intra and inter features are adaptively integrated by an intra-and-inter weighting fusion (AEWF) module to learn co-saliency features and predict the co-saliency map. The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms 11 state-of-the-art competitors trained on some large datasets (about 8k-200k images).
Collapse
|
5
|
Zhou W, Zhu Y, Lei J, Yang R, Yu L. LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1329-1340. [PMID: 37022901 DOI: 10.1109/tip.2023.3242775] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most recent methods for RGB (red-green-blue)-thermal salient object detection (SOD) involve several floating-point operations and have numerous parameters, resulting in slow inference, especially on common processors, and impeding their deployment on mobile devices for practical applications. To address these problems, we propose a lightweight spatial boosting network (LSNet) for efficient RGB-thermal SOD with a lightweight MobileNetV2 backbone to replace a conventional backbone (e.g., VGG, ResNet). To improve feature extraction using a lightweight backbone, we propose a boundary boosting algorithm that optimizes the predicted saliency maps and reduces information collapse in low-dimensional features. The algorithm generates boundary maps based on predicted saliency maps without incurring additional calculations or complexity. As multimodality processing is essential for high-performance SOD, we adopt attentive feature distillation and selection and propose semantic and geometric transfer learning to enhance the backbone without increasing the complexity during testing. Experimental results demonstrate that the proposed LSNet achieves state-of-the-art performance compared with 14 RGB-thermal SOD methods on three datasets while improving the numbers of floating-point operations (1.025G) and parameters (5.39M), model size (22.1 MB), and inference speed (9.95 fps for PyTorch, batch size of 1, and Intel i5-7500 processor; 93.53 fps for PyTorch, batch size of 1, and NVIDIA TITAN V graphics processor; 936.68 fps for PyTorch, batch size of 20, and graphics processor; 538.01 fps for TensorRT and batch size of 1; and 903.01 fps for TensorRT/FP16 and batch size of 1). The code and results can be found from the link of https://github.com/zyrant/LSNet.
Collapse
|
6
|
Wen H, Song K, Huang L, Wang H, Yan Y. Cross-modality salient object detection network with universality and anti-interference. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
7
|
MENet: Lightweight Multimodality Enhancement Network for Detecting Salient Objects in RGB-Thermal Images. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
8
|
Chen T, Xiao J, Hu X, Zhang G, Wang S. Adaptive Fusion Network For RGB-D Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
9
|
Li F, Tang S, Chen Y, Zou H. Deep attentive convolutional neural network for automatic grading of imbalanced diabetic retinopathy in retinal fundus images. BIOMEDICAL OPTICS EXPRESS 2022; 13:5813-5835. [PMID: 36733744 PMCID: PMC9872872 DOI: 10.1364/boe.472176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/25/2022] [Accepted: 10/06/2022] [Indexed: 06/18/2023]
Abstract
Automated fine-grained diabetic retinopathy (DR) grading was of great significance for assisting ophthalmologists in monitoring DR and designing tailored treatments for patients. Nevertheless, it is a challenging task as a result of high intra-class variations, high inter-class similarities, small lesions, and imbalanced data distributions. The pivotal factor for the success in fine-grained DR grading is to discern more subtle associated lesion features, such as microaneurysms (MA), Hemorrhages (HM), soft exudates (SE), and hard exudates (HE). In this paper, we constructed a simple yet effective deep attentive convolutional neural network (DACNN) for DR grading and lesion discovery with only image-wise supervision. Designed as a top-down architecture, our model incorporated stochastic atrous spatial pyramid pooling (sASPP), global attention mechanism (GAM), category attention mechanism (CAM), and learnable connected module (LCM) to better extract lesion-related features and maximize the DR grading performance. To be concrete, we devised sASPP combining randomness with atrous spatial pyramid pooling (ASPP) to accommodate the various scales of the lesions and struggle against the co-adaptation of multiple atrous convolutions. Then, GAM was introduced to extract class-agnostic global attention feature details, whilst CAM was explored for seeking class-specific distinctive region-level lesion feature information and regarding each DR severity grade in an equal way, which tackled the problem of imbalance DR data distributions. Further, the LCM was designed to automatically and adaptively search the optimal connections among layers for better extracting detailed small lesion feature representations. The proposed approach obtained high accuracy of 88.0% and kappa score of 88.6% for multi-class DR grading task on the EyePACS dataset, respectively, while 98.5% AUC, 93.8% accuracy, 87.9% kappa, 90.7% recall, 94.6% precision, and 92.6% F1-score for referral and non-referral classification on the Messidor dataset. Extensive experimental results on three challenging benchmarks demonstrated that the proposed approach achieved competitive performance in DR grading and lesion discovery using retinal fundus images compared with existing cutting-edge methods, and had good generalization capacity for unseen DR datasets. These promising results highlighted its potential as an efficient and reliable tool to assist ophthalmologists in large-scale DR screening.
Collapse
Affiliation(s)
- Feng Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Shiqing Tang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yuyang Chen
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Haidong Zou
- Shanghai Eye Disease Prevention & Treatment Center, Shanghai 200040, China
- Ophthalmology Center, Shanghai General Hospital, Shanghai 200080, China
| |
Collapse
|
10
|
Transformers and CNNs Fusion Network for Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.10.081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Zong G, Wei L, Guo S, Wang Y. A cascaded refined rgb-d salient object detection network based on the attention mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
12
|
Song M, Song W, Yang G, Chen C. Improving RGB-D Salient Object Detection via Modality-Aware Decoder. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6124-6138. [PMID: 36112559 DOI: 10.1109/tip.2022.3205747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.
Collapse
|
13
|
Wang Z, Wang P, Han Y, Zhang X, Sun M, Tian Q. Curiosity-Driven Salient Object Detection With Fragment Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5989-6001. [PMID: 36099213 DOI: 10.1109/tip.2022.3203605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recent deep learning based salient object detection methods with attention mechanisms have made great success. However, existing attention mechanisms can be generally separated into two categories. One part chooses to calculate weights indiscriminately, which yields computational redundancy. While one part focuses randomly on a small part of the images, such as hard attention, resulting in incorrectness owing to insufficiently targeted selection of a subset of tokens. To alleviate these problems, we design a Curiosity-driven Network (CNet) and a Curiosity-driven Learning Algorithm (CLA) based on fragment attention (FA) mechanism newly defined in this paper. FA imitates the process of cognition perception driven by human curiosity, and divides the degree of curiosity into three levels, i.e. curious, a little curious and not curious. These three levels correspond to five saliency degrees, including salient and non-salient, likewise salient and likewise non-salient, completely uncertain. With more knowledge gained by the network, CLA transforms the curiosity degree of each pixel to yield enhanced detail-enriched saliency maps. In order to extract more context-aware information of potential salient objects and make a better foundation for CLA, a high-level feature extraction module (HFEM) is further proposed. Based on the much better high-level features extracted by HFEM, FA can classify the curiosity degree for each pixel more reasonably and accurately. Extensive experiments on five popular datasets clearly demonstrate that our method outperforms the state-of-the-art approaches without any pre-processing operations or post-processing operations.
Collapse
|
14
|
SL-Net: self-learning and mutual attention-based distinguished window for RGBD complex salient object detection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07772-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Sun P, Zhang W, Li S, Guo Y, Song C, Li X. Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01646-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
16
|
Zhu J, Zhang X, Fang X, Rahman MRU, Dong F, Li Y, Yan S, Tan P. Boosting RGB-D salient object detection with adaptively cooperative dynamic fusion network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
17
|
Decoupled Dynamic Group Equivariant Filter for Saliency Prediction on Omnidirectional Image. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
18
|
Modal complementary fusion network for RGB-T salient object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03950-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Hidden Dangerous Object Recognition in Terahertz Images Using Deep Learning Methods. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
As a harmless detection method, terahertz has become a new trend in security detection. However, there are inherent problems such as the low quality of the images collected by terahertz equipment and the insufficient detection accuracy of dangerous goods. This work advances BiFPN at the neck of YOLOv5 of the deep learning model as a mechanism to improve low resolution. We also perform transfer learning, thereby fine-tuning the pre-training weight of the backbone for migration learning in our model. Results from experimental analysis reveal that mAP@0.5 and mAP@0.5:0.95 values witness a percentage increase of 0.2% and 1.7%, respectively, attesting to the superiority of the proposed model to YOLOv5, which is the state-of-the-art model in object detection.
Collapse
|
20
|
Zhang N, Han J, Liu N. Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4556-4570. [PMID: 35763477 DOI: 10.1109/tip.2022.3185550] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
RGB-D co-salient object detection aims to segment co-occurring salient objects when given a group of relevant images and depth maps. Previous methods often adopt separate pipeline and use hand-crafted features, being hard to capture the patterns of co-occurring salient objects and leading to unsatisfactory results. Using end-to-end CNN models is a straightforward idea, but they are less effective in exploiting global cues due to the intrinsic limitation. Thus, in this paper, we alternatively propose an end-to-end transformer-based model which uses class tokens to explicitly capture implicit class knowledge to perform RGB-D co-salient object detection, denoted as CTNet. Specifically, we first design adaptive class tokens for individual images to explore intra-saliency cues and then develop common class tokens for the whole group to explore inter-saliency cues. Besides, we also leverage the complementary cues between RGB images and depth maps to promote the learning of the above two types of class tokens. In addition, to promote model evaluation, we construct a challenging and large-scale benchmark dataset, named RGBD CoSal1k, which collects 106 groups containing 1000 pairs of RGB-D images with complex scenarios and diverse appearances. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
21
|
A2TPNet: Alternate Steered Attention and Trapezoidal Pyramid Fusion Network for RGB-D Salient Object Detection. ELECTRONICS 2022. [DOI: 10.3390/electronics11131968] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
RGB-D salient object detection (SOD) aims at locating the most eye-catching object in visual input by fusing complementary information of RGB modality and depth modality. Most of the existing RGB-D SOD methods integrate multi-modal features to generate the saliency map indiscriminately, ignoring the ambiguity between different modalities. To better use multi-modal complementary information and alleviate the negative impact of ambiguity among different modalities, this paper proposes a novel Alternate Steered Attention and Trapezoidal Pyramid Fusion Network (A2TPNet) for RGB-D SOD composed of Cross-modal Alternate Fusion Module (CAFM) and Trapezoidal Pyramid Fusion Module (TPFM). CAFM is focused on fusing cross-modal features, taking full consideration of the ambiguity between cross-modal data by an Alternate Steered Attention (ASA), and it reduces the interference of redundant information and non-salient features in the interactive process through a collaboration mechanism containing channel attention and spatial attention. TPFM endows the RGB-D SOD model with more powerful feature expression capabilities by combining multi-scale features to enhance the expressive ability of contextual semantics of the model. Extensive experimental results on five publicly available datasets demonstrate that the proposed model consistently outperforms 17 state-of-the-art methods.
Collapse
|
22
|
Automated Detection of Greenhouse Structures Using Cascade Mask R-CNN. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Automated detection of the content of images remains a challenging problem in artificial intelligence. Hence, continuous manual monitoring of restricted development zones is critical to maintaining territorial integrity and national security. In this regard, local governments of the Republic of Korea conduct four periodic inspections per year to preserve national territories from illegal encroachments and unauthorized developments in restricted zones. The considerable expense makes responding to illegal developments difficult for local governments. To address this challenge, we propose a deep-learning-based Cascade Mask region-based convolutional neural network (R-CNN) algorithm designed to perform automated detection of greenhouses in aerial photographs for efficient and continuous monitoring of restricted development zones in the Republic of Korea. Our proposed model is regional-based because it was optimized for the Republic of Korea via transfer learning and hyperparameter tuning, which improved the efficiency of the automated detection of greenhouse facilities. The experimental results demonstrated that the mAP value of the proposed Cascade Mask R-CNN model was 83.6, which was 12.83 higher than baseline mask R-CNN, and 0.9 higher than Mask R-CNN with hyperparameter tuning and transfer learning considered. Similarly, the F1-score of the proposed Cascade Mask R-CNN model was 62.07, which outperformed those of the baseline mask R-CNN and the Mask R-CNN with hyperparameter tuning and transfer learning considered (i.e., the F1-score 52.33 and 59.13, respectively). The proposed improved Cascade Mask R-CNN model is expected to facilitate efficient and continuous monitoring of restricted development zones through routine screening procedures. Moreover, this work provides a baseline for developing an integrated management system for national-scale land-use planning and development infrastructure by synergizing geographical information systems, remote sensing, and deep learning models.
Collapse
|
23
|
Wu YH, Liu Y, Zhang L, Cheng MM, Ren B. EDN: Salient Object Detection via Extremely-Downsampled Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3125-3136. [PMID: 35412981 DOI: 10.1109/tip.2022.3164550] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent progress on salient object detection (SOD) mainly benefits from multi-scale learning, where the high-level and low-level features collaborate in locating salient objects and discovering fine details, respectively. However, most efforts are devoted to low-level feature learning by fusing multi-scale features or enhancing boundary representations. High-level features, which although have long proven effective for many other tasks, yet have been barely studied for SOD. In this paper, we tap into this gap and show that enhancing high-level features is essential for SOD as well. To this end, we introduce an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization. To accomplish better multi-level feature fusion, we construct the Scale-Correlated Pyramid Convolution (SCPC) to build an elegant decoder for recovering object details from the above extreme downsampling. Extensive experiments demonstrate that EDN achieves state-of-the-art performance with real-time speed. Our efficient EDN-Lite also achieves competitive performance with a speed of 316fps. Hence, this work is expected to spark some new thinking in SOD. Code is available at https://github.com/yuhuan-wu/EDN.
Collapse
|
24
|
Ji W, Yan G, Li J, Piao Y, Yao S, Zhang M, Cheng L, Lu H. DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2321-2336. [PMID: 35245195 DOI: 10.1109/tip.2022.3154931] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work, we propose a novel depth-induced multi-scale recurrent attention network for RGB-D saliency detection, named as DMRA. It achieves dramatic performance especially in complex scenarios. There are four main contributions of our network that are experimentally demonstrated to have significant practical merits. First, we design an effective depth refinement block using residual connections to fully extract and fuse cross-modal complementary cues from RGB and depth streams. Second, depth cues with abundant spatial information are innovatively combined with multi-scale contextual features for accurately locating salient objects. Third, a novel recurrent attention module inspired by Internal Generative Mechanism of human brain is designed to generate more accurate saliency results via comprehensively learning the internal semantic relation of the fused feature and progressively optimizing local details with memory-oriented scene understanding. Finally, a cascaded hierarchical feature fusion strategy is designed to promote efficient information interaction of multi-level contextual features and further improve the contextual representability of model. In addition, we introduce a new real-life RGB-D saliency dataset containing a variety of complex scenarios that has been widely used as a benchmark dataset in recent RGB-D saliency detection research. Extensive empirical experiments demonstrate that our method can accurately identify salient objects and achieve appealing performance against 18 state-of-the-art RGB-D saliency models on nine benchmark datasets.
Collapse
|
25
|
Xu Y, Yu X, Zhang J, Zhu L, Wang D. Weakly Supervised RGB-D Salient Object Detection With Prediction Consistency Training and Active Scribble Boosting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2148-2161. [PMID: 35196231 DOI: 10.1109/tip.2022.3151999] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
RGB-D salient object detection (SOD) has attracted increasingly more attention as it shows more robust results in complex scenes compared with RGB SOD. However, state-of-the-art RGB-D SOD approaches heavily rely on a large amount of pixel-wise annotated data for training. Such densely labeled annotations are often labor-intensive and costly. To reduce the annotation burden, we investigate RGB-D SOD from a weakly supervised perspective. More specifically, we use annotator-friendly scribble annotations as supervision signals for model training. Since scribble annotations are much sparser compared to ground-truth masks, some critical object structure information might be neglected. To preserve such structure information, we explicitly exploit the complementary edge information from two modalities (i.e., RGB and depth). Specifically, we leverage the dual-modal edge guidance and introduce a new network architecture with a dual-edge detection module and a modality-aware feature fusion module. In order to use the useful information of unlabeled pixels, we introduce a prediction consistency training scheme by comparing the predictions of two networks optimized by different strategies. Moreover, we develop an active scribble boosting strategy to provide extra supervision signals with negligible annotation cost, leading to significant SOD performance improvement. Extensive experiments on seven benchmarks validate the superiority of our proposed method. Remarkably, the proposed method with scribble annotations achieves competitive performance in comparison to fully supervised state-of-the-art methods.
Collapse
|
26
|
Wang F, Pan J, Xu S, Tang J. Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1285-1297. [PMID: 35015637 DOI: 10.1109/tip.2022.3140606] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
How to explore useful information from depth is the key success of the RGB-D saliency detection methods. While the RGB and depth images are from different domains, a modality gap will lead to unsatisfactory results for simple feature concatenation. Towards better performance, most methods focus on bridging this gap and designing different cross-modal fusion modules for features, while ignoring explicitly extracting some useful consistent information from them. To overcome this problem, we develop a simple yet effective RGB-D saliency detection method by learning discriminative cross-modality features based on the deep neural network. The proposed method first learns modality-specific features for RGB and depth inputs. And then we separately calculate the correlations of every pixel-pair in a cross-modality consistent way, i.e., the distribution ranges are consistent for the correlations calculated based on features extracted from RGB (RGB correlation) or depth inputs (depth correlation). From different perspectives, color or spatial, the RGB and depth correlations end up at the same point to depict how tightly each pixel-pair is related. Secondly, to complemently gather RGB and depth information, we propose a novel correlation-fusion to fuse RGB and depth correlations, resulting in a cross-modality correlation. Finally, the features are refined with both long-range cross-modality correlations and local depth correlations to predict salient maps. In which, the long-range cross-modality correlation provides context information for accurate localization, and the local depth correlation keeps good subtle structures for fine segmentation. In addition, a lightweight DepthNet is designed for efficient depth feature extraction. We solve the proposed network in an end-to-end manner. Both quantitative and qualitative experimental results demonstrate the proposed algorithm achieves favorable performance against state-of-the-art methods.
Collapse
|
27
|
Chen T, Hu X, Xiao J, Zhang G, Wang S. CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06845-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
28
|
Zhou W, Liu C, Lei J, Yu L, Luo T. HFNet: Hierarchical feedback network with multilevel atrous spatial pyramid pooling for RGB-D saliency detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.11.100] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
29
|
|
30
|
Yang S, Lin W, Lin G, Jiang Q, Liu Z. Progressive Self-Guided Loss for Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8426-8438. [PMID: 34606454 DOI: 10.1109/tip.2021.3113794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a simple yet effective progressive self-guided loss function to facilitate deep learning-based salient object detection (SOD) in images. The saliency maps produced by the most relevant works still suffer from incomplete predictions due to the internal complexity of salient objects. Our proposed progressive self-guided loss simulates a morphological closing operation on the model predictions for progressively creating auxiliary training supervisions to step-wisely guide the training process. We demonstrate that this new loss function can guide the SOD model to highlight more complete salient objects step-by-step and meanwhile help to uncover the spatial dependencies of the salient object pixels in a region growing manner. Moreover, a new feature aggregation module is proposed to capture multi-scale features and aggregate them adaptively by a branch-wise attention mechanism. Benefiting from this module, our SOD framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively. Experimental results on several benchmark datasets show that our loss function not only advances the performance of existing SOD models without architecture modification but also helps our proposed framework to achieve state-of-the-art performance.
Collapse
|
31
|
Zhao Y, Zhao J, Li J, Chen X. RGB-D Salient Object Detection With Ubiquitous Target Awareness. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7717-7731. [PMID: 34478368 DOI: 10.1109/tip.2021.3108412] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Conventional RGB-D salient object detection methods aim to leverage depth as complementary information to find the salient regions in both modalities. However, the salient object detection results heavily rely on the quality of captured depth data which sometimes are unavailable. In this work, we make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework. This framework only relies on RGB data in the testing phase, utilizing captured depth data as supervision for representation learning. To construct our framework as well as achieving accurate salient detection results, we propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task: 1) a depth awareness module to excavate depth information and to mine ambiguous regions via adaptive depth-error weights, 2) a spatial-aware cross-modal interaction and a channel-aware cross-level interaction, exploiting the low-level boundary cues and amplifying high-level salient channels, and 3) a gated multi-scale predictor module to perceive the object saliency in different contextual scales. Besides its high performance, our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS. Experimental evidence demonstrates that our proposed network not only surpasses the state-of-the-art methods on five public RGB-D SOD benchmarks by a large margin, but also verifies its extensibility on five public RGB SOD benchmarks.
Collapse
|
32
|
Luo H, Han G, Wu X, Liu P, Yang H, Zhang X. LF3Net: Leader-follower feature fusing network for fast saliency detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
33
|
Wang X, Li S, Chen C, Hao A, Qin H. Depth quality-aware selective saliency fusion for RGB-D image salient object detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
34
|
Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H. Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3528-3542. [PMID: 33667161 DOI: 10.1109/tip.2021.3062689] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing RGB-D Salient Object Detection (SOD) methods take advantage of depth cues to improve the detection accuracy, while pay insufficient attention to the quality of depth information. In practice, a depth map is often with uneven quality and sometimes suffers from distractors, due to various factors in the acquisition procedure. In this article, to mitigate distractors in depth maps and highlight salient objects in RGB images, we propose a Hierarchical Alternate Interactions Network (HAINet) for RGB-D SOD. Specifically, HAINet consists of three key stages: feature encoding, cross-modal alternate interaction, and saliency reasoning. The main innovation in HAINet is the Hierarchical Alternate Interaction Module (HAIM), which plays a key role in the second stage for cross-modal feature interaction. HAIM first uses RGB features to filter distractors in depth features, and then the purified depth features are exploited to enhance RGB features in turn. The alternate RGB-depth-RGB interaction proceeds in a hierarchical manner, which progressively integrates local and global contexts within a single feature scale. In addition, we adopt a hybrid loss function to facilitate the training of HAINet. Extensive experiments on seven datasets demonstrate that our HAINet not only achieves competitive performance as compared with 19 relevant state-of-the-art methods, but also reaches a real-time processing speed of 43 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/HAINet.
Collapse
|
35
|
Ju M, Ding C, Ren W, Yang Y, Zhang D, Guo YJ. IDE: Image Dehazing and Exposure Using an Enhanced Atmospheric Scattering Model. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2180-2192. [PMID: 33476267 DOI: 10.1109/tip.2021.3050643] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Atmospheric scattering model (ASM) is one of the most widely used model to describe the imaging processing of hazy images. However, we found that ASM has an intrinsic limitation which leads to a dim effect in the recovered results. In this paper, by introducing a new parameter, i.e., light absorption coefficient, into ASM, an enhanced ASM (EASM) is attained, which can address the dim effect and better model outdoor hazy scenes. Relying on this EASM, a simple yet effective gray-world-assumption-based technique called IDE is then developed to enhance the visibility of hazy images. Experimental results show that IDE eliminates the dim effect and exhibits excellent dehazing performance. It is worth mentioning that IDE does not require any training process or extra information related to scene depth, which makes it very fast and robust. Moreover, the global stretch strategy used in IDE can effectively avoid some undesirable effects in recovery results, e.g., over-enhancement, over-saturation, and mist residue, etc. Comparison between the proposed IDE and other state-of-the-art techniques reveals the superiority of IDE in terms of both dehazing quality and efficiency over all the comparable techniques.
Collapse
|
36
|
Zhou T, Fan DP, Cheng MM, Shen J, Shao L. RGB-D salient object detection: A survey. COMPUTATIONAL VISUAL MEDIA 2021; 7:37-69. [PMID: 33432275 PMCID: PMC7788385 DOI: 10.1007/s41095-020-0199-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/07/2020] [Indexed: 06/12/2023]
Abstract
Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
Collapse
Affiliation(s)
- Tao Zhou
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Deng-Ping Fan
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | | | - Jianbing Shen
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Ling Shao
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| |
Collapse
|