1
|
Fu H, Wang S, Duan P, Xiao C, Dian R, Li S, Li Z. LRAF-Net: Long-Range Attention Fusion Network for Visible-Infrared Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13232-13245. [PMID: 37279125 DOI: 10.1109/tnnls.2023.3266452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visible-infrared object detection aims to improve the detector performance by fusing the complementarity of visible and infrared images. However, most existing methods only use local intramodality information to enhance the feature representation while ignoring the efficient latent interaction of long-range dependence between different modalities, which leads to unsatisfactory detection performance under complex scenes. To solve these problems, we propose a feature-enhanced long-range attention fusion network (LRAF-Net), which improves detection performance by fusing the long-range dependence of the enhanced visible and infrared features. First, a two-stream CSPDarknet53 network is used to extract the deep features from visible and infrared images, in which a novel data augmentation (DA) method is designed to reduce the bias toward a single modality through asymmetric complementary masks. Then, we propose a cross-feature enhancement (CFE) module to improve the intramodality feature representation by exploiting the discrepancy between visible and infrared images. Next, we propose a long-range dependence fusion (LDF) module to fuse the enhanced features by associating the positional encoding of multimodality features. Finally, the fused features are fed into a detection head to obtain the final detection results. Experiments on several public datasets, i.e., VEDAI, FLIR, and LLVIP, show that the proposed method obtains state-of-the-art performance compared with other methods.
Collapse
|
2
|
Yue H, Guo J, Yin X, Zhang Y, Zheng S. Salient object detection in low-light RGB-T scene via spatial-frequency cues mining. Neural Netw 2024; 178:106406. [PMID: 38838393 DOI: 10.1016/j.neunet.2024.106406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 01/26/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
Low-light conditions pose significant challenges to vision tasks, such as salient object detection (SOD), due to insufficient photons. Light-insensitive RGB-T SOD models mitigate the above problems to some extent, but they are limited in performance as they only focus on spatial feature fusion while ignoring the frequency discrepancy. To this end, we propose an RGB-T SOD model by mining spatial-frequency cues, called SFMNet, for low-light scenes. Our SFMNet consists of spatial-frequency feature exploration (SFFE) modules and spatial-frequency feature interaction (SFFI) modules. To be specific, the SFFE module aims to separate spatial-frequency features and adaptively extract high and low-frequency features. Moreover, the SFFI module integrates cross-modality and cross-domain information to capture effective feature representations. By deploying both modules in a top-down pathway, our method generates high-quality saliency predictions. Furthermore, we construct the first low-light RGB-T SOD dataset as a benchmark for evaluating performance. Extensive experiments demonstrate that our SFMNet can achieve higher accuracy than the existing models for low-light scenes.
Collapse
Affiliation(s)
- Huihui Yue
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Jichang Guo
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xiangjun Yin
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Yi Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Sida Zheng
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
3
|
Liu B. The analysis of art design under improved convolutional neural network based on the Internet of Things technology. Sci Rep 2024; 14:21113. [PMID: 39256455 PMCID: PMC11387743 DOI: 10.1038/s41598-024-72343-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 09/05/2024] [Indexed: 09/12/2024] Open
Abstract
This work aims to explore the application of an improved convolutional neural network (CNN) combined with Internet of Things (IoT) technology in art design education and teaching. The development of IoT technology has created new opportunities for art design education, while deep learning and improved CNN models can provide more accurate and effective tools for image processing and analysis. In order to enhance the effectiveness of art design teaching and students' creative expression, this work proposes an improved CNN model. In model construction, it increases the number of convolutional layers and neurons, and incorporates the batch normalization layer and dropout layer to enhance feature extraction capabilities and reduce overfitting. Besides, this work creates an experimental environment using IoT technology, capturing art image samples and environmental data using cameras, sensors, and other devices. In the model application phase, image samples undergo preprocessing and are input into the CNN for feature extraction. Sensor data are concatenated with image feature vectors and input into the fully connected layers to comprehensively understand the artwork. Finally, this work trains the model using techniques such as cross-entropy loss functions and L2 regularization and adjusts hyperparameters to optimize model performance. The results indicate that the improved CNN model can effectively acquire art sample data and student creative expression data, providing accurate and timely feedback and guidance for art design education and teaching, with promising applications. This work offers new insights and methods for the development of art design education.
Collapse
Affiliation(s)
- Bo Liu
- Shandong Institute of Petroleum and Chemical Technology, Dongying, 257000, China.
| |
Collapse
|
4
|
Ma X, Li T, Deng J, Li T, Li J, Chang C, Wang R, Li G, Qi T, Hao S. Infrared and Visible Image Fusion Algorithm Based on Double-Domain Transform Filter and Contrast Transform Feature Extraction. SENSORS (BASEL, SWITZERLAND) 2024; 24:3949. [PMID: 38931733 PMCID: PMC11207559 DOI: 10.3390/s24123949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/15/2024] [Accepted: 06/16/2024] [Indexed: 06/28/2024]
Abstract
Current challenges in visible and infrared image fusion include color information distortion, texture detail loss, and target edge blur. To address these issues, a fusion algorithm based on double-domain transform filter and nonlinear contrast transform feature extraction (DDCTFuse) is proposed. First, for the problem of incomplete detail extraction that exists in the traditional transform domain image decomposition, an adaptive high-pass filter is proposed to decompose images into high-frequency and low-frequency portions. Second, in order to address the issue of fuzzy fusion target caused by contrast loss during the fusion process, a novel feature extraction algorithm is devised based on a novel nonlinear transform function. Finally, the fusion results are optimized and color-corrected by our proposed spatial-domain logical filter, in order to solve the color loss and edge blur generated in the fusion process. To validate the benefits of the proposed algorithm, nine classical algorithms are compared on the LLVIP, MSRS, INO, and Roadscene datasets. The results of these experiments indicate that the proposed fusion algorithm exhibits distinct targets, provides comprehensive scene information, and offers significant image contrast.
Collapse
Affiliation(s)
- Xu Ma
- College of Safety Science and Engineering, Xi’an University of Science and Technology, Xi’an 710054, China;
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Tianqi Li
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Jun Deng
- College of Safety Science and Engineering, Xi’an University of Science and Technology, Xi’an 710054, China;
| | - Tong Li
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Jiahao Li
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Chi Chang
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Rui Wang
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Guoliang Li
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Tianrui Qi
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| | - Shuai Hao
- College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China; (T.L.); (T.L.); (J.L.); (C.C.); (R.W.); (G.L.); (T.Q.); (S.H.)
| |
Collapse
|
5
|
Peng D, Zhou W, Pan J, Wang D. MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection. Neural Netw 2024; 171:410-422. [PMID: 38141476 DOI: 10.1016/j.neunet.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 12/05/2023] [Accepted: 12/18/2023] [Indexed: 12/25/2023]
Abstract
RGB-T Salient object detection (SOD) is to accurately segment salient regions in both visible light images and thermal infrared images. However, most of existing methods for SOD neglects the critical complementarity between multiple modalities images, which is beneficial to further improve the detection accuracy. Therefore, this work introduces the MSEDNet RGB-T SOD method. We utilize an encoder to extract multi-level modalities features from both visible light images and thermal infrared images, which are subsequently categorized into high, medium, and low level. Additionally, we propose three separate feature fusion modules to comprehensively extract complementary information between different modalities during the fusion process. These modules are applied to specific feature levels: the Edge Dilation Sharpening module for low-level features, the Spatial and Channel-Aware module for mid-level features, and the Cross-Residual Fusion module for high-level features. Finally, we introduce an edge fusion loss function for supervised learning, which effectively extracts edge information from different modalities and suppresses background noise. Comparative demonstrate the superiority of the proposed MSEDNet over other state-of-the-art methods. The code and results can be found at the following link: https://github.com/Zhou-wy/MSEDNet.
Collapse
Affiliation(s)
- Daogang Peng
- College of Automation Engineering, Shanghai University of Electric Power, 200090, 2588 Changyang Road, Yangpu, Shanghai, China.
| | - Weiyi Zhou
- College of Automation Engineering, Shanghai University of Electric Power, 200090, 2588 Changyang Road, Yangpu, Shanghai, China.
| | - Junzhen Pan
- College of Automation Engineering, Shanghai University of Electric Power, 200090, 2588 Changyang Road, Yangpu, Shanghai, China
| | - Danhao Wang
- College of Automation Engineering, Shanghai University of Electric Power, 200090, 2588 Changyang Road, Yangpu, Shanghai, China
| |
Collapse
|
6
|
Lv C, Wan B, Zhou X, Sun Y, Zhang J, Yan C. Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection. ENTROPY (BASEL, SWITZERLAND) 2024; 26:130. [PMID: 38392385 PMCID: PMC10888287 DOI: 10.3390/e26020130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/24/2024]
Abstract
RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.
Collapse
Affiliation(s)
- Chengtao Lv
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Bin Wan
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Xiaofei Zhou
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yaoqi Sun
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
- Lishui Institute, Hangzhou Dianzi University, Lishui 323000, China
| | - Jiyong Zhang
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Chenggang Yan
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
7
|
Huo D, Wang J, Qian Y, Yang YH. Glass Segmentation with RGB-Thermal Image Pairs. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1911-1926. [PMID: 37030759 DOI: 10.1109/tip.2023.3256762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This paper proposes a new glass segmentation method utilizing paired RGB and thermal images. Due to the large difference between the transmission property of visible light and that of the thermal energy through the glass where most glass is transparent to the visible light but opaque to thermal energy, glass regions of a scene are made more distinguishable with a pair of RGB and thermal images than solely with an RGB image. To exploit such a unique property, we propose a neural network architecture that effectively combines an RGB-thermal image pair with a new multi-modal fusion module based on attention, and integrate CNN and transformer to extract local features and non-local dependencies, respectively. As well, we have collected a new dataset containing 5551 RGB-thermal image pairs with ground-truth segmentation annotations. The qualitative and quantitative evaluations demonstrate the effectiveness of the proposed approach on fusing RGB and thermal data for glass segmentation. Our code and data are available at https://github.com/Dong-Huo/RGB-T-Glass-Segmentation.
Collapse
|
8
|
Zhou W, Zhu Y, Lei J, Yang R, Yu L. LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1329-1340. [PMID: 37022901 DOI: 10.1109/tip.2023.3242775] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most recent methods for RGB (red-green-blue)-thermal salient object detection (SOD) involve several floating-point operations and have numerous parameters, resulting in slow inference, especially on common processors, and impeding their deployment on mobile devices for practical applications. To address these problems, we propose a lightweight spatial boosting network (LSNet) for efficient RGB-thermal SOD with a lightweight MobileNetV2 backbone to replace a conventional backbone (e.g., VGG, ResNet). To improve feature extraction using a lightweight backbone, we propose a boundary boosting algorithm that optimizes the predicted saliency maps and reduces information collapse in low-dimensional features. The algorithm generates boundary maps based on predicted saliency maps without incurring additional calculations or complexity. As multimodality processing is essential for high-performance SOD, we adopt attentive feature distillation and selection and propose semantic and geometric transfer learning to enhance the backbone without increasing the complexity during testing. Experimental results demonstrate that the proposed LSNet achieves state-of-the-art performance compared with 14 RGB-thermal SOD methods on three datasets while improving the numbers of floating-point operations (1.025G) and parameters (5.39M), model size (22.1 MB), and inference speed (9.95 fps for PyTorch, batch size of 1, and Intel i5-7500 processor; 93.53 fps for PyTorch, batch size of 1, and NVIDIA TITAN V graphics processor; 936.68 fps for PyTorch, batch size of 20, and graphics processor; 538.01 fps for TensorRT and batch size of 1; and 903.01 fps for TensorRT/FP16 and batch size of 1). The code and results can be found from the link of https://github.com/zyrant/LSNet.
Collapse
|
9
|
Pang Y, Zhao X, Zhang L, Lu H. CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:892-904. [PMID: 37018701 DOI: 10.1109/tip.2023.3234702] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Collapse
|
10
|
Wen H, Song K, Huang L, Wang H, Yan Y. Cross-modality salient object detection network with universality and anti-interference. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
11
|
MENet: Lightweight Multimodality Enhancement Network for Detecting Salient Objects in RGB-Thermal Images. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
12
|
Three-stream interaction decoder network for RGB-thermal salient object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
13
|
Xu C, Li Q, Zhou Q, Jiang X, Yu D, Zhou Y. Asymmetric cross-modal activation network for RGB-T salient object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
14
|
Bi H, Wu R, Liu Z, Zhang J, Zhang C, Xiang TZ, Wang X. PSNet: Parallel symmetric network for RGB-T salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
15
|
Fu B, Cao T, Zheng Y, Fang Z, Chen L, Wang Y, Wang Y, Wang Y. Polarization-driven camouflaged object segmentation via gated fusion. APPLIED OPTICS 2022; 61:8017-8027. [PMID: 36255923 DOI: 10.1364/ao.466339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 08/21/2022] [Indexed: 06/16/2023]
Abstract
Recently, polarization-based models for camouflaged object segmentation have attracted research attention. However, to construct this camouflaged object segmentation model, the main challenge is to effectively fuse polarization and light intensity features. Therefore, we propose a multi-modal camouflaged object segmentation method via gated fusion. First, the spatial positioning module is designed to perform channel calibration and global spatial attention alignment between polarization mode and light intensity mode from high-level feature representation to locate object positioning accurately. Then, the gated fusion module (GFM) is designed to selectively fuse the object information contained in the polarization and light intensity features. Among them, semantic information of location features is introduced in the GFM to guide each mode to aggregate dominant features. Finally, the features of each layer are aggregated to obtain an accurate segmentation result map. At the same time, considering the lack of public evaluation and training data on light intensity-polarization (I-P) camouflaged detection, we build the light I-P camouflaged detection dataset. Experimental results demonstrate that our proposed method outperforms other typical multi-modal segmentation methods in this dataset.
Collapse
|
16
|
Modal complementary fusion network for RGB-T salient object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03950-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Tu Z, Li Z, Li C, Tang J. Weakly Alignment-Free RGBT Salient Object Detection With Deep Correlation Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3752-3764. [PMID: 35604973 DOI: 10.1109/tip.2022.3176540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
RGBT Salient Object Detection (SOD) focuses on common salient regions of a pair of visible and thermal infrared images. Existing methods perform on the well-aligned RGBT image pairs, but the captured image pairs are always unaligned and aligning them requires much labor cost. To handle this problem, we propose a novel deep correlation network (DCNet), which explores the correlations across RGB and thermal modalities, for weakly alignment-free RGBT SOD. In particular, DCNet includes a modality alignment module based on the spatial affine transformation, the feature-wise affine transformation and the dynamic convolution to model the strong correlation of two modalities. Moreover, we propose a novel bi-directional decoder model, which combines the coarse-to-fine and fine-to-coarse processes for better feature enhancement. In particular, we design a modality correlation ConvLSTM by adding the first two components of modality alignment module and a global context reinforcement module into ConvLSTM, which is used to decode hierarchical features in both top-down and button-up manners. Extensive experiments on three public benchmark datasets show the remarkable performance of our method against state-of-the-art methods.
Collapse
|
18
|
RGB-T salient object detection via CNN feature and result saliency map fusion. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02984-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Zhou T, Fan DP, Cheng MM, Shen J, Shao L. RGB-D salient object detection: A survey. COMPUTATIONAL VISUAL MEDIA 2021; 7:37-69. [PMID: 33432275 PMCID: PMC7788385 DOI: 10.1007/s41095-020-0199-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/07/2020] [Indexed: 06/12/2023]
Abstract
Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
Collapse
Affiliation(s)
- Tao Zhou
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Deng-Ping Fan
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | | | - Jianbing Shen
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Ling Shao
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| |
Collapse
|