1
|
Ferreira A, Li J, Pomykala KL, Kleesiek J, Alves V, Egger J. GAN-based generation of realistic 3D volumetric data: A systematic review and taxonomy. Med Image Anal 2024; 93:103100. [PMID: 38340545 DOI: 10.1016/j.media.2024.103100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 11/20/2023] [Accepted: 01/30/2024] [Indexed: 02/12/2024]
Abstract
With the massive proliferation of data-driven algorithms, such as deep learning-based approaches, the availability of high-quality data is of great interest. Volumetric data is very important in medicine, as it ranges from disease diagnoses to therapy monitoring. When the dataset is sufficient, models can be trained to help doctors with these tasks. Unfortunately, there are scenarios where large amounts of data is unavailable. For example, rare diseases and privacy issues can lead to restricted data availability. In non-medical fields, the high cost of obtaining enough high-quality data can also be a concern. A solution to these problems can be the generation of realistic synthetic data using Generative Adversarial Networks (GANs). The existence of these mechanisms is a good asset, especially in healthcare, as the data must be of good quality, realistic, and without privacy issues. Therefore, most of the publications on volumetric GANs are within the medical domain. In this review, we provide a summary of works that generate realistic volumetric synthetic data using GANs. We therefore outline GAN-based methods in these areas with common architectures, loss functions and evaluation metrics, including their advantages and disadvantages. We present a novel taxonomy, evaluations, challenges, and research opportunities to provide a holistic overview of the current state of volumetric GANs.
Collapse
Affiliation(s)
- André Ferreira
- Center Algoritmi/LASI, University of Minho, Braga, 4710-057, Portugal; Computer Algorithms for Medicine Laboratory, Graz, Austria; Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, Essen, 45131, Germany; Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, 52074 Aachen, Germany; Institute of Medical Informatics, University Hospital RWTH Aachen, 52074 Aachen, Germany.
| | - Jianning Li
- Computer Algorithms for Medicine Laboratory, Graz, Austria; Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, Essen, 45131, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Hufelandstraße 55, Essen, 45147, Germany.
| | - Kelsey L Pomykala
- Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, Essen, 45131, Germany.
| | - Jens Kleesiek
- Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, Essen, 45131, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Hufelandstraße 55, Essen, 45147, Germany; German Cancer Consortium (DKTK), Partner Site Essen, Hufelandstraße 55, Essen, 45147, Germany; TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227 Dortmund, Germany.
| | - Victor Alves
- Center Algoritmi/LASI, University of Minho, Braga, 4710-057, Portugal.
| | - Jan Egger
- Computer Algorithms for Medicine Laboratory, Graz, Austria; Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, Essen, 45131, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Hufelandstraße 55, Essen, 45147, Germany; Institute of Computer Graphics and Vision, Graz University of Technology, Inffeldgasse 16, Graz, 801, Austria.
| |
Collapse
|
2
|
Liu Z, Hayat M, Yang H, Peng D, Lei Y. Deep Hypersphere Feature Regularization for Weakly Supervised RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5423-5437. [PMID: 37773910 DOI: 10.1109/tip.2023.3318953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
We propose a weakly supervised approach for salient object detection from multi-modal RGB-D data. Our approach only relies on labels from scribbles, which are much easier to annotate, compared with dense labels used in conventional fully supervised setting. In contrast to existing methods that employ supervision signals on the output space, our design regularizes the intermediate latent space to enhance discrimination between salient and non-salient objects. We further introduce a contour detection branch to implicitly constrain the semantic boundaries and achieve precise edges of detected salient objects. To enhance the long-range dependencies among local features, we introduce a Cross-Padding Attention Block (CPAB). Extensive experiments on seven benchmark datasets demonstrate that our method not only outperforms existing weakly supervised methods, but is also on par with several fully-supervised state-of-the-art models. Code is available at https://github.com/leolyj/DHFR-SOD.
Collapse
|
3
|
Belmekki MAA, Leach J, Tobin R, Buller GS, McLaughlin S, Halimi A. 3D target detection and spectral classification for single-photon LiDAR data. OPTICS EXPRESS 2023; 31:23729-23745. [PMID: 37475217 DOI: 10.1364/oe.487896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 05/15/2023] [Indexed: 07/22/2023]
Abstract
3D single-photon LiDAR imaging has an important role in many applications. However, full deployment of this modality will require the analysis of low signal to noise ratio target returns and very high volume of data. This is particularly evident when imaging through obscurants or in high ambient background light conditions. This paper proposes a multiscale approach for 3D surface detection from the photon timing histogram to permit a significant reduction in data volume. The resulting surfaces are background-free and can be used to infer depth and reflectivity information about the target. We demonstrate this by proposing a hierarchical Bayesian model for 3D reconstruction and spectral classification of multispectral single-photon LiDAR data. The reconstruction method promotes spatial correlation between point-cloud estimates and uses a coordinate gradient descent algorithm for parameter estimation. Results on simulated and real data show the benefits of the proposed target detection and reconstruction approaches when compared to state-of-the-art processing algorithms.
Collapse
|
4
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
5
|
Pang Y, Zhao X, Zhang L, Lu H. CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:892-904. [PMID: 37018701 DOI: 10.1109/tip.2023.3234702] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Collapse
|
6
|
Conditional TransGAN-Based Data Augmentation for PCB Electronic Component Inspection. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:2024237. [PMID: 36660560 PMCID: PMC9845033 DOI: 10.1155/2023/2024237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/06/2022] [Accepted: 12/21/2022] [Indexed: 01/12/2023]
Abstract
Automatic recognition and positioning of electronic components on PCBs can enhance quality inspection efficiency for electronic products during manufacturing. Efficient PCB inspection requires identification and classification of PCB components as well as defects for better quality assurance. The small size of the electronic component and PCB defect targets means that there are fewer feature areas for the neural network to detect, and the complex grain backgrounds of both datasets can cause significant interference, making the target detection task challenging. Meanwhile, the detection performance of deep learning models is significantly impacted due to the lack of samples. In this paper, we propose conditional TransGAN (cTransGAN), a generative model for data augmentation, which enhances the quantity and diversity of the original training set and further improves the accuracy of PCB electronic component recognition. The design of cTransGAN brings together the merits of both conditional GAN and TransGAN, allowing a trained model to generate high-quality synthetic images conditioned on the class embeddings. To validate the proposed method, we conduct extensive experiments on two datasets, including a self-developed dataset for PCB component detection and an existing dataset for PCB defect detection. Also, we have evaluated three existing object detection algorithms, including Faster R-CNN ResNet101, YOLO V3 DarkNet-53, and SCNet ResNet101, and each is validated under four experimental settings to form an ablation study. Results demonstrate that the proposed cTransGAN can effectively enhance the quality and diversity of the training set, leading to superior performance on both tasks. We have open-sourced the project to facilitate further studies.
Collapse
|
7
|
Wu Z, Allibert G, Meriaudeau F, Ma C, Demonceaux C. HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2160-2173. [PMID: 37027289 DOI: 10.1109/tip.2023.3263111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RGB-D saliency detection aims to fuse multi-modal cues to accurately localize salient regions. Existing works often adopt attention modules for feature modeling, with few methods explicitly leveraging fine-grained details to merge with semantic cues. Thus, despite the auxiliary depth information, it is still challenging for existing models to distinguish objects with similar appearances but at distinct camera distances. In this paper, from a new perspective, we propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection. Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies. To realize multi-modal and multi-level fusion, we first use a granularity-based attention scheme to strengthen the discriminatory power of RGB and depth features separately. Then we introduce a unified cross dual-attention module for multi-modal and multi-level fusion in a coarse-to-fine manner. The encoded multi-modal features are gradually aggregated into a shared decoder. Further, we exploit a multi-scale loss to take full advantage of the hierarchical information. Extensive experiments on challenging benchmark datasets demonstrate that our HiDAnet performs favorably over the state-of-the-art methods by large margins. The source code can be found in https://github.com/Zongwei97/HIDANet/.
Collapse
|
8
|
Zhou X, Shen K, Weng L, Cong R, Zheng B, Zhang J, Yan C. Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:539-552. [PMID: 35417369 DOI: 10.1109/tcyb.2022.3163152] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.
Collapse
|
9
|
Wen H, Song K, Huang L, Wang H, Yan Y. Cross-modality salient object detection network with universality and anti-interference. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
10
|
Liu N, Zhang N, Shao L, Han J. Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9026-9042. [PMID: 34699348 DOI: 10.1109/tpami.2021.3122139] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
How to effectively fuse cross-modal information is a key problem for RGB-D salient object detection. Early fusion and result fusion schemes fuse RGB and depth information at the input and output stages, respectively, and hence incur distribution gaps or information loss. Many models instead employ a feature fusion strategy, but they are limited by their use of low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and context from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other, thus leveraging complementary attention cues to achieve high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering that low-quality depth data may be detrimental to model performance, we further propose a selective attention to reweight the added depth cues. We embed the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results demonstrate the effectiveness of our proposed model. Moreover, we also construct a new and challenging large-scale RGB-D SOD dataset of high-quality, which can promote both the training and evaluation of deep models.
Collapse
|
11
|
Wu YH, Liu Y, Xu J, Bian JW, Gu YC, Cheng MM. MobileSal: Extremely Efficient RGB-D Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10261-10269. [PMID: 34898430 DOI: 10.1109/tpami.2021.3134684] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The high computational cost of neural networks has prevented recent successes in RGB-D salient object detection (SOD) from benefiting real-world applications. Hence, this article introduces a novel network, MobileSal, which focuses on efficient RGB-D SOD using mobile networks for deep feature extraction. However, mobile networks are less powerful in feature representation than cumbersome networks. To this end, we observe that the depth information of color images can strengthen the feature representation related to SOD if leveraged properly. Therefore, we propose an implicit depth restoration (IDR) technique to strengthen the mobile networks' feature representation capability for RGB-D SOD. IDR is only adopted in the training phase and is omitted during testing, so it is computationally free. Besides, we propose compact pyramid refinement (CPR) for efficient multi-level feature aggregation to derive salient objects with clear boundaries. With IDR and CPR incorporated, MobileSal performs favorably against state-of-the-art methods on six challenging RGB-D SOD datasets with much faster speed (450fps for the input size of 320×320) and fewer parameters (6.5M). The code is released at https://mmcheng.net/mobilesal.
Collapse
|
12
|
Xu C, Li Q, Zhou Q, Jiang X, Yu D, Zhou Y. Asymmetric cross-modal activation network for RGB-T salient object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
13
|
Bi H, Wu R, Liu Z, Zhang J, Zhang C, Xiang TZ, Wang X. PSNet: Parallel symmetric network for RGB-T salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
14
|
Sun P, Zhang W, Li S, Guo Y, Song C, Li X. Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01646-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
15
|
Zhang J, Fan DP, Dai Y, Anwar S, Saleh F, Aliakbarian S, Barnes N. Uncertainty Inspired RGB-D Saliency Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5761-5779. [PMID: 33856982 DOI: 10.1109/tpami.2021.3073564] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection which utilizes a latent variable to model the labeling variations. Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. Qualitative and quantitative results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps. The source code is publicly available via our project page: https://github.com/JingZhang617/UCNet.
Collapse
|
16
|
Zhu J, Zhang X, Fang X, Rahman MRU, Dong F, Li Y, Yan S, Tan P. Boosting RGB-D salient object detection with adaptively cooperative dynamic fusion network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
17
|
Prediction and Estimation of River Velocity Based on GAN and Multifeature Fusion. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7316133. [PMID: 36045976 PMCID: PMC9420598 DOI: 10.1155/2022/7316133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/22/2022] [Accepted: 07/15/2022] [Indexed: 11/17/2022]
Abstract
The necessity of predicting and estimating river velocity motivates the development of a prediction method based on GAN image enhancement and multifeature fusion. In this method, in order to improve the image quality of river velocity, GAN network is used to enhance the image, so as to improve the integrity of image data set. In order to improve the accuracy of prediction, the image is extracted and fused with multiple features, and the extracted multiple features are taken as the input of CNN, so as to improve the prediction accuracy of convolution neural network. The results show that when the velocity is 0.25 m/s, 0.50 m/s, and 0.75 m/s, the accuracy of improved method can reach 85%, 90%, and 92%, which are higher than SVM, VGG-16, and BPNET algorithms. The above results indicate that the improvement has certain positive value and practical application value.
Collapse
|
18
|
Li J, Cheng S, Tao Y, Liu H, Zhou J, Zhang J. Dual-Branch Discrimination Network Using Multiple Sparse Priors for Image Deblurring. SENSORS (BASEL, SWITZERLAND) 2022; 22:6216. [PMID: 36015974 PMCID: PMC9413728 DOI: 10.3390/s22166216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 08/12/2022] [Accepted: 08/16/2022] [Indexed: 06/15/2023]
Abstract
Blind image deblurring is a challenging problem in computer vision, aiming to restore the sharp image from blurred observation. Due to the incompatibility between the complex unknown degradation and the simple synthetic model, directly training a deep convolutional neural network (CNN) usually cannot sufficiently handle real-world blurry images. An existed generative adversarial network (GAN) can generate more detailed and realistic images, but the game between generator and discriminator is unbalancing, which leads to the training parameters not being able to converge to the ideal Nash equilibrium points. In this paper, we propose a GAN with a dual-branch discriminator using multiple sparse priors for image deblurring (DBSGAN) to overcome this limitation. By adding the multiple sparse priors into the other branch of the discriminator, the task of the discriminator is more complex. It can balance the game between the generator and the discriminator. Extensive experimental results on both synthetic and real-world blurry image datasets demonstrate the superior performance of our method over the state of the art in terms of quantitative metrics and visual quality. Especially for the GOPRO dataset, the averaged PSNR improves 1.7% over others.
Collapse
Affiliation(s)
- Jialuo Li
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Shichao Cheng
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
- Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China
- Shangyu Institute of Science and Engineering, Hangzhou Dianzi University, Shaoxing 310005, China
| | - Yueqiang Tao
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Huasheng Liu
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
- Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China
| | - Junzhe Zhou
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Jianhai Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
- Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China
| |
Collapse
|
19
|
Fan DP, Li T, Lin Z, Ji GP, Zhang D, Cheng MM, Fu H, Shen J. Re-Thinking Co-Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4339-4354. [PMID: 33600309 DOI: 10.1109/tpami.2021.3060412] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 40 cutting-edge algorithms, benchmarking 18 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community. The benchmark toolbox and results are available on our project page at https://dpfan.net/CoSOD3K.
Collapse
|
20
|
Zhang N, Han J, Liu N. Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4556-4570. [PMID: 35763477 DOI: 10.1109/tip.2022.3185550] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
RGB-D co-salient object detection aims to segment co-occurring salient objects when given a group of relevant images and depth maps. Previous methods often adopt separate pipeline and use hand-crafted features, being hard to capture the patterns of co-occurring salient objects and leading to unsatisfactory results. Using end-to-end CNN models is a straightforward idea, but they are less effective in exploiting global cues due to the intrinsic limitation. Thus, in this paper, we alternatively propose an end-to-end transformer-based model which uses class tokens to explicitly capture implicit class knowledge to perform RGB-D co-salient object detection, denoted as CTNet. Specifically, we first design adaptive class tokens for individual images to explore intra-saliency cues and then develop common class tokens for the whole group to explore inter-saliency cues. Besides, we also leverage the complementary cues between RGB images and depth maps to promote the learning of the above two types of class tokens. In addition, to promote model evaluation, we construct a challenging and large-scale benchmark dataset, named RGBD CoSal1k, which collects 106 groups containing 1000 pairs of RGB-D images with complex scenarios and diverse appearances. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
21
|
Chen T, Xiao J, Hu X, Zhang G, Wang S. Boundary-guided network for camouflaged object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
22
|
ISVD-Based Advanced Simultaneous Localization and Mapping (SLAM) Algorithm for Mobile Robots. MACHINES 2022. [DOI: 10.3390/machines10070519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In the case of simultaneous localization and mapping, route planning and navigation are based on data captured by multiple sensors, including built-in cameras. Nowadays, mobile devices frequently have more than one camera with overlapping fields of view, leading to solutions where depth information can also be gathered along with ordinary RGB color data. Using these RGB-D sensors, two- and three-dimensional point clouds can be recorded from the mobile devices, which provide additional information for localization and mapping. The method of matching point clouds during the movement of the device is essential: reducing noise while having an acceptable processing time is crucial for a real-life application. In this paper, we present a novel ISVD-based method for displacement estimation, using key points detected by SURF and ORB feature detectors. The ISVD algorithm is a fitting procedure based on SVD resolution, which removes outliers from the point clouds to be fitted in several steps. The developed method removes these outlying points in several steps, in each iteration examining the relative error of the point pairs and then progressively reducing the maximum error for the next matching step. An advantage over relevant methods is that this method always gives the same result, as no random steps are included.
Collapse
|
23
|
FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
24
|
Ji W, Yan G, Li J, Piao Y, Yao S, Zhang M, Cheng L, Lu H. DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2321-2336. [PMID: 35245195 DOI: 10.1109/tip.2022.3154931] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work, we propose a novel depth-induced multi-scale recurrent attention network for RGB-D saliency detection, named as DMRA. It achieves dramatic performance especially in complex scenarios. There are four main contributions of our network that are experimentally demonstrated to have significant practical merits. First, we design an effective depth refinement block using residual connections to fully extract and fuse cross-modal complementary cues from RGB and depth streams. Second, depth cues with abundant spatial information are innovatively combined with multi-scale contextual features for accurately locating salient objects. Third, a novel recurrent attention module inspired by Internal Generative Mechanism of human brain is designed to generate more accurate saliency results via comprehensively learning the internal semantic relation of the fused feature and progressively optimizing local details with memory-oriented scene understanding. Finally, a cascaded hierarchical feature fusion strategy is designed to promote efficient information interaction of multi-level contextual features and further improve the contextual representability of model. In addition, we introduce a new real-life RGB-D saliency dataset containing a variety of complex scenarios that has been widely used as a benchmark dataset in recent RGB-D saliency detection research. Extensive empirical experiments demonstrate that our method can accurately identify salient objects and achieve appealing performance against 18 state-of-the-art RGB-D saliency models on nine benchmark datasets.
Collapse
|
25
|
Xu Y, Yu X, Zhang J, Zhu L, Wang D. Weakly Supervised RGB-D Salient Object Detection With Prediction Consistency Training and Active Scribble Boosting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2148-2161. [PMID: 35196231 DOI: 10.1109/tip.2022.3151999] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
RGB-D salient object detection (SOD) has attracted increasingly more attention as it shows more robust results in complex scenes compared with RGB SOD. However, state-of-the-art RGB-D SOD approaches heavily rely on a large amount of pixel-wise annotated data for training. Such densely labeled annotations are often labor-intensive and costly. To reduce the annotation burden, we investigate RGB-D SOD from a weakly supervised perspective. More specifically, we use annotator-friendly scribble annotations as supervision signals for model training. Since scribble annotations are much sparser compared to ground-truth masks, some critical object structure information might be neglected. To preserve such structure information, we explicitly exploit the complementary edge information from two modalities (i.e., RGB and depth). Specifically, we leverage the dual-modal edge guidance and introduce a new network architecture with a dual-edge detection module and a modality-aware feature fusion module. In order to use the useful information of unlabeled pixels, we introduce a prediction consistency training scheme by comparing the predictions of two networks optimized by different strategies. Moreover, we develop an active scribble boosting strategy to provide extra supervision signals with negligible annotation cost, leading to significant SOD performance improvement. Extensive experiments on seven benchmarks validate the superiority of our proposed method. Remarkably, the proposed method with scribble annotations achieves competitive performance in comparison to fully supervised state-of-the-art methods.
Collapse
|
26
|
Zi L, Cong X. SASD: A Shape-Aware Saliency Object Detection Approach for RGB-D Images. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20497-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
27
|
Optimized convolutional neural network architectures for efficient on-device vision-based object detection. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06830-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractConvolutional neural networks have pushed forward image analysis research and computer vision over the last decade, constituting a state-of-the-art approach in object detection today. The design of increasingly deeper and wider architectures has made it possible to achieve unprecedented levels of detection accuracy, albeit at the cost of both a dramatic computational burden and a large memory footprint. In such a context, cloud systems have become a mainstream technological solution due to their tremendous scalability, providing researchers and practitioners with virtually unlimited resources. However, these resources are typically made available as remote services, requiring communication over the network to be accessed, thus compromising the speed of response, availability, and security of the implemented solution. In view of these limitations, the on-device paradigm has emerged as a recent yet widely explored alternative, pursuing more compact and efficient networks to ultimately enable the execution of the derived models directly on resource-constrained client devices. This study provides an up-to-date review of the more relevant scientific research carried out in this vein, circumscribed to the object detection problem. In particular, the paper contributes to the field with a comprehensive architectural overview of both the existing lightweight object detection frameworks targeted to mobile and embedded devices, and the underlying convolutional neural networks that make up their internal structure. More specifically, it addresses the main structural-level strategies used for conceiving the various components of a detection pipeline (i.e., backbone, neck, and head), as well as the most salient techniques proposed for adapting such structures and the resulting architectures to more austere deployment environments. Finally, the study concludes with a discussion of the specific challenges and next steps to be taken to move toward a more convenient accuracy–speed trade-off.
Collapse
|
28
|
Zhai Y, Fan DP, Yang J, Borji A, Shao L, Han J, Wang L. Bifurcated Backbone Strategy for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8727-8742. [PMID: 34613915 DOI: 10.1109/tip.2021.3116793] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel Bifurcated Backbone Strategy Network (BBS-Net). Our architecture, is simple, efficient, and backbone-independent. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Extensive experiments show that BBS-Net significantly outperforms 18 state-of-the-art (SOTA) models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach (~4% improvement in S-measure vs . the top-ranked model: DMRA). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. The complete algorithm, benchmark results, and post-processing toolbox are publicly available at https://github.com/zyjwuyan/BBS-Net.
Collapse
|
29
|
Zhou M, Cheng W, Huang H, Chen J. A Novel Approach to Automated 3D Spalling Defects Inspection in Railway Tunnel Linings Using Laser Intensity and Depth Information. SENSORS (BASEL, SWITZERLAND) 2021; 21:5725. [PMID: 34502618 PMCID: PMC8434528 DOI: 10.3390/s21175725] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/20/2021] [Accepted: 08/24/2021] [Indexed: 11/20/2022]
Abstract
The detection of concrete spalling is critical for tunnel inspectors to assess structural risks and guarantee the daily operation of the railway tunnel. However, traditional spalling detection methods mostly rely on visual inspection or camera images taken manually, which are inefficient and unreliable. In this study, an integrated approach based on laser intensity and depth features is proposed for the automated detection and quantification of concrete spalling. The Railway Tunnel Spalling Defects (RTSD) database, containing intensity images and depth images of the tunnel linings, is established via mobile laser scanning (MLS), and the Spalling Intensity Depurator Network (SIDNet) model is proposed for automatic extraction of the concrete spalling features. The proposed model is trained, validated and tested on the established RSTD dataset with impressive results. Comparison with several other spalling detection models shows that the proposed model performs better in terms of various indicators such as MPA (0.985) and MIoU (0.925). The extra depth information obtained from MLS allows for the accurate evaluation of the volume of detected spalling defects, which is beyond the reach of traditional methods. In addition, a triangulation mesh method is implemented to reconstruct the 3D tunnel lining model and visualize the 3D inspection results. As a result, a 3D inspection report can be outputted automatically containing quantified spalling defect information along with relevant spatial coordinates. The proposed approach has been conducted on several railway tunnels in Yunnan province, China and the experimental results have proved its validity and feasibility.
Collapse
Affiliation(s)
| | - Wen Cheng
- Key Laboratory of Geotechnical and Underground Engineering, Department of Geotechnical Engineering, Tongji University, Siping Road 1239, Shanghai 200092, China; (M.Z.); (H.H.); (J.C.)
| | | | | |
Collapse
|
30
|
Abstract
AbstractThis paper reviews the current state of the art in artificial intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically machine learning (ML) algorithms, is provided including convolutional neural networks (CNNs), generative adversarial networks (GANs), recurrent neural networks (RNNs) and deep Reinforcement Learning (DRL). We categorize creative applications into five groups, related to how AI technologies are used: (i) content creation, (ii) information analysis, (iii) content enhancement and post production workflows, (iv) information extraction and enhancement, and (v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, ML-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of ML in domains with fewer constraints, where AI is the ‘creator’, remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human-centric—where it is designed to augment, rather than replace, human creativity.
Collapse
|
31
|
CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01452-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
32
|
|
33
|
Bayoudh K, Knani R, Hamdaoui F, Mtibaa A. A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. THE VISUAL COMPUTER 2021; 38:2939-2970. [PMID: 34131356 PMCID: PMC8192112 DOI: 10.1007/s00371-021-02166-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/15/2021] [Indexed: 06/12/2023]
Abstract
The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and textual content. Extracting relevant patterns from this kind of data is still a motivating goal for researchers in deep learning. In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the integration and combination of heterogeneous visual cues across sensory modalities. In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains. Finally, we highlight the limitations and challenges of deep multimodal learning and provide insights and directions for future research.
Collapse
Affiliation(s)
- Khaled Bayoudh
- Electrical Department, National Engineering School of Monastir (ENIM), Laboratory of Electronics and Micro-electronics (LR99ES30), Faculty of Sciences of Monastir (FSM), University of Monastir, Monastir, Tunisia
| | - Raja Knani
- Physics Department, Laboratory of Electronics and Micro-electronics (LR99ES30), Faculty of Sciences of Monastir (FSM), University of Monastir, Monastir, Tunisia
| | - Fayçal Hamdaoui
- Electrical Department, National Engineering School of Monastir (ENIM), Laboratory of Control, Electrical Systems and Environment (LASEE), National Engineering School of Monastir, University of Monastir, Monastir, Tunisia
| | - Abdellatif Mtibaa
- Electrical Department, National Engineering School of Monastir (ENIM), Laboratory of Electronics and Micro-electronics (LR99ES30), Faculty of Sciences of Monastir (FSM), University of Monastir, Monastir, Tunisia
| |
Collapse
|
34
|
Towards accurate RGB-D saliency detection with complementary attention and adaptive integration. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.125] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
35
|
Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images. REMOTE SENSING 2021. [DOI: 10.3390/rs13112163] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although remarkable progress has been made in salient object detection (SOD) in natural scene images (NSI), the SOD of optical remote sensing images (RSI) still faces significant challenges due to various spatial resolutions, cluttered backgrounds, and complex imaging conditions, mainly for two reasons: (1) accurate location of salient objects; and (2) subtle boundaries of salient objects. This paper explores the inherent properties of multi-level features to develop a novel semantic-guided attention refinement network (SARNet) for SOD of NSI. Specifically, the proposed semantic guided decoder (SGD) roughly but accurately locates the multi-scale object by aggregating multiple high-level features, and then this global semantic information guides the integration of subsequent features in a step-by-step feedback manner to make full use of deep multi-level features. Simultaneously, the proposed parallel attention fusion (PAF) module combines cross-level features and semantic-guided information to refine the object’s boundary and highlight the entire object area gradually. Finally, the proposed network architecture is trained through an end-to-end fully supervised model. Quantitative and qualitative evaluations on two public RSI datasets and additional NSI datasets across five metrics show that our SARNet is superior to 14 state-of-the-art (SOTA) methods without any post-processing.
Collapse
|
36
|
Pan C, Liu J, Yan WQ, Cao F, He W, Zhou Y. Salient Object Detection Based on Visual Perceptual Saturation and Two-Stream Hybrid Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4773-4787. [PMID: 33929959 DOI: 10.1109/tip.2021.3074796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Inspired by the perceived saturation of human visual system, this paper proposes a two-stream hybrid networks to simulate binocular vision for salient object detection (SOD). Each stream in our system consists of unsupervised and supervised methods to form a two-branch module, so as to model the interaction between human intuition and memory. The two-branch module parallel processes visual information with bottom-up and top-down SODs, and output two initial saliency maps. Then a polyharmonic neural network with random-weight (PNNRW) is utilized to fuse two-branch's perception and refine the salient objects by learning online via multi-source cues. Depend on visual perceptual saturation, we can select optimal parameter of superpixel for unsupervised branch, locate sampling regions for PNNRW, and construct a positive feedback loop to facilitate perception saturated after the perception fusion. By comparing the binary outputs of the two-stream, the pixel annotation of predicted object with high saturation degree could be taken as new training samples. The presented method constitutes a semi-supervised learning framework actually. Supervised branches only need to be pre-trained initial, the system can collect the training samples with high confidence level and then train new models by itself. Extensive experiments show that the new framework can improve performance of the existing SOD methods, that exceeds the state-of-the-art methods in six popular benchmarks.
Collapse
|
37
|
Wang X, Li S, Chen C, Hao A, Qin H. Depth quality-aware selective saliency fusion for RGB-D image salient object detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|