1
|
Wang X, Wang S, Li J, Li M, Li J, Xu Y. Omnidirectional image super-resolution via position attention network. Neural Netw 2024; 178:106464. [PMID: 38968779 DOI: 10.1016/j.neunet.2024.106464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/11/2024] [Accepted: 06/12/2024] [Indexed: 07/07/2024]
Abstract
For convenient transmission, omnidirectional images (ODIs) usually follow the equirectangular projection (ERP) format and are low-resolution. To provide better immersive experience, omnidirectional image super resolution (ODISR) is essential. However, ERP ODIs suffer from serious geometric distortion and pixel stretching across latitudes, generating massive redundant information at high latitudes. This characteristic poses a huge challenge for the traditional SR methods, which can only obtain the suboptimal ODISR performance. To address this issue, we propose a novel position attention network (PAN) for ODISR in this paper. Specifically, a two-branch structure is introduced, in which the basic enhancement branch (BE) serves to achieve coarse deep feature enhancement for extracted shallow features. Meanwhile, the position attention enhancement branch (PAE) builds a positional attention mechanism to dynamically adjust the contribution of features at different latitudes in the ERP representation according to their positions and stretching degrees, which achieves the enhancement for the differentiated information, suppresses the redundant information, and modulate the deep features with spatial distortion. Subsequently, the features of two branches are fused effectively to achieve the further refinement and adapt the distortion characteristic of ODIs. After that, we exploit a long-term memory module (LM), promoting information interactions and fusions between the branches to enhance the perception of the distortion, aggregating the prior hierarchical features to keep the long-term memory and boosting the ODISR performance. Extensive results demonstrate the state-of-the-art performance and the high efficiency of our PAN in ODISR.
Collapse
Affiliation(s)
- Xin Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Shiqi Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Jinxing Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China
| | - Mu Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China.
| | - Jinkai Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China
| | - Yong Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China.
| |
Collapse
|
2
|
Zhao C, Cai W, Hu C, Yuan Z. Cycle contrastive adversarial learning with structural consistency for unsupervised high-quality image deraining transformer. Neural Netw 2024; 178:106428. [PMID: 38901091 DOI: 10.1016/j.neunet.2024.106428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/18/2024] [Accepted: 06/02/2024] [Indexed: 06/22/2024]
Abstract
In overcoming the challenges faced in adapting to paired real-world data, recent unsupervised single image deraining (SID) methods have proven capable of accomplishing notably acceptable deraining performance. However, the previous methods usually fail to produce a high quality rain-free image due to neglecting sufficient attention to semantic representation and the image content, which results in the inability to completely separate the content from the rain layer. In this paper, we develop a novel cycle contrastive adversarial framework for unsupervised SID, which mainly consists of cycle contrastive learning (CCL) and location contrastive learning (LCL). Specifically, CCL achieves high-quality image reconstruction and rain-layer stripping by pulling similar features together while pushing dissimilar features further in both semantic and discriminant latent spaces. Meanwhile, LCL implicitly constrains the mutual information of the same location of different exemplars to maintain the content information. In addition, recently inspired by the powerful Segment Anything Model (SAM) that can effectively extract widely applicable semantic structural details, we formulate a structural-consistency regularization to fine-tune our network using SAM. Apart from this, we attempt to introduce vision transformer (VIT) into our network architecture to further improve the performance. In our designed transformer-based GAN, to obtain a stronger representation, we propose a multi-layer channel compression attention module (MCCAM) to extract a richer feature. Equipped with the above techniques, our proposed unsupervised SID algorithm, called CCLformer, can show advantageous image deraining performance. Extensive experiments demonstrate both the superiority of our method and the effectiveness of each module in CCLformer. The code is available at https://github.com/zhihefang/CCLGAN.
Collapse
Affiliation(s)
- Chen Zhao
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| | - Weiling Cai
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| | - Chengwei Hu
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| | - Zheng Yuan
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| |
Collapse
|
3
|
Hsu WY, Chang WC. Wavelet Approximation-Aware Residual Network for Single Image Deraining. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15979-15995. [PMID: 37610914 DOI: 10.1109/tpami.2023.3307666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
It has been made great progress on single image deraining based on deep convolutional neural networks (CNNs). In most existing deep deraining methods, CNNs aim to learn a direct mapping from rainy images to clean rain-less images, and their architectures are becoming more and more complex. However, due to the limitation of mixing rain with object edges and background, it is difficult to separate rain and object/background, and the edge details of the image cannot be effectively recovered in the reconstruction process. To address this problem, we propose a novel wavelet approximation-aware residual network (WAAR), wherein rain is effectively removed from both low-frequency structures and high-frequency details at each level separately, especially in low-frequency sub-images at each level. After wavelet transform, we propose novel approximation aware (AAM) and approximation level blending (ALB) mechanisms to further aid the low-frequency networks at each level recover the structure and texture of low-frequency sub-images recursively, while the high frequency network can effectively eliminate rain streaks through block connection and achieve different degrees of edge detail enhancement by adjusting hyperparameters. In addition, we also introduce block connection to enrich the high-frequency details in the high-frequency network, which is favorable for obtaining potential interdependencies between high- and low-frequency features. Experimental results indicate that the proposed WAAR exhibits strong performance in reconstructing clean and rain-free images, recovering real and undistorted texture structures, and enhancing image edges in comparison with the state-of-the-art approaches on synthetic and real image datasets. It shows the effectiveness of our method, especially on image edges and texture details.
Collapse
|
4
|
Yan T, Li M, Li B, Yang Y, Lau RWH. Rain Removal from Light Field Images with 4D Convolution and Multi-scale Gaussian Process. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:921-936. [PMID: 37018668 DOI: 10.1109/tip.2023.3234692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Existing deraining methods mainly focus on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plenoptic camera, which has emerged as a popular device in the computer vision and graphics research communities. However, making full use of the abundant information available from LFIs, such as 2D array of sub-views and the disparity map of each sub-view, for effective rain removal is still a challenging problem. In this paper, we propose a novel network, 4D-MGP-SRRNet, for rain streak removal from LFIs. Our method takes as input all sub-views of a rainy LFI. In order to make full use of the LFI, we adopt 4D convolutional layers to build the proposed rain steak removal network to simultaneously process all sub-views of the LFI. In the proposed network, the rain detection model, MGPDNet, with a novel Multi-scale Self-guided Gaussian Process (MSGP) module is proposed to detect high-resolution rain streaks from all sub-views of the input LFI at multi-scales. Semi-supervised learning is introduced for MSGP to accurately detect rain streaks by training on both virtual-world rainy LFIs and real-world rainy LFIs at multi-scales via calculating pseudo ground truths for real-world rain streaks. We then feed all sub-views subtracting the predicted rain streaks into a 4D convolution-based Depth Estimation Residual Network (DERNet) to estimate the depth maps, which are later converted into fog maps. Finally, all sub-views concatenated with the corresponding rain streaks and fog maps are fed into a powerful rainy LFI restoring model based on the adversarial recurrent neural network to progressively eliminate rain streaks and recover the rain-free LFI. Extensive quantitative and qualitative evaluations conducted on both synthetic LFIs and real-world LFIs demonstrate the effectiveness of our proposed method.
Collapse
|
5
|
A Survey of Deep Learning-Based Image Restoration Methods for Enhancing Situational Awareness at Disaster Sites: The Cases of Rain, Snow and Haze. SENSORS 2022; 22:s22134707. [PMID: 35808203 PMCID: PMC9269588 DOI: 10.3390/s22134707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/14/2022] [Accepted: 06/16/2022] [Indexed: 02/01/2023]
Abstract
This survey article is concerned with the emergence of vision augmentation AI tools for enhancing the situational awareness of first responders (FRs) in rescue operations. More specifically, the article surveys three families of image restoration methods serving the purpose of vision augmentation under adverse weather conditions. These image restoration methods are: (a) deraining; (b) desnowing; (c) dehazing ones. The contribution of this article is a survey of the recent literature on these three problem families, focusing on the utilization of deep learning (DL) models and meeting the requirements of their application in rescue operations. A faceted taxonomy is introduced in past and recent literature including various DL architectures, loss functions and datasets. Although there are multiple surveys on recovering images degraded by natural phenomena, the literature lacks a comprehensive survey focused explicitly on assisting FRs. This paper aims to fill this gap by presenting existing methods in the literature, assessing their suitability for FR applications, and providing insights for future research directions.
Collapse
|