1
|
Chen Q, Wen W, Qin J. GlobalSR: Global context network for single image super-resolution via deformable convolution attention and fast Fourier convolution. Neural Netw 2024; 180:106686. [PMID: 39260011 DOI: 10.1016/j.neunet.2024.106686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 07/12/2024] [Accepted: 08/30/2024] [Indexed: 09/13/2024]
Abstract
Vision Transformer have achieved impressive performance in image super-resolution. However, they suffer from low inference speed mainly because of the quadratic complexity of multi-head self-attention (MHSA), which is the key to learning long-range dependencies. On the contrary, most CNN-based methods neglect the important effect of global contextual information, resulting in inaccurate and blurring details. If one can make the best of both Transformers and CNNs, it will achieve a better trade-off between image quality and inference speed. Based on this observation, firstly assume that the main factor affecting the performance in the Transformer-based SR models is the general architecture design, not the specific MHSA component. To verify this, some ablation studies are made by replacing MHSA with large kernel convolutions, alongside other essential module replacements. Surprisingly, the derived models achieve competitive performance. Therefore, a general architecture design GlobalSR is extracted by not specifying the core modules including blocks and domain embeddings of Transformer-based SR models. It also contains three practical guidelines for designing a lightweight SR network utilizing image-level global contextual information to reconstruct SR images. Following the guidelines, the blocks and domain embeddings of GlobalSR are instantiated via Deformable Convolution Attention Block (DCAB) and Fast Fourier Convolution Domain Embedding (FCDE), respectively. The instantiation of general architecture, termed GlobalSR-DF, proposes a DCA to extract the global contextual feature by utilizing Deformable Convolution and a Hadamard product as the attention map at the block level. Meanwhile, the FCDE utilizes the Fast Fourier to transform the input spatial feature into frequency space and then extract image-level global information from it by convolutions. Extensive experiments demonstrate that GlobalSR is the key part in achieving a superior trade-off between SR quality and efficiency. Specifically, our proposed GlobalSR-DF outperforms state-of-the-art CNN-based and ViT-based SISR models regarding accuracy-speed trade-offs with sharp and natural details.
Collapse
Affiliation(s)
- Qiangpu Chen
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Wushao Wen
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Jinghui Qin
- School of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
2
|
Li G, Cui Z, Li M, Han Y, Li T. Multi-attention fusion transformer for single-image super-resolution. Sci Rep 2024; 14:10222. [PMID: 38702417 PMCID: PMC11068767 DOI: 10.1038/s41598-024-60579-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 04/24/2024] [Indexed: 05/06/2024] Open
Abstract
Recently, Transformer-based methods have gained prominence in image super-resolution (SR) tasks, addressing the challenge of long-range dependence through the incorporation of cross-layer connectivity and local attention mechanisms. However, the analysis of these networks using local attribution maps has revealed significant limitations in leveraging the spatial extent of input information. To unlock the inherent potential of Transformer in image SR, we propose the Multi-Attention Fusion Transformer (MAFT), a novel model designed to integrate multiple attention mechanisms with the objective of expanding the number and range of pixels activated during image reconstruction. This integration enhances the effective utilization of input information space. At the core of our model lies the Multi-attention Adaptive Integration Groups, which facilitate the transition from dense local attention to sparse global attention through the introduction of Local Attention Aggregation and Global Attention Aggregation blocks with alternating connections, effectively broadening the network's receptive field. The effectiveness of our proposed algorithm has been validated through comprehensive quantitative and qualitative evaluation experiments conducted on benchmark datasets. Compared to state-of-the-art methods (e.g. HAT), the proposed MAFT achieves 0.09 dB gains on Urban100 dataset for × 4 SR task while containing 32.55% and 38.01% fewer parameters and FLOPs, respectively.
Collapse
Affiliation(s)
- Guanxing Li
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong, China
| | - Zhaotong Cui
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong, China
| | - Meng Li
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong, China
| | - Yu Han
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong, China
| | - Tianping Li
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong, China.
| |
Collapse
|
3
|
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH, Shao L. Learning Enriched Features for Fast Image Restoration and Enhancement. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1934-1948. [PMID: 35417348 DOI: 10.1109/tpami.2022.3167175] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatial details are preserved but the contextual information cannot be precisely encoded. In the latter case, generated outputs are semantically reliable but spatially less accurate. This paper presents a new architecture with a holistic goal of maintaining spatially-precise high-resolution representations through the entire network, and receiving complementary contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing the following key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) non-local attention mechanism for capturing contextual information, and (d) attention based multi-scale feature aggregation. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on six real image benchmark datasets demonstrate that our method, named as MIRNet-v2, achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNetv2.
Collapse
|
4
|
Xu S, Dutta V, He X, Matsumaru T. A Transformer-Based Model for Super-Resolution of Anime Image. SENSORS (BASEL, SWITZERLAND) 2022; 22:8126. [PMID: 36365830 PMCID: PMC9657210 DOI: 10.3390/s22218126] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/06/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on convolutional neural networks (CNNs), while few methods attempt to use transformers that perform well in other advanced vision tasks. We propose a so-called anime image super-resolution (AISR) method based on the Swin Transformer in this work. The work was carried out in several stages. First, a shallow feature extraction approach was employed to facilitate the features map of the input image's low-frequency information, which mainly approximates the distribution of detailed information in a spatial structure (shallow feature). Next, we applied deep feature extraction to extract the image semantic information (deep feature). Finally, the image reconstruction method combines shallow and deep features to upsample the feature size and performs sub-pixel convolution to obtain many feature map channels. The novelty of the proposal is the enhancement of the low-frequency information using a Gaussian filter and the introduction of different window sizes to replace the patch merging operations in the Swin Transformer. A high-quality anime dataset was constructed to curb the effects of the model robustness on the online regime. We trained our model on this dataset and tested the model quality. We implement anime image super-resolution tasks at different magnifications (2×, 4×, 8×). The results were compared numerically and graphically with those delivered by conventional convolutional neural network-based and transformer-based methods. We demonstrate the experiments numerically using standard peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. The series of experiments and ablation study showcase that our proposal outperforms others.
Collapse
Affiliation(s)
- Shizhuo Xu
- Graduate School of Information, Production and System, Waseda University, Kitakyushu 808-0135, Japan
| | - Vibekananda Dutta
- Institute of Micromechanics and Photonics, Faculty of Mechatronics, Warsaw University of Technology, 00-661 Warszawa, Poland
| | - Xin He
- Graduate School of Information, Production and System, Waseda University, Kitakyushu 808-0135, Japan
| | - Takafumi Matsumaru
- Graduate School of Information, Production and System, Waseda University, Kitakyushu 808-0135, Japan
| |
Collapse
|
5
|
Tu Z, Li H, Xie W, Liu Y, Zhang S, Li B, Yuan J. Optical flow for video super-resolution: a survey. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10159-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Jagdale RH, Shah SK. V - Channel magnification enabled by hybrid optimization algorithm: Enhancement of video super resolution. Gene Expr Patterns 2022; 45:119264. [PMID: 35868521 DOI: 10.1016/j.gep.2022.119264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 07/06/2022] [Accepted: 07/16/2022] [Indexed: 11/16/2022]
Abstract
Although being a really active area of research, television super-resolution remains a difficult problem. Additionally, it is noted that the blur motion and computational crisis hinder the enhancement. As a result, the goal of this research is to present a brand-new smart SR framework for the camera shot. To create High Resolution (HR) videos, first frames in RGB format are converted to HSV and then the V-channel is enhanced. In order to create enriched video frames, a high - dimension grid with enhanced pixel intensity is then created. This paper introduces a particular progression to enable this: Motion estimation, Cubic Spline Interpolation, and Deblurring or Sharpening are the three methods. By carefully adjusting the parameters, the cubic spline interpolation is improved during operation. A brand-new hybrid technique dubbed Lion with Particle Swarm Velocity Update (LPSO-VU), which combines the principles of the Lion Algorithm (LA) and Particle Swarm Optimization (PSO) algorithms, is presented for this optimal tuning purpose. Finally, using the BRISQUE, SDME, and ESSIM metrics, the adequacy of the method is contrasted to other traditional models, and its superiority is demonstrated. Accordingly, the analysis shows that the suggested LPSO-VU model for video frame 1 is 16.6%, 25.56%, 26.2%, 26.2%, and 27.2% superior to the previous systems like PSO, GWO, WOA, ROA, MF-ROA, and LA, respectively, in terms of BRISQUE.
Collapse
Affiliation(s)
- Rohita H Jagdale
- Assistant Professor (E &TC), Sinhgad College of Engineering, Vadgaon Budruk, Pune, Maharashtra 411041, India.
| | - Sanjeevani K Shah
- Professor & Head (PG-E & TC), Smt. KashibaiNavale College of Engineering, Ambegaon BK, Pune, Maharashtra 411041, India
| |
Collapse
|
7
|
Wang Z, Chen J, Hoi SCH. Deep Learning for Image Super-Resolution: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3365-3387. [PMID: 32217470 DOI: 10.1109/tpami.2020.2982166] [Citation(s) in RCA: 206] [Impact Index Per Article: 68.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
Collapse
|
8
|
|
9
|
Bashir SMA, Wang Y, Khan M, Niu Y. A comprehensive review of deep learning-based single image super-resolution. PeerJ Comput Sci 2021; 7:e621. [PMID: 34322592 PMCID: PMC8293932 DOI: 10.7717/peerj-cs.621] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 06/11/2021] [Indexed: 05/19/2023]
Abstract
Image super-resolution (SR) is one of the vital image processing methods that improve the resolution of an image in the field of computer vision. In the last two decades, significant progress has been made in the field of super-resolution, especially by utilizing deep learning methods. This survey is an effort to provide a detailed survey of recent progress in single-image super-resolution in the perspective of deep learning while also informing about the initial classical methods used for image super-resolution. The survey classifies the image SR methods into four categories, i.e., classical methods, supervised learning-based methods, unsupervised learning-based methods, and domain-specific SR methods. We also introduce the problem of SR to provide intuition about image quality metrics, available reference datasets, and SR challenges. Deep learning-based approaches of SR are evaluated using a reference dataset. Some of the reviewed state-of-the-art image SR methods include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN), multiscale residual network (MSRN), meta residual dense network (Meta-RDN), recurrent back-projection network (RBPN), second-order attention network (SAN), SR feedback network (SRFBN) and the wavelet-based residual attention network (WRAN). Finally, this survey is concluded with future directions and trends in SR and open problems in SR to be addressed by the researchers.
Collapse
Affiliation(s)
- Syed Muhammad Arsalan Bashir
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
- Quality Assurance, Pakistan Space and Upper Atmosphere Research Commission, Karachi, Sindh, Pakistan
| | - Yi Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Mahrukh Khan
- Department of Computer Science, National University of Computer and Emerging Sciences, Karachi, Sindh, Pakistan
| | - Yilong Niu
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| |
Collapse
|
10
|
Liu P, Zhang H, Cao Y, Liu S, Ren D, Zuo W. Learning cascaded convolutional networks for blind single image super-resolution. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.122] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Liu H, Cao F. Improved dual-scale residual network for image super-resolution. Neural Netw 2020; 132:84-95. [PMID: 32861917 DOI: 10.1016/j.neunet.2020.08.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 07/24/2020] [Accepted: 08/11/2020] [Indexed: 11/29/2022]
Abstract
In recent years, convolutional neural networks have been successfully applied to single image super-resolution (SISR) tasks, making breakthrough progress both in accuracy and speed. In this work, an improved dual-scale residual network (IDSRN), achieving promising reconstruction performance without sacrificing too much calculations, is proposed for SISR. The proposed network extracts features through two independent parallel branches: dual-scale feature extraction branch and texture attention branch. The improved dual-scale residual block (IDSRB) combined with active weighted mapping strategy constitutes the dual-scale feature extraction branch, which aims to capture dual-scale features of the image. As regards the texture attention branch, an encoder-decoder network employing symmetric full convolutional-deconvolution structure acts as a feature selector to enhance the high-frequency details. The integration of two branches reaches the goal of capturing dual-scale features with high-frequency information. Comparative experiments and extensive studies indicate that the proposed IDSRN can catch up with the state-of-the-art approaches in terms of accuracy and efficiency.
Collapse
Affiliation(s)
- Huan Liu
- Department of Mathematics and Information Sciences, China Jiliang University, Hangzhou 310018, Zhejiang Province, PR China.
| | - Feilong Cao
- Department of Mathematics and Information Sciences, China Jiliang University, Hangzhou 310018, Zhejiang Province, PR China.
| |
Collapse
|
12
|
Yoo JS, Kim JO. Noise-Robust Iterative Back-Projection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1219-1232. [PMID: 31535993 DOI: 10.1109/tip.2019.2940414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Noisy image super-resolution (SR) is a significant challenging process due to the smoothness caused by denoising. Iterative back-projection (IBP) can be helpful in further enhancing the reconstructed SR image, but there is no clean reference image available. This paper proposes a novel back-projection algorithm for noisy image SR. Its main goal is to pursuit the consistency between LR and SR images. We aim to estimate the clean reconstruction error to be back-projected, using the noisy and denoised reconstruction errors. We formulate a new cost function on the principal component analysis (PCA) transform domain to estimate the clean reconstruction error. In the data term of the cost function, noisy and denoised reconstruction errors are combined in a region-adaptive manner using texture probability. In addition, the sparsity constraint is incorporated into the regularization term, based on the Laplacian characteristics of the reconstruction error. Finally, we propose an eigenvector estimation method to minimize the effect of noise. The experimental results demonstrate that the proposed method can perform back-projection in a more noise-robust manner than the conventional IBP, and harmoniously work with any other SR methods as a post-processing.
Collapse
|
13
|
|
14
|
|
15
|
SK-SVR: Sigmoid kernel support vector regression based in-scale single image super-resolution. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.04.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
16
|
Perceptual Losses for Real-Time Style Transfer and Super-Resolution. COMPUTER VISION – ECCV 2016 2016. [DOI: 10.1007/978-3-319-46475-6_43] [Citation(s) in RCA: 2271] [Impact Index Per Article: 283.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
17
|
Hsu CC, Kang LW, Lin CW. Temporally coherent superresolution of textured video via dynamic texture synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:919-931. [PMID: 25576569 DOI: 10.1109/tip.2014.2387416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper addresses the problem of hallucinating the missing high-resolution (HR) details of a low-resolution (LR) video while maintaining the temporal coherence of the reconstructed HR details using dynamic texture synthesis (DTS). Most existing multiframe-based video superresolution (SR) methods suffer from the problem of limited reconstructed visual quality due to inaccurate subpixel motion estimation between frames in an LR video. To achieve high-quality reconstruction of HR details for an LR video, we propose a texture-synthesis (TS)-based video SR method, in which a novel DTS scheme is proposed to render the reconstructed HR details in a temporally coherent way, which effectively addresses the temporal incoherence problem caused by traditional TS-based image SR methods. To further reduce the complexity of the proposed method, our method only performs the TS-based SR on a set of key frames, while the HR details of the remaining nonkey frames are simply predicted using the bidirectional overlapped block motion compensation. After all frames are upscaled, the proposed DTS-SR is applied to maintain the temporal coherence in the HR video. Experimental results demonstrate that the proposed method achieves significant subjective and objective visual quality improvement over state-of-the-art video SR methods.
Collapse
|
18
|
Karam LJ, Sadaka NG, Ferzli R, Ivanovski ZA. An efficient selective perceptual-based super-resolution estimator. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2011; 20:3470-3482. [PMID: 21672677 DOI: 10.1109/tip.2011.2159324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In this paper, a selective perceptual-based (SELP) framework is presented to reduce the complexity of popular super-resolution (SR) algorithms while maintaining the desired quality of the enhanced images/video. A perceptual human visual system model is proposed to compute local contrast sensitivity thresholds. The obtained thresholds are used to select which pixels are super-resolved based on the perceived visibility of local edges. Processing only a set of perceptually significant pixels reduces significantly the computational complexity of SR algorithms without losing the achievable visual quality. The proposed SELP framework is integrated into a maximum-a posteriori-based SR algorithm as well as a fast two-stage fusion-restoration SR estimator. Simulation results show a significant reduction on average in computational complexity with comparable signal-to-noise ratio gains and visual quality.
Collapse
Affiliation(s)
- Lina J Karam
- School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, AZ 85287-5706, USA.
| | | | | | | |
Collapse
|
19
|
Jurio A, Pagola M, Mesiar R, Beliakov G, Bustince H. Image magnification using interval information. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2011; 20:3112-3123. [PMID: 21632304 DOI: 10.1109/tip.2011.2158227] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In this paper, a simple and effective image-magnification algorithm based on intervals is proposed. A low-resolution image is magnified to form a high-resolution image using a block-expanding method. Our proposed method associates each pixel with an interval obtained by a weighted aggregation of the pixels in its neighborhood. From the interval and with a linear K(α) operator, we obtain the magnified image. Experimental results show that our algorithm provides a magnified image with better quality (peak signal-to-noise ratio) than several existing methods.
Collapse
Affiliation(s)
- Aranzazu Jurio
- Departamento de Automatica y Computacion, Universidad Publica de Navarra, Pamplona, Spain.
| | | | | | | | | |
Collapse
|