1
|
Ho QT, Duong MT, Lee S, Hong MC. EHNet: Efficient Hybrid Network with Dual Attention for Image Deblurring. SENSORS (BASEL, SWITZERLAND) 2024; 24:6545. [PMID: 39460026 PMCID: PMC11511264 DOI: 10.3390/s24206545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 09/26/2024] [Accepted: 10/08/2024] [Indexed: 10/28/2024]
Abstract
The motion of an object or camera platform makes the acquired image blurred. This degradation is a major reason to obtain a poor-quality image from an imaging sensor. Therefore, developing an efficient deep-learning-based image processing method to remove the blur artifact is desirable. Deep learning has recently demonstrated significant efficacy in image deblurring, primarily through convolutional neural networks (CNNs) and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range structural dependencies. In contrast, Transformers excel at modeling these dependencies, but they are computationally expensive for high-resolution inputs and lack the appropriate inductive bias. To overcome these challenges, we propose an Efficient Hybrid Network (EHNet) that employs CNN encoders for local feature extraction and Transformer decoders with a dual-attention module to capture spatial and channel-wise dependencies. This synergy facilitates the acquisition of rich contextual information for high-quality image deblurring. Additionally, we introduce the Simple Feature-Embedding Module (SFEM) to replace the pointwise and depthwise convolutions to generate simplified embedding features in the self-attention mechanism. This innovation substantially reduces computational complexity and memory usage while maintaining overall performance. Finally, through comprehensive experiments, our compact model yields promising quantitative and qualitative results for image deblurring on various benchmark datasets.
Collapse
Affiliation(s)
- Quoc-Thien Ho
- Department of Information and Telecommunication Engineering, Soongsil University, Seoul 06978, Republic of Korea;
| | - Minh-Thien Duong
- Department of Automatic Control, Ho Chi Minh City University of Technology and Education, Ho Chi Minh City 70000, Vietnam;
| | - Seongsoo Lee
- Department of Intelligent Semiconductor, Soongsil University, Seoul 06978, Republic of Korea;
| | - Min-Cheol Hong
- School of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea
| |
Collapse
|
2
|
Kang Q, Lao Q, Gao J, Liu J, Yi H, Ma B, Zhang X, Li K. Deblurring masked image modeling for ultrasound image analysis. Med Image Anal 2024; 97:103256. [PMID: 39047605 DOI: 10.1016/j.media.2024.103256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/19/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024]
Abstract
Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM approaches fail to account for the gap between natural images and ultrasound, as well as the intrinsic imaging characteristics of the ultrasound modality, such as the high noise-to-signal ratio. In this paper, motivated by the unique high noise-to-signal ratio property in ultrasound, we propose a deblurring MIM approach specialized to ultrasound, which incorporates a deblurring task into the pretraining proxy task. The incorporation of deblurring facilitates the pretraining to better recover the subtle details within ultrasound images that are vital for subsequent downstream analysis. Furthermore, we employ a multi-scale hierarchical encoder to extract both local and global contextual cues for improved performance, especially on pixel-wise tasks such as segmentation. We conduct extensive experiments involving 280,000 ultrasound images for the pretraining and evaluate the downstream transfer performance of the pretrained model on various disease diagnoses (nodule, Hashimoto's thyroiditis) and task types (classification, segmentation). The experimental results demonstrate the efficacy of the proposed deblurring MIM, achieving state-of-the-art performance across a wide range of downstream tasks and datasets. Overall, our work highlights the potential of deblurring MIM for ultrasound image analysis, presenting an ultrasound-specific vision foundation model.
Collapse
Affiliation(s)
- Qingbo Kang
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China
| | - Qicheng Lao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China.
| | - Jun Gao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; College of Computer Science, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Jingyan Liu
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Huahui Yi
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Buyun Ma
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiaofan Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China; Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China.
| |
Collapse
|
3
|
Ren F, Liu H, Wang H. A LiDAR-Camera Joint Calibration Algorithm Based on Deep Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:6033. [PMID: 39338778 PMCID: PMC11435776 DOI: 10.3390/s24186033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 09/14/2024] [Accepted: 09/16/2024] [Indexed: 09/30/2024]
Abstract
Multisensor (MS) data fusion is important for improving the stability of vehicle environmental perception systems. MS joint calibration is a prerequisite for the fusion of multimodality sensors. Traditional calibration methods based on calibration boards require the manual extraction of many features and manual registration, resulting in a cumbersome calibration process and significant errors. A joint calibration algorithm for a Light Laser Detection and Ranging (LiDAR) and camera is proposed based on deep learning without the need for other special calibration objects. A network model constructed based on deep learning can automatically capture object features in the environment and complete the calibration by matching and calculating object features. A mathematical model was constructed for joint LiDAR-camera calibration, and the process of sensor joint calibration was analyzed in detail. By constructing a deep-learning-based network model to determine the parameters of the rotation matrix and translation matrix, the relative spatial positions of the two sensors were determined to complete the joint calibration. The network model consists of three parts: a feature extraction module, a feature-matching module, and a feature aggregation module. The feature extraction module extracts the image features of color and depth images, the feature-matching module calculates the correlation between the two, and the feature aggregation module determines the calibration matrix parameters. The proposed algorithm was validated and tested on the KITTI-odometry dataset and compared with other advanced algorithms. The experimental results show that the average translation error of the calibration algorithm is 0.26 cm, and the average rotation error is 0.02°. The calibration error is lower than those of other advanced algorithms.
Collapse
Affiliation(s)
- Fujie Ren
- College of Mechanical and Energy Engineering, Beijing University of Technology, Beijing 100124, China
| | - Haibin Liu
- College of Mechanical and Energy Engineering, Beijing University of Technology, Beijing 100124, China
| | - Huanjie Wang
- College of Mechanical and Energy Engineering, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
4
|
Yeh HH, Hsu BWY, Chou SY, Hsu TJ, Tseng VS, Lee CH. Deep Deblurring in Teledermatology: Deep Learning Models Restore the Accuracy of Blurry Images' Classification. Telemed J E Health 2024; 30:2477-2482. [PMID: 38934135 DOI: 10.1089/tmj.2023.0703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024] Open
Abstract
Background: Blurry images in teledermatology and consultation increased the diagnostic difficulty for both deep learning models and physicians. We aim to determine the extent of restoration in diagnostic accuracy after blurry images are deblurred by deep learning models. Methods: We used 19,191 skin images from a public skin image dataset that includes 23 skin disease categories, 54 skin images from a public dataset of blurry skin images, and 53 blurry dermatology consultation photos in a medical center to compare the diagnosis accuracy of trained diagnostic deep learning models and subjective sharpness between blurry and deblurred images. We evaluated five different deblurring models, including models for motion blur, Gaussian blur, Bokeh blur, mixed slight blur, and mixed strong blur. Main Outcomes and Measures: Diagnostic accuracy was measured as sensitivity and precision of correct model prediction of the skin disease category. Sharpness rating was performed by board-certified dermatologists on a 4-point scale, with 4 being the highest image clarity. Results: The sensitivity of diagnostic models dropped 0.15 and 0.22 on slightly and strongly blurred images, respectively, and deblurring models restored 0.14 and 0.17 for each group. The sharpness ratings perceived by dermatologists improved from 1.87 to 2.51 after deblurring. Activation maps showed the focus of diagnostic models was compromised by the blurriness but was restored after deblurring. Conclusions: Deep learning models can restore the diagnostic accuracy of diagnostic models for blurry images and increase image sharpness perceived by dermatologists. The model can be incorporated into teledermatology to help the diagnosis of blurry images.
Collapse
Affiliation(s)
- Hsu-Hang Yeh
- Department of Ophthalmology, National Taiwan University Hospital, Taipei, Taiwan
| | - Benny Wei-Yun Hsu
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Sheng-Yuan Chou
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Ting-Jung Hsu
- Department of Dermatology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Vincent S Tseng
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Chih-Hung Lee
- Department of Dermatology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
- Institute for Translational Research in Biomedicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| |
Collapse
|
5
|
Lee TB, Heo YS. ABDGAN: Arbitrary Time Blur Decomposition Using Critic-Guided TripleGAN. SENSORS (BASEL, SWITZERLAND) 2024; 24:4801. [PMID: 39123847 PMCID: PMC11314794 DOI: 10.3390/s24154801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/15/2024] [Accepted: 07/23/2024] [Indexed: 08/12/2024]
Abstract
Recent studies have proposed methods for extracting latent sharp frames from a single blurred image. However, these methods still suffer from limitations in restoring satisfactory images. In addition, most existing methods are limited to decomposing a blurred image into sharp frames with a fixed frame rate. To address these problems, we present an Arbitrary Time Blur Decomposition Triple Generative Adversarial Network (ABDGAN) that restores sharp frames with flexible frame rates. Our framework plays a min-max game consisting of a generator, a discriminator, and a time-code predictor. The generator serves as a time-conditional deblurring network, while the discriminator and the label predictor provide feedback to the generator on producing realistic and sharp image depending on given time code. To provide adequate feedback for the generator, we propose a critic-guided (CG) loss by collaboration of the discriminator and time-code predictor. We also propose a pairwise order-consistency (POC) loss to ensure that each pixel in a predicted image consistently corresponds to the same ground-truth frame. Extensive experiments show that our method outperforms previously reported methods in both qualitative and quantitative evaluations. Compared to the best competitor, the proposed ABDGAN improves PSNR, SSIM, and LPIPS on the GoPro test set by 16.67%, 9.16%, and 36.61%, respectively. For the B-Aist++ test set, our method shows improvements of 6.99%, 2.38%, and 17.05% in PSNR, SSIM, and LPIPS, respectively, compared to the best competitive method.
Collapse
Affiliation(s)
- Tae Bok Lee
- Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea;
| | - Yong Seok Heo
- Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea;
- Department of Electrical and Computer Engineering, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|
6
|
Gao M, Fessler JA, Chan HP. X-ray source motion blur modeling and deblurring with generative diffusion for digital breast tomosynthesis. Phys Med Biol 2024; 69:115003. [PMID: 38640913 PMCID: PMC11103667 DOI: 10.1088/1361-6560/ad40f8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/27/2024] [Accepted: 04/19/2024] [Indexed: 04/21/2024]
Abstract
Objective. Digital breast tomosynthesis (DBT) has significantly improved the diagnosis of breast cancer due to its high sensitivity and specificity in detecting breast lesions compared to two-dimensional mammography. However, one of the primary challenges in DBT is the image blur resulting from x-ray source motion, particularly in DBT systems with a source in continuous-motion mode. This motion-induced blur can degrade the spatial resolution of DBT images, potentially affecting the visibility of subtle lesions such as microcalcifications.Approach. We addressed this issue by deriving an analytical in-plane source blur kernel for DBT images based on imaging geometry and proposing a post-processing image deblurring method with a generative diffusion model as an image prior.Main results. We showed that the source blur could be approximated by a shift-invariant kernel over the DBT slice at a given height above the detector, and we validated the accuracy of our blur kernel modeling through simulation. We also demonstrated the ability of the diffusion model to generate realistic DBT images. The proposed deblurring method successfully enhanced spatial resolution when applied to DBT images reconstructed with detector blur and correlated noise modeling.Significance. Our study demonstrated the advantages of modeling the imaging system components such as source motion blur for improving DBT image quality.
Collapse
Affiliation(s)
- Mingjie Gao
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, United States of America
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Jeffrey A Fessler
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, United States of America
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, United States of America
| |
Collapse
|
7
|
Pang MM, Chen F, Xie M, Druckmann S, Clandinin TR, Yang HH. A recurrent neural circuit in Drosophila deblurs visual inputs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590352. [PMID: 38712245 PMCID: PMC11071408 DOI: 10.1101/2024.04.19.590352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
A critical goal of vision is to detect changes in light intensity, even when these changes are blurred by the spatial resolution of the eye and the motion of the animal. Here we describe a recurrent neural circuit in Drosophila that compensates for blur and thereby selectively enhances the perceived contrast of moving edges. Using in vivo, two-photon voltage imaging, we measured the temporal response properties of L1 and L2, two cell types that receive direct synaptic input from photoreceptors. These neurons have biphasic responses to brief flashes of light, a hallmark of cells that encode changes in stimulus intensity. However, the second phase was often much larger than the first, creating an unusual temporal filter. Genetic dissection revealed that recurrent neural circuitry strongly shapes the second phase of the response, informing the structure of a dynamical model. By applying this model to moving natural images, we demonstrate that rather than veridically representing stimulus changes, this temporal processing strategy systematically enhances them, amplifying and sharpening responses. Comparing the measured responses of L2 to model predictions across both artificial and natural stimuli revealed that L2 tunes its properties as the model predicts in order to deblur images. Since this strategy is tunable to behavioral context, generalizable to any time-varying sensory input, and implementable with a common circuit motif, we propose that it could be broadly used to selectively enhance sharp and salient changes.
Collapse
Affiliation(s)
- Michelle M. Pang
- Department of Neurobiology, Stanford University, Stanford, CA 94305, USA
| | - Feng Chen
- Department of Neurobiology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Marjorie Xie
- Department of Neurobiology, Stanford University, Stanford, CA 94305, USA
- Current affiliation: School for the Future of Innovation of Society, Arizona State University, Tempe, AZ 85281, USA
| | - Shaul Druckmann
- Department of Neurobiology, Stanford University, Stanford, CA 94305, USA
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | | | - Helen H. Yang
- Department of Neurobiology, Stanford University, Stanford, CA 94305, USA
- Current affiliation: Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
- Lead contact
| |
Collapse
|
8
|
Cui Y, Knoll A. Dual-domain strip attention for image restoration. Neural Netw 2024; 171:429-439. [PMID: 38142482 DOI: 10.1016/j.neunet.2023.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/28/2023] [Accepted: 12/01/2023] [Indexed: 12/26/2023]
Abstract
Image restoration aims to reconstruct a latent high-quality image from a degraded observation. Recently, the usage of Transformer has significantly advanced the state-of-the-art performance of various image restoration tasks due to its powerful ability to model long-range dependencies. However, the quadratic complexity of self-attention hinders practical applications. Moreover, sufficiently leveraging the huge spectral disparity between clean and degraded image pairs can also be conducive to image restoration. In this paper, we develop a dual-domain strip attention mechanism for image restoration by enhancing representation learning, which consists of spatial and frequency strip attention units. Specifically, the spatial strip attention unit harvests the contextual information for each pixel from its adjacent locations in the same row or column under the guidance of the learned weights via a simple convolutional branch. In addition, the frequency strip attention unit refines features in the spectral domain via frequency separation and modulation, which is implemented by simple pooling techniques. Furthermore, we apply different strip sizes for enhancing multi-scale learning, which is beneficial for handling degradations of various sizes. By employing the dual-domain attention units in different directions, each pixel can implicitly perceive information from an expanded region. Taken together, the proposed dual-domain strip attention network (DSANet) achieves state-of-the-art performance on 12 different datasets for four image restoration tasks, including image dehazing, image desnowing, image denoising, and image defocus deblurring. The code and models are available at https://github.com/c-yn/DSANet.
Collapse
Affiliation(s)
- Yuning Cui
- School of Computation, Information and Technology, Technical University of Munich, Munich, 85748, Germany.
| | - Alois Knoll
- School of Computation, Information and Technology, Technical University of Munich, Munich, 85748, Germany
| |
Collapse
|
9
|
Meng D, Zhou Y, Bai J. 4-K-resolution minimalist optical system design based on deep learning. APPLIED OPTICS 2024; 63:917-926. [PMID: 38437388 DOI: 10.1364/ao.510860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/18/2023] [Indexed: 03/06/2024]
Abstract
In order to simplify optical systems, we propose a high-resolution minimalist optical design method based on deep learning. Unlike most imaging system design work, we combine optical design more closely with image processing algorithms. For optical design, we separately study the impact of different aberrations on computational imaging and then innovatively propose an aberration metric and a spatially micro-variant design method that better meet the needs of image recognition. For image processing, we construct a dataset based on the point spread function (PSF) imaging simulation method. In addition, we use a non-blind deblurring computational imaging method to repair spatially variant aberrations. Finally, we achieve clear imaging at 4 K (5184×3888) using only two spherical lenses and achieve image quality similar to that of complex lenses on the market.
Collapse
|
10
|
Fontbonne A, Trouvé-Peloux P, Champagnat F, Jobert G, Druart G. Embedded Processing for Extended Depth of Field Imaging Systems: From Infinite Impulse Response Wiener Filter to Learned Deconvolution. SENSORS (BASEL, SWITZERLAND) 2023; 23:9462. [PMID: 38067835 PMCID: PMC10708841 DOI: 10.3390/s23239462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 11/22/2023] [Accepted: 11/26/2023] [Indexed: 10/16/2024]
Abstract
Many works in the state of the art are interested in the increase of the camera depth of field (DoF) via the joint optimization of an optical component (typically a phase mask) and a digital processing step with an infinite deconvolution support or a neural network. This can be used either to see sharp objects from a greater distance or to reduce manufacturing costs due to tolerance regarding the sensor position. Here, we study the case of an embedded processing with only one convolution with a finite kernel size. The finite impulse response (FIR) filter coefficients are learned or computed based on a Wiener filter paradigm. It involves an optical model typical of codesigned systems for DoF extension and a scene power spectral density, which is either learned or modeled. We compare different FIR filters and present a method for dimensioning their sizes prior to a joint optimization. We also show that, among the filters compared, the learning approach enables an easy adaptation to a database, but the other approaches are equally robust.
Collapse
Affiliation(s)
- Alice Fontbonne
- DOTA, ONERA, Université Paris Saclay, 91123 Palaiseau, France
| | | | | | | | | |
Collapse
|
11
|
Miranda-González AA, Rosales-Silva AJ, Mújica-Vargas D, Escamilla-Ambrosio PJ, Gallegos-Funes FJ, Vianney-Kinani JM, Velázquez-Lozada E, Pérez-Hernández LM, Lozano-Vázquez LV. Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1467. [PMID: 37895588 PMCID: PMC10606544 DOI: 10.3390/e25101467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/06/2023] [Accepted: 07/31/2023] [Indexed: 10/29/2023]
Abstract
Noise suppression algorithms have been used in various tasks such as computer vision, industrial inspection, and video surveillance, among others. The robust image processing systems need to be fed with images closer to a real scene; however, sometimes, due to external factors, the data that represent the image captured are altered, which is translated into a loss of information. In this way, there are required procedures to recover data information closest to the real scene. This research project proposes a Denoising Vanilla Autoencoding (DVA) architecture by means of unsupervised neural networks for Gaussian denoising in color and grayscale images. The methodology improves other state-of-the-art architectures by means of objective numerical results. Additionally, a validation set and a high-resolution noisy image set are used, which reveal that our proposal outperforms other types of neural networks responsible for suppressing noise in images.
Collapse
Affiliation(s)
- Armando Adrián Miranda-González
- Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico; (A.A.M.-G.); (F.J.G.-F.); (E.V.-L.); (L.M.P.-H.); (L.V.L.-V.)
| | - Alberto Jorge Rosales-Silva
- Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico; (A.A.M.-G.); (F.J.G.-F.); (E.V.-L.); (L.M.P.-H.); (L.V.L.-V.)
| | - Dante Mújica-Vargas
- Departamento de Ciencias Computacionales, Tecnológico Nacional de México, Cuernavaca 62490, Mexico; (D.M.-V.); (J.M.V.-K.)
| | | | - Francisco Javier Gallegos-Funes
- Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico; (A.A.M.-G.); (F.J.G.-F.); (E.V.-L.); (L.M.P.-H.); (L.V.L.-V.)
| | - Jean Marie Vianney-Kinani
- Departamento de Ciencias Computacionales, Tecnológico Nacional de México, Cuernavaca 62490, Mexico; (D.M.-V.); (J.M.V.-K.)
- Unidad Profesional Interdisciplinaria de Ingeniería Campus Hidalgo, Instituto Politécnico Nacional, Pachuca de Soto 42162, Mexico
| | - Erick Velázquez-Lozada
- Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico; (A.A.M.-G.); (F.J.G.-F.); (E.V.-L.); (L.M.P.-H.); (L.V.L.-V.)
| | - Luis Manuel Pérez-Hernández
- Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico; (A.A.M.-G.); (F.J.G.-F.); (E.V.-L.); (L.M.P.-H.); (L.V.L.-V.)
| | - Lucero Verónica Lozano-Vázquez
- Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico; (A.A.M.-G.); (F.J.G.-F.); (E.V.-L.); (L.M.P.-H.); (L.V.L.-V.)
| |
Collapse
|
12
|
Li Z, Gao Z, Yi H, Fu Y, Chen B. Image Deblurring With Image Blurring. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5595-5609. [PMID: 37812541 DOI: 10.1109/tip.2023.3321515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
Deep learning (DL) based methods for motion deblurring, taking advantage of large-scale datasets and sophisticated network structures, have reported promising results. However, two challenges still remain: existing methods usually perform well on synthetic datasets but cannot deal with complex real-world blur, and in addition, over- and under-estimation of the blur will result in restored images that remain blurred and even introduce unwanted distortion. We propose a motion deblurring framework that includes a Blur Space Disentangled Network (BSDNet) and a Hierarchical Scale-recurrent Deblurring Network (HSDNet) to address these issues. Specifically, we train an image blurring model to facilitate learning a better image deblurring model. Firstly, BSDNet learns how to separate the blur features from blurry images, which is adaptable for blur transferring, dataset augmentation, and ultimately directing the deblurring model. Secondly, to gradually recover sharp information in a coarse-to-fine manner, HSDNet makes full use of the blur features acquired by BSDNet as a priori and breaks down the non-uniform deblurring task into various subtasks. Moreover, the motion blur dataset created by BSDNet also bridges the gap between training images and actual blur. Extensive experiments on real-world blur datasets demonstrate that our method works effectively on complex scenarios, resulting in the best performance that significantly outperforms many state-of-the-art approaches.
Collapse
|
13
|
Evangelista D, Morotti E, Piccolomini EL, Nagy J. Ambiguity in Solving Imaging Inverse Problems with Deep-Learning-Based Operators. J Imaging 2023; 9:133. [PMID: 37504810 PMCID: PMC10381581 DOI: 10.3390/jimaging9070133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/29/2023] Open
Abstract
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution is difficult to approximate when noise affects the data. Really, one limitation of neural networks for deblurring is their sensitivity to noise and other perturbations, which can lead to instability and produce poor reconstructions. In addition, networks do not necessarily take into account the numerical formulation of the underlying imaging problem when trained end-to-end. In this paper, we propose some strategies to improve stability without losing too much accuracy to deblur images with deep-learning-based methods. First, we suggest a very small neural architecture, which reduces the execution time for training, satisfying a green AI need, and does not extremely amplify noise in the computed image. Second, we introduce a unified framework where a pre-processing step balances the lack of stability of the following neural-network-based step. Two different pre-processors are presented. The former implements a strong parameter-free denoiser, and the latter is a variational-model-based regularized formulation of the latent imaging problem. This framework is also formally characterized by mathematical analysis. Numerical experiments are performed to verify the accuracy and stability of the proposed approaches for image deblurring when unknown or not-quantified noise is present; the results confirm that they improve the network stability with respect to noise. In particular, the model-based framework represents the most reliable trade-off between visual precision and robustness.
Collapse
Affiliation(s)
| | - Elena Morotti
- Department of Political and Social Sciences, University of Bologna, 40125 Bologna, Italy
| | - Elena Loli Piccolomini
- Department of Computer Science and Engineering, University of Bologna, 40126 Bologna, Italy
| | - James Nagy
- Department of Mathematics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
14
|
Ali AM, Benjdira B, Koubaa A, El-Shafai W, Khan Z, Boulila W. Vision Transformers in Image Restoration: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:2385. [PMID: 36904589 PMCID: PMC10006889 DOI: 10.3390/s23052385] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 02/14/2023] [Accepted: 02/17/2023] [Indexed: 06/18/2023]
Abstract
The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.
Collapse
Affiliation(s)
- Anas M. Ali
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
| | - Bilel Benjdira
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- SE & ICT Laboratory, LR18ES44, ENICarthage, University of Carthage, Tunis 1054, Tunisia
| | - Anis Koubaa
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Walid El-Shafai
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
- Security Engineering Laboratory, Computer Science Department, Prince Sultan University, Riyadh 11586, Saudi Arabia
| | - Zahid Khan
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Wadii Boulila
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- RIADI Laboratory, University of Manouba, Manouba 2010, Tunisia
| |
Collapse
|
15
|
Liu X, Tang G. Color Image Restoration Using Sub-Image Based Low-Rank Tensor Completion. SENSORS (BASEL, SWITZERLAND) 2023; 23:1706. [PMID: 36772745 PMCID: PMC9919421 DOI: 10.3390/s23031706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/18/2023] [Accepted: 01/19/2023] [Indexed: 06/18/2023]
Abstract
Many restoration methods use the low-rank constraint of high-dimensional image signals to recover corrupted images. These signals are usually represented by tensors, which can maintain their inherent relevance. The image of this simple tensor presentation has a certain low-rank property, but does not have a strong low-rank property. In order to enhance the low-rank property, we propose a novel method called sub-image based low-rank tensor completion (SLRTC) for image restoration. We first sample a color image to obtain sub-images, and adopt these sub-images instead of the original single image to form a tensor. Then we conduct the mode permutation on this tensor. Next, we exploit the tensor nuclear norm defined based on the tensor-singular value decomposition (t-SVD) to build the low-rank completion model. Finally, we perform the tensor-singular value thresholding (t-SVT) based the standard alternating direction method of multipliers (ADMM) algorithm to solve the aforementioned model. Experimental results have shown that compared with the state-of-the-art tensor completion techniques, the proposed method can provide superior results in terms of objective and subjective assessment.
Collapse
Affiliation(s)
- Xiaohua Liu
- College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology), Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Guijin Tang
- Jiangsu Key Laboratory of Image Processing and Image Communication, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| |
Collapse
|
16
|
Jothi Lakshmi S, Deepa P. Blind image deblurring using GLCM and negans obtuse mono proximate distance. THE IMAGING SCIENCE JOURNAL 2023. [DOI: 10.1080/13682199.2022.2161996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- S. Jothi Lakshmi
- Department of CSE, Akshaya College of Engineering and Technology, Coimbatore, India
| | - P. Deepa
- Department of ECE, Government College of Technology, Coimbatore, India
| |
Collapse
|
17
|
Zhang Z, Cheng Y, Suo J, Bian L, Dai Q. INFWIDE: Image and Feature Space Wiener Deconvolution Network for Non-Blind Image Deblurring in Low-Light Conditions. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1390-1402. [PMID: 37027543 DOI: 10.1109/tip.2023.3244417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Under low-light environment, handheld photography suffers from severe camera shake under long exposure settings. Although existing deblurring algorithms have shown promising performance on well-exposed blurry images, they still cannot cope with low-light snapshots. Sophisticated noise and saturation regions are two dominating challenges in practical low-light deblurring: the former violates the Gaussian or Poisson assumption widely used in most existing algorithms and thus degrades their performance badly, while the latter introduces non-linearity to the classical convolution-based blurring model and makes the deblurring task even challenging. In this work, we propose a novel non-blind deblurring method dubbed image and feature space Wiener deconvolution network (INFWIDE) to tackle these problems systematically. In terms of algorithm design, INFWIDE proposes a two-branch architecture, which explicitly removes noise and hallucinates saturated regions in the image space and suppresses ringing artifacts in the feature space, and integrates the two complementary outputs with a subtle multi-scale fusion network for high quality night photograph deblurring. For effective network training, we design a set of loss functions integrating a forward imaging model and backward reconstruction to form a close-loop regularization to secure good convergence of the deep neural network. Further, to optimize INFWIDE's applicability in real low-light conditions, a physical-process-based low-light noise model is employed to synthesize realistic noisy night photographs for model training. Taking advantage of the traditional Wiener deconvolution algorithm's physically driven characteristics and deep neural network's representation ability, INFWIDE can recover fine details while suppressing the unpleasant artifacts during deblurring. Extensive experiments on synthetic data and real data demonstrate the superior performance of the proposed approach.
Collapse
|
18
|
Jia T, Shi L, Wei C, Shi R, Liu B. Correction of motion artifact in CL based on MAFusNet. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2023; 31:393-407. [PMID: 36710712 DOI: 10.3233/xst-221335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Computed laminography (CL) is one of the best methods for nondestructive testing of plate-like objects. If the object and the detector move continually while the scanning is being done, the data acquisition efficiency of CL will be significantly increased. However, the projection images will contain motion artifact as a result. A multi-angle fusion network (MAFusNet) is presented in order to correct the motion artifact of CL projection images considering the properties of CL projection images. The multi-angle fusion module significantly increases the ability of MAFusNet to deblur by using data from nearby projection images, and the feature fusion module lessens information loss brought on by data flow between the encoders. In contrast to conventional deblurring networks, the MAFusNet network employs synthetic datasets for training and performed well on realistic data, proving the network's outstanding generalization. The multi-angle fusion-based network has a significant improvement in the correction effect of CL motion artifact through ablation study and comparison with existing classical deblurring networks, and the synthetic training dataset can also significantly lower the training cost, which can effectively improve the quality and efficiency of CL imaging in industrial nondestructive testing.
Collapse
Affiliation(s)
- Tong Jia
- Beijing Engineering Research Center of Radiographic Techniques and Equipment, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China
- School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Liu Shi
- Beijing Engineering Research Center of Radiographic Techniques and Equipment, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China
- School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Cunfeng Wei
- Beijing Engineering Research Center of Radiographic Techniques and Equipment, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China
- School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Rongjian Shi
- Beijing Engineering Research Center of Radiographic Techniques and Equipment, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China
| | - Baodong Liu
- Beijing Engineering Research Center of Radiographic Techniques and Equipment, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China
- School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
19
|
Zhang Z, Li H, Lv G, Zhou H, Feng H, Xu Z, Li Q, Jiang T, Chen Y. Deep learning-based image reconstruction for photonic integrated interferometric imaging. OPTICS EXPRESS 2022; 30:41359-41373. [PMID: 36366616 DOI: 10.1364/oe.469582] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
Photonic integrated interferometric imaging (PIII) is an emerging technique that uses far-field spatial coherence measurements to extract intensity information from a source to form an image. At present, low sampling rate and noise disturbance are the main factors hindering the development of this technology. This paper implements a deep learning-based method to improve image quality. Firstly, we propose a frequency-domain dataset generation method based on imaging principles. Secondly, spatial-frequency dual-domain fusion networks (SFDF-Nets) are presented for image reconstruction. We utilize normalized amplitude and phase to train networks, which reduces the difficulty of network training using complex data. SFDF-Nets can fuse multi-frame data captured by rotation sampling to increase the sampling rate and generate high-quality spatial images through dual-domain supervised learning and frequency domain fusion. Furthermore, we propose an inverse fast Fourier transform loss (IFFT loss) for network training in the frequency domain. Extensive experiments show that our method improves PSNR and SSIM by 5.64 dB and 0.20, respectively. Our method effectively improves the reconstructed image quality and opens a new dimension in interferometric imaging.
Collapse
|
20
|
Slutsky M. Noise-Adaptive Non-Blind Image Deblurring. SENSORS (BASEL, SWITZERLAND) 2022; 22:6923. [PMID: 36146269 PMCID: PMC9503865 DOI: 10.3390/s22186923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 09/06/2022] [Accepted: 09/09/2022] [Indexed: 06/16/2023]
Abstract
This work addresses the problem of non-blind image deblurring for arbitrary input noise. The problem arises in the context of sensors with strong chromatic aberrations, as well as in standard cameras, in low-light and high-speed scenarios. A short description of two common classical approaches to regularized image deconvolution is provided, and common issues arising in this context are described. It is shown how a pre-deconvolved deep neural network (DNN) based image enhancement can be improved by joint optimization of regularization parameters and network weights. Furthermore, a two-step approach to deblurring based on two DNNs is proposed, with the first network estimating deconvolution regularization parameters, and the second one performing image enhancement and residual artifact removal. For the first network, a novel RegParamNet architecture is introduced and its performance is examined for both direct and indirect regularization parameter estimation. The system is shown to operate well for input noise in a three orders of magnitude range (0.01-10.0) and a wide spectrum of 1D or 2D Gaussian blur kernels, well outside the scope of most previously explored image blur and noise degrees. The proposed method is found to significantly outperform several leading state-of-the-art approaches.
Collapse
Affiliation(s)
- Michael Slutsky
- GM Technical Center Israel-R&D Lab, 13 Arie Shenkar St., Herzliya 4672513, Israel
| |
Collapse
|
21
|
Teixeira E, Araujo B, Costa V, Mafra S, Figueiredo F. Literature Review on Ship Localization, Classification, and Detection Methods Based on Optical Sensors and Neural Networks. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22186879. [PMID: 36146228 PMCID: PMC9501387 DOI: 10.3390/s22186879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/08/2022] [Accepted: 09/08/2022] [Indexed: 05/09/2023]
Abstract
Object detection is a common application within the computer vision area. Its tasks include the classic challenges of object localization and classification. As a consequence, object detection is a challenging task. Furthermore, this technique is crucial for maritime applications since situational awareness can bring various benefits to surveillance systems. The literature presents various models to improve automatic target recognition and tracking capabilities that can be applied to and leverage maritime surveillance systems. Therefore, this paper reviews the available models focused on localization, classification, and detection. Moreover, it analyzes several works that apply the discussed models to the maritime surveillance scenario. Finally, it highlights the main opportunities and challenges, encouraging new research in this area.
Collapse
|
22
|
Nasonov AV, Nasonova AA. Linear Blur Parameters Estimation Using a Convolutional Neural Network. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1134/s1054661822030270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
23
|
Zhang K, Luo W, Yu Y, Ren W, Zhao F, Li C, Ma L, Liu W, Li H. Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01620-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|