1
|
Qian S, Yang L, Xue Y, Li P. SIFusion: Lightweight infrared and visible image fusion based on semantic injection. PLoS One 2024; 19:e0307236. [PMID: 39504316 PMCID: PMC11540218 DOI: 10.1371/journal.pone.0307236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 07/01/2024] [Indexed: 11/08/2024] Open
Abstract
The objective of image fusion is to integrate complementary features from source images to better cater to the needs of human and machine vision. However, existing image fusion algorithms predominantly focus on enhancing the visual appeal of the fused image for human perception, often neglecting their impact on subsequent high-level visual tasks, particularly the processing of semantic information. Moreover, these fusion methods that incorporate downstream tasks tend to be overly complex and computationally intensive, which is not conducive to practical applications. To address these issues, a lightweight infrared and visible light image fusion method known as SIFusion, which is based on semantic injection, is proposed in this paper. This method employs a semantic-aware branch to extract semantic feature information, and then integrates these features into the fused features through a Semantic Injection Module (SIM) to meet the semantic requirements of high-level visual tasks. Furthermore, to simplify the complexity of the fusion network, this method introduces an Edge Convolution Module (ECB) based on structural reparameterization technology to enhance the representational capacity of the encoder and decoder. Extensive experimental comparisons demonstrate that the proposed method performs excellently in terms of visual appeal and advanced semantics, providing satisfactory fusion results for subsequent high-level visual tasks even in challenging scenarios.
Collapse
Affiliation(s)
- Song Qian
- Faculty of Information Engineering, Xinjiang Institute of Technology, Aksu, China
| | - Liwei Yang
- Faculty of Information Engineering, Xinjiang Institute of Technology, Aksu, China
| | - Yan Xue
- Faculty of Information Engineering, Xinjiang Institute of Technology, Aksu, China
| | - Ping Li
- Faculty of Information Engineering, Xinjiang Institute of Technology, Aksu, China
| |
Collapse
|
2
|
Zhao C, Yang P, Zhou F, Yue G, Wang S, Wu H, Chen G, Wang T, Lei B. MHW-GAN: Multidiscriminator Hierarchical Wavelet Generative Adversarial Network for Multimodal Image Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13713-13727. [PMID: 37432812 DOI: 10.1109/tnnls.2023.3271059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Image fusion technology aims to obtain a comprehensive image containing a specific target or detailed information by fusing data of different modalities. However, many deep learning-based algorithms consider edge texture information through loss functions instead of specifically constructing network modules. The influence of the middle layer features is ignored, which leads to the loss of detailed information between layers. In this article, we propose a multidiscriminator hierarchical wavelet generative adversarial network (MHW-GAN) for multimodal image fusion. First, we construct a hierarchical wavelet fusion (HWF) module as the generator of MHW-GAN to fuse feature information at different levels and scales, which avoids information loss in the middle layers of different modalities. Second, we design an edge perception module (EPM) to integrate edge information from different modalities to avoid the loss of edge information. Third, we leverage the adversarial learning relationship between the generator and three discriminators for constraining the generation of fusion images. The generator aims to generate a fusion image to fool the three discriminators, while the three discriminators aim to distinguish the fusion image and edge fusion image from two source images and the joint edge image, respectively. The final fusion image contains both intensity information and structure information via adversarial learning. Experiments on public and self-collected four types of multimodal image datasets show that the proposed algorithm is superior to the previous algorithms in terms of both subjective and objective evaluation.
Collapse
|
3
|
Xie Y, Fei Z, Deng D, Meng L, Niu F, Sun J. MEEAFusion: Multi-Scale Edge Enhancement and Joint Attention Mechanism Based Infrared and Visible Image Fusion. SENSORS (BASEL, SWITZERLAND) 2024; 24:5860. [PMID: 39275771 PMCID: PMC11397970 DOI: 10.3390/s24175860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/26/2024] [Accepted: 09/06/2024] [Indexed: 09/16/2024]
Abstract
Infrared and visible image fusion can integrate rich edge details and salient infrared targets, resulting in high-quality images suitable for advanced tasks. However, most available algorithms struggle to fully extract detailed features and overlook the interaction of complementary features across different modal images during the feature fusion process. To address this gap, this study presents a novel fusion method based on multi-scale edge enhancement and a joint attention mechanism (MEEAFusion). Initially, convolution kernels of varying scales were utilized to obtain shallow features with multiple receptive fields unique to the source image. Subsequently, a multi-scale gradient residual block (MGRB) was developed to capture the high-level semantic information and low-level edge texture information of the image, enhancing the representation of fine-grained features. Then, the complementary feature between infrared and visible images was defined, and a cross-transfer attention fusion block (CAFB) was devised with joint spatial attention and channel attention to refine the critical supplemental information. This allowed the network to obtain fused features that were rich in both common and complementary information, thus realizing feature interaction and pre-fusion. Lastly, the features were reconstructed to obtain the fused image. Extensive experiments on three benchmark datasets demonstrated that the MEEAFusion proposed in this research has considerable strengths in terms of rich texture details, significant infrared targets, and distinct edge contours, and it achieves superior fusion performance.
Collapse
Affiliation(s)
- Yingjiang Xie
- Systems Engineering Institute, Academy of Military Sciences, PLA, Beijing 100166, China
| | - Zhennan Fei
- Systems Engineering Institute, Academy of Military Sciences, PLA, Beijing 100166, China
| | - Da Deng
- Systems Engineering Institute, Academy of Military Sciences, PLA, Beijing 100166, China
| | - Lingshuai Meng
- Systems Engineering Institute, Academy of Military Sciences, PLA, Beijing 100166, China
| | - Fu Niu
- Systems Engineering Institute, Academy of Military Sciences, PLA, Beijing 100166, China
| | - Jinggong Sun
- Systems Engineering Institute, Academy of Military Sciences, PLA, Beijing 100166, China
| |
Collapse
|
4
|
Yan A, Gao S, Lu Z, Jin S, Chen J. Infrared and Harsh Light Visible Image Fusion Using an Environmental Light Perception Network. ENTROPY (BASEL, SWITZERLAND) 2024; 26:696. [PMID: 39202166 PMCID: PMC11353657 DOI: 10.3390/e26080696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 08/11/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024]
Abstract
The complementary combination of emphasizing target objects in infrared images and rich texture details in visible images can effectively enhance the information entropy of fused images, thereby providing substantial assistance for downstream composite high-level vision tasks, such as nighttime vehicle intelligent driving. However, mainstream fusion algorithms lack specific research on the contradiction between the low information entropy and high pixel intensity of visible images under harsh light nighttime road environments. As a result, fusion algorithms that perform well in normal conditions can only produce low information entropy fusion images similar to the information distribution of visible images under harsh light interference. In response to these problems, we designed an image fusion network resilient to harsh light environment interference, incorporating entropy and information theory principles to enhance robustness and information retention. Specifically, an edge feature extraction module was designed to extract key edge features of salient targets to optimize fusion information entropy. Additionally, a harsh light environment aware (HLEA) module was proposed to avoid the decrease in fusion image quality caused by the contradiction between low information entropy and high pixel intensity based on the information distribution characteristics of harsh light visible images. Finally, an edge-guided hierarchical fusion (EGHF) module was designed to achieve robust feature fusion, minimizing irrelevant noise entropy and maximizing useful information entropy. Extensive experiments demonstrate that, compared to other advanced algorithms, the method proposed fusion results contain more useful information and have significant advantages in high-level vision tasks under harsh nighttime lighting conditions.
Collapse
Affiliation(s)
- Aiyun Yan
- College of Information Science and Engineering, Northeastern University, Shenyang 110167, China; (A.Y.); (S.G.); (S.J.); (J.C.)
| | - Shang Gao
- College of Information Science and Engineering, Northeastern University, Shenyang 110167, China; (A.Y.); (S.G.); (S.J.); (J.C.)
| | - Zhenlin Lu
- Beijing Microelectronics Technology Institute, Beijing 100076, China
| | - Shuowei Jin
- College of Information Science and Engineering, Northeastern University, Shenyang 110167, China; (A.Y.); (S.G.); (S.J.); (J.C.)
| | - Jingrong Chen
- College of Information Science and Engineering, Northeastern University, Shenyang 110167, China; (A.Y.); (S.G.); (S.J.); (J.C.)
| |
Collapse
|
5
|
Wu D, Wang Y, Wang H, Wang F, Gao G. DCFNet: Infrared and Visible Image Fusion Network Based on Discrete Wavelet Transform and Convolutional Neural Network. SENSORS (BASEL, SWITZERLAND) 2024; 24:4065. [PMID: 39000844 PMCID: PMC11244297 DOI: 10.3390/s24134065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/14/2024] [Accepted: 06/18/2024] [Indexed: 07/16/2024]
Abstract
Aiming to address the issues of missing detailed information, the blurring of significant target information, and poor visual effects in current image fusion algorithms, this paper proposes an infrared and visible-light image fusion algorithm based on discrete wavelet transform and convolutional neural networks. Our backbone network is an autoencoder. A DWT layer is embedded in the encoder to optimize frequency-domain feature extraction and prevent information loss, and a bottleneck residual block and a coordinate attention mechanism are introduced to enhance the ability to capture and characterize the low- and high-frequency feature information; an IDWT layer is embedded in the decoder to achieve the feature reconstruction of the fused frequencies; the fusion strategy adopts the l1-norm fusion strategy to integrate the encoder's output frequency mapping features; a weighted loss containing pixel loss, gradient loss, and structural loss is constructed for optimizing network training. DWT decomposes the image into sub-bands at different scales, including low-frequency sub-bands and high-frequency sub-bands. The low-frequency sub-bands contain the structural information of the image, which corresponds to the important target information, while the high-frequency sub-bands contain the detail information, such as edge and texture information. Through IDWT, the low-frequency sub-bands that contain important target information are synthesized with the high-frequency sub-bands that enhance the details, ensuring that the important target information and texture details are clearly visible in the reconstructed image. The whole process is able to reconstruct the information of different frequency sub-bands back into the image non-destructively, so that the fused image appears natural and harmonious visually. Experimental results on public datasets show that the fusion algorithm performs well according to both subjective and objective evaluation criteria and that the fused image is clearer and contains more scene information, which verifies the effectiveness of the algorithm, and the results of the generalization experiments also show that our network has good generalization ability.
Collapse
Affiliation(s)
- Dan Wu
- School of Electronic Engineering, Xi’an Shiyou University, Xi’an 710312, China; (Y.W.); (H.W.); (F.W.); (G.G.)
| | | | | | | | | |
Collapse
|
6
|
Xu L, Zou Q. Semantic-Aware Fusion Network Based on Super-Resolution. SENSORS (BASEL, SWITZERLAND) 2024; 24:3665. [PMID: 38894455 PMCID: PMC11175180 DOI: 10.3390/s24113665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 06/02/2024] [Accepted: 06/04/2024] [Indexed: 06/21/2024]
Abstract
The aim of infrared and visible image fusion is to generate a fused image that not only contains salient targets and rich texture details, but also facilitates high-level vision tasks. However, due to the hardware limitations of digital cameras and other devices, there are more low-resolution images in the existing datasets, and low-resolution images are often accompanied by the problem of losing details and structural information. At the same time, existing fusion algorithms focus too much on the visual quality of the fused images, while ignoring the requirements of high-level vision tasks. To address the above challenges, in this paper, we skillfully unite the super-resolution network, fusion network and segmentation network, and propose a super-resolution-based semantic-aware fusion network. First, we design a super-resolution network based on a multi-branch hybrid attention module (MHAM), which aims to enhance the quality and details of the source image, enabling the fusion network to integrate the features of the source image more accurately. Then, a comprehensive information extraction module (STDC) is designed in the fusion network to enhance the network's ability to extract finer-grained complementary information from the source image. Finally, the fusion network and segmentation network are jointly trained to utilize semantic loss to guide the semantic information back to the fusion network, which effectively improves the performance of the fused images on high-level vision tasks. Extensive experiments show that our method is more effective than other state-of-the-art image fusion methods. In particular, our fused images not only have excellent visual perception effects, but also help to improve the performance of high-level vision tasks.
Collapse
Affiliation(s)
- Lingfeng Xu
- School of Microelectronics, Tianjin University, Tianjin 300072, China;
| | - Qiang Zou
- School of Microelectronics, Tianjin University, Tianjin 300072, China;
- Tianjin International Joint Research Center for Internet of Things, Tianjin 300072, China
- Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin University, Tianjin 300072, China
| |
Collapse
|
7
|
Yang D, Zhu N, Wang X, Li S. Image fusion using Y-net-based extractor and global-local discriminator. Heliyon 2024; 10:e30798. [PMID: 38784534 PMCID: PMC11112272 DOI: 10.1016/j.heliyon.2024.e30798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/27/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024] Open
Abstract
Although some deep learning-based image fusion approaches have realized promising results, how to extract information-rich features from different source images while preserving them in the fused image with less distortions remains challenging issue that needs to be addressed. Here, we propose a well worked-out GAN-based scheme with multi-scale feature extractor and global-local discriminator for infrared and visible image fusion. We use Y-Net as the backbone architecture to design the generator network, and introduce the residual dense block (RDblock) to yield more realistic fused images for infrared and visible images by learning discriminative multi-scale representations that are closer to the essence of different modal images. During feature reconstruction, the cross-modality shortcuts with contextual attention (CMSCA) are employed to selectively aggregate features at different scales and different levels to construct information-rich fused images with better visual effect. To ameliorate the information content of the fused image, we not only constrain the structure and contrast information using structural similarity index, but also evaluate the intensity and gradient similarities at both feature and image levels. Two global-local discriminators that combine global GAN with PatchGAN as a unified architecture help to dig for finer differences between the generated image and reference images, which force the generator to learn both the local radiation information and pervasive global details in two source images. It is worth mentioning that image fusion is achieved during confrontation without fusion rules. Lots of assessment tests demonstrate that the reported fusion scheme achieves superior performance against state-of-the-art works in meaningful information preservation.
Collapse
Affiliation(s)
- Danqing Yang
- School of Optoelectronic Engineering, Xidian University, Xi'an, 710071, China
| | - Naibo Zhu
- Research Institute of System Engineering, PLA Academy of Military Science, Beijing, 100091, China
| | - Xiaorui Wang
- School of Optoelectronic Engineering, Xidian University, Xi'an, 710071, China
| | - Shuang Li
- Research Institute of System Engineering, PLA Academy of Military Science, Beijing, 100091, China
| |
Collapse
|
8
|
Zhong Y, Zhang S, Liu Z, Zhang X, Mo Z, Zhang Y, Hu H, Chen W, Qi L. Unsupervised Fusion of Misaligned PAT and MRI Images via Mutually Reinforcing Cross-Modality Image Generation and Registration. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1702-1714. [PMID: 38147426 DOI: 10.1109/tmi.2023.3347511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Photoacoustic tomography (PAT) and magnetic resonance imaging (MRI) are two advanced imaging techniques widely used in pre-clinical research. PAT has high optical contrast and deep imaging range but poor soft tissue contrast, whereas MRI provides excellent soft tissue information but poor temporal resolution. Despite recent advances in medical image fusion with pre-aligned multimodal data, PAT-MRI image fusion remains challenging due to misaligned images and spatial distortion. To address these issues, we propose an unsupervised multi-stage deep learning framework called PAMRFuse for misaligned PAT and MRI image fusion. PAMRFuse comprises a multimodal to unimodal registration network to accurately align the input PAT-MRI image pairs and a self-attentive fusion network that selects information-rich features for fusion. We employ an end-to-end mutually reinforcing mode in our registration network, which enables joint optimization of cross-modality image generation and registration. To the best of our knowledge, this is the first attempt at information fusion for misaligned PAT and MRI. Qualitative and quantitative experimental results show the excellent performance of our method in fusing PAT-MRI images of small animals captured from commercial imaging systems.
Collapse
|
9
|
Hao CY, Chen YC, Ning FS, Chou TY, Chen MH. Using Sparse Parts in Fused Information to Enhance Performance in Latent Low-Rank Representation-Based Fusion of Visible and Infrared Images. SENSORS (BASEL, SWITZERLAND) 2024; 24:1514. [PMID: 38475050 DOI: 10.3390/s24051514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/19/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
Latent Low-Rank Representation (LatLRR) has emerged as a prominent approach for fusing visible and infrared images. In this approach, images are decomposed into three fundamental components: the base part, salient part, and sparse part. The aim is to blend the base and salient features to reconstruct images accurately. However, existing methods often focus more on combining the base and salient parts, neglecting the importance of the sparse component, whereas we advocate for the comprehensive inclusion of all three parts generated from LatLRR image decomposition into the image fusion process, a novel proposition introduced in this study. Moreover, the effective integration of Convolutional Neural Network (CNN) technology with LatLRR remains challenging, particularly after the inclusion of sparse parts. This study utilizes fusion strategies involving weighted average, summation, VGG19, and ResNet50 in various combinations to analyze the fusion performance following the introduction of sparse parts. The research findings show a significant enhancement in fusion performance achieved through the inclusion of sparse parts in the fusion process. The suggested fusion strategy involves employing deep learning techniques for fusing both base parts and sparse parts while utilizing a summation strategy for the fusion of salient parts. The findings improve the performance of LatLRR-based methods and offer valuable insights for enhancement, leading to advancements in the field of image fusion.
Collapse
Affiliation(s)
- Chen-Yu Hao
- GIS Research Center, Feng Chia University, Taichung 40724, Taiwan
| | - Yao-Chung Chen
- GIS Research Center, Feng Chia University, Taichung 40724, Taiwan
| | - Fang-Shii Ning
- Department of Land Economics, National Chengchi University, Taipei 11605, Taiwan
| | - Tien-Yin Chou
- GIS Research Center, Feng Chia University, Taichung 40724, Taiwan
| | - Mei-Hsin Chen
- GIS Research Center, Feng Chia University, Taichung 40724, Taiwan
| |
Collapse
|
10
|
Liang L, Gao Z. SharDif: Sharing and Differential Learning for Image Fusion. ENTROPY (BASEL, SWITZERLAND) 2024; 26:57. [PMID: 38248182 PMCID: PMC10814104 DOI: 10.3390/e26010057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/06/2024] [Accepted: 01/08/2024] [Indexed: 01/23/2024]
Abstract
Image fusion is the generation of an informative image that contains complementary information from the original sensor images, such as texture details and attentional targets. Existing methods have designed a variety of feature extraction algorithms and fusion strategies to achieve image fusion. However, these methods ignore the extraction of common features in the original multi-source images. The point of view proposed in this paper is that image fusion is to retain, as much as possible, the useful shared features and complementary differential features of the original multi-source images. Shared and differential learning methods for infrared and visible light image fusion are proposed. An encoder with shared weights is used to extract shared common features contained in infrared and visible light images, and the other two encoder blocks are used to extract differential features of infrared images and visible light images, respectively. Effective learning of shared and differential features is achieved through weight sharing and loss functions. Then, the fusion of shared features and differential features is achieved via a weighted fusion strategy based on an entropy-weighted attention mechanism. The experimental results demonstrate the effectiveness of the proposed model with its algorithm. Compared with the-state-of-the-art methods, the significant advantage of the proposed method is that it retains the structural information of the original image and has better fusion accuracy and visual perception effect.
Collapse
Affiliation(s)
- Lei Liang
- College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
- Low Speed Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China
| | - Zhisheng Gao
- School of Computer and Software Engineering, Xihua University, Chengdu 610039, China
| |
Collapse
|
11
|
Shiri I, Amini M, Yousefirizi F, Vafaei Sadr A, Hajianfar G, Salimi Y, Mansouri Z, Jenabi E, Maghsudi M, Mainta I, Becker M, Rahmim A, Zaidi H. Information fusion for fully automated segmentation of head and neck tumors from PET and CT images. Med Phys 2024; 51:319-333. [PMID: 37475591 DOI: 10.1002/mp.16615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/16/2023] [Accepted: 06/19/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND PET/CT images combining anatomic and metabolic data provide complementary information that can improve clinical task performance. PET image segmentation algorithms exploiting the multi-modal information available are still lacking. PURPOSE Our study aimed to assess the performance of PET and CT image fusion for gross tumor volume (GTV) segmentations of head and neck cancers (HNCs) utilizing conventional, deep learning (DL), and output-level voting-based fusions. METHODS The current study is based on a total of 328 histologically confirmed HNCs from six different centers. The images were automatically cropped to a 200 × 200 head and neck region box, and CT and PET images were normalized for further processing. Eighteen conventional image-level fusions were implemented. In addition, a modified U2-Net architecture as DL fusion model baseline was used. Three different input, layer, and decision-level information fusions were used. Simultaneous truth and performance level estimation (STAPLE) and majority voting to merge different segmentation outputs (from PET and image-level and network-level fusions), that is, output-level information fusion (voting-based fusions) were employed. Different networks were trained in a 2D manner with a batch size of 64. Twenty percent of the dataset with stratification concerning the centers (20% in each center) were used for final result reporting. Different standard segmentation metrics and conventional PET metrics, such as SUV, were calculated. RESULTS In single modalities, PET had a reasonable performance with a Dice score of 0.77 ± 0.09, while CT did not perform acceptably and reached a Dice score of only 0.38 ± 0.22. Conventional fusion algorithms obtained a Dice score range of [0.76-0.81] with guided-filter-based context enhancement (GFCE) at the low-end, and anisotropic diffusion and Karhunen-Loeve transform fusion (ADF), multi-resolution singular value decomposition (MSVD), and multi-level image decomposition based on latent low-rank representation (MDLatLRR) at the high-end. All DL fusion models achieved Dice scores of 0.80. Output-level voting-based models outperformed all other models, achieving superior results with a Dice score of 0.84 for Majority_ImgFus, Majority_All, and Majority_Fast. A mean error of almost zero was achieved for all fusions using SUVpeak , SUVmean and SUVmedian . CONCLUSION PET/CT information fusion adds significant value to segmentation tasks, considerably outperforming PET-only and CT-only methods. In addition, both conventional image-level and DL fusions achieve competitive results. Meanwhile, output-level voting-based fusion using majority voting of several algorithms results in statistically significant improvements in the segmentation of HNC.
Collapse
Affiliation(s)
- Isaac Shiri
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Mehdi Amini
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Fereshteh Yousefirizi
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, British Columbia, Canada
| | - Alireza Vafaei Sadr
- Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, USA
| | - Ghasem Hajianfar
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Yazdan Salimi
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Zahra Mansouri
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Elnaz Jenabi
- Research Center for Nuclear Medicine, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Mehdi Maghsudi
- Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Ismini Mainta
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Minerva Becker
- Service of Radiology, Geneva University Hospital, Geneva, Switzerland
| | - Arman Rahmim
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, British Columbia, Canada
- Department of Radiology and Physics, University of British Columbia, Vancouver, Canada
| | - Habib Zaidi
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
- Geneva University Neurocenter, Geneva University, Geneva, Switzerland
- Department of Nuclear Medicine and Molecular Imaging, University of Groningen, University Medical Center Groningen, Groningen, Netherlands
- Department of Nuclear Medicine, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
12
|
Li D, Tian Y, Li J. SODFormer: Streaming Object Detection With Transformer Using Events and Frames. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14020-14037. [PMID: 37494161 DOI: 10.1109/tpami.2023.3298925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1 k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.
Collapse
|
13
|
Xu H, Yuan J, Ma J. MURF: Mutually Reinforcing Multi-Modal Image Registration and Fusion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12148-12166. [PMID: 37285256 DOI: 10.1109/tpami.2023.3283682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Existing image fusion methods are typically limited to aligned source images and have to "tolerate" parallaxes when images are unaligned. Simultaneously, the large variances between different modalities pose a significant challenge for multi-modal image registration. This study proposes a novel method called MURF, where for the first time, image registration and fusion are mutually reinforced rather than being treated as separate issues. MURF leverages three modules: shared information extraction module (SIEM), multi-scale coarse registration module (MCRM), and fine registration and fusion module (F2M). The registration is carried out in a coarse-to-fine manner. During coarse registration, SIEM first transforms multi-modal images into mono-modal shared information to eliminate the modal variances. Then, MCRM progressively corrects the global rigid parallaxes. Subsequently, fine registration to repair local non-rigid offsets and image fusion are uniformly implemented in F2M. The fused image provides feedback to improve registration accuracy, and the improved registration result further improves the fusion result. For image fusion, rather than solely preserving the original source information in existing methods, we attempt to incorporate texture enhancement into image fusion. We test on four types of multi-modal data (RGB-IR, RGB-NIR, PET-MRI, and CT-MRI). Extensive registration and fusion results validate the superiority and universality of MURF.
Collapse
|
14
|
Li H, Xiao Y, Cheng C, Song X. SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion. SENSORS (BASEL, SWITZERLAND) 2023; 23:7870. [PMID: 37765927 PMCID: PMC10536945 DOI: 10.3390/s23187870] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/08/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
The infrared and visible image fusion task aims to generate a single image that preserves complementary features and reduces redundant information from different modalities. Although convolutional neural networks (CNNs) can effectively extract local features and obtain better fusion performance, the size of the receptive field limits its feature extraction ability. Thus, the Transformer architecture has gradually become mainstream to extract global features. However, current Transformer-based fusion methods ignore the enhancement of details, which is important to image fusion tasks and other downstream vision tasks. To this end, a new super feature attention mechanism and the wavelet-guided pooling operation are applied to the fusion network to form a novel fusion network, termed SFPFusion. Specifically, super feature attention is able to establish long-range dependencies of images and to fully extract global features. The extracted global features are processed by wavelet-guided pooling to fully extract multi-scale base information and to enhance the detail features. With the powerful representation ability, only simple fusion strategies are utilized to achieve better fusion performance. The superiority of our method compared with other state-of-the-art methods is demonstrated in qualitative and quantitative experiments on multiple image fusion benchmarks.
Collapse
|
15
|
Dong L, Wang J, Zhao L, Zhang Y, Yang J. ICIF: Image fusion via information clustering and image features. PLoS One 2023; 18:e0286024. [PMID: 37531364 PMCID: PMC10396002 DOI: 10.1371/journal.pone.0286024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 05/06/2023] [Indexed: 08/04/2023] Open
Abstract
Image fusion technology is employed to integrate images collected by utilizing different types of sensors into the same image to generate high-definition images and extract more comprehensive information. However, all available techniques derive the features of the images by utilizing each sensor separately, resulting in poorly correlated image features when different types of sensors are utilized during the fusion process. The fusion strategy to make up for the differences between features alone is an important reason for the poor clarity of fusion results. Therefore, this paper proposes a fusion method via information clustering and image features (ICIF). First, the weighted median filter algorithm is adopted in the spatial domain to realize the clustering of images, which uses the texture features of an infrared image as the weight to influence the clustering results of the visible light image. Then, the image is decomposed into the base layer, bright detail layer, and dark detail layer, which improves the correlations between the layers after conducting the decomposition of a source graph. Finally, the characteristics of the images collected by utilizing sensors and feature information between the image layers are used as the weight reference of the fusion strategy. Hence, the fusion images are reconstructed according to the principle of extended texture details. Experiments on public datasets demonstrate the superiority of the proposed strategy over state-of-the-art methods. The proposed ICIF highlighted targets and abundant details as well. Moreover, we also generalize the proposed ICIF to fuse images with different sensors, e.g., medical images and multi-focus images.
Collapse
Affiliation(s)
- Linlu Dong
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan, China
| | - Jun Wang
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan, China
| | - Liangjun Zhao
- School of Computer Science and Engineering, Sichuan University of Science and Engineering, Zigong, Sichuan, China
- Sichuan Key Provincial Research Base of Intelligent Tourism, Sichuan, China
| | - Yun Zhang
- School of Computer Science and Engineering, Sichuan University of Science and Engineering, Zigong, Sichuan, China
| | - Jie Yang
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan, China
| |
Collapse
|
16
|
Li K, Qi M, Zhuang S, Liu Y, Gao J. Noise-aware infrared polarization image fusion based on salient prior with attention-guided filtering network. OPTICS EXPRESS 2023; 31:25781-25796. [PMID: 37710455 DOI: 10.1364/oe.492954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 07/02/2023] [Indexed: 09/16/2023]
Abstract
Infrared polarization image fusion integrates intensity and polarization information, producing a fused image that enhances visibility and captures crucial details. However, in complex environments, polarization imaging is susceptible to noise interference. Existing fusion methods typically use the infrared intensity (S0) and degree of linear polarization (DoLP) images for fusion but fail to consider the noise interference, leading to reduced performance. To cope with this problem, we propose a fusion method based on polarization salient prior, which extends DoLP by angle of polarization (AoP) and introduces polarization distance (PD) to obtain salient target features. Moreover, according to the distribution difference between S0 and DoLP features, we construct a fusion network based on attention-guided filtering, utilizing cross-attention to generate filter kernels for fusion. The quantitative and qualitative experimental results validate the effectiveness of our approach. Compared with other fusion methods, our method can effectively suppress noise interference and preserve salient target features.
Collapse
|
17
|
Zhao X, Li M, Nie T, Han C, Huang L. An Innovative Approach for Removing Stripe Noise in Infrared Images. SENSORS (BASEL, SWITZERLAND) 2023; 23:6786. [PMID: 37571569 PMCID: PMC10422565 DOI: 10.3390/s23156786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/13/2023] [Accepted: 07/28/2023] [Indexed: 08/13/2023]
Abstract
The non-uniformity of infrared detectors' readout circuits can lead to stripe noise in infrared images, which affects their effective information and poses challenges for subsequent applications. Traditional denoising algorithms have limited effectiveness in maintaining effective information. This paper proposes a multi-level image decomposition method based on an improved LatLRR (MIDILatLRR). By utilizing the global low-rank structural characteristics of stripe noise, the noise and smooth information are decomposed into low-rank part images, and texture information is adaptively decomposed into several salient part images, thereby better preserving texture edge information in the image. Sparse terms are constructed according to the smoothness of the effective information in the final low-rank part of the image and the sparsity of the stripe noise direction. The modeling of stripe noise is achieved using multi-sparse constraint representation (MSCR), and the Alternating Direction Method of Multipliers (ADMM) is used for calculation. Extensive experiments demonstrated the proposed algorithm's effectiveness and compared it with state-of-the-art algorithms in subjective judgments and objective indicators. The experimental results fully demonstrate the proposed algorithm's superiority and efficacy.
Collapse
Affiliation(s)
- Xiaohang Zhao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (X.Z.); (M.L.); (T.N.); (C.H.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingxuan Li
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (X.Z.); (M.L.); (T.N.); (C.H.)
| | - Ting Nie
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (X.Z.); (M.L.); (T.N.); (C.H.)
| | - Chengshan Han
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (X.Z.); (M.L.); (T.N.); (C.H.)
| | - Liang Huang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (X.Z.); (M.L.); (T.N.); (C.H.)
| |
Collapse
|
18
|
Zheng B, Xiang T, Lin M, Cheng S, Zhang P. Real-Time Semantics-Driven Infrared and Visible Image Fusion Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:6113. [PMID: 37447962 DOI: 10.3390/s23136113] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/20/2023] [Accepted: 06/29/2023] [Indexed: 07/15/2023]
Abstract
This paper proposes a real-time semantics-driven infrared and visible image fusion framework (RSDFusion). A novel semantics-driven image fusion strategy is introduced in image fusion to maximize the retention of significant information of the source image in the fusion image. First, a semantically segmented image of the source image is obtained using a pre-trained semantic segmentation model. Second, masks of significant targets are obtained from the semantically segmented image, and these masks are used to separate the targets in the source and fusion images. Finally, the local semantic loss of the separation target is designed and combined with the overall structural similarity loss of the image to instruct the network to extract appropriate features to reconstruct the fusion image. Experimental results show that the RSDFusion proposed in this paper outperformed other comparative methods on both subjective and objective evaluation of public datasets and that the main target of the source image is better preserved in the fusion image.
Collapse
Affiliation(s)
- Binhao Zheng
- School of Electronic Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Tieming Xiang
- School of Electronic Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Minghuang Lin
- School of Electronic Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Silin Cheng
- School of Electronic Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Pengquan Zhang
- School of Electronic Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
19
|
Liu H, Ma M, Wang M, Chen Z, Zhao Y. SCFusion: Infrared and Visible Fusion Based on Salient Compensation. ENTROPY (BASEL, SWITZERLAND) 2023; 25:985. [PMID: 37509931 PMCID: PMC10378341 DOI: 10.3390/e25070985] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/20/2023] [Accepted: 06/23/2023] [Indexed: 07/30/2023]
Abstract
The aim of infrared and visible image fusion is to integrate the complementary information of the two modalities for high-quality fused images. However, many deep learning fusion algorithms have not considered the characteristics of infrared images in low-light scenes, leading to the problems of weak texture details, low contrast of infrared targets and poor visual perception in the existing methods. Therefore, in this paper, we propose a salient compensation-based fusion method that makes sufficient use of the characteristics of infrared and visible images to generate high-quality fused images under low-light conditions. First, we design a multi-scale edge gradient module (MEGB) in the texture mainstream to adequately extract the texture information of the dual input of infrared and visible images; on the other hand, the salient tributary is pre-trained by salient loss to obtain the saliency map based on the salient dense residual module (SRDB) to extract salient features, which is supplemented in the process of overall network training. We propose the spatial bias module (SBM) to fuse global information with local information. Finally, extensive comparison experiments with existing methods show that our method has significant advantages in describing target features and global scenes, the effectiveness of the proposed module is demonstrated by ablation experiments. In addition, we also verify the facilitation of this paper's method for high-level vision on a semantic segmentation task.
Collapse
Affiliation(s)
- Haipeng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
| | - Meiyan Ma
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
| | - Meng Wang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
- Yunnan Province Key Laboratory of Computer, Kunming University of Science and Technology, Kunming 650500, China
| | - Zhaoyu Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
| | - Yibo Zhao
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
| |
Collapse
|
20
|
Bai Y, Li L, Lu J, Zhang S, Chu N. A Novel Steganography Method for Infrared Image Based on Smooth Wavelet Transform and Convolutional Neural Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:5360. [PMID: 37420527 DOI: 10.3390/s23125360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 07/09/2023]
Abstract
Infrared images have been widely used in many research areas, such as target detection and scene monitoring. Therefore, the copyright protection of infrared images is very important. In order to accomplish the goal of image-copyright protection, a large number of image-steganography algorithms have been studied in the last two decades. Most of the existing image-steganography algorithms hide information based on the prediction error of pixels. Consequently, reducing the prediction error of pixels is very important for steganography algorithms. In this paper, we propose a novel framework SSCNNP: a Convolutional Neural-Network Predictor (CNNP) based on Smooth-Wavelet Transform (SWT) and Squeeze-Excitation (SE) attention for infrared image prediction, which combines Convolutional Neural Network (CNN) with SWT. Firstly, the Super-Resolution Convolutional Neural Network (SRCNN) and SWT are used for preprocessing half of the input infrared image. Then, CNNP is applied to predict the other half of the infrared image. To improve the prediction accuracy of CNNP, an attention mechanism is added to the proposed model. The experimental results demonstrate that the proposed algorithm reduces the prediction error of the pixels due to full utilization of the features around the pixel in both the spatial and the frequency domain. Moreover, the proposed model does not require either expensive equipment or a large amount of storage space during the training process. Experimental results show that the proposed algorithm had good performances in terms of imperceptibility and watermarking capacity compared with advanced steganography algorithms. The proposed algorithm improved the PSNR by 0.17 on average with the same watermark capacity.
Collapse
Affiliation(s)
- Yu Bai
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Li Li
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Jianfeng Lu
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Shanqing Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Ning Chu
- Zhe-Jiang Shangfeng Special Blower Company Ltd., Shaoxing 312352, China
| |
Collapse
|
21
|
Chang Z, Feng Z, Yang S, Gao Q. AFT: Adaptive Fusion Transformer for Visible and Infrared Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2077-2092. [PMID: 37018097 DOI: 10.1109/tip.2023.3263113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this paper, an Adaptive Fusion Transformer (AFT) is proposed for unsupervised pixel-level fusion of visible and infrared images. Different from the existing convolutional networks, transformer is adopted to model the relationship of multi-modality images and explore cross-modal interactions in AFT. The encoder of AFT uses a Multi-Head Self-attention (MSA) module and Feed Forward (FF) network for feature extraction. Then, a Multi-head Self-Fusion (MSF) module is designed for the adaptive perceptual fusion of the features. By sequentially stacking the MSF, MSA, and FF, a fusion decoder is constructed to gradually locate complementary features for recovering informative images. In addition, a structure-preserving loss is defined to enhance the visual quality of fused images. Extensive experiments are conducted on several datasets to compare our proposed AFT method with 21 popular approaches. The results show that AFT has state-of-the-art performance in both quantitative metrics and visual perception.
Collapse
|
22
|
Zhang Y, Nie R, Cao J, Ma C. Self-Supervised Fusion for Multi-Modal Medical Images via Contrastive Auto-Encoding and Convolutional Information Exchange. IEEE COMPUT INTELL M 2023. [DOI: 10.1109/mci.2022.3223487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
23
|
Fusion of visible and infrared images using GE-WA model and VGG-19 network. Sci Rep 2023; 13:190. [PMID: 36604536 PMCID: PMC9814665 DOI: 10.1038/s41598-023-27391-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 01/02/2023] [Indexed: 01/06/2023] Open
Abstract
For the low computational efficiency, the existence of false targets, blurred targets, and halo occluded targets of existing image fusion models, a novel fusion method of visible and infrared images using GE-WA model and VGG-19 network is proposed. First, Laplacian is used to decompose the visible and infrared images into basic images and detail content. Next, a Gaussian estimation function is constructed, and a basic fusion scheme using the GE-WA model is designed to obtain a basic fusion image that eliminates halo of visible image. Then, the pre-trained VGG-19 network and the multi-layer fusion strategy are used to extract the fusion of different depth features of the visible and infrared images, and also obtain the fused detail content with different depth features. Finally, the fusion image is reconstructed by the basic image and detail content after fusion. The experiments show that the comprehensive evaluation FQ of the proposed method is better than other comparison methods, and has better performance in the aspects of image fusion speed, halo elimination of visible image, and image fusion quality, which is more suitable for visible and infrared image fusion in complex environments.
Collapse
|
24
|
Ma W, Wang K, Li J, Yang SX, Li J, Song L, Li Q. Infrared and Visible Image Fusion Technology and Application: A Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:599. [PMID: 36679396 PMCID: PMC9862268 DOI: 10.3390/s23020599] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/25/2022] [Accepted: 12/29/2022] [Indexed: 06/17/2023]
Abstract
The images acquired by a single visible light sensor are very susceptible to light conditions, weather changes, and other factors, while the images acquired by a single infrared light sensor generally have poor resolution, low contrast, low signal-to-noise ratio, and blurred visual effects. The fusion of visible and infrared light can avoid the disadvantages of two single sensors and, in fusing the advantages of both sensors, significantly improve the quality of the images. The fusion of infrared and visible images is widely used in agriculture, industry, medicine, and other fields. In this study, firstly, the architecture of mainstream infrared and visible image fusion technology and application was reviewed; secondly, the application status in robot vision, medical imaging, agricultural remote sensing, and industrial defect detection fields was discussed; thirdly, the evaluation indicators of the main image fusion methods were combined into the subjective evaluation and the objective evaluation, the properties of current mainstream technologies were then specifically analyzed and compared, and the outlook for image fusion was assessed; finally, infrared and visible image fusion was summarized. The results show that the definition and efficiency of the fused infrared and visible image had been improved significantly. However, there were still some problems, such as the poor accuracy of the fused image, and irretrievably lost pixels. There is a need to improve the adaptive design of the traditional algorithm parameters, to combine the innovation of the fusion algorithm and the optimization of the neural network, so as to further improve the image fusion accuracy, reduce noise interference, and improve the real-time performance of the algorithm.
Collapse
Affiliation(s)
- Weihong Ma
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
| | - Kun Wang
- School of Electrical Engineering, Chongqing University of Science & Technology, Chongqing 401331, China
| | - Jiawei Li
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
| | - Simon X. Yang
- Advanced Robotics and Intelligent Systems Laboratory, School of Engineering, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Junfei Li
- Advanced Robotics and Intelligent Systems Laboratory, School of Engineering, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Lepeng Song
- School of Electrical Engineering, Chongqing University of Science & Technology, Chongqing 401331, China
| | - Qifeng Li
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
| |
Collapse
|
25
|
Chen J, Ding J, Yu Y, Gong W. THFuse: An Infrared and Visible Image Fusion Network using Transformer and Hybrid Feature Extractor. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
26
|
Liu Y, Zhou D, Nie R, Hou R, Ding Z, Xia W, Li M. Green fluorescent protein and phase contrast image fusion via Spectral TV filter-based decomposition. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
27
|
Zou D, Yang B. Infrared and low-light visible image fusion based on hybrid multiscale decomposition and adaptive light adjustment. OPTICS AND LASERS IN ENGINEERING 2023; 160:107268. [DOI: 10.1016/j.optlaseng.2022.107268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
28
|
Wu X, Hong D, Chanussot J. UIU-Net: U-Net in U-Net for Infrared Small Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:364-376. [PMID: 37015404 DOI: 10.1109/tip.2022.3228497] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Learning-based infrared small object detection methods currently rely heavily on the classification backbone network. This tends to result in tiny object loss and feature distinguishability limitations as the network depth increases. Furthermore, small objects in infrared images are frequently emerged bright and dark, posing severe demands for obtaining precise object contrast information. For this reason, we in this paper propose a simple and effective "U-Net in U-Net" framework, UIU-Net for short, and detect small objects in infrared images. As the name suggests, UIU-Net embeds a tiny U-Net into a larger U-Net backbone, enabling the multi-level and multi-scale representation learning of objects. Moreover, UIU-Net can be trained from scratch, and the learned features can enhance global and local contrast information effectively. More specifically, the UIU-Net model is divided into two modules: the resolution-maintenance deep supervision (RM-DS) module and the interactive-cross attention (IC-A) module. RM-DS integrates Residual U-blocks into a deep supervision network to generate deep multi-scale resolution-maintenance features while learning global context information. Further, IC-A encodes the local context information between the low-level details and high-level semantic features. Extensive experiments conducted on two infrared single-frame image datasets, i.e., SIRST and Synthetic datasets, show the effectiveness and superiority of the proposed UIU-Net in comparison with several state-of-the-art infrared small object detection methods. The proposed UIU-Net also produces powerful generalization performance for video sequence infrared small object datasets, e.g., ATR ground/air video sequence dataset. The codes of this work are available openly at https://github.com/danfenghong/IEEE_TIP_UIU-Net.
Collapse
|
29
|
Xu R, Liu G, Xie Y, Prasad BD, Qian Y, Xing M. Multiscale feature pyramid network based on activity level weight selection for infrared and visible image fusion. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2022; 39:2193-2204. [PMID: 36520734 DOI: 10.1364/josaa.468627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 10/13/2022] [Indexed: 06/17/2023]
Abstract
At present, deep-learning-based infrared and visible image fusion methods have the problem of extracting insufficient source image features, causing imbalanced infrared and visible information in fused images. To solve the problem, a multiscale feature pyramid network based on activity level weight selection (MFPN-AWS) with a complete downsampling-upsampling structure is proposed. The network consists of three parts: a downsampling convolutional network, an AWS fusion layer, and an upsampling convolutional network. First, multiscale deep features are extracted by downsampling convolutional networks, obtaining rich information of intermediate layers. Second, AWS highlights the advantages of the l1-norm and global pooling dual fusion strategy to describe the characteristics of target saliency and texture detail, and effectively balances the multiscale infrared and visible features. Finally, multiscale fused features are reconstructed by the upsampling convolutional network to obtain fused images. Compared with nine state-of-the-art methods via the publicly available experimental datasets TNO and VIFB, MFPN-AWS reaches more natural and balanced fusion results, such as better overall clarity and salient targets, and achieves optimal values on two metrics: mutual information and visual fidelity.
Collapse
|
30
|
Huo X, Deng Y, Shao K. Infrared and Visible Image Fusion with Significant Target Enhancement. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1633. [PMID: 36359722 PMCID: PMC9689360 DOI: 10.3390/e24111633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/02/2022] [Accepted: 11/07/2022] [Indexed: 06/16/2023]
Abstract
Existing fusion rules focus on retaining detailed information in the source image, but as the thermal radiation information in infrared images is mainly characterized by pixel intensity, these fusion rules are likely to result in reduced saliency of the target in the fused image. To address this problem, we propose an infrared and visible image fusion model based on significant target enhancement, aiming to inject thermal targets from infrared images into visible images to enhance target saliency while retaining important details in visible images. First, the source image is decomposed with multi-level Gaussian curvature filtering to obtain background information with high spatial resolution. Second, the large-scale layers are fused using ResNet50 and maximizing weights based on the average operator to improve detail retention. Finally, the base layers are fused by incorporating a new salient target detection method. The subjective and objective experimental results on TNO and MSRS datasets demonstrate that our method achieves better results compared to other traditional and deep learning-based methods.
Collapse
Affiliation(s)
- Xing Huo
- School of Mathematics, Hefei University of Technology, Hefei 230009, China
| | - Yinping Deng
- School of Mathematics, Hefei University of Technology, Hefei 230009, China
| | - Kun Shao
- School of Software, Hefei University of Technology, Hefei 230009, China
| |
Collapse
|
31
|
Zhang L, Yang X, Wan Z, Cao D, Lin Y. A Real-Time FPGA Implementation of Infrared and Visible Image Fusion Using Guided Filter and Saliency Detection. SENSORS (BASEL, SWITZERLAND) 2022; 22:8487. [PMID: 36366184 PMCID: PMC9655019 DOI: 10.3390/s22218487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/01/2022] [Accepted: 11/02/2022] [Indexed: 06/16/2023]
Abstract
Taking advantage of the functional complementarity between infrared and visible light sensors imaging, pixel-level real-time image fusion based on infrared and visible light images of different resolutions is a promising strategy for visual enhancement, which has demonstrated tremendous potential for autonomous driving, military reconnaissance, video surveillance, etc. Great progress has been made in this field in recent years, but the fusion speed and quality of visual enhancement are still not satisfactory. Herein, we propose a multi-scale FPGA-based image fusion technology with substantially enhanced visual enhancement capability and fusion speed. Specifically, the source images are first decomposed into three distinct layers using guided filter and saliency detection, which are the detail layer, saliency layer and background layer. Fusion weight map of the saliency layer is subsequently constructed using attention mechanism. Afterwards weight fusion strategy is used for saliency layer fusion and detail layer fusion, while weight average fusion strategy is used for the background layer fusion, followed by the incorporation of image enhancement technology to improve the fused image contrast. Finally, high-level synthesis tool is used to design the hardware circuit. The method in the present study is thoroughly tested on XCZU15EG board, which could not only effectively improve the image enhancement capability in glare and smoke environments, but also achieve fast real-time image fusion with 55FPS for infrared and visible images with a resolution of 640 × 470.
Collapse
Affiliation(s)
- Ling Zhang
- The School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Xuefei Yang
- The School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Zhenlong Wan
- National Information Center of GACC, Beijing 100005, China
| | - Dingxin Cao
- The School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Yingcheng Lin
- The School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| |
Collapse
|
32
|
Li B, Lu J, Liu Z, Shao Z, Li C, Du Y, Huang J. AEFusion: A multi-scale fusion network combining Axial attention and Entropy feature Aggregation for infrared and visible images. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
33
|
Ji J, Zhang Y, Hu Y, Li Y, Wang C, Lin Z, Huang F, Yao J. Fusion of Infrared and Visible Images Based on Three-Scale Decomposition and ResNet Feature Transfer. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1356. [PMID: 37420376 DOI: 10.3390/e24101356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/21/2022] [Accepted: 09/23/2022] [Indexed: 07/09/2023]
Abstract
Image fusion technology can process multiple single image data into more reliable and comprehensive data, which play a key role in accurate target recognition and subsequent image processing. In view of the incomplete image decomposition, redundant extraction of infrared image energy information and incomplete feature extraction of visible images by existing algorithms, a fusion algorithm for infrared and visible image based on three-scale decomposition and ResNet feature transfer is proposed. Compared with the existing image decomposition methods, the three-scale decomposition method is used to finely layer the source image through two decompositions. Then, an optimized WLS method is designed to fuse the energy layer, which fully considers the infrared energy information and visible detail information. In addition, a ResNet-feature transfer method is designed for detail layer fusion, which can extract detailed information such as deeper contour structures. Finally, the structural layers are fused by weighted average strategy. Experimental results show that the proposed algorithm performs well in both visual effects and quantitative evaluation results compared with the five methods.
Collapse
Affiliation(s)
- Jingyu Ji
- Department of UAV, Army Engineering University, Shijiazhuang 050003, China
| | - Yuhua Zhang
- Department of UAV, Army Engineering University, Shijiazhuang 050003, China
| | - Yongjiang Hu
- Department of UAV, Army Engineering University, Shijiazhuang 050003, China
| | - Yongke Li
- Department of UAV, Army Engineering University, Shijiazhuang 050003, China
| | - Changlong Wang
- Department of UAV, Army Engineering University, Shijiazhuang 050003, China
| | - Zhilong Lin
- Department of UAV, Army Engineering University, Shijiazhuang 050003, China
| | - Fuyu Huang
- Department of Electronic and Optical Engineering, Army Engineering University, Shijiazhuang 050003, China
| | - Jiangyi Yao
- Equipment Simulation Training Center, Army Engineering University, Shijiazhuang 050003, China
| |
Collapse
|
34
|
Chen X, Teng Z, Liu Y, Lu J, Bai L, Han J. Infrared-Visible Image Fusion Based on Semantic Guidance and Visual Perception. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1327. [PMID: 37420348 DOI: 10.3390/e24101327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/19/2022] [Accepted: 09/19/2022] [Indexed: 07/09/2023]
Abstract
Infrared-visible fusion has great potential in night-vision enhancement for intelligent vehicles. The fusion performance depends on fusion rules that balance target saliency and visual perception. However, most existing methods do not have explicit and effective rules, which leads to the poor contrast and saliency of the target. In this paper, we propose the SGVPGAN, an adversarial framework for high-quality infrared-visible image fusion, which consists of an infrared-visible image fusion network based on Adversarial Semantic Guidance (ASG) and Adversarial Visual Perception (AVP) modules. Specifically, the ASG module transfers the semantics of the target and background to the fusion process for target highlighting. The AVP module analyzes the visual features from the global structure and local details of the visible and fusion images and then guides the fusion network to adaptively generate a weight map of signal completion so that the resulting fusion images possess a natural and visible appearance. We construct a joint distribution function between the fusion images and the corresponding semantics and use the discriminator to improve the fusion performance in terms of natural appearance and target saliency. Experimental results demonstrate that our proposed ASG and AVP modules can effectively guide the image-fusion process by selectively preserving the details in visible images and the salient information of targets in infrared images. The SGVPGAN exhibits significant improvements over other fusion methods.
Collapse
Affiliation(s)
- Xiaoyu Chen
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Zhijie Teng
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yingqi Liu
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Jun Lu
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Lianfa Bai
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Jing Han
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing 210094, China
| |
Collapse
|
35
|
Jia W, Song Z, Li Z. Multi-scale Fusion of Stretched Infrared and Visible Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:6660. [PMID: 36081118 PMCID: PMC9459838 DOI: 10.3390/s22176660] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/24/2022] [Accepted: 08/31/2022] [Indexed: 06/15/2023]
Abstract
Infrared (IR) band sensors can capture digital images under challenging conditions, such as haze, smoke, and fog, while visible (VIS) band sensors seize abundant texture information. It is desired to fuse IR and VIS images to generate a more informative image. In this paper, a novel multi-scale IR and VIS images fusion algorithm is proposed to integrate information from both the images into the fused image and preserve the color of the VIS image. A content-adaptive gamma correction is first introduced to stretch the IR images by using one of the simplest edge-preserving filters, which alleviates excessive luminance shifts and color distortions in the fused images. New contrast and exposedness measures are then introduced for the stretched IR and VIS images to achieve weight matrices that are more in line with their characteristics. The IR and luminance components of the VIS image in grayscale or RGB space are fused by using the Gaussian and Laplacian pyramids. The RGB components of the VIS image are finally expanded to generate the fused image if necessary. Comparisons experimentally demonstrate the effectiveness of the proposed algorithm to 10 different state-of-the-art fusion algorithms in terms of computational cost and quality of the fused images.
Collapse
Affiliation(s)
- Weibin Jia
- School of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, China
| | - Zhihuan Song
- School of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Zhengguo Li
- SRO Department of Institute for Infocomm Research, Singapore 138632, Singapore
| |
Collapse
|
36
|
Yin W, He K, Xu D, Luo Y, Gong J. Adaptive enhanced infrared and visible image fusion using hybrid decomposition and coupled dictionary. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07559-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
37
|
A robust infrared and visible image fusion framework via multi-receptive-field attention and color visual perception. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03952-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
38
|
Yan W, Wu Y, Du C, Xu G. An improved cross-subject spatial filter transfer method for SSVEP-based BCI. J Neural Eng 2022; 19. [PMID: 35850094 DOI: 10.1088/1741-2552/ac81ee] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/18/2022] [Indexed: 11/11/2022]
Abstract
Steady-state visual evoked potential (SSVEP) training feature recognition algorithms utilize user training data to reduce the interference of spontaneous electroencephalogram (EEG) activities on SSVEP response for improved recognition accuracy. The data collection process can be tedious, increasing the mental fatigue of users and also seriously affecting the practicality of SSVEP-based brain-computer interface (BCI) systems. As an alternative, a cross-subject spatial filter transfer (CSSFT) method to transfer an existing user data model with good SSVEP response to new user test data has been proposed. The CSSFT method uses superposition averages of data for multiple blocks of data as transfer data. However, the amplitude and pattern of brain signals are often significantly different across trials. The goal of this study was to improve superposition averaging for the CSSFT method and propose an Ensemble scheme based on ensemble learning, and an Expansion scheme based on matrix expansion. The feature recognition performance was compared for CSSFT and the proposed improved CSSFT method using two public datasets. The results demonstrated that the improved CSSFT method can significantly improve the recognition accuracy and information transmission rate of existing methods. This strategy avoids a tedious data collection process, and promotes the potential practical application of BCI systems.
Collapse
Affiliation(s)
- Wenqiang Yan
- School of Mechanical Engineering, Xi'an Jiaotong University, Xi'an, China, XIANNING WEST ROAD, XI'AN, SHAANXI, 710049, CHINA
| | - Yongcheng Wu
- Xi'an Jiaotong University, XIANNING WEST ROAD, XI'AN, Shaanxi, 710049, CHINA
| | - Chenghang Du
- Xi'an Jiaotong University, XIANNING WEST ROAD, XI'AN, Shaanxi, 710049, CHINA
| | - Guanghua Xu
- Xi'an Jiaotong University, XIANNING WEST ROAD, Xi'an, 710049, CHINA
| |
Collapse
|
39
|
Fusion of infrared and visible images based on discrete cosine wavelet transform and high pass filter. Soft comput 2022. [DOI: 10.1007/s00500-022-07175-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
40
|
NOSMFuse: An infrared and visible image fusion approach based on norm optimization and slime mold architecture. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03591-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
41
|
Multi-Scale Mixed Attention Network for CT and MRI Image Fusion. ENTROPY 2022; 24:e24060843. [PMID: 35741563 PMCID: PMC9222659 DOI: 10.3390/e24060843] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 06/12/2022] [Accepted: 06/16/2022] [Indexed: 01/27/2023]
Abstract
Recently, the rapid development of the Internet of Things has contributed to the generation of telemedicine. However, online diagnoses by doctors require the analyses of multiple multi-modal medical images, which are inconvenient and inefficient. Multi-modal medical image fusion is proposed to solve this problem. Due to its outstanding feature extraction and representation capabilities, convolutional neural networks (CNNs) have been widely used in medical image fusion. However, most existing CNN-based medical image fusion methods calculate their weight maps by a simple weighted average strategy, which weakens the quality of fused images due to the effect of inessential information. In this paper, we propose a CNN-based CT and MRI image fusion method (MMAN), which adopts a visual saliency-based strategy to preserve more useful information. Firstly, a multi-scale mixed attention block is designed to extract features. This block can gather more helpful information and refine the extracted features both in the channel and spatial levels. Then, a visual saliency-based fusion strategy is used to fuse the feature maps. Finally, the fused image can be obtained via reconstruction blocks. The experimental results of our method preserve more textual details, clearer edge information and higher contrast when compared to other state-of-the-art methods.
Collapse
|
42
|
Kong W, Miao Q, Lei Y, Ren C. Guided filter random walk and improved spiking cortical model based image fusion method in NSST domain. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
43
|
Wavelet-based self-supervised learning for multi-scene image fusion. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07242-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
Abstract
Image fusion is one of the most rapidly evolving fields in image processing today, and its applications are widely expanded in various fields. In the field of image fusion, the method based on multi-scale decomposition plays an important role. However, it faces many difficult puzzles, such as the risk of over-smoothing during decomposition, blurring of fusion results, and loss of details. Aiming at these problems, this paper proposes a novel decomposition-based image fusion framework, which overcomes the problems of noise, blurring, and loss of details. Both the symmetry and asymmetry between infrared and visible images are important research hotspots in this paper. The experiments confirmed that the fusion framework outperforms other methods in both subjective observation and objective evaluation.
Collapse
|
45
|
DDGANSE: Dual-Discriminator GAN with a Squeeze-and-Excitation Module for Infrared and Visible Image Fusion. PHOTONICS 2022. [DOI: 10.3390/photonics9030150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Infrared images can provide clear contrast information to distinguish between the target and the background under any lighting conditions. In contrast, visible images can provide rich texture details and are compatible with the human visual system. The fusion of a visible image and infrared image will thus contain both comprehensive contrast information and texture details. In this study, a novel approach for the fusion of infrared and visible images is proposed based on a dual-discriminator generative adversarial network with a squeeze-and-excitation module (DDGANSE). Our approach establishes confrontation training between one generator and two discriminators. The goal of the generator is to generate images that are similar to the source images, and contain the information from both infrared and visible source images. The purpose of the two discriminators is to increase the similarity between the image generated by the generator and the infrared and visible images. We experimentally demonstrated that using continuous adversarial training, DDGANSE outputs images retain the advantages of both infrared and visible images with significant contrast information and rich texture details. Finally, we compared the performance of our proposed method with previously reported techniques for fusing infrared and visible images using both quantitative and qualitative assessments. Our experiments on the TNO dataset demonstrate that our proposed method shows superior performance compared to other similar reported methods in the literature using various performance metrics.
Collapse
|
46
|
Zhang C, Wang K, Tian J. Adaptive brightness fusion method for intraoperative near-infrared fluorescence and visible images. BIOMEDICAL OPTICS EXPRESS 2022; 13:1243-1260. [PMID: 35414996 PMCID: PMC8973195 DOI: 10.1364/boe.446176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/27/2022] [Accepted: 01/27/2022] [Indexed: 06/14/2023]
Abstract
An adaptive brightness fusion method (ABFM) for near-infrared fluorescence imaging is proposed to adapt to different lighting conditions and make the equipment operation more convenient in clinical applications. The ABFM is designed based on the network structure of Attention Unet, which is an image segmentation technique. Experimental results show that ABFM has the function of adaptive brightness adjustment and has better fusion performance in terms of both perception and quantification. Generally, the proposed method can realize an adaptive brightness fusion of fluorescence and visible images to enhance the usability of fluorescence imaging technology during surgery.
Collapse
Affiliation(s)
- Chong Zhang
- Department of Big Data Management and Application, School of International Economics and Management, Beijing Technology and Business University, Beijing 100048, China
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Kun Wang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jie Tian
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
- BUAA-CCMU Advanced Innovation Center for Big Data-Based Precision Medicine, Beijing 100083, China
| |
Collapse
|
47
|
Infrared and Visible Image Fusion Based on Co-Occurrence Analysis Shearlet Transform. REMOTE SENSING 2022. [DOI: 10.3390/rs14020283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This study based on co-occurrence analysis shearlet transform (CAST) effectively combines the latent low rank representation (LatLRR) and the regularization of zero-crossing counting in differences to fuse the heterogeneous images. First, the source images are decomposed by CAST method into base-layer and detail-layer sub-images. Secondly, for the base-layer components with larger-scale intensity variation, the LatLRR, is a valid method to extract the salient information from image sources, and can be applied to generate saliency map to implement the weighted fusion of base-layer images adaptively. Meanwhile, the regularization term of zero crossings in differences, which is a classic method of optimization, is designed as the regularization term to construct the fusion of detail-layer images. By this method, the gradient information concealed in the source images can be extracted as much as possible, then the fusion image owns more abundant edge information. Compared with other state-of-the-art algorithms on publicly available datasets, the quantitative and qualitative analysis of experimental results demonstrate that the proposed method outperformed in enhancing the contrast and achieving close fusion result.
Collapse
|
48
|
Combining Regional Energy and Intuitionistic Fuzzy Sets for Infrared and Visible Image Fusion. SENSORS 2021; 21:s21237813. [PMID: 34883816 PMCID: PMC8659942 DOI: 10.3390/s21237813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 11/15/2021] [Accepted: 11/19/2021] [Indexed: 11/17/2022]
Abstract
To get more obvious target information and more texture features, a new fusion method for the infrared (IR) and visible (VIS) images combining regional energy (RE) and intuitionistic fuzzy sets (IFS) is proposed, and this method can be described by several steps as follows. Firstly, the IR and VIS images are decomposed into low- and high-frequency sub-bands by non-subsampled shearlet transform (NSST). Secondly, RE-based fusion rule is used to obtain the low-frequency pre-fusion image, which allows the important target information preserved in the resulting image. Based on the pre-fusion image, the IFS-based fusion rule is introduced to achieve the final low-frequency image, which enables more important texture information transferred to the resulting image. Thirdly, the ‘max-absolute’ fusion rule is adopted to fuse high-frequency sub-bands. Finally, the fused image is reconstructed by inverse NSST. The TNO and RoadScene datasets are used to evaluate the proposed method. The simulation results demonstrate that the fused images of the proposed method have more obvious targets, higher contrast, more plentiful detailed information, and local features. Qualitative and quantitative analysis results show that the presented method is superior to the other nine advanced fusion methods.
Collapse
|
49
|
Ciprián-Sánchez JF, Ochoa-Ruiz G, Gonzalez-Mendoza M, Rossi L. FIRe-GAN: a novel deep learning-based infrared-visible fusion method for wildfire imagery. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06691-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
50
|
|