201
|
Dou H, Huang Y, Huang Y, Yang X, Zhen C, Zhang Y, Xiong Y, Huang W, Ni D. Standard plane localization using denoising diffusion model with multi-scale guidance. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108619. [PMID: 39919604 DOI: 10.1016/j.cmpb.2025.108619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 12/09/2024] [Accepted: 01/24/2025] [Indexed: 02/09/2025]
Abstract
BACKGROUND AND OBJECTIVE Standard planes (SPs) acquisition is a fundamental yet crucial step in routine ultrasound (US) examinations. Compared to the 2D US, 3D US offers the advantage of capturing multiple SPs in a single scan, and visualizing particular SPs (e.g., the coronal plane of the uterus). However, SPs localization in 3D US is challenging due to the vast 3D search space, anatomical variability, and poor image quality. METHODS In this study, we present a probabilistic method based on the conditional denoising diffusion model for SPs localization in 3D US. Specifically, we construct multi-scale guidance to provide the model with both global and local context. We improve the model's angular sensitivity by modifying the tangent-based plane representation with the spherical coordinates. We also reveal the potential in simultaneously localizing SPs and detecting their abnormality without introducing extra parameters. RESULTS Extensive validations were performed on a large in-house dataset containing 837 patients across two organs with four SPs. The proposed method achieved average errors of less than 10° and 1 mm in terms of the angle and distance on the four investigated SPs. Furthermore, it can obtain over 90% accuracy for detecting anomalies by simply thresholding the quantified uncertainty. CONCLUSIONS The results show that our proposed method significantly outperformed the current state-of-the-art approaches regarding spatial and content metrics across four SPs in two organs, indicating its superiority and generalizability. Meanwhile, the investigated anomaly detection of our method demonstrates its potential in applying clinical practice.
Collapse
Affiliation(s)
- Haoran Dou
- School of Computer Science, University of Leeds, Leeds, UK; Department of Computer Science, University of Manchester, Manchester, UK
| | - Yuhao Huang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Yunzhi Huang
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Xin Yang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Chaojiong Zhen
- Department of Ultrasound, The First People's Hospital of Foshan, Foshan, China
| | - Yuanji Zhang
- Department of Computer Science, University of Manchester, Manchester, UK; National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Shenzhen RayShape Medical Technology Co., Ltd, Shenzhen, China
| | - Yi Xiong
- Department of Ultrasound, Shenzhen Luohu People's Hospital, Shenzhen, China
| | - Weijun Huang
- Department of Ultrasound, The First People's Hospital of Foshan, Foshan, China.
| | - Dong Ni
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China; School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
202
|
Shin Y, Son G, Hwang D, Eo T. Ensemble and low-frequency mixing with diffusion models for accelerated MRI reconstruction. Med Image Anal 2025; 101:103477. [PMID: 39913965 DOI: 10.1016/j.media.2025.103477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 12/10/2024] [Accepted: 01/17/2025] [Indexed: 03/05/2025]
Abstract
Magnetic resonance imaging (MRI) is an important imaging modality in medical diagnosis, providing comprehensive anatomical information with detailed tissue structures. However, the long scan time required to acquire high-quality MR images is a major challenge, especially in urgent clinical scenarios. Although diffusion models have achieved remarkable performance in accelerated MRI, there are several challenges. In particular, they struggle with the long inference time due to the high number of iterations in the reverse process of diffusion models. Additionally, they occasionally create artifacts or 'hallucinate' tissues that do not exist in the original anatomy. To address these problems, we propose ensemble and adaptive low-frequency mixing on the diffusion model, namely ELF-Diff for accelerated MRI. The proposed method consists of three key components in the reverse diffusion step: (1) optimization based on unified data consistency; (2) low-frequency mixing; and (3) aggregation of multiple perturbations of the predicted images for the ensemble in each step. We evaluate ELF-Diff on two MRI datasets, FastMRI and SKM-TEA. ELF-Diff surpasses other existing diffusion models for MRI reconstruction. Furthermore, extensive experiments, including a subtask of pathology detection, further demonstrate the superior anatomical precision of our method. ELF-Diff outperforms the existing state-of-the-art MRI reconstruction methods without being limited to specific undersampling patterns.
Collapse
Affiliation(s)
- Yejee Shin
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea
| | - Geonhui Son
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea
| | - Dosik Hwang
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea; Department of Radiology, College of Dentistry, Yonsei University, Seoul 03722, Republic of Korea; Department of Oral and Maxillofacial Radiology, College of Dentistry, Yonsei University, Seoul 03722, Republic of Korea; Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology, 5, Hwarang-ro 14-gil, Seongbuk-gu, Seoul 02792, Republic of Korea.
| | - Taejoon Eo
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea; Probe Medical, Seoul 03777, Republic of Korea.
| |
Collapse
|
203
|
Zhu M, Wang Z, Wang C, Zeng C, Zeng D, Ma J, Wang Y. VBVT-Net: VOI-Based VVBP-Tensor Network for High-Attenuation Artifact Suppression in Digital Breast Tomosynthesis Imaging. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1953-1968. [PMID: 40030817 DOI: 10.1109/tmi.2024.3522242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
High-attenuation (HA) artifacts may lead to obscured subtle lesions and lesion over-estimation in digital breast tomosynthesis (DBT) imaging. High-attenuation artifact suppression (HAAS) is vital for widespread DBT applications in clinic. The conventional HAAS methods usually rely on the segmentation accuracy of HA objects and manual weighting schemes, without considering the geometry information in DBT reconstruction. And the global weighted strategy designed for HA artifacts may decrease the resolution in low-contrast soft-tissue regions. Moreover, the view-by-view backprojection tensor (VVBP-Tensor) domain has recently developed as a new intermediary domain that contains the lossless information in projection domain and the structural details in image domain. Therefore, we propose a VOI-Based VVBP-Tensor Network (VBVT-Net) for HAAS task in DBT imaging, which learns a local implicit weighted strategy based on the analytical FDK reconstruction mechanism. Specifically, the VBVT-Net method incorporates a volume of interest (VOI) recognition sub-network and a HAAS sub-network. The VOI recognition sub-network automatically extracts all 4D VVBP-Tensor patches containing HA artifacts. The HAAS sub-network reduces HA artifacts in these 4D VVBP-Tensor patches by leveraging the ray-trace backprojection features and extra neighborhood information. All results on four datasets demonstrate that the proposed VBVT-Net method could accurately detect HA regions, effectively reduce HA artifacts and simultaneously preserve structures in soft-tissue background regions. The proposed VBVT-Net method has a good interpretability as a general variant of the weighted FDK algorithm, which is potential to be applied in the next generation DBT prototype system in the future.
Collapse
|
204
|
Zhang W, Yang D, Che H, Ran AR, Cheung CY, Chen H. Unpaired Optical Coherence Tomography Angiography Image Super-Resolution via Frequency-Aware Inverse-Consistency GAN. IEEE J Biomed Health Inform 2025; 29:2695-2705. [PMID: 40030303 DOI: 10.1109/jbhi.2024.3506575] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
For optical coherence tomography angiography (OCTA) images, the limited scanning rate leads to a trade-off between field-of-view (FOV) and imaging resolution. Although larger FOV images may reveal more parafoveal vascular lesions, their application is hampered due to lower resolution. To increase the resolution, previous works only achieved satisfactory performance by using paired data for training, but real-world applications are limited by the challenge of collecting large-scale paired images. Thus, an unpaired approach is highly demanded. Generative Adversarial Network (GAN) has been commonly used in the unpaired setting, but it may struggle to accurately preserve fine-grained capillary details, which are critical biomarkers for OCTA. In this paper, our approach aspires to preserve these details by leveraging the frequency information, which represents details as high-frequencies (${\bm {hf}}$) and coarse-grained features as low-frequencies (${\bm {lf}}$). We propose a GAN-based unpaired super-resolution method for OCTA images and exceptionally emphasize ${\bm {hf}}$ fine capillaries through a dual-path generator. To facilitate a precise spectrum of the reconstructed image, we also propose a frequency-aware adversarial loss for the discriminator and introduce a frequency-aware focal consistency loss for end-to-end optimization. We collected a paired dataset for evaluation and showed that our method outperforms other state-of-the-art unpaired methods both quantitatively and visually.
Collapse
|
205
|
Li Y, Liao YP, Wang J, Lu W, Zhang Y. Patient-specific MRI super-resolution via implicit neural representations and knowledge transfer. Phys Med Biol 2025; 70:075021. [PMID: 40064110 DOI: 10.1088/1361-6560/adbed4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Accepted: 03/10/2025] [Indexed: 04/02/2025]
Abstract
Objective.Magnetic resonance imaging (MRI) is a non-invasive imaging technique that provides high soft tissue contrast, playing a vital role in disease diagnosis and treatment planning. However, due to limitations in imaging hardware, scan time, and patient compliance, the resolution of MRI images is often insufficient. Super-resolution (SR) techniques can enhance MRI resolution, reveal more detailed anatomical information, and improve the identification of complex structures, while also reducing scan time and patient discomfort. However, traditional population-based models trained on large datasets may introduce artifacts or hallucinated structures, which compromise their reliability in clinical applications.Approach.To address these challenges, we propose a patient-specific knowledge transfer implicit neural representation (KT-INR) SR model. The KT-INR model integrates a dual-head INR with a pre-trained generative adversarial network (GAN) model trained on a large-scale dataset. Anatomical information from different MRI sequences of the same patient, combined with the SR mappings learned by the GAN model on a population-based dataset, is transferred as prior knowledge to the INR. This integration enhances both the performance and reliability of the SR model.Main results.We validated the effectiveness of the KT-INR model across three distinct clinical SR tasks on the brain tumor segmentation dataset. For task 1, KT-INR achieved an average structural similarity index, peak signal-to-noise ratio, and learned perceptual image patch similarity of 0.9813, 36.845, and 0.0186, respectively. In comparison, a state-of-the-art SR technique, ArSSR, attained average values of 0.9689, 33.4557, and 0.0309 for the same metrics. The experimental results demonstrate that KT-INR outperforms all other methods across all tasks and evaluation metrics, with particularly remarkable performance in resolving fine anatomical details.Significance.The KT-INR model significantly enhances the reliability of SR results, effectively addressing the hallucination effects commonly seen in traditional models. It provides a robust solution for patient-specific MRI SR.
Collapse
Affiliation(s)
- Yunxiang Li
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX 75390, United States of America
| | - Yen-Peng Liao
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX 75390, United States of America
| | - Jing Wang
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX 75390, United States of America
| | - Weiguo Lu
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX 75390, United States of America
| | - You Zhang
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX 75390, United States of America
| |
Collapse
|
206
|
Li A, Zhang L, Liu Y, Zhu C. Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:3141-3158. [PMID: 40031155 DOI: 10.1109/tpami.2025.3529927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.
Collapse
|
207
|
Solak M, Tören M, Asan B, Kaba E, Beyazal M, Çeliker FB. Generative Adversarial Network Based Contrast Enhancement: Synthetic Contrast Brain Magnetic Resonance Imaging. Acad Radiol 2025; 32:2220-2232. [PMID: 39694785 DOI: 10.1016/j.acra.2024.11.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 11/05/2024] [Accepted: 11/10/2024] [Indexed: 12/20/2024]
Abstract
RATIONALE AND OBJECTIVES Magnetic resonance imaging (MRI) is a vital tool for diagnosing neurological disorders, frequently utilising gadolinium-based contrast agents (GBCAs) to enhance resolution and specificity. However, GBCAs present certain risks, including side effects, increased costs, and repeated exposure. This study proposes an innovative approach using generative adversarial networks (GANs) for virtual contrast enhancement in brain MRI, with the aim of reducing or eliminating GBCAs, minimising associated risks, and enhancing imaging efficiency while preserving diagnostic quality. MATERIAL AND METHODS In this study, 10,235 images were acquired in a 3.0 Tesla MRI scanner from 81 participants (54 females, 27 males; mean age 35 years, range 19-68 years). T1-weighted and contrast-enhanced images were obtained following the administration of a standard dose of a GBCA. In order to generate "synthetic" images for contrast-enhanced T1-weighted, a CycleGAN model, a sub-model of the GAN structure, was trained to process pre- and post-contrast images. The dataset was divided into three subsets: 80% for training, 10% for validation, and 10% for testing. TensorBoard was employed to prevent image deterioration throughout the training phase, and the image processing and training procedures were optimised. The radiologists were presented with a non-contrast input image and asked to choose between a real contrast-enhanced image and synthetic MR images generated by CycleGAN corresponding to this non-contrast MR image (Turing test). RESULTS The performance of the CycleGAN model was evaluated using a combination of quantitative and qualitative analyses. For the entire dataset, in the test set, the mean square error (MSE) was 0.0038, while the structural similarity index (SSIM) was 0.58. Among the submodels, the most successful model achieved an MSE of 0.0053, while the SSIM was 0.8. The qualitative evaluation was validated through a visual Turing test conducted by four radiologists with varying levels of clinical experience. CONCLUSION The findings of this study support the efficacy of the CycleGAN model in generating synthetic contrast-enhanced T1-weighted brain MR images. Both quantitative and qualitative evaluations demonstrated excellent performance, confirming the model's ability to produce realistic synthetic images. This method shows promise in potentially eliminating the need for intravenous contrast agents, thereby minimising the associated risks of their use.
Collapse
Affiliation(s)
- Merve Solak
- Recep Tayyip Erdogan University, Department of Radiology, Rize, Turkey (M.S., E.K., M.B., F.B.C.)
| | - Murat Tören
- Recep Tayyip Erdogan University, Department of Electrical and Electronics Engineering, Rize, Turkey (M.T., B.A.)
| | - Berkutay Asan
- Recep Tayyip Erdogan University, Department of Electrical and Electronics Engineering, Rize, Turkey (M.T., B.A.)
| | - Esat Kaba
- Recep Tayyip Erdogan University, Department of Radiology, Rize, Turkey (M.S., E.K., M.B., F.B.C.)
| | - Mehmet Beyazal
- Recep Tayyip Erdogan University, Department of Radiology, Rize, Turkey (M.S., E.K., M.B., F.B.C.)
| | - Fatma Beyazal Çeliker
- Recep Tayyip Erdogan University, Department of Radiology, Rize, Turkey (M.S., E.K., M.B., F.B.C.).
| |
Collapse
|
208
|
Antunes J, Young T, Pittock D, Jacobs P, Nelson A, Piper J, Deshpande S. Assessing multiple MRI sequences in deep learning-based synthetic CT generation for MR-only radiation therapy of head and neck cancers. Radiother Oncol 2025; 205:110782. [PMID: 39929288 DOI: 10.1016/j.radonc.2025.110782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 01/29/2025] [Accepted: 02/07/2025] [Indexed: 02/19/2025]
Abstract
PURPOSE This study investigated the effect of multiple magnetic resonance (MR) sequences on the quality of deep-learning-based synthetic computed tomography (sCT) generation in the head and neck region. MATERIALS AND METHODS 12 MR series (T1pre-, T1post-contrast, T2 each with 4 Dixon images) were collected from 26 patients with head and neck cancers. 14 unique deep-learning models using the U-Net framework were trained using multiple MRs as inputs to generate sCTs. Mean absolute error (MAE), Dice Similarity Coefficient (DSC), as well as Gamma pass rates were used to compare sCTs to the actual CT across the different multi-channel MR-sCT models. RESULTS Using all available MR series yielded sCTs with the lowest pixel-wise error (MAE = 80.5 ± 9.9 HU), but increasing channels also increased artificial tissue which led to poorer auto-contouring and lower dosimetric accuracy. Models with T2 protocols generally resulted in poorer quality sCTs. Pre-contrast T1 with all Dixon images was the best multi-channel MR-sCT model, consistently ranking high for all sCT quality measurements (average DSC across all structures = 80.0 % ± 13.6 %, global Gamma Pass Rate = 97.9 % ± 1.7 % at 2 %/2mm dose criterion and 20 % of max dose threshold). CONCLUSIONS Deep-learning networks using all Dixon images from a pre-contrast T1 sequence as multi-channel inputs produced the most clinically viable sCTs. Our proposed method may enable MR-only radiotherapy planning in a clinical setting for head and neck cancers.
Collapse
Affiliation(s)
| | - Tony Young
- Liverpool and Macarthur Cancer Therapy Centres, Sydney, Australia; Ingham Institute, Sydney, Australia
| | | | - Paul Jacobs
- MIM Software Inc, Cleveland, OH, United States
| | | | - Jon Piper
- MIM Software Inc, Cleveland, OH, United States
| | - Shrikant Deshpande
- Ingham Institute, Sydney, Australia; South Western Sydney Clinical School, University of New South Wales, Sydney, Australia
| |
Collapse
|
209
|
Vyver GVD, Måsøy SE, Dalen H, Grenne BL, Holte E, Olaisen SH, Nyberg J, Østvik A, Løvstakken L, Smistad E. Regional Image Quality Scoring for 2-D Echocardiography Using Deep Learning. ULTRASOUND IN MEDICINE & BIOLOGY 2025; 51:638-649. [PMID: 39864961 DOI: 10.1016/j.ultrasmedbio.2024.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/16/2024] [Accepted: 12/18/2024] [Indexed: 01/28/2025]
Abstract
OBJECTIVE To develop and compare methods to automatically estimate regional ultrasound image quality for echocardiography separate from view correctness. METHODS Three methods for estimating image quality were developed: (i) classic pixel-based metric: the generalized contrast-to-noise ratio (gCNR), computed on myocardial segments (region of interest) and left ventricle lumen (background), extracted by a U-Net segmentation model; (ii) local image coherence: the average local coherence as predicted by a U-Net model that predicts image coherence from B-mode ultrasound images at the pixel level; (iii) deep convolutional network: an end-to-end deep-learning model that predicts the quality of each region in the image directly. These methods were evaluated against manual regional quality annotations provided by three experienced cardiologists. RESULTS The results indicated poor performance of the gCNR metric, with Spearman correlation to annotations of ρ = 0.24. The end-to-end learning model obtained the best result, ρ = 0.69, comparable to the inter-observer correlation, ρ = 0.63. Finally, the coherence-based method, with ρ = 0.58, out-performed the classical metrics and was more generic than the end-to-end approach. CONCLUSION The deep convolutional network provided the most accurate regional quality prediction, while the coherence-based method offered a more generalizable solution. gCNR showed limited effectiveness in this study. The image quality prediction tool is available as an open-source Python library at https://github.com/GillesVanDeVyver/arqee.
Collapse
Affiliation(s)
- Gilles Van De Vyver
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway.
| | - Svein-Erik Måsøy
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway
| | - Håvard Dalen
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway; St. Olavs Hospital, Trondheim, Norway
| | - Bjørnar Leangen Grenne
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway; St. Olavs Hospital, Trondheim, Norway
| | - Espen Holte
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway; St. Olavs Hospital, Trondheim, Norway
| | - Sindre Hellum Olaisen
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway
| | - John Nyberg
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway
| | - Andreas Østvik
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway; Health Research, SINTEF, Trondheim, Norway
| | - Lasse Løvstakken
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway
| | - Erik Smistad
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway; Health Research, SINTEF, Trondheim, Norway
| |
Collapse
|
210
|
Afrakhteh S, Demi L. Mitigating high frame rate demands in shear wave elastography using radial basis function-based reconstruction: An experimental phantom study. ULTRASONICS 2025; 148:107542. [PMID: 39674075 DOI: 10.1016/j.ultras.2024.107542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 11/26/2024] [Accepted: 11/29/2024] [Indexed: 12/16/2024]
Abstract
BACKGROUND Shear wave elastography (SWE) is a technique that quantifies tissue stiffness by assessing the speed of shear waves propagating after being excited by acoustic radiation force. SWE allows the quantification of elastic tissue properties and serves as an adjunct to conventional ultrasound techniques, aiding in tissue characterization. To capture this transient propagation of the shear wave, the ultrasound device must be able to reach very high frame rates. METHODOLOGY In this paper, our aim is to relax the high frame rate requirement for SWE imaging. To this end, we propose lower frame rate SWE imaging followed by employing a 2-dimensional (2D) radial basis functions (RBF)-based interpolation. More specifically, the process involves obtaining low frame rate data and then temporal upsampling to reach a synthetic high frame rate data by inserting the 'UpS-1' image frames with missing values between two successive image frames (UpS: Upsampling rate). Finally, we apply the proposed interpolation technique to reconstruct the missing values within the incomplete high frame rate data. RESULTS AND CONCLUSION The results obtained from employing the proposed model on two experimental datasets indicate that we can relax the frame rate requirement of SWE imaging by a factor of 4 while maintaining shear wave speed (SWS), group velocity, and phase velocity estimates closely align with the high frame rate SWE model so that the error is less than 3%. Furthermore, analysis of the structural similarity index (SSIM) and root mean squared error (RMSE) on the 2D-SWS maps highlights the efficacy of the suggested technique in enhancing local SWS estimates, even at a downsampling (DS) factor of 4. For DS≤4, the SSIM values between the 2D-SWS maps produced by the proposed technique and those generated by the original high frame rate data consistently remain above 0.94. Additionally, the RMSE values is below 0.37 m/s, indicating promising performance of the proposed technique in reconstruction of SWS values.
Collapse
Affiliation(s)
- Sajjad Afrakhteh
- Department of Information Engineering and Computer Science, University of Trento, Italy.
| | - Libertario Demi
- Department of Information Engineering and Computer Science, University of Trento, Italy
| |
Collapse
|
211
|
Weigand-Whittier J, Wendland M, Lam B, Velasquez M, Vandsburger MH. Ungated, plug-and-play preclinical cardiac CEST-MRI using radial FLASH with segmented saturation. Magn Reson Med 2025; 93:1793-1806. [PMID: 39607872 PMCID: PMC11785487 DOI: 10.1002/mrm.30382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/10/2024] [Accepted: 11/07/2024] [Indexed: 11/30/2024]
Abstract
PURPOSE Electrocardiography (ECG) and respiratory-gated preclinical cardiac CEST-MRI acquisitions are difficult because of variable saturation recovery with T1, RF interference in the ECG signal, and offset-to-offset variation in Z-magnetization and cardiac phase introduced by changes in cardiac frequency and trigger delays. METHODS The proposed method consists of segmented saturation modules with radial FLASH readouts and golden angle progression. The segmented saturation blocks drive the system to steady-state, and because center k-space is sampled repeatedly, steady-state saturation dominates contrast during gridding and reconstruction. Ten complete Z-spectra were acquired in healthy mice using both ECG and respiratory-gated and ungated methods. Z-spectra were also acquired at multiple saturation B1 values to optimize for amide and Cr contrasts. RESULTS There was no significant difference between CEST contrasts (amide, Cr, magnetization transfer) calculated from images acquired using ECG and respiratory-gated and ungated methods (p = 0.27, 0.11, 0.47). A saturation power of 1.8μT provides optimal contrast amplitudes for both amide and total Cr contrast without significantly complicating CEST contrast quantification because of water direct saturation, magnetization transfer, and RF spillover between amide and Cr pools. Further, variability in CEST contrast measurements was significantly reduced using the ungated radial FLASH acquisition (p = 0.002, 0.006 for amide and Cr, respectively). CONCLUSION This method enables CEST mapping in the murine myocardium without the need for cardiac or respiratory gating. Quantitative CEST contrasts are consistent with those obtained using gated sequences, and per-contrast variance is significantly reduced. This approach makes preclinical cardiac CEST-MRI easily accessible, even for investigators without prior experience in cardiac imaging.
Collapse
Affiliation(s)
| | - Michael Wendland
- Berkeley Preclinical Imaging Core, University of California Berkeley, Berkeley, CA, USA
| | - Bonnie Lam
- Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA
| | - Mark Velasquez
- Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA
| | | |
Collapse
|
212
|
Yang Z, Yang LT, Wang H, Zhao H, Liu D. Bayesian Nonnegative Tensor Completion With Automatic Rank Determination. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:2036-2051. [PMID: 40053614 DOI: 10.1109/tip.2024.3459647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]
Abstract
Nonnegative CANDECOMP/PARAFAC (CP) factorization of incomplete tensors is a powerful technique for finding meaningful and physically interpretable latent factor matrices to achieve nonnegative tensor completion. However, most existing nonnegative CP models rely on manually predefined tensor ranks, which introduces uncertainty and leads the models to overfit or underfit. Although the presence of CP models within the probabilistic framework can estimate rank better, they lack the ability to learn nonnegative factors from incomplete data. In addition, existing approaches tend to focus on point estimation and ignore estimating uncertainty. To address these issues within a unified framework, we propose a fully Bayesian treatment of nonnegative tensor completion with automatic rank determination. Benefitting from the Bayesian framework and the hierarchical sparsity-inducing priors, the model can provide uncertainty estimates of nonnegative latent factors and effectively obtain low-rank structures from incomplete tensors. Additionally, the proposed model can mitigate problems of parameter selection and overfitting. For model learning, we develop two fully Bayesian inference methods for posterior estimation and propose a hybrid computing strategy that reduces the time overhead for large-scale data significantly. Extensive simulations on synthetic data demonstrate that our model can recover missing data with high precision and automatically estimate CP rank from incomplete tensors. Moreover, results from real-world applications demonstrate that our model is superior to state-of-the-art methods in image and video inpainting. The code is available at https://github.com/zecanyang/BNTC.
Collapse
|
213
|
Beizaee F, Lodygensky GA, Adamson CL, Thompson DK, Cheong JLY, Spittle AJ, Anderson PJ, Desrosiers C, Dolz J. Harmonizing flows: Leveraging normalizing flows for unsupervised and source-free MRI harmonization. Med Image Anal 2025; 101:103483. [PMID: 39919411 DOI: 10.1016/j.media.2025.103483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 01/21/2025] [Accepted: 01/25/2025] [Indexed: 02/09/2025]
Abstract
Lack of standardization and various intrinsic parameters for magnetic resonance (MR) image acquisition results in heterogeneous images across different sites and devices, which adversely affects the generalization of deep neural networks. To alleviate this issue, this work proposes a novel unsupervised harmonization framework that leverages normalizing flows to align MR images, thereby emulating the distribution of a source domain. The proposed strategy comprises three key steps. Initially, a normalizing flow network is trained to capture the distribution characteristics of the source domain. Then, we train a shallow harmonizer network to reconstruct images from the source domain via their augmented counterparts. Finally, during inference, the harmonizer network is updated to ensure that the output images conform to the learned source domain distribution, as modeled by the normalizing flow network. Our approach, which is unsupervised, source-free, and task-agnostic is assessed in the context of both adults and neonatal cross-domain brain MRI segmentation, as well as neonatal brain age estimation, demonstrating its generalizability across tasks and population demographics. The results underscore its superior performance compared to existing methodologies. The code is available at https://github.com/farzad-bz/Harmonizing-Flows.
Collapse
Affiliation(s)
- Farzad Beizaee
- LIVIA, ÉTS, Montreal, Quebec, Canada; ILLS , McGill - ETS - Mila - CNRS - Université Paris-Saclay - CentraleSupelec, Canada; CHU Sainte-Justine, University of Montreal, Montreal, Canada.
| | - Gregory A Lodygensky
- CHU Sainte-Justine, University of Montreal, Montreal, Canada; Canadian Neonatal Brain Platform, Montreal, Canada
| | - Chris L Adamson
- Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Deanne K Thompson
- Murdoch Children's Research Institute, Parkville, Victoria, Australia; School of Psychological Sciences, Monash University, Clayton, Victoria, Australia; Department of Paediatrics, The University of Melbourne, Victoria, Australia
| | - Jeanie L Y Cheong
- Murdoch Children's Research Institute, Parkville, Victoria, Australia; Department of Paediatrics, The University of Melbourne, Victoria, Australia; The Royal Women's Hospital, Melbourne, Parkville, Victoria, Australia; Department of Obstetrics and Gynaecology, The University of Melbourne, Victoria, Australia
| | - Alicia J Spittle
- Murdoch Children's Research Institute, Parkville, Victoria, Australia; The Royal Women's Hospital, Melbourne, Parkville, Victoria, Australia; Department of Physiotherapy, The University of Melbourne, Victoria, Australia
| | - Peter J Anderson
- Murdoch Children's Research Institute, Parkville, Victoria, Australia; School of Psychological Sciences, Monash University, Clayton, Victoria, Australia
| | - Christian Desrosiers
- LIVIA, ÉTS, Montreal, Quebec, Canada; ILLS , McGill - ETS - Mila - CNRS - Université Paris-Saclay - CentraleSupelec, Canada
| | - Jose Dolz
- LIVIA, ÉTS, Montreal, Quebec, Canada; ILLS , McGill - ETS - Mila - CNRS - Université Paris-Saclay - CentraleSupelec, Canada
| |
Collapse
|
214
|
Meng B, Zhou J, Yang H, Liu J, Pu Y. DFCL: Dual-pathway fusion contrastive learning for blind single-image visible watermark removal. Neural Netw 2025; 184:107077. [PMID: 39793490 DOI: 10.1016/j.neunet.2024.107077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 11/06/2024] [Accepted: 12/19/2024] [Indexed: 01/13/2025]
Abstract
Digital image watermarking is a prevalent method for image copyright protection. As watermark embedding techniques evolve, research in copyright protection has increasingly extended into watermark removal. Recent advancements in deep learning and generative technologies have led to the development of public watermark removal solutions, addressing issues such as plagiarized, illegal, or outdated watermarks while driving significant improvements in robust watermark embedding. Traditional image restoration often requires the manual selection of watermark mask regions, while common blind visible watermark removal techniques struggle with watermark detection accuracy and post-removal visual quality. To address these challenges, this paper introduces a dual-pathway fusion contrastive learning approach for blind single-image visible watermark removal. We conduct dual-pathway training of the image and gradient map, enhancing high-frequency feature acquisition and the accuracy of watermark spatial positioning through feature fusion. Additionally, contrastive learning ensures that the results closely resemble the original watermark-free images while distancing themselves from watermark content, resulting in improved background visual quality. Importantly, our blind watermark removal algorithm does not require additional watermark images or mask regions. Extensive experiments on three challenging benchmark datasets demonstrate the effectiveness of our approach in overcoming the limitations of existing methods.
Collapse
Affiliation(s)
- Bin Meng
- School of Cyber Science and Engineering, Sichuan University, Chengdu, 610207, Sichuan, China.
| | - Jiliu Zhou
- College of Computer Science, Sichuan University, Chengdu, 610065, Sichuan, China; School of Computer Science, Chengdu University of Information Technology, Chengdu, 610255, Sichuan, China.
| | - Haoran Yang
- College of Electronics and Information Engineering, Sichuan University, Chengdu, 610065, Sichuan, China; School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.
| | - Jiayong Liu
- School of Cyber Science and Engineering, Sichuan University, Chengdu, 610207, Sichuan, China.
| | - Yifei Pu
- College of Computer Science, Sichuan University, Chengdu, 610065, Sichuan, China.
| |
Collapse
|
215
|
Wu M, Zhang L, Yap PT, Zhu H, Liu M. Disentangled latent energy-based style translation: An image-level structural MRI harmonization framework. Neural Netw 2025; 184:107039. [PMID: 39700825 PMCID: PMC11802304 DOI: 10.1016/j.neunet.2024.107039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 10/17/2024] [Accepted: 12/07/2024] [Indexed: 12/21/2024]
Abstract
Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at image level. However, existing methods generally suffer from high computational requirements and limited generalizability, restricting their applicability to unseen MRIs. In this paper, we design a novel disentangled latent energy-based style translation (DLEST) framework for unpaired image-level MRI harmonization, consisting of (a) site-invariant image generation (SIG), (b) site-specific style translation (SST), and (c) site-specific MRI synthesis (SMS). Specifically, the SIG employs a latent autoencoder to encode MRIs into a low-dimensional latent space and reconstruct MRIs from latent codes. The SST utilizes an energy-based model to comprehend global latent distribution of a target domain and translate source latent codes towards the target domain, while SMS enables MRI synthesis with a target-specific style. By disentangling image generation and style translation in latent space, the DLEST can achieve efficient style translation. Our model was trained on T1-weighted MRIs from a public dataset (with 3,984 subjects across 58 acquisition sites/settings) and validated on an independent dataset (with 9 traveling subjects scanned in 11 sites/settings) in four tasks: histogram and feature visualization, site classification, brain tissue segmentation, and site-specific structural MRI synthesis. Qualitative and quantitative results demonstrate the superiority of our method over several state-of-the-arts.
Collapse
Affiliation(s)
- Mengqi Wu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill and North Carolina State University, Chapel Hill, NC 27599, USA
| | - Lintao Zhang
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Pew-Thian Yap
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
216
|
Qu J, Wu X, Dong W, Cui J, Li Y. IR&ArF: Toward Deep Interpretable Arbitrary Resolution Fusion of Unregistered Hyperspectral and Multispectral Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1934-1949. [PMID: 40126967 DOI: 10.1109/tip.2025.3551531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
The fusion of hyperspectral image (HSI) and multispectral image (MSI) is an effective mean to improve the inherent defect of low spatial resolution of HSI. However, existing fusion methods usually rigidly upgrade the spatial resolution of HSI to that of matching MSI under the ideal assumption that multi-source images are accurately registered. In real scenes where multi-source images are difficult to be perfectly registered and the spatial resolution requirements are dynamically different, these fusion algorithms is difficult to be effectively deployed. To this end, we construct the spatial-spectral consistent arbitrary scale observation model (S2cAsOM) to model the dependence between the unregistered HSI and MSI and the ideal arbitrary resolution HSI. Furthermore, an optimization algorithm is designed to solve S2cAsOM, and a deep interpretable arbitrary resolution fusion network (IR&ArF) is proposed to simulate the optimization process, which achieves the model-data dual-driven arbitrary resolution fusion of unregistered HSI and MSI. IR&ArF breaks the dependence of traditional fusion methods on the accuracy of image registration in a robust way, and can flexibly cope with the dynamic requirements of diverse applications for the spatial resolution of HSI, which improves the application ability of HSI fusion in real scenes. Extensive systematic experiments demonstrate the superiority and generalization of the proposed method. Source code of the proposed method is available on https://github.com/Jiahuiqu/IR-ArF.
Collapse
|
217
|
Li L, Hou J, Liu W, Fang Y, Yan J. Diffusion-Based Facial Aesthetics Enhancement With 3D Structure Guidance. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1879-1894. [PMID: 40117164 DOI: 10.1109/tip.2025.3551077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]
Abstract
Facial Aesthetics Enhancement (FAE) aims to improve facial attractiveness by adjusting the structure and appearance of a facial image while preserving its identity as much as possible. Most existing methods adopted deep feature-based or score-based guidance for generation models to conduct FAE. Although these methods achieved promising results, they potentially produced excessively beautified results with lower identity consistency or insufficiently improved facial attractiveness. To enhance facial aesthetics with less loss of identity, we propose the Nearest Neighbor Structure Guidance based on Diffusion (NNSG-Diffusion), a diffusion-based FAE method that beautifies a 2D facial image with 3D structure guidance. Specifically, we propose to extract FAE guidance from a nearest neighbor reference face. To allow for less change of facial structures in the FAE process, a 3D face model is recovered by referring to both the matched 2D reference face and the 2D input face, so that the depth and contour guidance can be extracted from the 3D face model. Then the depth and contour clues can provide effective guidance to Stable Diffusion with ControlNet for FAE. Extensive experiments demonstrate that our method is superior to previous relevant methods in enhancing facial aesthetics while preserving facial identity.
Collapse
|
218
|
Wang J, Yang H, Liu Z, Chen H. SSDDPM: A single SAR image generation method based on denoising diffusion probabilistic model. Sci Rep 2025; 15:10867. [PMID: 40157974 PMCID: PMC11954893 DOI: 10.1038/s41598-025-95106-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 03/19/2025] [Indexed: 04/01/2025] Open
Abstract
The limited availability of high-quality SAR images severely affects the accuracy and robustness of target detection, classification, and segmentation. To solve this problem, a novel image generation method based on a diffusion model is introduced that requires only one training sample to generate a realistic SAR image. We propose a single-scale architecture to avoid image noise accumulation. In addition, an attention module for the sampling layer in the generator for improving feature extraction is designed. Then, an information-guided attention module is proposed to suppress redundant information. Ship targets were selected as the research objects, and the proposed method was tested using an open-source dataset. We also built our own Sentinel-1 dataset to increase the number of challenges. The experimental results show that our method is optimal compared with the classical method SinGAN. Specifically, the SIFID is decreased from 4.80 × 10^(-4) to 1.66 × 10^(-7), the SSIM is improved from 0.07 to 0.51, and the LPIPS is decreased from 0.61 to 0.23. Compared with that of ExSinGAN, generation diversity increases by 27.35%.
Collapse
Affiliation(s)
- Jinyu Wang
- Space Engineering University, Beijing, 101416, China
| | - Haitao Yang
- Space Engineering University, Beijing, 101416, China
| | - Zhengjun Liu
- School of Physics, Harbin Institute of Technology, Harbin, 150001, China
| | - Hang Chen
- Space Engineering University, Beijing, 101416, China.
| |
Collapse
|
219
|
Wan D, Jiang X, Yu Q. Blind HDR image quality assessment based on aggregating perception and inference features. Sci Rep 2025; 15:10808. [PMID: 40155481 PMCID: PMC11953282 DOI: 10.1038/s41598-025-94005-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2024] [Accepted: 03/11/2025] [Indexed: 04/01/2025] Open
Abstract
High Dynamic Range (HDR) images, with their expanded range of brightness and color, provide a far more realistic and immersive viewing experience compared to Low Dynamic Range (LDR) images. However, the significant increase in peak luminance and contrast inherent in HDR images often accentuates artifacts, thus limiting the effectiveness of traditional LDR-based image quality assessment (IQA) algorithms when applied to HDR content. To address this, we propose a novel blind IQA method tailored specifically for HDR images, which incorporates both the perception and inference processes of the human visual system (HVS). Our approach begins with multi-scale Retinex decomposition to generate reflectance maps with varying sensitivity, followed by the calculation of gradient similarities from these maps to model the perception process. Deep feature maps are then extracted from the last pooling layer of a pretrained VGG16 network to capture inference characteristics. These gradient similarity maps and deep feature maps are subsequently aggregated for quality prediction using support vector regression (SVR). Experimental results demonstrate that the proposed method achieves outstanding performance, outperforming other state-of-the-art HDR IQA metrics.
Collapse
Affiliation(s)
- Donghui Wan
- State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, 100024, China.
- School of Electronic Information, Huzhou College, Huzhou, 313000, China.
| | - Xiuhua Jiang
- State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, 100024, China
| | - Qiangguo Yu
- School of Electronic Information, Huzhou College, Huzhou, 313000, China
| |
Collapse
|
220
|
Incekara AH, Seker DZ. Enhancing Historical Aerial Photographs: A New Approach Based on Non-Reference Metric and Photo Interpretation Elements. SENSORS (BASEL, SWITZERLAND) 2025; 25:2126. [PMID: 40218638 PMCID: PMC11991374 DOI: 10.3390/s25072126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 03/11/2025] [Accepted: 03/25/2025] [Indexed: 04/14/2025]
Abstract
Deep learning-based super-resolution (SR) is an effective state-of-the-art technique for enhancing low-resolution images. This study explains a hierarchical dataset structure within the scope of enhancing grayscale historical aerial photographs with a basic SR model and relates it to non-reference image quality metric. The dataset was structured based on the hierarchy of photo interpretation elements. Images of bare land and forestry areas were evaluated as the primary category containing tone and color elements, images of residential areas as the secondary category containing shape and size elements, and images of farmland areas as the tertiary category containing pattern elements. Instead of training all images in all categories at once, which is the issue that any SR model with low number of parameters has difficulty handling, each category was trained separately. Test images containing the features of each category were enhanced separately, which means three enhanced images for one test image. The obtained images were divided into equal parts of 5 × 5 pixel size, and the final image was created by concatenating those that were determined to be of higher quality based on the Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) metric values. Subsequently, comparative analyses based on visual interpretation and reference-based image quality metrics proved that the approach to the dataset structure positively impacted the results.
Collapse
Affiliation(s)
- Abdullah Harun Incekara
- Department of Geomatics Engineering, Tokat Gaziosmanpasa University, Taslicitlik, Tokat 60150, Türkiye;
| | - Dursun Zafer Seker
- Department of Geomatics Engineering, Istanbul Technical University, Maslak, Istanbul 34469, Türkiye
| |
Collapse
|
221
|
Wu S, Xu Z, Li R, Chen S, Zhang Y, Zhang X, Chen Z, Tai R. Enhanced Imaging in Scanning Transmission X-Ray Microscopy Assisted by Ptychography. NANOMATERIALS (BASEL, SWITZERLAND) 2025; 15:496. [PMID: 40214541 PMCID: PMC11990249 DOI: 10.3390/nano15070496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2025] [Revised: 03/18/2025] [Accepted: 03/24/2025] [Indexed: 04/14/2025]
Abstract
Scanning transmission X-ray microscopy (STXM) is a direct imaging technique with nanoscale resolution. But its resolution is limited by the spot size on the sample, i.e., by the manufacturing technique of the focusing element. As an emerging high-resolution X-ray imaging technique, ptychography utilizes highly redundant data from overlapping scans as well as phase retrieval algorithms to simultaneously reconstruct a high-resolution sample image and a probe function. In this study, we designed an accurate reconstruction strategy to obtain the probe spot with the vibration effects being eliminated, and developed an image enhancement technique for STXM by combining the reconstructed probe with the deconvolution algorithm. This approach significantly improves the resolution of STXM imaging and can break the limitation of the focal spot on STXM resolution when the scanning step size is near or below the spot size, while the data processing time is much shorter than that of ptychography. Both simulations and experiments show that this approach can be applied to STXM data at different energies and different scan steps using the same focal spot retrieved via ptychography.
Collapse
Affiliation(s)
- Shuhan Wu
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zijian Xu
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ruoru Li
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- School of Physical Science and Technology, Shanghai Tech University, Shanghai 201210, China
| | - Sheng Chen
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yingling Zhang
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiangzhi Zhang
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenhua Chen
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Renzhong Tai
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China; (S.W.)
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Physical Science and Technology, Shanghai Tech University, Shanghai 201210, China
| |
Collapse
|
222
|
Tanaka M, Ajiki H, Horiuchi T. Analysis of Physical Features Affecting Glossiness and Roughness Alteration in Image Reproduction and Image Features for Their Recovery. J Imaging 2025; 11:95. [PMID: 40278011 PMCID: PMC12028133 DOI: 10.3390/jimaging11040095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 03/22/2025] [Accepted: 03/24/2025] [Indexed: 04/26/2025] Open
Abstract
Digital imaging can cause the perception of an appearance that is different from the real object. This study first confirmed that the glossiness and roughness of reproduced images are altered by directly comparing real and colorimetrically reproduced images (CRIs). Then, psychophysical experiments comparing real and modulated images were performed, and the physical features that influence the alteration of the real object were analyzed. Furthermore, we analyzed the image features to recover the altered glossiness and roughness by image reproduction. In total, 67 samples belonging to 11 material categories, including metals, resins, etc., were used as stimuli. Analysis of the physical surface roughness of real objects showed that the low skewness and high kurtosis of samples were associated with alterations in glossiness and roughness, respectively. It was shown that these can be recovered by modulating the contrast for glossiness and the angular second moment in the gray level co-occurrence matrix for roughness, reproducing perceptually equivalent images. These results suggest that although the glossiness and roughness of real objects and their CRIs are perceived differently, reproducing perceptually equivalent glossiness and roughness may be facilitated by measuring the physical features of real objects and reflecting them in image features.
Collapse
Affiliation(s)
- Midori Tanaka
- Graduate School of Informatics, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba 263-8522, Japan;
| | - Hideyuki Ajiki
- Graduate School of Science and Engineering, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba 263-8522, Japan;
| | - Takahiko Horiuchi
- Graduate School of Informatics, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba 263-8522, Japan;
| |
Collapse
|
223
|
Eckardt JN, Srivastava I, Wang Z, Winter S, Schmittmann T, Riechert S, Gediga MEH, Sulaiman AS, Schneider MMK, Schulze F, Thiede C, Sockel K, Kroschinsky F, Röllig C, Bornhäuser M, Wendt K, Middeke JM. Synthetic bone marrow images augment real samples in developing acute myeloid leukemia microscopy classification models. NPJ Digit Med 2025; 8:173. [PMID: 40118991 PMCID: PMC11928482 DOI: 10.1038/s41746-025-01563-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 03/11/2025] [Indexed: 03/24/2025] Open
Abstract
High-quality image data is essential for training deep learning (DL) classifiers, yet data sharing is often limited by privacy concerns. We hypothesized that generative adversarial networks (GANs) could synthesize bone marrow smear (BMS) images suitable for classifier training. BMS from 1251 patients with acute myeloid leukemia (AML), 51 patients with acute promyelocytic leukemia (APL), and 236 stem cell donors were digitized, and synthetic images were generated using StyleGAN2-Ada. In a blinded visual Turing test, eight hematologists achieved 63% accuracy in identifying synthetic images, confirming high image quality. DL classifiers trained on real data achieved AUROCs of 0.99 across AML, APL, and donor classifications, with performance remaining above 0.95 even when incrementally substituting real data for synthetic samples. Adding synthetic data to real training data offered performance gains for an exceptionally rare disease (APL). Our study demonstrates the usability of synthetic BMS data for training highly accurate image classifiers in microscopy.
Collapse
Affiliation(s)
- Jan-Niklas Eckardt
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany.
| | - Ishan Srivastava
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | - Zizhe Wang
- Chair of Software Technology, TUD Dresden University of Technology, Dresden, Germany
| | - Susann Winter
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Tim Schmittmann
- Chair of Software Technology, TUD Dresden University of Technology, Dresden, Germany
| | - Sebastian Riechert
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
- Chair of Software Technology, TUD Dresden University of Technology, Dresden, Germany
| | - Miriam Eva Helena Gediga
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Anas Shekh Sulaiman
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Martin M K Schneider
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Freya Schulze
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Christian Thiede
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Katja Sockel
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Frank Kroschinsky
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Christoph Röllig
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Martin Bornhäuser
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- German Cancer Consortium (DKTK), Partner Site Dresden, and German Cancer Research Center (DKFZ), Heidelberg, Germany
- National Center for Tumor Diseases Dresden (NCT/UCC), Dresden, Germany
| | - Karsten Wendt
- Chair of Software Technology, TUD Dresden University of Technology, Dresden, Germany
| | - Jan Moritz Middeke
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| |
Collapse
|
224
|
Zheng J, Wu YC, Cai X, Phan P, Gill M, Er EE, Zhao Z, Wang ZJ, Lee SSY. Correlative multiscale 3D imaging of mouse primary and metastatic tumors by sequential light sheet and confocal fluorescence microscopy. iScience 2025; 28:111934. [PMID: 40124485 PMCID: PMC11928867 DOI: 10.1016/j.isci.2025.111934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 11/13/2024] [Accepted: 01/28/2025] [Indexed: 03/25/2025] Open
Abstract
Three-dimensional (3D) optical microscopy permits in situ interrogation of the tumor microenvironment (TME) in volumetric tumors for research while light sheet and confocal fluorescence microscopy are often used to achieve macroscopic and microscopic 3D images of tissues, respectively. Although each technique offers distinct fields of view (FOVs) and spatial resolution, the combination of the two to obtain correlative multiscale 3D images from the same tumor tissues has not yet been explored. We established a workflow that enables the tracking and 3D imaging of region of interests (ROIs) within tumor tissues through sequential light sheet and confocal fluorescence microscopy. This approach allowed for quantitative 3D spatial analysis of the immune response in the TME at multiple spatial scales and facilitated the direct localization of a metastatic lesion within a mouse brain. Our method offers an approach for correlative multiscale 3D optical microscopy with the potential to provide new insights into comprehensive research in disease mechanism or drug response.
Collapse
Affiliation(s)
- Jingtian Zheng
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Yi-Chien Wu
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Xiaoying Cai
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Philana Phan
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Meghna Gill
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Ekrem Emrah Er
- Department of Physiology and Biophysics, University of Illinois Chicago, Chicago, IL, USA
- University of Illinois Cancer Center, University of Illinois Chicago, Chicago, IL, USA
| | - Zongmin Zhao
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
- University of Illinois Cancer Center, University of Illinois Chicago, Chicago, IL, USA
| | - Zaijie J. Wang
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Steve Seung-Young Lee
- Department of Pharmaceutical Sciences, University of Illinois Chicago, Chicago, IL, USA
- University of Illinois Cancer Center, University of Illinois Chicago, Chicago, IL, USA
| |
Collapse
|
225
|
Chaudhary H, Garg P, Vishwakarma VP. Enhanced medical image watermarking using hybrid DWT-HMD-SVD and Arnold scrambling. Sci Rep 2025; 15:9710. [PMID: 40113945 PMCID: PMC11926163 DOI: 10.1038/s41598-025-94080-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Accepted: 03/11/2025] [Indexed: 03/22/2025] Open
Abstract
The protection of medical images against unauthorized access and tampering is paramount. This paper presents a robust watermarking framework that integrates Discrete Wavelet Transform (DWT), Hessenberg Decomposition (HMD), Singular Value Decomposition (SVD), and Arnold Scrambling to enhance the security of medical images. By applying DWT to decompose the medical image into frequency subbands and embedding the watermark into the most significant subband, the proposed algorithm ensures minimal impact on image quality. HMD simplifies the subband matrix, while SVD extracts and manipulates the essential features of the image. Arnold Scrambling is employed to further secure the watermark image before embedding. Experimental results on various medical image datasets demonstrate the algorithm's effectiveness in maintaining imperceptibility, with a peak signal-to-noise ratio (PSNR) of up to 49 dB, and robustness against common image processing attacks, such as compression and noise addition. The proposed scheme achieves a balance between imperceptibility and robustness, making it suitable for securing medical images in digital environments. The proposed scheme has been implemented on different medical datasets and the performance is evaluated in terms of its imperceptibility and robustness. The PSNR value achieved by the proposed work is 49 dB which proves that the embedded watermark image is imperceptible while the NC value achieved is higher than 0.9 against most of the attacks, hence proves its robustness against multiple attacks.
Collapse
Affiliation(s)
- Himanshi Chaudhary
- University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, Sector 16-C, Dwarka, New Delhi, India.
- Department of Computer Science and Engineering, KIET Group of Institutions, Delhi-NCR, Ghaziabad, India.
| | - Preeti Garg
- Department of Computer Science and Engineering, KIET Group of Institutions, Delhi-NCR, Ghaziabad, India.
| | - Virendra P Vishwakarma
- University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, Sector 16-C, Dwarka, New Delhi, India
| |
Collapse
|
226
|
Zhang Y, Yao Z, Klöfkorn R, Ritschel T, Villanueva-Perez P. 4D-ONIX for reconstructing 3D movies from sparse X-ray projections via deep learning. COMMUNICATIONS ENGINEERING 2025; 4:54. [PMID: 40119014 PMCID: PMC11928503 DOI: 10.1038/s44172-025-00390-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 03/07/2025] [Indexed: 03/24/2025]
Abstract
The X-ray flux from X-ray free-electron lasers and storage rings enables new spatiotemporal opportunities for studying in-situ and operando dynamics, even with single pulses. X-ray multi-projection imaging is a technique that provides volumetric information using single pulses while avoiding the centrifugal forces induced by conventional time-resolved 3D methods like time-resolved tomography, and can acquire 3D movies (4D) at least three orders of magnitude faster than existing techniques. However, reconstructing 4D information from highly sparse projections remains a challenge for current algorithms. Here we present 4D-ONIX, a deep-learning-based approach that reconstructs 3D movies from an extremely limited number of projections. It combines the computational physical model of X-ray interaction with matter and state-of-the-art deep learning methods. We demonstrate its ability to reconstruct high-quality 4D by generalizing over multiple experiments with only two to three projections per timestamp on simulations of water droplet collisions and experimental data of additive manufacturing. Our results demonstrate 4D-ONIX as an enabling tool for 4D analysis, offering high-quality image reconstruction for fast dynamics three orders of magnitude faster than tomography.
Collapse
Affiliation(s)
- Yuhe Zhang
- Synchrotron Radiation Research and NanoLund, Lund University, Lund, Sweden.
| | - Zisheng Yao
- Synchrotron Radiation Research and NanoLund, Lund University, Lund, Sweden
| | - Robert Klöfkorn
- Center for Mathematical Sciences, Lund University, Lund, Sweden
| | | | | |
Collapse
|
227
|
Mahmoudi SS, Alishani MM, Emdadi M, Hosseiniyan Khatibi SM, Khodaei B, Ghaffari A, Oskui SD, Ghaffari S, Pirmoradi S. X-ray Coronary Angiogram images and SYNTAX score to develop Machine-Learning algorithms for CHD Diagnosis. Sci Data 2025; 12:471. [PMID: 40118960 PMCID: PMC11928481 DOI: 10.1038/s41597-025-04727-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 02/28/2025] [Indexed: 03/24/2025] Open
Abstract
Coronary Heart Disease (CHD) is becoming a leading cause of death worldwide. To assess coronary artery narrowing or stenosis, doctors use coronary angiography, which is considered the gold-standard method. Interventional cardiologists rely on angiography to decide on the best course of treatment for CHD, such as revascularization with bypass surgery, coronary stents, or medication. However, angiography has some issues, including operator bias, inter-observer variability, and poor reproducibility. The automated interpretation of coronary angiography is yet to be developed, and these tasks can only be performed by highly specialized physicians. Developing automated angiogram interpretation and coronary artery stenosis estimation using Artificial Intelligence (AI) approaches requires a large dataset of X-ray angiography images that include clinical information. We have collected 231 X-ray images of heart vessels, along with the necessary angiographic variables, including the SYNTAX score, to support the advancement of research on CHD-related machine learning and data mining algorithms. We hope that this dataset will ultimately contribute to advances in clinical diagnosis of CHD.
Collapse
Affiliation(s)
- Seyed Sajjad Mahmoudi
- Department of Cardiology, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Mohammad Matin Alishani
- Department of Computer Science, Faculty of Information Technology, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - Manijeh Emdadi
- Department of Computer Engineering, Abadan Branch, Islamic Azad University, Abadan, Iran
| | | | - Bahareh Khodaei
- Clinical Research Development Unit of Tabriz Valiasr Hospital, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Alireza Ghaffari
- Faculty of Electrical and Computer Eng., University of Tabriz, Tabriz, Iran
| | - Shahram Dabiri Oskui
- Clinical Research Development Unit of Tabriz Valiasr Hospital, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Samad Ghaffari
- Cardiovascular Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Saeed Pirmoradi
- Clinical Research Development Unit of Tabriz Valiasr Hospital, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
228
|
Xu C, Zhang F, Yang Z, Zhou Z, Zheng Y. A few-shot network intrusion detection method based on mutual centralized learning. Sci Rep 2025; 15:9848. [PMID: 40118883 PMCID: PMC11928629 DOI: 10.1038/s41598-025-93185-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 03/05/2025] [Indexed: 03/24/2025] Open
Abstract
Deep learning has recently made significant advancements in intrusion detection. However, existing deep learning algorithms rely heavily on extensive data for training. It presents challenges when dealing with few-shot network traffic, resulting in low detection performance. To address the few-shot detection challenge, we propose a few-shot network intrusion detection method based on mutual centralized learning (FS-MCL). The method utilizes dense features extracted by an encoder and associates each feature with a particle in discrete space. This association allows the particle to randomly traverse the discrete feature space, establishing bidirectional associations between disjoint dense features. By measuring the expected visits of dense features in a Markov process, we can determine the probability of a query feature belonging to a support class. To address the scarcity of available few-shot datasets in intrusion detection, we also provide a visualization method that converts network traffic into image-like data, and we use traffic data from three public datasets to construct few-shot detection datasets to evaluate the proposed method. Experimental results demonstrate that the proposed method achieves excellent binary and multi-classification performance, with an average detection rate of up to 99.84%.
Collapse
Affiliation(s)
- Congyuan Xu
- College of Information Science and Engineering, Jiaxing University, Jiaxing, 314001, China
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
| | - Fan Zhang
- School of Cyber Science and Technology, Zhejiang University, Hangzhou, 310027, China.
| | - Ziqi Yang
- School of Cyber Science and Technology, Zhejiang University, Hangzhou, 310027, China
| | - Zhihao Zhou
- School of Cyber Science and Technology, Zhejiang University, Hangzhou, 310027, China
| | - Yuqi Zheng
- School of Cyber Science and Technology, Zhejiang University, Hangzhou, 310027, China
| |
Collapse
|
229
|
Jóźwik-Wabik P, Popowicz A. Fully convolutional neural networks for processing observational data from small remote solar telescopes. Sci Rep 2025; 15:9630. [PMID: 40113965 PMCID: PMC11926255 DOI: 10.1038/s41598-025-93808-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 03/10/2025] [Indexed: 03/22/2025] Open
Abstract
Heliophysics phenomena on the Sun, such as radio bursts, can strongly affect satellites and ground-based electronic systems. Therefore, an insight into the actual image of the Sun with good spatial and temporal resolution is crucial. In this paper, we explore the possibility of using fully convolutional networks (FCNs) to improve the images acquired from remotely operated small solar telescopes whose resolution is limited by the size of the lens aperture and by atmospheric turbulence. For this purpose, we use chromosphere data from the 50 mm small Hα Telescope of the Silesian University of Technology acquired over many months under various atmospheric conditions. We compare the obtained results with the results of raw data processing by a state-of-the-art deterministic algorithm, multi-frame blind deconvolution (MFBD). In our research, we investigate the impact of the amount of data and the complexity of FCNs on the quality of the results and their processing time. We show that the use of FCNs is a very attractive alternative to MFBD because they are more energy efficient and allow for the obtaining of comparable results in orders of magnitude shorter time.
Collapse
Affiliation(s)
- Piotr Jóźwik-Wabik
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland.
| | - Adam Popowicz
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
| |
Collapse
|
230
|
Zhan Y, Yu Q, Liu J, Wang Z, Yang Z. Hyperspectral remote sensing image destriping via spectral-spatial factorization. Sci Rep 2025; 15:9317. [PMID: 40102298 PMCID: PMC11920236 DOI: 10.1038/s41598-025-94396-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2025] [Accepted: 03/13/2025] [Indexed: 03/20/2025] Open
Abstract
Hyperspectral images (HSIs) are gradually playing an important role in many fields because of their ability to obtain spectral information. However, sensor response differences and other reasons may lead to the generation of stripe noise in HSIs, which will greatly degrade the image quality. To solve the problem of HSIs destriping, a new iterative method via spectral-spatial factorization is proposed. We first rearrange the HSI data to get a new two-dimensional matrix. Then the original noise-free HSI is decomposed into a spectral information matrix and a spatial information matrix. The sparsity of stripe noise, the group sparsity of spatial information matrix, the smoothness of spectral information matrix can be used to achieve sufficient removal of stripe noise while effectively retaining spectral information and spatial details of the original HSI. Numerical tests on simulated datasets show that our method achieves an average PSNR growth above 4dB and a better SSIM result. The proposed method also obtains good results when processing real datasets polluted by Gaussian noise and stripe noise.
Collapse
Affiliation(s)
- Yapeng Zhan
- College of Science, National University of Defense Technology, Changsha, 410073, China
| | - Qi Yu
- College of Science, National University of Defense Technology, Changsha, 410073, China
| | - Jiying Liu
- College of Science, National University of Defense Technology, Changsha, 410073, China.
| | - Zhengming Wang
- College of Science, National University of Defense Technology, Changsha, 410073, China
| | - Zexi Yang
- College of Science, National University of Defense Technology, Changsha, 410073, China
| |
Collapse
|
231
|
Damen SLC, van Lier ALHMW, Zachiu C, Raaymakers BW. Bowel tracking for MR-guided radiotherapy: simultaneous optimization of small bowel imaging and tracking. Phys Med Biol 2025; 70:075001. [PMID: 40020314 DOI: 10.1088/1361-6560/adbbac] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 02/28/2025] [Indexed: 03/03/2025]
Abstract
Objective. The small bowel is one of the most radiosensitive organs-at-risk during radiotherapy in the pelvis. This is further complicated due to anatomical and physiological motion. Thus, its accurate tracking becomes of particular importance during therapy delivery, to obtain better dose-toxicity relations and/or to perform safe adaptive treatments. The aim of this work is to simultaneously optimize the MR imaging sequence and motion estimation solution towards improved small bowel tracking precision during radiotherapy delivery.Approach. An MRI sequence was optimized, to adhere to the respiratory and peristaltic motion frequencies, by assesing the performance of an image registration algorithm on data acquired on volunteers and patients. In terms of tracking, three registration algorithms, previously-employed in the scope of image-guided radiotherapy, were investigated and optimized. The optimized scan was acquired for 7.5 min, in 18 patients and for 15 min, in 10 volunteers at a 1.5 T MRL (Unity, Elekta AB). The tracking precision was evaluated and validated by means of three different quality assurance criteria: Structural Similarity Index Measure (SSIM), Inverse Consistency (IC) and Absolute Intensity Difference.Main results. The optimal sequence was a balanced Fast Field Echo, which acquired a 3D volume of the abdomen, with a dynamic scan time of 1.8 s. An optical flow algorithm performed best and which was able to resolve most of the motion. This was shown by mean IC values of<1 mm and a mean SSIM>0.9for the majority of the cases. A strong positive correlation (p <0.001) between the registration performance and visceral fat percentage was found, where a higher visceral fat percentage gave a better registration due to the better image contrast.Significance. A method for simultaneous optimization of imaging and tracking was presented, which derived an imaging and registration procedure for accurate small bowel tracking on the MR-Linac.
Collapse
Affiliation(s)
- S L C Damen
- Department of Radiotherapy, UMC Utrecht, Utrecht, The Netherlands
| | | | - C Zachiu
- Department of Radiotherapy, UMC Utrecht, Utrecht, The Netherlands
| | - B W Raaymakers
- Department of Radiotherapy, UMC Utrecht, Utrecht, The Netherlands
| |
Collapse
|
232
|
Liu T, Lin Y, Luo X, Sun Y, Zhao H. VISTA Uncovers Missing Gene Expression and Spatial-induced Information for Spatial Transcriptomic Data Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.26.609718. [PMID: 40166134 PMCID: PMC11957009 DOI: 10.1101/2024.08.26.609718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Characterizing cell activities within a spatially resolved context is essential to enhance our understanding of spatially-induced cellular states and features. While single-cell RNA-seq (scRNA-seq) offers comprehensive profiling of cells within a tissue, it fails to capture spatial context. Conversely, subcellular spatial transcriptomics (SST) technologies provide high-resolution spatial profiles of gene expression, yet their utility is constrained by the limited number of genes they can simultaneously profile. To address this limitation, we introduce VISTA, a novel approach designed to predict the expression levels of unobserved genes specifically tailored for SST data. VISTA jointly models scRNA-seq data and SST data based on variational inference and geometric deep learning, and incorporates uncertainty quantification. Using four SST datasets, we demonstrate VISTA's superior performance in imputation and in analyzing large-scale SST datasets with satisfactory time efficiency and memory consumption. The imputation of VISTA enables a multitude of downstream applications, including the detection of new spatially variable genes, the discovery of novel ligand-receptor interactions, the inference of spatial RNA velocity, the generation for spatial transcriptomics with in-silico perturbation, and an improved decomposition of spatial and intrinsic variations.
Collapse
Affiliation(s)
- Tianyu Liu
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, 06511, CT, USA
| | - Yingxin Lin
- Department of Biostatistics, Yale University, New Haven, 06511, CT, USA
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, Los Angeles, 90095, CA, USA
| | - Yizhou Sun
- Department of Computer Science, University of California, Los Angeles, Los Angeles, 90095, CA, USA
| | - Hongyu Zhao
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, 06511, CT, USA
- Department of Biostatistics, Yale University, New Haven, 06511, CT, USA
| |
Collapse
|
233
|
Billah M, Bermann M, Hollifield MK, Tsuruta S, Chen CY, Psota E, Holl J, Misztal I, Lourenco D. Review: Genomic selection in the era of phenotyping based on digital images. Animal 2025:101486. [PMID: 40222869 DOI: 10.1016/j.animal.2025.101486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 01/31/2025] [Accepted: 02/03/2025] [Indexed: 04/15/2025] Open
Abstract
Promoting sustainable breeding programs requires several measures, including genomic selection and continuous data recording. Digital phenotyping uses images, videos, and sensor data to continuously monitor animal activity and behaviors, such as feeding, walking, and distress, while also measuring production traits like average daily gain, loin depth, and backfat thickness. Coupled with machine learning techniques, any feature of interest can be extracted and used as phenotypes in genomic prediction models. It can also help define novel phenotypes that are hard or expensive for humans to measure. For the already recorded traits, it may add extra precision or lower phenotyping costs. One example is lameness in pigs, where digital phenotyping has allowed moving from a categorical scoring system to a continuous phenotypic scale, resulting in increased heritability and greater selection potential. Additionally, digital phenotyping offers an effective approach for generating large datasets on difficult-to-measure behavioral traits at any given time, enabling the quantification and understanding of their relationships with production traits, which may be recorded at a less frequent basis. One example is the strong, negative genetic correlation between distance traveled and average daily gain in pigs. Conversely, despite improvements in computer vision, phenotype accuracy may not be maximized for some production or carcass traits. In this review, we discuss various image processing techniques to prepare the data for the genomic evaluation models, followed by a brief description of object detection and segmentation methodology, including model selection and objective-specific modifications to the state-of-the-art models. Then, we present real-life applications of digital phenotyping for various species, and finally, we provide further challenges. Overall, digital phenotyping is a promising tool to increase the rates of genetic gain, promote sustainable genomic selection, and lower phenotyping costs. We foresee a massive inclusion of digital phenotypes into breeding programs, making it the primary phenotyping tool.
Collapse
Affiliation(s)
- M Billah
- University of Georgia, Department of Animal and Dairy Science, Athens, GA 30602, USA
| | - M Bermann
- University of Georgia, Department of Animal and Dairy Science, Athens, GA 30602, USA
| | - M K Hollifield
- University of Georgia, Department of Animal and Dairy Science, Athens, GA 30602, USA
| | - S Tsuruta
- University of Georgia, Department of Animal and Dairy Science, Athens, GA 30602, USA
| | - C Y Chen
- Pig Improvement Company, Hendersonville, TN 37075, USA
| | - E Psota
- Pig Improvement Company, Hendersonville, TN 37075, USA
| | - J Holl
- Pig Improvement Company, Hendersonville, TN 37075, USA
| | - I Misztal
- University of Georgia, Department of Animal and Dairy Science, Athens, GA 30602, USA
| | - D Lourenco
- University of Georgia, Department of Animal and Dairy Science, Athens, GA 30602, USA.
| |
Collapse
|
234
|
Kumar A, Bhattacharjee S, Kumar A, Jayakody DNK. Facial identity recognition using StyleGAN3 inversion and improved tiny YOLOv7 model. Sci Rep 2025; 15:9102. [PMID: 40097614 PMCID: PMC11914265 DOI: 10.1038/s41598-025-93096-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 03/04/2025] [Indexed: 03/19/2025] Open
Abstract
Facial identity recognition is one of the challenging problems in the domain of computer vision. Facial identity comprises the facial attributes of a person's face ranging from age progression, gender, hairstyle, etc. Manipulating facial attributes such as changing the gender, hairstyle, expressions, and makeup changes the entire facial identity of a person which is often used by law offenders to commit crimes. Leveraging the deep learning-based approaches, this work proposes a one-step solution for facial attribute manipulation and detection leading to facial identity recognition in few-shot and traditional scenarios. As a first step towards performing facial identity recognition, we created the Facial Attribute Manipulation Detection (FAM) Dataset which consists of twenty unique identities with thirty-eight facial attributes generated by the StyleGAN3 inversion. The Facial Attribute Detection (FAM) Dataset has 11,560 images richly annotated in YOLO format. To perform facial attribute and identity detection, we developed the Spatial Transformer Block (STB) and Squeeze-Excite Spatial Pyramid Pooling (SE-SPP)-based Tiny YOLOv7 model and proposed as FIR-Tiny YOLOv7 (Facial Identity Recognition-Tiny YOLOv7) model. The proposed model is an improvised variant of the Tiny YOLOv7 model. For facial identity recognition, the proposed model achieved 10.0% higher mAP in the one-shot scenario, 30.4% higher mAP in the three-shot scenario, 15.3% higher mAP in the five-shot scenario, and 0.1% higher mAP in the traditional 70% - 30% split scenario as compared to the Tiny YOLOv7 model. The results obtained with the proposed model are promising for general facial identity recognition under varying facial attribute manipulation.
Collapse
Affiliation(s)
- Akhil Kumar
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | | | - Ambrish Kumar
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | - Dushantha Nalin K Jayakody
- COPELABS, Lusófona University, Lisboa, Portugal.
- Center of Technology and Systems (UNINOVA-CTS) and Associated Lab of Intelligent Systems (LASI), 2829-516, Caparica, Portugal.
- CIET/DEEE, Faculty of Engineering, Sri Lanka Institute of Information Technology, 10115, Malabe, Sri Lanka.
| |
Collapse
|
235
|
Jiang L, Zhou J, Wang M, Wang J, Wu Y, Wang J. 360° display of 3D objects by cylindrical holography. OPTICS LETTERS 2025; 50:1877-1880. [PMID: 40085582 DOI: 10.1364/ol.553152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Accepted: 02/06/2025] [Indexed: 03/16/2025]
Abstract
Cylindrical computer-generated holography (CCGH) has garnered significant attention due to its ability to offer a 360° viewing zone. However, generating a CCGH capable of reconstructing multiple realistic three-dimensional (3D) objects is still a challenge under the constraints of limited computational resources. In this Letter, we introduce a novel, to the best of our knowledge, CCGH generation algorithm that can reconstruct 3D objects within a 360° viewing zone under limited computational resources. The algorithm generates CCGH in segments, which is achieved by calculating the light field distribution of the 3D object on the tangent plane. Numerical simulations and optical experiments validate the effectiveness of our proposed algorithm, demonstrating its superiority in reconstructing realistic 3D objects within a complete 360° viewing zone. This work represents a significant advancement in the practical application of CCGH.
Collapse
|
236
|
In H, Kweon J, Moon C. Squeeze-EnGAN: Memory Efficient and Unsupervised Low-Light Image Enhancement for Intelligent Vehicles. SENSORS (BASEL, SWITZERLAND) 2025; 25:1825. [PMID: 40292935 PMCID: PMC11945755 DOI: 10.3390/s25061825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 02/28/2025] [Accepted: 03/13/2025] [Indexed: 04/30/2025]
Abstract
Intelligent vehicles, such as autonomous cars, drones, and robots, rely on sensors to gather environmental information and respond accordingly. RGB cameras are commonly used due to their low cost and high resolution but are limited in low-light conditions. While employing LiDAR or specialized cameras can address this issue, these solutions often incur high costs. Deep learning-based low-light image enhancement (LLIE) methods offer an alternative, but existing models struggle to adapt to road scenes. Furthermore, most LLIE models rely on supervised training but are heavily constrained by the lack of low-light and normal-light paired datasets. In particular, obtaining paired datasets for driving scenes is extremely challenging. To address these issues, this paper proposes Squeeze-EnGAN, a memory-efficient, GAN-based LLIE method capable of unsupervised learning without paired image datasets. Squeeze-EnGAN incorporates a fire module into a U-net architecture, substantially reducing the number of parameters and Multiply-Accumulate Operations (MACs) compared to its base model, EnlightenGAN. Additionally, Squeeze-EnGAN achieves real-time performance on devices like Jetson Xavier (0.061 s). Significantly, enhanced images improve object detection performance over original images, demonstrating the model's potential to aid high-level vision tasks in intelligent vehicles.
Collapse
Affiliation(s)
- Haegyo In
- Department of Smart Vehicle Engineering, Konkuk University, Seoul 05029, Republic of Korea;
| | - Juhum Kweon
- Graduate School of Future Defense Technology Convergence, Konkuk University, Seoul 05029, Republic of Korea;
| | - Changjoo Moon
- Department of Smart Vehicle Engineering, Konkuk University, Seoul 05029, Republic of Korea;
| |
Collapse
|
237
|
Shoura M, Walther DB, Nestor A. Unraveling other-race face perception with GAN-based image reconstruction. Behav Res Methods 2025; 57:115. [PMID: 40087201 DOI: 10.3758/s13428-025-02636-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2025] [Indexed: 03/17/2025]
Abstract
The other-race effect (ORE) is the disadvantage of recognizing faces of another race than one's own. While its prevalence is behaviorally well documented, the representational basis of ORE remains unclear. This study employs StyleGAN2, a deep learning technique for generating photorealistic images to uncover face representations and to investigate ORE's representational basis. To this end, we collected pairwise visual similarity ratings with same- and other-race faces across East Asian and White participants exhibiting robust levels of ORE. Leveraging the significant overlap in representational similarity between the GAN's latent space and perceptual representations in human participants, we designed an image reconstruction approach aiming to reveal internal face representations from behavioral similarity data. This methodology yielded hyper-realistic depictions of face percepts, with reconstruction accuracy well above chance, as well as an accuracy advantage for same-race over other-race reconstructions, which mirrored ORE in both populations. Further, a comparison of reconstructions across participant race revealed a novel age bias, with other-race face reconstructions appearing younger than their same-race counterpart. Thus, our work proposes a new approach to exploiting the utility of GANs in image reconstruction and provides new avenues in the study of ORE.
Collapse
Affiliation(s)
- Moaz Shoura
- Department of Psychology at Scarborough, University of Toronto, 1265 Military Trail, Scarborough, ON, M1C 1A4, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Toronto, Canada
| | - Adrian Nestor
- Department of Psychology at Scarborough, University of Toronto, 1265 Military Trail, Scarborough, ON, M1C 1A4, Canada
| |
Collapse
|
238
|
Ibrahim S, Selim S, Elattar M. Facilitating Radiograph Interpretation: Refined Generative Models for Precise Bone Suppression in Chest X-rays. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01461-2. [PMID: 40082331 DOI: 10.1007/s10278-025-01461-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/17/2025] [Accepted: 02/18/2025] [Indexed: 03/16/2025]
Abstract
Chest X-ray (CXR) is crucial for diagnosing lung diseases, especially lung nodules. Recent studies indicate that bones, such as ribs and clavicles, obscure 82 to 95% of undiagnosed lung cancers. The development of computer-aided detection (CAD) systems with automated bone suppression is vital to improve detection rates and support early clinical decision-making. Current bone suppression methods face challenges: they often depend on manual subtraction of bone-only images from CXRs, leading to inefficiency and poor generalization; there is significant information loss in data compression within deep convolutional end-to-end architectures; and a balance between model efficiency and accuracy has not been sufficiently achieved in existing research. We introduce a novel end-to-end architecture, the mask-guided model, to address these challenges. Leveraging the Pix2Pix framework, our model enhances computational efficiency by reducing parameter count by 92.5%. It features a rib mask-guided module with a mask encoder and cross-attention mechanism, which provides spatial constraints, reduces information loss during encoder compression, and preserves non-relevant areas. An ablation study evaluates the impact of various factors. The model undergoes initial training on digitally reconstructed radiographs (DRRs) derived from CT projections for bone suppression and is fine-tuned on the JSRT dataset to accelerate convergence. The mask-guided model surpasses previous state-of-the-art methods, showing superior bone suppression performance in terms of structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), and processing speed. It achieves an SSIM of 0.99 ± 0.002 and a PSNR of 36.14 ± 1.13 on the JSRT dataset. This study underscores the proposed model's effectiveness compared to existing methods, showcasing its capability to reduce model size and increase accuracy. This makes it well-suited for deployment in affordable, low-power hardware devices across various clinical settings.
Collapse
Affiliation(s)
- Samar Ibrahim
- Medical Imaging and Image Processing Research Group, Center for Informatics Science (CIS), Nile University, 26th of July Corridor, Sheikh Zayed City, Giza, 12588, Egypt
| | - Sahar Selim
- Medical Imaging and Image Processing Research Group, Center for Informatics Science (CIS), Nile University, 26th of July Corridor, Sheikh Zayed City, Giza, 12588, Egypt.
- School of Information Technology and Computer Science, Nile University, 26th of July Corridor, Sheikh Zayed City, Giza, 12588, Egypt.
| | - Mustafa Elattar
- Medical Imaging and Image Processing Research Group, Center for Informatics Science (CIS), Nile University, 26th of July Corridor, Sheikh Zayed City, Giza, 12588, Egypt
- School of Information Technology and Computer Science, Nile University, 26th of July Corridor, Sheikh Zayed City, Giza, 12588, Egypt
| |
Collapse
|
239
|
Wang J, Hao Y, Bai H, Yan L. Parallel attention recursive generalization transformer for image super-resolution. Sci Rep 2025; 15:8669. [PMID: 40082517 PMCID: PMC11906870 DOI: 10.1038/s41598-025-92377-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2024] [Accepted: 02/27/2025] [Indexed: 03/16/2025] Open
Abstract
Transformer architectures have demonstrated remarkable performance in image super-resolution (SR). However, existing Transformer-based models generally suffer from insufficient local feature modeling, weak feature representation capabilities, and unreasonable loss function design, especially when reconstructing high-resolution (HR) images, where the restoration of fine details is poor. To address these issues, we propose a novel SR model, Parallel Attention Recursive Generalization Transformer (PARGT) in this study, which can effectively capture the fine-grained interactions between local features of the image and other regions, resulting in clearer and more coherent generated details. Specifically, we introduce the Parallel Local Self-attention (PL-SA) module, which enhances local features by parallelizing the Shift Window Pixel Attention Module (SWPAM) and Channel-Spatial Shuffle Attention Module (CSSAM). In addition, we introduce a new type of feed-forward network called Spatial Fusion Convolution Feed-forward Network (SFCFFN) for multi-scale information fusion. Finally, we optimize the reconstruction of high-frequency details by incorporating a Stationary Wavelet Transform (SWT). Experimental results on several challenging benchmark datasets demonstrate the superiority of our PARGT over state-of-the-art image SR models, showcasing the effectiveness of combining a parallel attention mechanism with a multi-scale feed-forward network for SR tasks. The code will be available at https://github.com/hgzbn/PARGT .
Collapse
Affiliation(s)
- Jing Wang
- School of Computer Science, Hubei University of Technology, Wuhan, 430068, China
- Key Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan, China
| | - Yuanyuan Hao
- School of Computer Science, Hubei University of Technology, Wuhan, 430068, China
- Key Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan, China
| | - Hongxing Bai
- Hubei Galaxis Tongda Technology Co., Ltd., Wuhan, China
| | - Lingyu Yan
- School of Computer Science, Hubei University of Technology, Wuhan, 430068, China.
- Key Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan, China.
| |
Collapse
|
240
|
Jin Y, Sun Y, Liang J, Yan X, Hou Z, Zheng S, Chen Y, Chen X. A hybrid framework for curve estimation based low light image enhancement. Sci Rep 2025; 15:8611. [PMID: 40074857 PMCID: PMC11903685 DOI: 10.1038/s41598-025-92161-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2024] [Accepted: 02/25/2025] [Indexed: 03/14/2025] Open
Abstract
Images captured in low-light conditions often suffer from poor visibility and noise corruption. Low-light image enhancement (LLIE) aims to restore the brightness of under-exposed images. However, most previous LLIE solutions enhance low-light images via global mapping without considering various degradations of dark regions. Besides, these methods rely on convolutional neural networks for training, which have limitations in capturing long-range dependencies. To this end, we construct a hybrid framework dubbed hybLLIE that combines transformer and convolutional designs for LLIE task. Firstly, we propose a light-aware transformer (LAFormer) block that utilizes brightness representations to direct the modeling of valuable information in low-light regions. It is achieved by utilizing a learnable feature reassignment modulator to encourage inter-channel feature competition. Secondly, we introduce a SeqNeXt block to capture the local context, which is a ConvNet-based model to process sequences of image patches. Thirdly, we devise an efficient self-supervised mechanism to eliminate inappropriate features from the given under-exposed samples and employ high-order curves to brighten the low-light images. Extensive experiments demonstrate that our HybLLIE achieves comparable performance to 17 state-of-the-art methods on 7 representative datasets.
Collapse
Affiliation(s)
- Yutao Jin
- Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Yue Sun
- Witeyesz Co., Ltd., Shenzhen, 518131, Guangdong, China
| | - Jiabao Liang
- Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Xiaoning Yan
- Witeyesz Co., Ltd., Shenzhen, 518131, Guangdong, China
| | - Zeyao Hou
- Tianjin University of Science and Technology, Tianjin, 300222, China
| | | | - Yang Chen
- Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Xiaoyan Chen
- Tianjin University of Science and Technology, Tianjin, 300222, China.
| |
Collapse
|
241
|
Fan S, Li M, Huang C, Deng X, Li H. Metal artifacts correction based on a physics-informed nonlinear sinogram completion model. Phys Med Biol 2025; 70:065010. [PMID: 40010007 DOI: 10.1088/1361-6560/adbaad] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 02/26/2025] [Indexed: 02/28/2025]
Abstract
Objective.Metal artifacts seriously deteriorate CT image quality. Current metal artifacts reduction (MAR) methods suffer from insufficient correction or easily introduce secondary artifacts. To better suppress metal artifacts, we propose a sinogram completion approach extracting and utilizing useful information that contained in the corrupted metal trace projections.Approach.Our method mainly contains two stages: sinogram interpolation by an improved normalization technique for initial correction and physics-informed nonlinear sinogram decomposition for further improvement. In the first stage, different from the popular normalized metal artifact reduction method, we propose a more meaningful normalization scheme for the interpolation procedure. In the second stage, instead of performing a linear sinogram decomposition as done in the physics-informed sinogram completion method, we introduce a nonlinear decomposition model that can accurately separate the sinogram into metal and non-metal contributions by better modeling the physical scanning process. The interpolated sinogram and physics-informed correction compensate each other to reach the optimal correction results.Main results.Experimental results on simulated and real data indicate that, in terms of both structures preservation and detail recovery, the proposed physics-informed nonlinear sinogram completion method achieves very competitive performance for MAR compared to existing methods.Significance.According to our knowledge, it is for the first time that a nonlinear sinogram decomposition model is proposed in the literature for metal artifacts correction. It might motivate further research exploring this idea for various sinogram processing tasks.
Collapse
Affiliation(s)
- Shuqiong Fan
- School of Mathematical Sciences, Capital Normal University, Beijing 100048, People's Republic of China
- Beijing Higher Institution Engineering Research Center of Testing and Imaging, Beijing 100048, People's Republic of China
| | - Mengfei Li
- Division of Ionization Radiation, National Institute of Metrology, Beijing 100029, People's Republic of China
| | - Chuwen Huang
- School of Mathematical Sciences, Capital Normal University, Beijing 100048, People's Republic of China
- Beijing Higher Institution Engineering Research Center of Testing and Imaging, Beijing 100048, People's Republic of China
| | - Xiaojuan Deng
- School of Computer Science, Beijing Information Science and Technology University, Beijing 102206, People's Republic of China
| | - Hongwei Li
- School of Mathematical Sciences, Capital Normal University, Beijing 100048, People's Republic of China
- Beijing Higher Institution Engineering Research Center of Testing and Imaging, Beijing 100048, People's Republic of China
| |
Collapse
|
242
|
Fan H, Jin C, Li M. AGASI: A Generative Adversarial Network-Based Approach to Strengthening Adversarial Image Steganography. ENTROPY (BASEL, SWITZERLAND) 2025; 27:282. [PMID: 40149206 PMCID: PMC11940965 DOI: 10.3390/e27030282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2025] [Revised: 02/23/2025] [Accepted: 02/24/2025] [Indexed: 03/29/2025]
Abstract
Steganography has been widely used in the field of image privacy protection. However, with the advancement of steganalysis techniques, deep learning-based models are now capable of accurately detecting modifications in stego-images, posing a significant threat to traditional steganography. To address this, we propose AGASI, a GAN-based approach for strengthening adversarial image steganography. This method employs an encoder as the generator in conjunction with a discriminator to form a generative adversarial network (GAN), thereby enhancing the robustness of stego-images against steganalysis tools. Additionally, the GAN framework reduces the gap between the original secret image and the extracted image, while the decoder effectively extracts the secret image from the stego-image, achieving the goal of image privacy protection. Experimental results demonstrate that the AGASI method not only ensures high-quality secret images but also effectively reduces the accuracy of neural network classifiers, inducing misclassifications and significantly increasing the embedding capacity of the steganography system. For instance, under PGD attack, the adversarial stego-images generated by the GAN, at higher disturbance levels, successfully maintain the quality of the secret image while achieving an 84.73% misclassification rate in neural network detection. Compared to images with the same visual quality, our method increased the misclassification rate by 23.31%.
Collapse
Affiliation(s)
- Haiju Fan
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; (H.F.); (M.L.)
- Henan Provincial Key Laboratory of Educational Artificial Intelligence and Personalized Learning, Xinxiang 453007, China
| | - Changyuan Jin
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; (H.F.); (M.L.)
- Henan Provincial Key Laboratory of Educational Artificial Intelligence and Personalized Learning, Xinxiang 453007, China
| | - Ming Li
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; (H.F.); (M.L.)
- Henan Provincial Key Laboratory of Educational Artificial Intelligence and Personalized Learning, Xinxiang 453007, China
| |
Collapse
|
243
|
Cao B, Qi G, Zhao J, Zhu P, Hu Q, Gao X. RTF: Recursive TransFusion for Multi-Modal Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1573-1587. [PMID: 40031796 DOI: 10.1109/tip.2025.3541877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Multi-modal image synthesis is crucial for obtaining complete modalities due to the imaging restrictions in reality. Current methods, primarily CNN-based models, find it challenging to extract global representations because of local inductive bias, leading to synthetic structure deformation or color distortion. Despite the significant global representation ability of transformer in capturing long-range dependencies, its huge parameter size requires considerable training data. Multi-modal synthesis solely based on one of the two structures makes it hard to extract comprehensive information from each modality with limited data. To tackle this dilemma, we propose a simple yet effective Recursive TransFusion (RTF) framework for multi-modal image synthesis. Specifically, we develop a TransFusion unit to integrate local knowledge extracted from the individual modality by connecting a CNN-based local representation block (LRB) and a transformer-based global fusion block (GFB) via a feature translating gate (FTG). Considering the numerous parameters introduced by the transformer, we further unfold a TransFusion unit with recursive constraint repeatedly, forming recursive TransFusion (RTF), which progressively extracts multi-modal information at different depths. Our RTF remarkably reduces network parameters while maintaining superior performance. Extensive experiments validate our superiority against the competing methods on multiple benchmarks. The source code will be available at https://github.com/guoliangq/RTF.
Collapse
|
244
|
Shoura M, Liang YZ, Sama MA, De A, Nestor A. Revealing the neural representations underlying other-race face perception. Front Hum Neurosci 2025; 19:1543840. [PMID: 40110535 PMCID: PMC11920127 DOI: 10.3389/fnhum.2025.1543840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 02/17/2025] [Indexed: 03/22/2025] Open
Abstract
The other-race effect (ORE) refers to poorer recognition for faces of other races than one's own. This study investigates the neural and representational basis of ORE in East Asian and White participants using behavioral measures, neural decoding, and image reconstruction based on electroencephalography (EEG) data. Our investigation identifies a reliable neural counterpart of ORE, with reduced decoding accuracy for other-race faces, and it relates this result to higher density of other-race face representations in face space. Then, we characterize the temporal dynamics and the prominence of ORE for individual variability at the neural level. Importantly, we use a data-driven image reconstruction approach to reveal visual biases underlying other-race face perception, including a tendency to perceive other-race faces as more typical, younger, and more expressive. These findings provide neural evidence for a classical account of ORE invoking face space compression for other-race faces. Further, they indicate that ORE involves not only reduced identity information but also broader, systematic distortions in visual representation with considerable cognitive and social implications.
Collapse
Affiliation(s)
- Moaz Shoura
- Department of Psychology at Scarborough, University of Toronto, Toronto, ON, Canada
| | - Yong Z Liang
- Department of Psychology at Scarborough, University of Toronto, Toronto, ON, Canada
| | - Marco A Sama
- Department of Psychology at Scarborough, University of Toronto, Toronto, ON, Canada
| | - Arijit De
- Department of Psychology at Scarborough, University of Toronto, Toronto, ON, Canada
| | - Adrian Nestor
- Department of Psychology at Scarborough, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
245
|
Zhou Y, Liu Y, Shao Y, Chen J. Fine-tuning diffusion model to generate new kite designs for the revitalization and innovation of intangible cultural heritage. Sci Rep 2025; 15:7519. [PMID: 40032964 DOI: 10.1038/s41598-025-92225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 02/26/2025] [Indexed: 03/05/2025] Open
Abstract
Traditional kite creation often relies on the hand-painting of experienced artisans, which limits the revitalization and innovation of this intangible cultural heritage. This study proposes using an AI-based diffusion model to learn kite design and generate new kite patterns, thereby promoting the revitalization and innovation of kite-making craftsmanship. Specifically, to address the lack of training data, this study collected ancient kite drawings and physical kites to create a Traditional Kite Style Patterns Dataset. The study then introduces a novel loss function that incorporates auspicious themes in style and motif composition, and fine-tunes the diffusion model using the newly created dataset. The trained model can produce batches of kite designs based on input text descriptions, incorporating specified auspicious themes, style patterns, and varied motif compositions, all of which are easily modifiable. Experiments demonstrate that the proposed AI-generated kite design can replace traditional hand-painted creation. This approach highlights a new application of AI technology in kite creation. Additionally, this new method can be applied to other areas of cultural heritage preservation. Offering a new technical pathway for the revitalization and innovation of intangible cultural heritage. It also opens new directions for future research in the integration of AI and cultural heritage.
Collapse
Affiliation(s)
- Yaqin Zhou
- Technical College for the Deaf, Tianjin University of Technology, Tianjin, 300384, China
| | - Yu Liu
- School of Art and Design, Tianjin University of Technology, Tianjin, 300384, China
| | - Yuxin Shao
- School of Art and Design, Tianjin University of Technology, Tianjin, 300384, China
| | - Junming Chen
- School of Art and Design, Guangzhou University, Guangzhou, 510006, China.
- Faculty of Humanities and Arts, Macau University of Science and Technology, Taipa, 999078, China.
| |
Collapse
|
246
|
Ra H, Jee D, Han S, Lee SH, Kwon JW, Jung Y, Baek J. Prediction of short-term anatomic prognosis for central serous chorioretinopathy using a generative adversarial network. Graefes Arch Clin Exp Ophthalmol 2025:10.1007/s00417-025-06786-w. [PMID: 40032768 DOI: 10.1007/s00417-025-06786-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/31/2025] [Accepted: 02/19/2025] [Indexed: 03/05/2025] Open
Abstract
PURPOSE To train generative adversarial network (GAN) models to generate predictive optical coherence tomography (OCT) images of central serous chorioretinopathy (CSC) at 3 months after observation using multi-modal OCT images. METHODS Four hundred forty CSC eyes of 440 patients who underwent Cirrus OCT imaging were included. Baseline OCT B-scan images through the foveal center, en face choroid, and en face ellipsoid zone were collected from each patient. The datasets were divided into training and validation (n = 390) and test (n = 50) sets. The input images for each model comprised either baseline B-scan alone or a combination of en face choroid and ellipsoid zones. Predictive post-treatment OCT B-scan images were generated using GAN models and compared with real 3-month images. RESULTS Of 50 generated OCT images, there were 48, 47, and 48 acceptable images for UNIT, CycleGAN, and RegGAN, respectively. In comparison with real 3-month images, the generated images showed sensitivity, specificity, and positive predictive values (PPV) for residual fluid in the ranges of 0.762-1.000, 0.483-0.724, and 0.583-0.704; for pigment epithelial detachment (PED) of 0.917-1.000, 0.974-1.000, and 0.917-1.000; and for subretinal hyperreflective material (SHRM) of 0.667-0.778, 0.925-0.950 and 0.700-0.750, respectively. RegGAN exhibited the highest values except for sensitivity. CONCLUSIONS GAN models could generate prognostic OCT images with good performance for prediction of residual fluid, PED, and SHRM presence in CSC. Implementation of the models may help predict disease activity in CSC, facilitating the establishment of a proper treatment plan.
Collapse
Affiliation(s)
- Ho Ra
- Department of Ophthalmology, Bucheon St. Mary'S Hospital, College of Medicine, The Catholic University of Korea, Bucheon, Gyeonggi-Do, Republic of Korea
- Department of Ophthalmology, The Catholic University of Korea, Seoul, Republic of Korea
| | - Donghyun Jee
- Department of Ophthalmology, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Ophthalmology, St. Vincent Hospital, College of Medicine, The Catholic University of Korea, Suwon, Gyeonggi-Do, Republic of Korea
| | - Suyeon Han
- Department of Ophthalmology, Bucheon St. Mary'S Hospital, College of Medicine, The Catholic University of Korea, Bucheon, Gyeonggi-Do, Republic of Korea
| | - Seung-Hoon Lee
- Department of Ophthalmology, Bucheon St. Mary'S Hospital, College of Medicine, The Catholic University of Korea, Bucheon, Gyeonggi-Do, Republic of Korea
| | - Jin-Woo Kwon
- Department of Ophthalmology, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Ophthalmology, St. Vincent Hospital, College of Medicine, The Catholic University of Korea, Suwon, Gyeonggi-Do, Republic of Korea
| | - Yunhea Jung
- Department of Ophthalmology, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Ophthalmology, Yeoui-Do St. Mary'S Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Jiwon Baek
- Department of Ophthalmology, Bucheon St. Mary'S Hospital, College of Medicine, The Catholic University of Korea, Bucheon, Gyeonggi-Do, Republic of Korea.
- Department of Ophthalmology, The Catholic University of Korea, Seoul, Republic of Korea.
| |
Collapse
|
247
|
Guo Z, Zhao Z. Hybrid attention structure preserving network for reconstruction of under-sampled OCT images. Sci Rep 2025; 15:7405. [PMID: 40032840 DOI: 10.1038/s41598-024-82812-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 12/09/2024] [Indexed: 03/05/2025] Open
Abstract
Optical coherence tomography (OCT) is a non-invasive, high-resolution imaging technology that provides cross-sectional images of tissues. Dense acquisition of A-scans along the fast axis is required to obtain high digital resolution images. However, the dense acquisition will increase the acquisition time, causing the discomfort of patients. In addition, the longer acquisition time may lead to motion artifacts, thereby reducing imaging quality. In this work, we proposed a hybrid attention structure preserving network (HASPN) to achieve super-resolution of under-sampled OCT images to speed up the acquisition. It utilized adaptive dilated convolution-based channel attention (ADCCA) and enhanced spatial attention (ESA) to better capture the channel and spatial information of the feature. Moreover, convolutional neural networks (CNNs) exhibit a higher sensitivity of low-frequency than high-frequency information, which may lead to a limited performance on reconstructing fine structures. To address this problem, we introduced an additional branch, i.e., textures & details branch, using high-frequency decomposition images to better super-resolve retinal structures. The superiority of our method was demonstrated by qualitative and quantitative comparisons with mainstream methods. Furthermore, HASPN was applied to three out-of-distribution datasets, validating its strong generalization capability.
Collapse
Affiliation(s)
- Zezhao Guo
- College of Information and Engineering, Hebei GEO University, Hebei, China
| | - Zhanfang Zhao
- College of Information and Engineering, Hebei GEO University, Hebei, China.
| |
Collapse
|
248
|
Hu T, Nan X, Zhou X, Shen Y, Zhou Q. A dual-stream feature decomposition network with weight transformation for multi-modality image fusion. Sci Rep 2025; 15:7467. [PMID: 40032937 DOI: 10.1038/s41598-025-92054-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 02/25/2025] [Indexed: 03/05/2025] Open
Abstract
As an image enhancement technology, multi-modal image fusion primarily aims to retain salient information from multi-source image pairs in a single image, generating imaging information that contains complementary features and can facilitate downstream visual tasks. However, dual-stream methods with convolutional neural networks (CNNs) as backbone networks predominantly have limited receptive fields, whereas methods with Transformers are time-consuming, and both lack the exploration of cross-domain information. This study proposes an innovative image fusion model designed for multi-modal images, encompassing pairs of infrared and visible images and multi-source medical images. Our model leverages the strengths of both Transformers and CNNs to model various feature types effectively, addressing both short- and long-range learning as well as the extraction of low- and high-frequency features. First, our shared encoder is constructed based on Transformers for long-range learning, including an intra-modal feature extraction block, an inter-modal feature extraction block, and a novel feature alignment block that handles slight misalignments. Our private encoder for extracting low- and high-frequency features employs a dual-stream architecture based on CNNs, which includes a dual-domain selection mechanism and an invertible neural network. Second, we develop a cross-attention-based Swin Transformer block to explore cross-domain information. In particular, we introduce a weight transformation that is embedded into the Transformer block to enhance the efficiency. Third, a unified loss function incorporating a dynamic weighting factor is formulated to capture the inherent commonalities of multi-modal images. A comprehensive qualitative and quantitative analysis of image fusion and object detection experimental results demonstrates that the proposed method effectively preserves thermal targets and background texture details, surpassing state-of-the-art alternatives in terms of achieving high-quality image fusion and improving the performance in subsequent visual tasks.
Collapse
Affiliation(s)
- Tianqing Hu
- School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, China
| | - Xiaofei Nan
- School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, China
| | - Xiabing Zhou
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Yu Shen
- Medical Imaging Department, Henan Provincial People's Hospital, Zhengzhou, 450001, China
| | - Qinglei Zhou
- School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, China
| |
Collapse
|
249
|
Guo J, Wang J, Wei R, Kang D, Dou Q, Liu YH. UC-NeRF: Uncertainty-Aware Conditional Neural Radiance Fields From Endoscopic Sparse Views. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1284-1296. [PMID: 39531569 DOI: 10.1109/tmi.2024.3496558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Visualizing surgical scenes is crucial for revealing internal anatomical structures during minimally invasive procedures. Novel View Synthesis is a vital technique that offers geometry and appearance reconstruction, enhancing understanding, planning, and decision-making in surgical scenes. Despite the impressive achievements of Neural Radiance Field (NeRF), its direct application to surgical scenes produces unsatisfying results due to two challenges: endoscopic sparse views and significant photometric inconsistencies. In this paper, we propose uncertainty-aware conditional NeRF for novel view synthesis to tackle the severe shape-radiance ambiguity from sparse surgical views. The core of UC-NeRF is to incorporate the multi-view uncertainty estimation to condition the neural radiance field for modeling the severe photometric inconsistencies adaptively. Specifically, our UC-NeRF first builds a consistency learner in the form of multi-view stereo network, to establish the geometric correspondence from sparse views and generate uncertainty estimation and feature priors. In neural rendering, we design a base-adaptive NeRF network to exploit the uncertainty estimation for explicitly handling the photometric inconsistencies. Furthermore, an uncertainty-guided geometry distillation is employed to enhance geometry learning. Experiments on the SCARED and Hamlyn datasets demonstrate our superior performance in rendering appearance and geometry, consistently outperforming the current state-of-the-art approaches. Our code will be released at https://github.com/wrld/UC-NeRF.
Collapse
|
250
|
Lee JC, Park HW, Kang YN. Feasibility study of structural similarity index for patient-specific quality assurance. J Appl Clin Med Phys 2025; 26:e14591. [PMID: 39625100 PMCID: PMC11905251 DOI: 10.1002/acm2.14591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 10/22/2024] [Accepted: 11/11/2024] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND The traditional gamma evaluation method combines dose difference (DD) and distance-to-agreement (DTA) to assess the agreement between two dose distributions. However, while gamma evaluation can identify the location of errors, it does not provide information about the type of errors. PURPOSE The purpose of this study is to optimize and apply the structural similarity (SSIM) index algorithm as a supplementary metric for the quality evaluation of radiation therapy plans alongside gamma evaluation. By addressing the limitations of gamma evaluation, this study aims to establish clinically meaningful SSIM criteria to enhance the accuracy of patient-specific quality assurance (PSQA). METHODS We analyzed the relationship between the gamma passing rate (GPR) and the SSIM index with respect to distance and dose errors. For SSIM analysis corresponding to gamma evaluation criteria of 3%/2 mm, we introduce the concept of SSIM passing rate (SPR). We determined a valid SSIM index that met the gamma evaluation criteria and applied it. Evaluations performed for 40 fields measured with an electronic portal imaging device (EPID) were analyzed using the GPR and the applied SPR. RESULTS The study results showed that distance errors significantly affected both the GPR and the SSIM index, whereas dose errors had some influence on the GPR but little impact on the SSIM index. The SPR was 100% for distance error of 2 mm but began to decrease for distance errors of 3 mm or more. An optimal SSIM index threshold of 0.65 was established, indicating that SPR fell below 100% when distance errors exceeded 2 mm. CONCLUSIONS This study demonstrates that the SSIM algorithm can be effectively applied for the quality evaluation of radiation therapy plans. The SPR can serve as a supplementary metric to gamma evaluation, offering a more precise identification of distance errors. Future research should further validate the efficacy of SSIM algorithm across a broader range of clinical cases.
Collapse
Affiliation(s)
- Jae Choon Lee
- Department of Medical PhysicsKyonggi UniversitySuwonSouth Korea
| | | | - Young Nam Kang
- Department of Radiation OncologySeoul St. Mary's Hospital, College of MedicineThe Catholic University of KoreaSeocho‐guSeoulSouth Korea
| |
Collapse
|