1
|
Mu X, Zhang H, Ma J, Zhang Z, Jiang L, Chen X, Jiang F. Wp-VTON: A wrinkle-preserving virtual try-on network via clothing texture book. Neural Netw 2025; 189:107546. [PMID: 40359737 DOI: 10.1016/j.neunet.2025.107546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Revised: 04/07/2025] [Accepted: 04/24/2025] [Indexed: 05/15/2025]
Abstract
Virtual try-on technology seeks to seamlessly integrate an image of a specified garment onto the target person, generating a synthesized image that realistically depicts the person wearing the clothing. Existing methods based on generative adversarial network (GAN) for clothing warping in the generation process usually use human pose- and body parsing-based features to guide the distortion of a flattened clothing item. However, it is hard using these approaches to accurately capture the spatial characteristics of the distorted clothing (e.g., wrinkles on the clothing). In this research, we propose a Wrinkle-Preserving Virtual Try-On Network, named WP-VTON, to address the aforementioned issues exist in the virtual try-on task. Specifically, in the clothing warping stage, we incorporate the normal features extracted from spatial attributes of both clothing and the human body to learn about the clothing deformation caused by warping; in the try-on generation stage, we leverage a pre-trained StyleGAN, called clothing texture book, to optimize the try-on image, with the aim of further improving the generation capability of WP-VTON with regard to texture details. Experimental results in public datasets demonstrate the effectiveness of our method by outperforming the state-of-the-art GAN-based virtual try-on models.
Collapse
Affiliation(s)
- Xiangyu Mu
- Department of Computer Science, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong Province, China
| | - Haijun Zhang
- Department of Computer Science, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong Province, China.
| | - Jianghong Ma
- Department of Computer Science, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong Province, China; City University of Hong Kong, 999077, Hong Kong, China
| | - Zhao Zhang
- School of Computer and Information, Hefei University of Technology, Hefei, 230009, China
| | - Lin Jiang
- JIANF & ASSOCIATES, Shenzhen, 518057, China
| | - Xiao Chen
- JIANF & ASSOCIATES, Shenzhen, 518057, China
| | - Feng Jiang
- JIANF & ASSOCIATES, Shenzhen, 518057, China
| |
Collapse
|
2
|
Luo W, Zeng Z, Zhong Y. Enhancing image-based virtual try-on with Multi-Controlled Diffusion Models. Neural Netw 2025; 189:107552. [PMID: 40414151 DOI: 10.1016/j.neunet.2025.107552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 02/08/2025] [Accepted: 04/25/2025] [Indexed: 05/27/2025]
Abstract
Image-based virtual try-on technology digitally overlays clothing onto images of individuals, enabling users to preview how garments fit without physical trial, thus enhancing the online shopping experience. While current diffusion-based virtual try-on networks produce high-quality results, they struggle to accurately render garments with textual designs such as logos or prints which are widely prevalent in the real world, often carrying significant brand and cultural identities. To address this challenge, we introduce the Multi-Controlled Diffusion Models for Image-based Virtual Try-On (MCDM-VTON), a novel approach that synergistically incorporates global image features and local textual features extracted from garments to control the generation process. Specifically, we innovatively introduce an Optical Character Recognition (OCR) model to extract the text-style textures from clothing, utilizing the information gathered as text features. These features, in conjunction with the inherent global image features through a multimodal feature fusion module based on cross-attention, jointly control the denoising process of the diffusion models. Moreover, by extracting text information from both the generated virtual try-on results and the original garment images with the OCR model, we have devised a new content-style loss to supervise the training of diffusion models, thereby reinforcing the generation effect of text-style textures. Extensive experiments demonstrate that MCDM-VTON significantly outperforms existing state-of-the-art methods in terms of text preservation and overall visual quality.
Collapse
Affiliation(s)
- Weihao Luo
- Key Laboratory of Textile Science & Technology, Ministry of Education, College of Textiles, Donghua University, Shanghai, 201620, China.
| | | | - Yueqi Zhong
- Key Laboratory of Textile Science & Technology, Ministry of Education, College of Textiles, Donghua University, Shanghai, 201620, China.
| |
Collapse
|
3
|
Pryde MC, Rioux J, Cora AE, Volders D, Schmidt MH, Abdolell M, Bowen C, Beyea SD. Correlation of objective image quality metrics with radiologists' diagnostic confidence depends on the clinical task performed. J Med Imaging (Bellingham) 2025; 12:051803. [PMID: 40223906 PMCID: PMC11991859 DOI: 10.1117/1.jmi.12.5.051803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Revised: 02/28/2025] [Accepted: 03/13/2025] [Indexed: 04/15/2025] Open
Abstract
Purpose Objective image quality metrics (IQMs) are widely used as outcome measures to assess acquisition and reconstruction strategies for diagnostic images. For nonpathological magnetic resonance (MR) images, these IQMs correlate to varying degrees with expert radiologists' confidence scores of overall perceived diagnostic image quality. However, it is unclear whether IQMs also correlate with task-specific diagnostic image quality or expert radiologists' confidence in performing a specific diagnostic task, which calls into question their use as surrogates for radiologist opinion. Approach 0.5 T MR images from 16 stroke patients and two healthy volunteers were retrospectively undersampled ( R = 1 to 7 × ) and reconstructed via compressed sensing. Three neuroradiologists reported the presence/absence of acute ischemic stroke (AIS) and assigned a Fazekas score describing the extent of chronic ischemic lesion burden. Neuroradiologists ranked their confidence in performing each task using a 1 to 5 Likert scale. Confidence scores were correlated with noise quality measure, the visual information fidelity criterion, the feature similarity index, root mean square error, and structural similarity (SSIM) via nonlinear regression modeling. Results Although acceleration alters image quality, neuroradiologists remain able to report pathology. All of the IQMs tested correlated to some degree with diagnostic confidence for assessing chronic ischemic lesion burden, but none correlated with diagnostic confidence in diagnosing the presence/absence of AIS due to consistent radiologist performance regardless of image degradation. Conclusions Accelerated images were helpful for understanding the ability of IQMs to assess task-specific diagnostic image quality in the context of chronic ischemic lesion burden, although not in the case of AIS diagnosis. These findings suggest that commonly used IQMs, such as the SSIM index, do not necessarily indicate an image's utility when performing certain diagnostic tasks.
Collapse
Affiliation(s)
- Michelle C. Pryde
- Dalhousie University, School of Biomedical Engineering, Halifax, Nova Scotia, Canada
| | - James Rioux
- Dalhousie University, School of Biomedical Engineering, Halifax, Nova Scotia, Canada
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
- Nova Scotia Health, Department of Diagnostic Imaging, Halifax, Nova Scotia, Canada
| | - Adela Elena Cora
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
- Nova Scotia Health, Department of Diagnostic Imaging, Halifax, Nova Scotia, Canada
| | - David Volders
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
- Nova Scotia Health, Department of Diagnostic Imaging, Halifax, Nova Scotia, Canada
| | - Matthias H. Schmidt
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
- Nova Scotia Health, Department of Diagnostic Imaging, Halifax, Nova Scotia, Canada
| | - Mohammed Abdolell
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
| | - Chris Bowen
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
- Nova Scotia Health, Department of Diagnostic Imaging, Halifax, Nova Scotia, Canada
| | - Steven D. Beyea
- Dalhousie University, School of Biomedical Engineering, Halifax, Nova Scotia, Canada
- Dalhousie University, Department of Diagnostic Radiology, Halifax, Nova Scotia, Canada
- Nova Scotia Health, Department of Diagnostic Imaging, Halifax, Nova Scotia, Canada
- IWK Health Centre, Halifax, Nova Scotia, Canada
| |
Collapse
|
4
|
Tang M, Jiang J, Zhang X, Zhou T, Zhang Y, Qiu B, Zhang L. Dynamic Multi-scale Feature Integration Network for unsupervised MR-CT synthesis. Neural Netw 2025; 189:107584. [PMID: 40424759 DOI: 10.1016/j.neunet.2025.107584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/30/2025] [Accepted: 05/03/2025] [Indexed: 05/29/2025]
Abstract
Unsupervised MR-CT synthesis presents a significant opportunity to reduce radiation exposure from CT scans and lower costs by eliminating the need for both MR and CT imaging. However, many existing unsupervised methods face limitations in capturing different anatomical structures due to their inability to model features with large receptive fields, and the receptive fields of different structures can vary. To address this challenge, we propose a novel Dynamic Multi-scale Feature Integration Network (DMFI-Net) tailored for unsupervised MR-CT synthesis. Our DMFI-Net dynamically adjusts its receptive field to extract multi-scale receptive field features, effectively capturing intricate anatomical details to enhance the synthesis performance. Specifically, we present a Global Context-enhanced Kernel Selection (GCKS) module, which intelligently modulates the receptive fields of convolutions for capturing fine-grained details essential to image transformation. By incorporating global cues, the module enriches multi-scale receptive features with comprehensive semantic information, which is crucial for synthesizing globally distributed target regions or organs. Additionally, a Scale Enhancement Module (SEM) is proposed to integrate features extracted with different scales, preserving richer spatial information. Furthermore, we present a scale-aware reconstruction branch to bolster the encoder's feature extraction capability and improve model generalization. This branch is capable of reconstructing downsampled input images that have undergone random masking, underscoring the model's robust feature extraction ability. Extensive experimental results on one private and two public MR-CT datasets demonstrate that our model significantly outperforms state-of-the-art MR-CT synthesis methods in both qualitative and quantitative evaluations. The implementation code will be released upon acceptance of this manuscript at https://github.com/taozh2017/DMFINet.
Collapse
Affiliation(s)
- Meng Tang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Jiuming Jiang
- Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Xue Zhang
- Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Tao Zhou
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Yizhe Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Bin Qiu
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Li Zhang
- Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China.
| |
Collapse
|
5
|
Puebla G, Bowers JS. Visual reasoning in object-centric deep neural networks: A comparative cognition approach. Neural Netw 2025; 189:107582. [PMID: 40409010 DOI: 10.1016/j.neunet.2025.107582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/28/2025] [Accepted: 05/03/2025] [Indexed: 05/25/2025]
Abstract
Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of generalization of the relations learned. However, in recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning within the deep learning framework. Object-centric models attempt to model input scenes as compositions of objects and relations between them. To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects. In this work we tested relation learning and generalization in several object-centric models, as well as a ResNet-50 baseline. In contrast to previous research, which has focused heavily in the same-different task in order to asses relational reasoning in DNNs, we use a set of tasks - with varying degrees of complexity - derived from the comparative cognition literature. Our results show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases. In our simpler tasks, this improves their capacity to learn and generalize visual relations in comparison to the ResNet-50 baseline. However, object-centric models still struggle in our more difficult tasks and conditions. We conclude that abstract visual reasoning remains an open challenge for DNNs, including object-centric models.
Collapse
Affiliation(s)
- Guillermo Puebla
- Facultad de Administración y Economía, Universidad de Tarapacá, Arica 1000000, Chile.
| | - Jeffrey S Bowers
- School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK
| |
Collapse
|
6
|
Yin S, Liu H. Driving scene image Dehazing model based on multi-branch and multi-scale feature fusion. Neural Netw 2025; 188:107495. [PMID: 40252372 DOI: 10.1016/j.neunet.2025.107495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 03/21/2025] [Accepted: 04/13/2025] [Indexed: 04/21/2025]
Abstract
Image dehazing is critical for enhancing image quality in applications such as autonomous driving, surveillance, and remote sensing. This paper presents an innovative image dehazing model based on a multi-branch and multi-scale feature fusion network that leverages spatial and frequency information. The model features a multi-branch architecture that combines local and global features through depthwise separable convolutions and state space models, effectively capturing both detailed and comprehensive information to improve dehazing performance. Additionally, a specialized module integrates spatial and frequency domain information by utilizing convolutional layers and Fourier transforms, enabling comprehensive haze removal through the fusion of these two domains. A feature fusion mechanism incorporates channel attention and residual connections, dynamically adjusting the importance of different channel features while preserving the global structural information of the input image. Furthermore, this is the first model to combine Mamba and convolution layers for driving scene image dehazing, achieving global feature extraction with linear complexity. Each image is processed in only 0.030 s, with a frame rate of 32.41 FPS and a processing efficiency of 67.96 MPx/s, ensuring high efficiency suitable for real-time applications. Extensive experiments on real-world foggy driving scene datasets demonstrate the superior performance of the proposed method, providing reliable visual perception capabilities and significantly improving adaptability and robustness in complex environments.
Collapse
Affiliation(s)
- Shi Yin
- Institute of Artificial Intelligence & Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha 410075, Hunan, PR China.
| | - Hui Liu
- Institute of Artificial Intelligence & Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha 410075, Hunan, PR China.
| |
Collapse
|
7
|
Wang J, Deng J, Liu D. Deep prior embedding method for Electrical Impedance Tomography. Neural Netw 2025; 188:107419. [PMID: 40184867 DOI: 10.1016/j.neunet.2025.107419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 02/15/2025] [Accepted: 03/16/2025] [Indexed: 04/07/2025]
Abstract
This paper presents a novel deep learning-based approach for Electrical Impedance Tomography (EIT) reconstruction that effectively integrates image priors to enhance reconstruction quality. Traditional neural network methods often rely on random initialization, which may not fully exploit available prior information. Our method addresses this by using image priors to guide the initialization of the neural network, allowing for a more informed starting point and better utilization of prior knowledge throughout the reconstruction process. We explore three different strategies for embedding prior information: non-prior embedding, implicit prior embedding, and full prior embedding. Through simulations and experimental studies, we demonstrate that the incorporation of accurate image priors significantly improves the fidelity of the reconstructed conductivity distribution. The method is robust across varying levels of noise in the measurement data, and the quality of the reconstruction is notably higher when the prior closely resembles the true distribution. This work highlights the importance of leveraging prior information in EIT and provides a framework that could be extended to other inverse problems where prior knowledge is available.
Collapse
Affiliation(s)
- Junwu Wang
- School of Mathematical Sciences, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Jiansong Deng
- School of Mathematical Sciences, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Dong Liu
- CAS Key Laboratory of Microscale Magnetic Resonance, University of Science and Technology of China, Hefei, 230026, Anhui, China; Synergetic Innovation Center of Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, 230026, Anhui, China; School of Biomedical Engineering and Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, 215123, Jiangsu, China.
| |
Collapse
|
8
|
Lin F, Gao S, Tang Y, Ma X, Murakami R, Zhang Z, Obayemi JD, Soboyejo WO, Zhang HK. Spectroscopic photoacoustic denoising framework using hybrid analytical and data-free learning method. PHOTOACOUSTICS 2025; 44:100729. [PMID: 40416360 PMCID: PMC12098154 DOI: 10.1016/j.pacs.2025.100729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 03/24/2025] [Accepted: 04/27/2025] [Indexed: 05/27/2025]
Abstract
Spectroscopic photoacoustic (sPA) imaging uses multiple wavelengths to differentiate and quantify chromophores based on their unique optical absorption spectra. This technique has been widely applied in areas such as vascular mapping, tumor detection, and therapeutic monitoring. However, PA imaging is highly susceptible to noise, leading to a low signal-to-noise ratio (SNR) and compromised image quality. Furthermore, low SNR in spectral data adversely affects spectral unmixing outcomes, hindering accurate quantitative PA imaging. Traditional denoising techniques like frame averaging, though effective in improving SNR, can be impractical for dynamic imaging scenarios due to reduced frame rates. Advanced methods, including learning-based approaches and analytical algorithms, have demonstrated promise but often require extensive training data and parameter tuning. Moreover, spectral information preservation is unclear, limiting their adaptability for clinical usage. Additionally, training data is not always accessible for learning-based methods. In this work, we propose a Spectroscopic Photoacoustic Denoising (SPADE) framework using hybrid analytical and data-free learning method. This framework integrates a data-free learning-based method with an efficient BM3D-based analytical approach while preserving spectral integrity, providing noise reduction, and ensuring that functional information is maintained. The SPADE framework was validated through simulation, phantom, in vivo, and ex vivo studies. These studies demonstrated that SPADE improved image SNR by over 15 dB in high noise cases and preserved spectral information (R > 0.8), outperforming conventional methods, especially in low SNR conditions. SPADE presents a promising solution for preserving the accuracy of quantitative PA imaging in clinical applications where noise reduction and spectral preservation are critical.
Collapse
Affiliation(s)
- Fangzhou Lin
- Department of Robotics Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Shang Gao
- Department of Robotics Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Yichuan Tang
- Department of Robotics Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Xihan Ma
- Department of Robotics Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Ryo Murakami
- Department of Robotics Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Ziming Zhang
- Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - John D. Obayemi
- Department of Biomedical Engineering, Gateway Park Life Sciences Center, Worcester Polytechnic Institute (WPI), 60 Prescott Street, Worcester, MA 01605, USA
- Department of Mechanical & Materials Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Winston O. Soboyejo
- Department of Mechanical & Materials Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| | - Haichong K. Zhang
- Department of Robotics Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
- Department of Biomedical Engineering, Gateway Park Life Sciences Center, Worcester Polytechnic Institute (WPI), 60 Prescott Street, Worcester, MA 01605, USA
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
| |
Collapse
|
9
|
Yang S, Chen C, Liu J, Tang J, Wu G. FSDM: An efficient video super-resolution method based on Frames-Shift Diffusion Model. Neural Netw 2025; 188:107435. [PMID: 40187080 DOI: 10.1016/j.neunet.2025.107435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 02/15/2025] [Accepted: 03/23/2025] [Indexed: 04/07/2025]
Abstract
Video super-resolution is a fundamental task aimed at enhancing video quality through intricate modeling techniques. Recent advancements in diffusion models have significantly enhanced image super-resolution processing capabilities. However, their integration into video super-resolution workflows remains constrained due to the computational complexity of temporal fusion modules, demanding more computational resources compared to their image counterparts. To address this challenge, we propose a novel approach: a Frames-Shift Diffusion Model based on the image diffusion models. Compared to directly training diffusion-based video super-resolution models, redesigning the diffusion process of image models without introducing complex temporal modules requires minimal training consumption. We incorporate temporal information into the image super-resolution diffusion model by using optical flow and perform multi-frame fusion. This model adapts the diffusion process to smoothly transition from image super-resolution to video super-resolution diffusion without additional weight parameters. As a result, the Frames-Shift Diffusion Model efficiently processes videos frame by frame while maintaining computational efficiency and achieving superior performance. It enhances perceptual quality and achieves comparable performance to other state-of-the-art diffusion-based VSR methods in PSNR and SSIM. This approach optimizes video super-resolution by simplifying the integration of temporal data, thus addressing key challenges in the field.
Collapse
Affiliation(s)
- Shijie Yang
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; School of Artificial Intelligence, Nanjing University, Nanjing, 210023, Jiangsu, China.
| | - Chao Chen
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, Jiangsu, China.
| | - Jie Liu
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, Jiangsu, China.
| | - Jie Tang
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, Jiangsu, China.
| | - Gangshan Wu
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, Jiangsu, China.
| |
Collapse
|
10
|
López-Baldomero AB, Buzzelli M, Moronta-Montero F, Martínez-Domingo MÁ, Valero EM. Ink classification in historical documents using hyperspectral imaging and machine learning methods. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 335:125916. [PMID: 40049019 DOI: 10.1016/j.saa.2025.125916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 02/11/2025] [Accepted: 02/15/2025] [Indexed: 03/24/2025]
Abstract
Ink identification using only spectral reflectance information poses significant challenges due to material degradation, aging, and spectral overlap between ink classes. This study explores the use of hyperspectral imaging and machine learning techniques to classify three distinct types of inks: pure metallo-gallate, carbon-containing, and non-carbon-containing inks. Six supervised classification models, including five traditional algorithms (Support Vector Machines, K-Nearest Neighbors, Linear Discriminant Analysis, Random Forest, and Partial Least Squares Discriminant Analysis) and one Deep Learning-based model, were evaluated. The methodology integrates data fusion from different imaging systems, sample extraction, ground truth creation, and a post-processing step to increase uniformity. The evaluation was performed using both mock-up samples and historical documents, achieving micro-averaged accuracy above 90% for all models. The best performance was obtained using the DL-based model (98% F1-score), followed by the Support Vector Machine model. In the case study documents, the overall performance of the traditional model was better. This study highlights the potential of hyperspectral imaging combined with machine learning for non-invasive ink identification and mapping, even under challenging conditions, contributing to the conservation and analysis of historical manuscripts.
Collapse
Affiliation(s)
- Ana Belén López-Baldomero
- Department of Optics, University of Granada, Faculty of Sciences, Campus Fuentenueva, s/n, Granada, 18071, Spain.
| | - Marco Buzzelli
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Viale Sarca, 336, Milan, 20126, Italy
| | - Francisco Moronta-Montero
- Department of Optics, University of Granada, Faculty of Sciences, Campus Fuentenueva, s/n, Granada, 18071, Spain
| | | | - Eva María Valero
- Department of Optics, University of Granada, Faculty of Sciences, Campus Fuentenueva, s/n, Granada, 18071, Spain
| |
Collapse
|
11
|
Guérendel C, Petrychenko L, Chupetlovska K, Bodalal Z, Beets-Tan RGH, Benson S. Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images. Eur Radiol 2025; 35:4335-4346. [PMID: 39738559 DOI: 10.1007/s00330-024-11321-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 10/28/2024] [Accepted: 11/28/2024] [Indexed: 01/02/2025]
Abstract
OBJECTIVE This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. MATERIALS AND METHODS We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations. RESULTS The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart. CONCLUSION Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods. KEY POINTS Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.
Collapse
Affiliation(s)
- Corentin Guérendel
- Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands.
- GROW-Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
| | - Liliana Petrychenko
- Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Kalina Chupetlovska
- Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands
- University Hospital St. Ivan Rilski, Sofia, Bulgaria
| | - Zuhir Bodalal
- Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW-Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Regina G H Beets-Tan
- Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW-Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Sean Benson
- Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cardiology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
12
|
Fortin M, Stirnberg R, Völzke Y, Lamalle L, Pracht E, Löwen D, Stöcker T, Goa PE. MPRAGE like: A novel approach to generate T1w images from multi-contrast gradient echo images for brain segmentation. Magn Reson Med 2025; 94:134-149. [PMID: 39902546 PMCID: PMC12021339 DOI: 10.1002/mrm.30453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 01/15/2025] [Accepted: 01/15/2025] [Indexed: 02/05/2025]
Abstract
PURPOSE Brain segmentation and multi-parameter mapping (MPM) are important steps in neurodegenerative disease characterization. However, acquiring both a high-resolution T1w sequence like MPRAGE (standard input to brain segmentation) and an MPM in the same neuroimaging protocol increases scan time and patient discomfort, making it difficult to combine both in clinical examinations. METHODS A novel approach to synthesize T1w images from MPM images, named MPRAGElike, is proposed and compared to the standard technique used to produce synthetic MPRAGE images (synMPRAGE). Twenty-three healthy subjects were scanned with the same imaging protocol at three different 7T sites using universal parallel transmit RF pulses. SNR, CNR, and automatic brain segmentation results from both MPRAGElike and synMPRAGE were compared against an acquired MPRAGE. RESULTS The proposed MPRAGElike technique produced higher SNR values than synMPRAGE for all regions evaluated while also having higher CNR values for subcortical structures. MPRAGE was still the image with the highest SNR values overall. For automatic brain segmentation, MPRAGElike outperformed synMPRAGE when compared to MPRAGE (median Dice Similarity Coefficient of 0.90 versus 0.29 and Average Asymmetric Surface Distance of 0.33 versus 2.93 mm, respectively), in addition to being simple, flexible, and considerably more robust to low image quality than synMPRAGE. CONCLUSION The MPRAGElike technique can provide a better and more reliable alternative to synMPRAGE as a substitute for MPRAGE, especially when automatic brain segmentation is of interest and scan time is limited.
Collapse
Affiliation(s)
- Marc‐Antoine Fortin
- Department of PhysicsNorwegian University of Science and TechnologyTrondheimTrøndelagNorway
| | | | - Yannik Völzke
- German Center for Neurodegenerative Diseases (DZNE)BonnGermany
| | - Laurent Lamalle
- GIGA‐Cyclotron Research Centre‐In Vivo ImagingUniversity of LiègeLiègeBelgium
| | - Eberhard Pracht
- German Center for Neurodegenerative Diseases (DZNE)BonnGermany
| | - Daniel Löwen
- German Center for Neurodegenerative Diseases (DZNE)BonnGermany
| | - Tony Stöcker
- German Center for Neurodegenerative Diseases (DZNE)BonnGermany
- Department of Physics and AstronomyUniversity of BonnBonnGermany
| | - Pål Erik Goa
- Department of PhysicsNorwegian University of Science and TechnologyTrondheimTrøndelagNorway
- Department of Radiology and Nuclear MedicineSt. Olavs Hospital HFTrondheimNorway
| |
Collapse
|
13
|
Adamson PM, Desai AD, Dominic J, Varma M, Bluethgen C, Wood JP, Syed AB, Boutin RD, Stevens KJ, Vasanawala S, Pauly JM, Gunel B, Chaudhari AS. Using deep feature distances for evaluating the perceptual quality of MR image reconstructions. Magn Reson Med 2025; 94:317-330. [PMID: 39921580 PMCID: PMC12021552 DOI: 10.1002/mrm.30437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 12/09/2024] [Accepted: 01/04/2025] [Indexed: 02/10/2025]
Abstract
PURPOSE Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation. METHODS We compare commonly used IQ metrics (PSNR and SSIM) to two "out-of-domain" DFDs with encoders trained on natural images, an "in-domain" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise. RESULTS All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs. CONCLUSION A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.
Collapse
Affiliation(s)
- Philip M. Adamson
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Arjun D. Desai
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Jeffrey Dominic
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, California, USA
| | | | - Jeff P. Wood
- Austin Radiological Association, Austin, Texas, USA
| | - Ali B. Syed
- Department of Radiology, Stanford University, Stanford, California, USA
| | - Robert D. Boutin
- Department of Radiology, Stanford University, Stanford, California, USA
| | | | | | - John M. Pauly
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Beliz Gunel
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Akshay S. Chaudhari
- Department of Radiology, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| |
Collapse
|
14
|
Kim G, Shin H, Eom M, Kim H, Chang JB, Yoon YG. Doubling multiplexed imaging capability via spatial expression pattern-guided protein pairing and computational unmixing. Commun Biol 2025; 8:928. [PMID: 40517167 DOI: 10.1038/s42003-025-08357-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 06/06/2025] [Indexed: 06/16/2025] Open
Abstract
Three-dimensional multiplexed fluorescence imaging is an indispensable technique in neuroscience. For two-dimensional multiplexed imaging, cyclic immunofluorescence, which involves repeating staining, imaging, and signal removal over multiple cycles, has been widely used. However, the application of cyclic immunofluorescence to three dimensions poses challenges, as a single staining process can take more than 12 hours for thick specimens, and repeating this process for multiple cycles can be prohibitively long. Here, we propose SEPARATE (Spatial Expression PAttern-guided paiRing And unmixing of proTEins), a method that reduces the number of cycles by half by imaging two proteins using a single fluorophore. This is achieved by labeling two proteins with the same fluorophores and unmixing their signals based on their three-dimensional spatial expression patterns, using a neural network. We employ a feature extraction network to quantify the spatial distinction between proteins, with these quantified values, termed feature-based distances, used to identify protein pairs. We then validate the feature extraction network with ten proteins, showing a high correlation between spatial pattern distinction and signal unmixing performance. We finally demonstrate the volumetric multiplexed imaging of six proteins using three fluorophores, pairing them based on feature-based distances and unmixing their signals through protein separation networks.
Collapse
Affiliation(s)
- Gyuri Kim
- School of Electrical Engineering, KAIST, Daejeon, Republic of Korea
| | - Hyejin Shin
- Department of Materials Science and Engineering, KAIST, Daejeon, Republic of Korea
| | - Minho Eom
- School of Electrical Engineering, KAIST, Daejeon, Republic of Korea
| | - Hyunwoo Kim
- Department of Materials Science and Engineering, KAIST, Daejeon, Republic of Korea
| | - Jae-Byum Chang
- Department of Materials Science and Engineering, KAIST, Daejeon, Republic of Korea.
| | - Young-Gyu Yoon
- School of Electrical Engineering, KAIST, Daejeon, Republic of Korea.
- Department of Semiconductor System Engineering, KAIST, Daejeon, Republic of Korea.
- KAIST Institute for Health Science and Technology, Daejeon, Republic of Korea.
| |
Collapse
|
15
|
Marchetto E, Eichhorn H, Gallichan D, Schnabel JA, Ganz M. Agreement of image quality metrics with radiological evaluation in the presence of motion artifacts. MAGMA (NEW YORK, N.Y.) 2025:10.1007/s10334-025-01266-y. [PMID: 40493331 DOI: 10.1007/s10334-025-01266-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Revised: 04/22/2025] [Accepted: 05/14/2025] [Indexed: 06/12/2025]
Abstract
OBJECTIVE Reliable image quality assessment is crucial for evaluating new motion correction methods for magnetic resonance imaging. We compare the performance of common reference-based and reference-free image quality metrics on unique datasets with real motion artifacts, and analyze the metrics' robustness to typical pre-processing techniques. MATERIALS AND METHODS We compared five reference-based and five reference-free metrics on brain data acquired with and without intentional motion (2D and 3D sequences). The metrics were recalculated seven times with varying pre-processing steps. Spearman correlation coefficients were computed to assess the relationship between image quality metrics and radiological evaluation. RESULTS All reference-based metrics showed strong correlation with observer assessments. Among reference-free metrics, Average Edge Strength offers the most promising results, as it consistently displayed stronger correlations across all sequences compared to the other reference-free metrics. The strongest correlation was achieved with percentile normalization and restricting the metric values to the skull-stripped brain region. In contrast, correlations were weaker when not applying any brain mask and using min-max or no normalization. DISCUSSION Reference-based metrics reliably correlate with radiological evaluation across different sequences and datasets. Pre-processing significantly influences correlation values. Future research should focus on refining pre-processing techniques and exploring approaches for automated image quality evaluation.
Collapse
Affiliation(s)
- Elisa Marchetto
- Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, NYU School of Medicine, New York, NY, USA
- Center for Advanced Imaging Innovation and Research (CAI2R), Department of Radiology, NYU School of Medicine, New York, NY, USA
- CUBRIC, School of Engineering, Cardiff University, Cardiff, UK
| | - Hannah Eichhorn
- Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | | - Julia A Schnabel
- Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - Melanie Ganz
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
- Neurobiology Research Unit, Copenhagen University Hospital, Copenhagen, Denmark.
| |
Collapse
|
16
|
Liu Y, Wang B, Xu X, Xu J. SADA: An advanced Spectral Attention Denoising Autoencoder for high-fidelity and efficient infrared spectral data generation. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 343:126336. [PMID: 40516311 DOI: 10.1016/j.saa.2025.126336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 04/25/2025] [Accepted: 05/03/2025] [Indexed: 06/16/2025]
Abstract
This study utilizes a Fourier transform infrared spectroscopy (FTIR)-based detection system to obtain and analyze the infrared spectra of cigarette smoke aerosols. To reduce the workload of spectral data acquisition and improve efficiency, we developed the Spectral Attention Denoising Autoencoder (SADA) model, which integrates an autoencoder (AE) architecture with a self-attention mechanism and incorporates a noise injection strategy. Compared to mainstream generative models, the SADA model performs better in generating accurate and high-fidelity spectra. To further validate the effectiveness of the generated spectra, we conducted classification experiments on hybrid datasets. By augmenting real spectral data with generated spectra, we observed significant improvements in classification accuracy across several mainstream classification models. Ablation experiments confirmed the critical roles of the self-attention mechanism and noise injection strategy in feature extraction and stable training. Additionally, the model exhibited excellent generalization capabilities across multiple public spectral datasets. The proposed SADA model not only alleviates the burden of spectral data acquisition but also provides an effective data augmentation strategy for spectral analysis tasks.
Collapse
Affiliation(s)
- Yunzhao Liu
- College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
| | - Bin Wang
- College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
| | - Xiaoxuan Xu
- College of Artificial Intelligence, Nankai University, Tianjin 300350, China; Yunnan Research Institute, Nankai University, Kunming 650091, China.
| | - Jing Xu
- College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
| |
Collapse
|
17
|
Zhang X, Huang C, Gui W. A multi-strategy improved crow search algorithm for multi-level thresholding image segmentation. Sci Rep 2025; 15:20033. [PMID: 40481186 PMCID: PMC12144277 DOI: 10.1038/s41598-025-94318-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 03/12/2025] [Indexed: 06/11/2025] Open
Abstract
The standard crow search algorithm suffers from low convergence accuracy, insufficient stability, and susceptibility to getting stuck in local optima. To tackle these formidable challenges, this paper proposes a novel multi-strategy improved crow search algorithm (MSICSA) specifically designed for multi-level image segmentation. The proposed approach incorporates three key enhancements: firstly, opposition-based learning (OBL) is utilized to improve the quality of initial solutions within MSICSA; secondly, an adaptive awareness probability mechanism is introduced to better balance the trade-off between exploration and exploitation; lastly, two differential mutation operators are developed to enhance global search capabilities, increase population diversity, and reduce the risk of converging on local optima. To validate the performance of the proposed algorithm, two sets of experiments are conducted. In the first set of experiments, CEC 2020 benchmark test functions are selected to compare the performance of MSICSA with other group intelligent optimization algorithms. In the second set of experiments, Otsu's method and fuzzy entropy are employed as objective functions for performing multilevel threshold segmentation on twelve grayscale images. The experimental results demonstrate that MSICSA outperforms seven comparison algorithms in terms of both convergence speed and segmentation quality.
Collapse
Affiliation(s)
- Xiaoping Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
| | - Chengliang Huang
- Information Technology Management Department, Toronto Metropolitan University, Toronto, M5B 2K3, Canada.
| | - Weixia Gui
- School of Big Data and Artificial Intelligence, Guangxi University of Finance and Economics, Nanning, 530004, China
- Guangxi Key Laboratory of Big Data in Finance and Economics, Guangxi University of Finance and Economics, Nanning, 530004, China
| |
Collapse
|
18
|
Lu W, Zhao H, Ma D, Jing P. LUFormer : A luminance-informed localized transformer with frequency augmentation for nighttime flare removal. Neural Netw 2025; 190:107660. [PMID: 40516379 DOI: 10.1016/j.neunet.2025.107660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 04/22/2025] [Accepted: 05/20/2025] [Indexed: 06/16/2025]
Abstract
Flare caused by unintended light scattering or reflection in night scenes significantly degrades image quality. Existing methods explore frequency factors and semantic priors but fail to comprehensively integrate all relevant information. To address this, we propose LUFormer, a luminance-informed Transformer network with localized frequency augmentation. Central to our approach are two key modules: the luminance-guided branch (LGB) and the dual domain hybrid attention (DDHA) unit. The LGB provides global brightness semantic priors, emphasizing the disruption of luminance distribution caused by flare. The DDHA improves deep flare representation in both the spatial and frequency domains. In the spatial domain, it broadens the receptive field through pixel rearrangement and cross-window dilation, while in the frequency domain, it emphasizes and amplifies low-frequency components via a compound attention mechanism. Our approach leverages the LGB, which globally guides semantic refinement, to construct a U-shaped progressive focusing framework. In this architecture, the DDHA locally augments multi-domain features across multiple scales. Extensive experiments on real-world benchmarks demonstrate that the proposed LUFormer outperforms state-of-the-art methods. The code is publicly available at: https://github.com/HeZhao0725/LUFormer.
Collapse
Affiliation(s)
- Wei Lu
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - He Zhao
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - Dubuke Ma
- University of Michigan Ann Arbor, Ann Arbor 48109, USA
| | - Peiguang Jing
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
19
|
Diaz N, Beniwal M, Marquez M, Guzman F, Jiang C, Liang J, Vera E. Single-mask sphere-packing with implicit neural representation reconstruction for ultrahigh-speed imaging. OPTICS EXPRESS 2025; 33:24027-24038. [PMID: 40515355 DOI: 10.1364/oe.561323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Accepted: 05/12/2025] [Indexed: 06/16/2025]
Abstract
Single-shot, high-speed 2D optical imaging is essential for studying transient phenomena in various research fields. Among existing techniques, compressed optical-streaking ultra-high-speed photography (COSUP) uses a coded aperture and a galvanometer scanner to capture non-repeatable time-evolving events at the 1.5 million-frame-per-second level. However, the use of a randomly coded aperture complicates the reconstruction process and introduces artifacts in the recovered videos. In contrast, non-multiplexing coded apertures simplify the reconstruction algorithm, allowing the recovery of longer videos from a snapshot. In this work, we design a non-multiplexing coded aperture for COSUP by exploiting the properties of congruent sphere packing (SP), which enables uniform space-time sampling given by the synergy between the galvanometer linear scanning and the optimal SP encoding patterns. We also develop an implicit neural representation-which can be self-trained from a single measurement-to not only largely reduce the training time and eliminate the need for training datasets but also reconstruct far more ultra-high-speed frames from a single measurement. The advantages of this proposed encoding and reconstruction scheme are verified by simulations and experimental results in a COSUP system.
Collapse
|
20
|
Wilson L, Ruget A, Halimi A, Hearn B, Leach J. Super-resolution depth imaging via processing of compact single-photon histogram parameters. OPTICS EXPRESS 2025; 33:23657-23667. [PMID: 40515327 DOI: 10.1364/oe.559801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Accepted: 05/05/2025] [Indexed: 06/16/2025]
Abstract
Time-of-flight (ToF) imaging is widely used in consumer electronics for depth perception, with compact ToF sensors often representing their data as histograms of photon arrival times for each pixel. These histograms capture detailed temporal information that enables advanced computational techniques, such as super-resolution, to reconstruct high-resolution depth images even from low-resolution sensors by leveraging the full temporal structure of the data. However, transferring full histogram data is impractical for compact systems due to the large amount of data. To address this, microcontrollers extract a few key parameters-such as peak position, signal intensity, and noise level-greatly reducing data volume. While this approach performs well for low-resolution tasks like autofocus and obstacle detection, its potential for high-resolution depth imaging has not been fully explored. In this work, we demonstrate that these few extracted parameters are sufficient to reconstruct full high-resolution depth images. We propose a compact and data-efficient neural network that enhances the spatial resolution of a basic ToF sensor from 4 × 4 pixels to 32 × 32 pixels. By focusing on only 3 key parameters per pixel, compared to the original 144 histogram bins (range ToF sensor provides), representing a 48× reduction in data, our approach significantly reduces the data requirements while maintaining performance similar to methods that rely on full histogram data. Despite this drastic reduction in data, our method achieves high-resolution depth imaging with minimal performance loss, demonstrating the feasibility of efficient and high-quality depth reconstruction using only key extracted parameters.
Collapse
|
21
|
Yagi S, Usui K, Ogawa K. Scatter and beam hardening effect corrections in pelvic region cone beam CT images using a convolutional neural network. Radiol Phys Technol 2025; 18:457-468. [PMID: 40183875 DOI: 10.1007/s12194-025-00896-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 03/03/2025] [Accepted: 03/06/2025] [Indexed: 04/05/2025]
Abstract
The aim of this study is to remove scattered photons and beam hardening effect in cone beam CT (CBCT) images and make an image available for treatment planning. To remove scattered photons and beam hardening effect, a convolutional neural network (CNN) was used, and trained with distorted projection data including scattered photons and beam hardening effect and supervised projection data calculated with monochromatic X-rays. The number of training projection data was 17,280 with data augmentation and that of test projection data was 540. The performance of the CNN was investigated in terms of the number of photons in the projection data used in the training of the network. Projection data of pelvic CBCT images (32 cases) were calculated with a Monte Carlo simulation with six different count levels ranging from 0.5 to 3 million counts/pixel. For the evaluation of corrected images, the peak signal-to-noise ratio (PSNR), the structural similarity index measure (SSIM), and the sum of absolute difference (SAD) were used. The results of simulations showed that the CNN could effectively remove scattered photons and beam hardening effect, and the PSNR, the SSIM, and the SAD significantly improved. It was also found that the number of photons in the training projection data was important in correction accuracy. Furthermore, a CNN model trained with projection data with a sufficient number of photons could yield good performance even though a small number of photons were used in the input projection data.
Collapse
Affiliation(s)
- Soya Yagi
- Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, 3-7-2 Kajinocho, Koganei, Tokyo, 184-0002, Japan
| | - Keisuke Usui
- Department of Radiological Technology, Faculty of Health Science, Juntendo University, 1-5-3 Yushima, Bunkyo-ku, Tokyo, 113-0034, Japan
| | - Koichi Ogawa
- Department of Applied Informatics, Faculty of Science and Engineering, Hosei University, 3-7-2 Kajinocho, Koganei, Tokyo, 184-0002, Japan.
| |
Collapse
|
22
|
Huang JJ, Liu T, Chen Z, Liu X, Wang M, Dragotti PL. A Lightweight Deep Exclusion Unfolding Network for Single Image Reflection Removal. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:4957-4973. [PMID: 40048344 DOI: 10.1109/tpami.2025.3548148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]
Abstract
Single Image Reflection Removal (SIRR) is a canonical blind source separation problem and refers to the issue of separating a reflection-contaminated image into a transmission and a reflection image. The core challenge lies in minimizing the commonalities among different sources. Existing deep learning approaches either neglect the significance of feature interactions or rely on heuristically designed architectures. In this paper, we propose a novel Deep Exclusion unfolding Network (DExNet), a lightweight, interpretable, and effective network architecture for SIRR. DExNet is principally constructed by unfolding and parameterizing a simple iterative Sparse and Auxiliary Feature Update (i-SAFU) algorithm, which is specifically designed to solve a new model-based SIRR optimization formulation incorporating a general exclusion prior. This general exclusion prior enables the unfolded SAFU module to inherently identify and penalize commonalities between the transmission and reflection features, ensuring more accurate separation. The principled design of DExNet not only enhances its interpretability but also significantly improves its performance. Comprehensive experiments on four benchmark datasets demonstrate that DExNet achieves state-of-the-art visual and quantitative results while utilizing only approximately 8% of the parameters required by leading methods.
Collapse
|
23
|
Tian Y, Luo Z, Lu D, Liu C, Wildsoet C. Reconstruction of highly and extremely aberrated wavefront for ocular Shack-Hartmann sensor using multi-task Attention-UNet. Exp Eye Res 2025; 255:110394. [PMID: 40254120 DOI: 10.1016/j.exer.2025.110394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2025] [Revised: 04/16/2025] [Accepted: 04/17/2025] [Indexed: 04/22/2025]
Abstract
In certain ocular conditions, such as in eyes with keratoconus or after corneal laser surgery, Higher Order Aberrations (HOAs) may be dramatically elevated. Accurately recording interpretable wavefronts in such highly aberrated eyes using Shack-Hartmann sensor is a challenging task. While there are studies that have applied deep neural networks to Shack-Hartmann wavefront reconstructions, they have been limited to low resolution and small dynamic range cases. In this study, we introduce a multi-task learning scheme for High-Resolution and High Dynamic Range Shack-Hartmann wavefront reconstruction using a modified attention-UNet (HR-HDR-SHUNet), which outputs a wavefront map along with Zernike coefficients simultaneously. The HR-HDR-SHUNet was evaluated on three large datasets with different levels of HOAs (regularly, highly, and extremely aberrated), with successful reconstruction of all aberrated wavefronts, at the same time achieving significantly higher accuracy than both traditional methods and other deep learning networks; it is also computationally more efficient than the latter.
Collapse
Affiliation(s)
- Yibin Tian
- College of Mechatronics and Control Engineering & State Key Laboratory of Radio Frequency Heterogenous Integration, Shenzhen University, Shenzhen, 518060, China
| | - Zipei Luo
- College of Mechatronics and Control Engineering & State Key Laboratory of Radio Frequency Heterogenous Integration, Shenzhen University, Shenzhen, 518060, China
| | - Dajiang Lu
- College of Mechatronics and Control Engineering & State Key Laboratory of Radio Frequency Heterogenous Integration, Shenzhen University, Shenzhen, 518060, China.
| | - Cheng Liu
- College of Mechatronics and Control Engineering & State Key Laboratory of Radio Frequency Heterogenous Integration, Shenzhen University, Shenzhen, 518060, China; Department of Optoelectric Information Science and Technology, School of Science, Jiangnan University, Wuxi, 214122, China
| | - Christine Wildsoet
- Wertheim School of Optometry and Vision Science, University of California, Berkeley, 94720, California, United States
| |
Collapse
|
24
|
Lian R, Li W, Hao J, Zhang Y, Jia F. Stereo Endoscopic Camera Pose Optimal Estimation by Structure Similarity Index Measure Integration. Int J Med Robot 2025; 21:e70078. [PMID: 40413787 DOI: 10.1002/rcs.70078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 04/24/2025] [Accepted: 05/16/2025] [Indexed: 05/27/2025]
Abstract
BACKGROUND Accurate endoscopic camera pose estimation is crucial for real-time AR navigation systems. While current methods primarily use depth and optical flow, they often ignore structural inconsistencies between images. METHODS Leveraging the RAFT framework, we process sequential stereo RGB pairs to extract optical flow and depth features for pose estimation. To address structural inconsistencies, we refine the weights for both 2D and 3D residuals by computing SSIM indices for the left and right views, as well as pre- and post-optical flow transformations. The SSIM metric is also used in the loss function. RESULTS Experiments on the StereoMIS dataset demonstrate our method's improved pose estimation accuracy compared to rigid SLAM methods, showing a lower accumulated trajectory error (ATE-RMSE: 18.5 mm). Additionally, ablation experiments achieved an 11.49% reduction in average error. CONCLUSION The pose estimation accuracy has been improved by incorporating SSIM. The code is available at: https://github.com/lianrq/pose-estimation-by-SSIM-Integration.
Collapse
Affiliation(s)
- Ruoqi Lian
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Li
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Faculty of Data Science, City University of Macau, Macau, China
| | - Junchen Hao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Software College, Northeastern University, Shenyang, China
| | - Yanfang Zhang
- Department of Interventional Radiology, Shenzhen People's Hospital, The Second Clinical Medical College, Jinan University, Shenzhen, China
| | - Fucang Jia
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- The Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
25
|
Zotova D, Pinon N, Trombetta R, Bouet R, Jung J, Lartizien C. GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108727. [PMID: 40187100 DOI: 10.1016/j.cmpb.2025.108727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 02/13/2025] [Accepted: 03/14/2025] [Indexed: 04/07/2025]
Abstract
BACKGROUND AND OBJECTIVE Research in the cross-modal medical image translation domain has been very productive over the past few years in tackling the scarce availability of large curated multi-modality datasets with the promising performance of GAN-based architectures. However, only a few of these studies assessed task-based related performance of these synthetic data, especially for the training of deep models. METHODS We design and compare different GAN-based frameworks for generating synthetic brain[18F]fluorodeoxyglucose (FDG) PET images from T1 weighted MRI data. We first perform standard qualitative and quantitative visual quality evaluation. Then, we explore further impact of using these fake PET data in the training of a deep unsupervised anomaly detection (UAD) model designed to detect subtle epilepsy lesions in T1 MRI and FDG PET images. We introduce novel diagnostic task-oriented quality metrics of the synthetic FDG PET data tailored to our unsupervised detection task, then use these fake data to train a use case UAD model combining a deep representation learning based on siamese autoencoders with a OC-SVM density support estimation model. This model is trained on normal subjects only and allows the detection of any variation from the pattern of the normal population. We compare the detection performance of models trained on 35 paired real MR T1 of normal subjects paired either on 35 true PET images or on 35 synthetic PET images generated from the best performing generative models. Performance analysis is conducted on 17 exams of epilepsy patients undergoing surgery. RESULTS The best performing GAN-based models allow generating realistic fake PET images of control subject with SSIM and PSNR values around 0.9 and 23.8, respectively and in distribution (ID) with regard to the true control dataset. The best UAD model trained on these synthetic normative PET data allows reaching 74% sensitivity. CONCLUSION Our results confirm that GAN-based models are the best suited for MR T1 to FDG PET translation, outperforming transformer or diffusion models. We also demonstrate the diagnostic value of these synthetic data for the training of UAD models and evaluation on clinical exams of epilepsy patients. Our code and the normative image dataset are available.
Collapse
Affiliation(s)
- Daria Zotova
- INSA Lyon, Université Claude Bernard Lyon 1, CNRS, Inserm, CREATIS UMR 5220, U1294, Lyon, F-69621, France
| | - Nicolas Pinon
- INSA Lyon, Université Claude Bernard Lyon 1, CNRS, Inserm, CREATIS UMR 5220, U1294, Lyon, F-69621, France
| | - Robin Trombetta
- INSA Lyon, Université Claude Bernard Lyon 1, CNRS, Inserm, CREATIS UMR 5220, U1294, Lyon, F-69621, France
| | - Romain Bouet
- Lyon Neuroscience Research Center, INSERM U1028, CNRS UMR5292, Univ Lyon 1, Bron, 69500, France
| | - Julien Jung
- Lyon Neuroscience Research Center, INSERM U1028, CNRS UMR5292, Univ Lyon 1, Bron, 69500, France
| | - Carole Lartizien
- INSA Lyon, Université Claude Bernard Lyon 1, CNRS, Inserm, CREATIS UMR 5220, U1294, Lyon, F-69621, France.
| |
Collapse
|
26
|
Peng K, Huang D, Chen Y. Retinal OCT image classification based on MGR-GAN. Med Biol Eng Comput 2025; 63:1749-1763. [PMID: 39862318 DOI: 10.1007/s11517-025-03286-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 12/31/2024] [Indexed: 01/27/2025]
Abstract
Accurately classifying optical coherence tomography (OCT) images is essential for diagnosing and treating ophthalmic diseases. This paper introduces a novel generative adversarial network framework called MGR-GAN. The masked image modeling (MIM) method is integrated into the GAN model's generator, enhancing its ability to synthesize more realistic images by reconstructing them based on unmasked patches. A ResNet-structured discriminator is employed to determine whether the image is generated by the generator. Through the unique game process of the generative adversarial network (GAN) model, the discriminator acquires high-level discriminant features, essential for precise OCT classification. Experimental results demonstrate that MGR-GAN achieves a classification accuracy of 98.4% on the original UCSD dataset. As the trained generator can synthesize OCT images with higher precision, and owing to category imbalances in the UCSD dataset, the generated OCT images are leveraged to address this imbalance. After balancing the UCSD dataset, the classification accuracy further improves to 99%.
Collapse
Affiliation(s)
- Kun Peng
- School of Automation and Information Engineering, Sichuan University of Science & Engineering, Key Laboratory of Artificial Intelligence, Yibin, 644000, Sichuan, China
| | - Dan Huang
- School of Automation and Information Engineering, Sichuan University of Science & Engineering, Key Laboratory of Artificial Intelligence, Yibin, 644000, Sichuan, China.
| | - Yurong Chen
- School of Automation and Information Engineering, Sichuan University of Science & Engineering, Key Laboratory of Artificial Intelligence, Yibin, 644000, Sichuan, China
| |
Collapse
|
27
|
Zhang Y, Tian H, Wan M, Tang S, Ding Z, Huang W, Yang Y, Li W. High resolution photoacoustic vascular image reconstruction through the fast residual dense generative adversarial network. PHOTOACOUSTICS 2025; 43:100720. [PMID: 40241881 PMCID: PMC12000740 DOI: 10.1016/j.pacs.2025.100720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 03/17/2025] [Accepted: 03/31/2025] [Indexed: 04/18/2025]
Abstract
Photoacoustic imaging is a powerful technique that provides high-resolution, deep tissue imaging. However, the time-intensive nature of photoacoustic microscopy (PAM) poses a significant challenge, especially when high-resolution images are required for real-time applications. In this study, we proposed an optimized Fast Residual Dense Generative Adversarial Network (FRDGAN) for high-quality PAM reconstruction. Through dataset validation on mouse ear vasculature, FRDGAN demonstrated superior performance in image quality, background noise suppression, and computational efficiency across multiple down-sampling scales (×4, ×8) compared to classical methods. Furthermore, in the in vivo experiments of mouse cerebral vasculature, FRDGAN achieves the improvement of 2.24 dB and 0.0255 in peak signal-to-noise ratio and structural similarity metrics in contrast to SRGAN, respectively. Our FRDGAN method provides a promising solution for fast, high-quality PAM microvascular imaging in biomedical research.
Collapse
Affiliation(s)
- Yameng Zhang
- School of Computer Engineering, Nanjing Institute of Technology, Nanjing, Jiangsu 211167, China
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
| | - Hua Tian
- School of Computer Engineering, Nanjing Institute of Technology, Nanjing, Jiangsu 211167, China
| | - Min Wan
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
| | - Shihao Tang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
| | - Ziyun Ding
- School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Wei Huang
- School of Computer Engineering, Nanjing Institute of Technology, Nanjing, Jiangsu 211167, China
| | - Yamin Yang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
| | - Weitao Li
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
| |
Collapse
|
28
|
He Y, Ruan D. An implicit neural deformable ray model for limited and sparse view-based spatiotemporal reconstruction. Med Phys 2025; 52:3959-3969. [PMID: 40038095 DOI: 10.1002/mp.17714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 01/30/2025] [Accepted: 01/30/2025] [Indexed: 03/06/2025] Open
Abstract
BACKGROUND Continuous spatiotemporal volumetric reconstruction is highly valuable, especially in radiation therapy, where tracking and calculating actual exposure during a treatment session is critical. This allows for accurate analysis of treatment outcomes, including patient response and toxicity in relation to delivered doses. However, continuous 4D imaging during radiotherapy is often unavailable due to radiation exposure concerns and hardware limitations. Most setups are limited to acquiring intermittent portal projections or images between treatment beams. PURPOSE This study addresses the challenge of spatiotemporal reconstruction from limited views by reconstructing patient-specific volume with as low as 20 input views and continuous-time dynamic volumes from only two orthogonal x-ray projections. METHODS We introduce a novel implicit neural deformable ray (INDeR) model that uses a ray bundle coordinate system, embedding sparse view measurements into an implicit neural field. This method estimates real-time motion via efficient low-dimensional modulation, allowing for the deformation of ray bundles based on just two orthogonal x-ray projections. RESULTS The INDeR model demonstrates robust performance in image reconstruction and motion tracking, offering detailed visualization of structures like tumors and bronchial passages. With just 20 projection views, INDeR achieves a peak signal-to-noise ratio (PSNR) of 30.13 dB, outperforming methods such as FDK, PWLS-TV, and NAF by 13.93, 4.07, and 3.16 dB, respectively. When applied in real-time, the model consistently delivers a PSNR higher than 27.41 dB using only two orthogonal projections. CONCLUSION The proposed INDeR framework successfully reconstructs continuous spatiotemporal representations from sparse views, achieving highly accurate reconstruction with as few as 20 projections and effective tracking with two orthogonal views in real-time. This approach shows great potential for anatomical monitoring in radiation therapy.
Collapse
Affiliation(s)
- Yuanwei He
- Department of Radiation Oncology, University of California Los Angeles, Los Angeles, California, USA
| | - Dan Ruan
- Department of Radiation Oncology, University of California Los Angeles, Los Angeles, California, USA
- Department of Bioengineering, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
29
|
Yu B, Ozdemir S, Dong Y, Shao W, Pan T, Shi K, Gong K. Robust whole-body PET image denoising using 3D diffusion models: evaluation across various scanners, tracers, and dose levels. Eur J Nucl Med Mol Imaging 2025; 52:2549-2562. [PMID: 39912940 PMCID: PMC12119227 DOI: 10.1007/s00259-025-07122-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 01/27/2025] [Indexed: 02/07/2025]
Abstract
PURPOSE Whole-body PET imaging plays an essential role in cancer diagnosis and treatment but suffers from low image quality. Traditional deep learning-based denoising methods work well for a specific acquisition but are less effective in handling diverse PET protocols. In this study, we proposed and validated a 3D Denoising Diffusion Probabilistic Model (3D DDPM) as a robust and universal solution for whole-body PET image denoising. METHODS The proposed 3D DDPM gradually injected noise into the images during the forward diffusion phase, allowing the model to learn to reconstruct the clean data during the reverse diffusion process. A 3D convolutional network was trained using high-quality data from the Biograph Vision Quadra PET/CT scanner to generate the score function, enabling the model to capture accurate PET distribution information extracted from the total-body datasets. The trained 3D DDPM was evaluated on datasets from four scanners, four tracer types, and six dose levels representing a broad spectrum of clinical scenarios. RESULTS The proposed 3D DDPM consistently outperformed 2D DDPM, 3D UNet, and 3D GAN, demonstrating its superior denoising performance across all tested conditions. Additionally, the model's uncertainty maps exhibited lower variance, reflecting its higher confidence in its outputs. CONCLUSIONS The proposed 3D DDPM can effectively handle various clinical settings, including variations in dose levels, scanners, and tracers, establishing it as a promising foundational model for PET image denoising. The trained 3D DDPM model of this work can be utilized off the shelf by researchers as a whole-body PET image denoising solution. The code and model are available at https://github.com/Miche11eU/PET-Image-Denoising-Using-3D-Diffusion-Model .
Collapse
Affiliation(s)
- Boxiao Yu
- J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA
| | - Savas Ozdemir
- Department of Radiology, University of Florida, Jacksonville, FL, USA
| | - Yafei Dong
- Yale PET Center, Yale School of Medicine, New Haven, CT, USA
| | - Wei Shao
- Department of Medicine, University of Florida, Gainesville, FL, USA
| | - Tinsu Pan
- Department of Imaging Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kuangyu Shi
- Department of Nuclear Medicine, University of Bern, Bern, Switzerland
| | - Kuang Gong
- J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
30
|
Koori N, Yamamoto S, Kamekawa H, Fuse H, Takahashi M, Miyakawa S, Sasaki K, Naruse R, Yasue K, Nosaka H, Takatsu Y, Saotome K, Kurata K. Comparison of image quality evaluation methods for magnetic resonance imaging using compressed sensing-sensitivity encoding (CS-SENSE). Radiol Phys Technol 2025; 18:597-605. [PMID: 40353935 DOI: 10.1007/s12194-025-00911-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2025] [Revised: 04/22/2025] [Accepted: 04/24/2025] [Indexed: 05/14/2025]
Abstract
This study aimed to compare the relationship between the quantitative values and visual score of acquired images using the CS-SENSE method. T1-weighted image (T1WI) and T2-weighted image (T2WI) were acquired using a phantom created by a 3D printer. Each quantitative values (signal-to-noise ratio [SNR], contrast-to-noise ratio [CNR], structural similarity [SSIM], and scale-invariant feature transform [SIFT]) and visual evaluation score (VES) were calculated by the acquired images. The correlation coefficients among the calculating quantitative values and VES were calculated. The difference in methods for evaluating the image quality of T1WI and T2WI images using CS-SENSE was clarified. Variations in image quality, as reflected by VES in T1WI and T2WI images obtained via the CS-SENSE method, can be quantitatively assessed. Specifically, CNR is effective for evaluating changes in T1WI, while SNR, CNR, and SIFT are suitable for assessing variations in T2WI.
Collapse
Affiliation(s)
- Norikazu Koori
- Department of Radiological Technology, Faculty of Medical Technology, Niigata University of Health and Welfare, 1398 Shimami-cho, Niigata city, Niigata, 950-3198, Japan.
| | - Shohei Yamamoto
- Department of Radiology, Tsuchiura Kyodo General Hospital, 4-1-1 Otsuno, Tsuchiura, Ibaraki, 300-0028, Japan
| | - Hiroki Kamekawa
- Department of Radiology, Komaki City Hospital, 1-20 Jyoubushi, Komaki, Aichi, 485-8520, Japan
| | - Hiraku Fuse
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Masato Takahashi
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Shin Miyakawa
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Kota Sasaki
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Reina Naruse
- Department of Radiology, Shinshu University School of Medicine, 3-1-1 Asahi, Matsumoto , Nagano, 390-8621, Japan
| | - Kenji Yasue
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Hiroki Nosaka
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Yasuo Takatsu
- Graduate School of Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan
| | - Kosaku Saotome
- Department of Radiological Sciences, Ibaraki Prefectural University of Health Sciences, 4669-2 Ami, Ibaraki, 300-0394, Japan
| | - Kazuma Kurata
- Department of Radiology, Komaki City Hospital, 1-20 Jyoubushi, Komaki, Aichi, 485-8520, Japan
| |
Collapse
|
31
|
Whitbread L, Laurenz S, Palmer LJ, Jenkinson M, The Alzheimer's Disease Neuroimaging Initiative. Deep-Diffeomorphic Networks for Conditional Brain Templates. Hum Brain Mapp 2025; 46:e70229. [PMID: 40372124 PMCID: PMC12079767 DOI: 10.1002/hbm.70229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 03/06/2025] [Accepted: 04/20/2025] [Indexed: 05/16/2025] Open
Abstract
Deformable brain templates are an important tool in many neuroimaging analyses. Conditional templates (e.g., age-specific templates) have advantages over single population templates by enabling improved registration accuracy and capturing common processes in brain development and degeneration. Conventional methods require large, evenly spread cohorts to develop conditional templates, limiting their ability to create templates that could reflect richer combinations of clinical and demographic variables. More recent deep-learning methods, which can infer relationships in very high-dimensional spaces, open up the possibility of producing conditional templates that are jointly optimised for these richer sets of conditioning parameters. We have built on recent deep-learning template generation approaches using a diffeomorphic (topology-preserving) framework to create a purely geometric method of conditional template construction that learns diffeomorphisms between: (i) a global or group template and conditional templates, and (ii) conditional templates and individual brain scans. We evaluated our method, as well as other recent deep-learning approaches, on a data set of cognitively normal (CN) participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI), using age as the conditioning parameter of interest. We assessed the effectiveness of these networks at capturing age-dependent anatomical differences. Our results demonstrate that while the assessed deep-learning methods have a number of strengths, they require further refinement to capture morphological changes in ageing brains with an acceptable degree of accuracy. The volumetric output of our method, and other recent deep-learning approaches, across four brain structures (grey matter, white matter, the lateral ventricles and the hippocampus), was measured and showed that although each of the methods captured some changes well, each method was unable to accurately track changes in all of the volumes. However, as our method is purely geometric, it was able to produce T1-weighted conditional templates with high spatial fidelity and with consistent topology as age varies, making these conditional templates advantageous for spatial registrations. The use of diffeomorphisms in these deep-learning methods represents an important strength of these approaches, as they can produce conditional templates that can be explicitly linked, geometrically, across age as well as to fixed, unconditional templates or brain atlases. The use of deep learning in conditional template generation provides a framework for creating templates for more complex sets of conditioning parameters, such as pathologies and demographic variables, in order to facilitate a broader application of conditional brain templates in neuroimaging studies. This can aid researchers and clinicians in their understanding of how brain structure changes over time and under various interventions, with the ultimate goal of improving the calibration of treatments and interventions in personalised medicine. The code to implement our conditional brain template network is available at: github.com/lwhitbread/deep-diff.
Collapse
Affiliation(s)
- Luke Whitbread
- Australian Institute for Machine Learning (AIML)The University of AdelaideAdelaideAustralia
- South Australian Health and Medical Research Institute (SAHMRI)AdelaideAustralia
- School of Computer and Mathematical SciencesThe University of AdelaideAdelaideAustralia
| | - Stephan Laurenz
- Australian Institute for Machine Learning (AIML)The University of AdelaideAdelaideAustralia
- South Australian Health and Medical Research Institute (SAHMRI)AdelaideAustralia
- School of Computer and Mathematical SciencesThe University of AdelaideAdelaideAustralia
| | - Lyle J. Palmer
- Australian Institute for Machine Learning (AIML)The University of AdelaideAdelaideAustralia
- School of Public HealthThe University of AdelaideAdelaideAustralia
| | - Mark Jenkinson
- Australian Institute for Machine Learning (AIML)The University of AdelaideAdelaideAustralia
- South Australian Health and Medical Research Institute (SAHMRI)AdelaideAustralia
- School of Computer and Mathematical SciencesThe University of AdelaideAdelaideAustralia
| | | |
Collapse
|
32
|
Liu Q, Zhang W, Zhang Y, Han X, Lin Y, Li X, Chen K. DGEDDGAN: A dual-domain generator and edge-enhanced dual discriminator generative adversarial network for MRI reconstruction. Magn Reson Imaging 2025; 119:110381. [PMID: 40064245 DOI: 10.1016/j.mri.2025.110381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 01/08/2025] [Accepted: 03/05/2025] [Indexed: 03/14/2025]
Abstract
Magnetic resonance imaging (MRI) as a critical clinical tool in medical imaging, requires a long scan time for producing high-quality MRI images. To accelerate the speed of MRI while reconstructing high-quality images with sharper edges and fewer aliases, a novel dual-domain generator and edge-enhancement dual discriminator generative adversarial network structure named DGEDDGAN for MRI reconstruction is proposed, in which one discriminator is responsible for holistic image reconstruction, whereas the other is adopted to enhance the edge preservation. A dual-domain U-Net structure that cascades the frequency domain and image domain is designed for the generator. The densely connected residual block is used to replace the traditional U-Net convolution block to improve the feature reuse capability while overcoming the gradient vanishing problem. The coordinate attention mechanism in each skip connection is employed to effectively reduce the loss of spatial information and enforce the feature selection capability. Extensive experiments on two publicly available datasets i.e., IXI dataset and CC-359, demonstrate that the proposed method can reconstruct the high-quality MRI images with more edge details and fewer artifacts, outperforming several state-of-the-art methods under various sampling rates and masks. The time of single-image reconstruction is below 13 ms, which meets the demand of faster processing.
Collapse
Affiliation(s)
- Qiaohong Liu
- School of Medical Instruments, Shanghai University of Medicine and Health Sciences, Shanghai, China.
| | - Weikun Zhang
- School of Medical Instruments, Shanghai University of Medicine and Health Sciences, Shanghai, China; School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Yuting Zhang
- ToolSensing Technologies Co., Ltd AI Technology Research Group, Chengdu, China
| | - Xiaoxiang Han
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Yuanjie Lin
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Xinyu Li
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Keyan Chen
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
33
|
Ito T, Hitomi K, Ljungberg M, Kawasaki S, Katayama Y, Kato A, Tsuchikame H, Suzuki K, Miyazaki K, Mogi R. Accuracy of a whole-body single-photon emission computed tomography with a thallium-bromide detector: Verification via Monte Carlo simulations. Med Phys 2025; 52:4079-4095. [PMID: 40017160 DOI: 10.1002/mp.17724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 01/17/2025] [Accepted: 02/12/2025] [Indexed: 03/01/2025] Open
Abstract
BACKGROUND Single-photon emission computed tomography (SPECT) devices equipped with cadmium-zinc-telluride (CZT) detectors achieve high contrast resolution because of their enhanced energy resolution. Recently, thallium bromide (TlBr) has gained attention as a detector material because of its high atomic number and density. PURPOSE This study evaluated the clinical applicability of a SPECT system equipped with TlBr detectors using Monte Carlo simulations, focusing on 99mTc and 177Lu imaging. METHODS This study used the Simulation of Imaging Nuclear Detectors Monte Carlo program to compare the imaging characteristics between a whole-body SPECT system equipped with TlBr (T-SPECT) and a system equipped with CZT detectors (C-SPECT). The simulations were performed using a three-dimensional brain phantom and a National Electrical Manufacturers Association body phantom to evaluate 99mTc and 177Lu imaging. The simulation parameters were accurately set by comparing them with the actual measurements. RESULTS The T-SPECT system demonstrated improved energy resolution and higher detection efficiency than the C-SPECT system. In 99mTc imaging, T-SPECT demonstrated 1.71 times higher photopeak counts and improved contrast resolution. T-SPECT exhibited a significantly lower impact of hole tailing and higher-energy resolution (4.50% for T-SPECT vs. 7.34% for C-SPECT). Furthermore, T-SPECT showed higher peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) values, indicating better image quality. In 177Lu imaging, T-SPECT showed 2.76 times higher photopeak counts and improved energy resolution (3.94% for T-SPECT vs. 5.20% for C-SPECT). T-SPECT demonstrated a higher contrast recovery coefficient (CRC) and contrast-to-noise ratio (CNR) across all acquisition times, maintaining sufficient counts even with shorter acquisition times. Moreover, T-SPECT acquired higher low-frequency values in power spectrum density (PSD), indicating more accurate internal image reproduction. CONCLUSIONS T-SPECT offers superior energy resolution and detection efficiency than C-SPECT. Moreover, T-SPECT can provide higher contrast resolution and sensitivity in clinical imaging with 99mTc and 177Lu. Furthermore, the Monte Carlo simulations are confirmed to be a valuable guide for the development of T-SPECT.
Collapse
Affiliation(s)
- Toshimune Ito
- Department of Medical Radiological, Faculty of Medical Technology, Teikyo University, Tokyo, Japan
| | - Keitaro Hitomi
- Department of Quantum Science and Energy Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan
| | | | - Sousei Kawasaki
- Department of Radiology, Nippon Medical School Hospital, Tokyo, Japan
| | - Yuka Katayama
- Department of Radiological Technology, Showa University Hospital, Tokyo, Japan
| | - Akane Kato
- Department of Radiology, Institute of Science Tokyo Hospital, Tokyo, Japan
| | - Hirotatsu Tsuchikame
- Department of Radiology, Saiseikai Yokohamashi Tobu Hospital, Yokohama, Kanagawa, Japan
| | - Kentaro Suzuki
- Department of Radiological Technology, Toranomon Hospital, Tokyo, Japan
| | - Kyosuke Miyazaki
- Department of Radiology, Kawasaki Municipal Kawasaki Hospital, Kawasaki, Kanagawa, Japan
| | - Ritsushi Mogi
- Department of Medical Radiological, Faculty of Medical Technology, Teikyo University, Tokyo, Japan
| |
Collapse
|
34
|
Slioussarenko C, Baudin P, Marty B. A steady-state MR fingerprinting sequence optimization framework applied to the fast 3D quantification of fat fraction and water T1 in the thigh muscles. Magn Reson Med 2025; 93:2623-2639. [PMID: 40033965 PMCID: PMC11971504 DOI: 10.1002/mrm.30490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/20/2025] [Accepted: 02/18/2025] [Indexed: 03/05/2025]
Abstract
PURPOSE The aim of this study was to develop an optimization framework to shorten GRE-based MRF sequences while keeping similar parameter estimation quality. METHODS An optimization framework taking into account steady-state initial longitudinal magnetization, undersampling artifacts, and mitigating overfitting by drawing from a realistic numerical thighs phantom database was developed and validated on numerical simulations and 10 healthy volunteers. RESULTS The sequences optimized with the proposed framework decreased the original sequence duration by 30% (8 s per repetition instead of 11.2 s) while showing improved accuracy (SSIM going up from 96% to 99% forF F $$ FF $$ , from 93% to 96% forT 1 H 2 O $$ T{1}_{H2O} $$ on numerical simulations) and precision, especially when compared with sequences optimized through other means. CONCLUSIONS The proposed framework paves the way for fast 3D quantification ofF F $$ FF $$ andT 1 H 2 O $$ T{1}_{H2O} $$ in the skeletal muscle.
Collapse
Affiliation(s)
| | - Pierre‐Yves Baudin
- Neuromuscular Investigation Center, NMR LaboratoryInstitute of MyologyParis Cedex 13France
| | - Benjamin Marty
- Neuromuscular Investigation Center, NMR LaboratoryInstitute of MyologyParis Cedex 13France
| |
Collapse
|
35
|
Hussain J, Båth M, Ivarsson J. Generative adversarial networks in medical image reconstruction: A systematic literature review. Comput Biol Med 2025; 191:110094. [PMID: 40198987 DOI: 10.1016/j.compbiomed.2025.110094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 01/12/2025] [Accepted: 03/25/2025] [Indexed: 04/10/2025]
Abstract
PURPOSE Recent advancements in generative adversarial networks (GANs) have demonstrated substantial potential in medical image processing. Despite this progress, reconstructing images from incomplete data remains a challenge, impacting image quality. This systematic literature review explores the use of GANs in enhancing and reconstructing medical imaging data. METHOD A document survey of computing literature was conducted using the ACM Digital Library to identify relevant articles from journals and conference proceedings using keyword combinations, such as "generative adversarial networks or generative adversarial network," "medical image or medical imaging," and "image reconstruction." RESULTS Across the reviewed articles, there were 122 datasets used in 175 instances, 89 top metrics employed 335 times, 10 different tasks with a total count of 173, 31 distinct organs featured in 119 instances, and 18 modalities utilized in 121 instances, collectively depicting significant utilization of GANs in medical imaging. The adaptability and efficacy of GANs were showcased across diverse medical tasks, organs, and modalities, utilizing top public as well as private/synthetic datasets for disease diagnosis, including the identification of conditions like cancer in different anatomical regions. The study emphasized GAN's increasing integration and adaptability in diverse radiology modalities, showcasing their transformative impact on diagnostic techniques, including cross-modality tasks. The intricate interplay between network size, batch size, and loss function refinement significantly impacts GAN's performance, although challenges in training persist. CONCLUSIONS The study underscores GANs as dynamic tools shaping medical imaging, contributing significantly to image quality, training methodologies, and overall medical advancements, positioning them as substantial components driving medical advancements.
Collapse
Affiliation(s)
- Jabbar Hussain
- Dept. of Applied IT, University of Gothenburg, Forskningsgången 6, 417 56, Sweden.
| | - Magnus Båth
- Department of Medical Radiation Sciences, University of Gothenburg, Sweden
| | - Jonas Ivarsson
- Dept. of Applied IT, University of Gothenburg, Forskningsgången 6, 417 56, Sweden
| |
Collapse
|
36
|
Lu Y, Chen N. Editorial for "Incorporating Radiologist Knowledge Into MRI Quality Metrics for Machine Learning Using Rank-Based Ratings". J Magn Reson Imaging 2025; 61:2585-2586. [PMID: 39902710 DOI: 10.1002/jmri.29728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/17/2025] [Accepted: 01/21/2025] [Indexed: 02/06/2025] Open
Affiliation(s)
- Yao Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, China
| | - Ninghao Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
37
|
Li Z, Zhang X, Li G, Peng J, Su X. Light scattering imaging modal expansion cytometry for label-free single-cell analysis with deep learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 264:108726. [PMID: 40112688 DOI: 10.1016/j.cmpb.2025.108726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 03/02/2025] [Accepted: 03/14/2025] [Indexed: 03/22/2025]
Abstract
BACKGROUND AND OBJECTIVE Single-cell imaging plays a key role in various fields, including drug development, disease diagnosis, and personalized medicine. To obtain multi-modal information from a single-cell image, especially for label-free cells, this study develops modal expansion cytometry for label-free single-cell analysis. METHODS The study utilizes a deep learning-based architecture to expand single-mode light scattering images into multi-modality images, including bright-field (non-fluorescent) and fluorescence images, for label-free single-cell analysis. By combining adversarial loss, L1 distance loss, and VGG perceptual loss, a new network optimization method is proposed. The effectiveness of this method is verified by experiments on simulated images, standard spheres of different sizes, and multiple cell types (such as cervical cancer and leukemia cells). Additionally, the capability of this method in single-cell analysis is assessed through multi-modal cell classification experiments, such as cervical cancer subtypes. RESULTS This is demonstrated by using both cervical cancer cells and leukemia cells. The expanded bright-field and fluorescence images derived from the light scattering images align closely with those obtained through conventional microscopy, showing a contour ratio near 1 for both the whole cell and its nucleus. Using machine learning, the subtyping of cervical cancer cells achieved 92.85 % accuracy with the modal expansion images, which represents an improvement of nearly 20 % over single-mode light scattering images. CONCLUSIONS This study demonstrates the light scattering imaging modal expansion cytometry with deep learning has the capability to expand the single-mode light scattering image into the artificial multimodal images of label-free single cells, which not only provides the visualization of cells but also helps for the cell classification, showing great potential in the field of single-cell analysis such as cancer cell diagnosis.
Collapse
Affiliation(s)
- Zhi Li
- School of Integrated Circuits, Shandong University, Jinan 250101, China; Institute of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Xiaoyu Zhang
- Department of Hematology, Qilu Hospital of Shandong University, Jinan 250012, China
| | - Guosheng Li
- Department of Hematology, Qilu Hospital of Shandong University, Jinan 250012, China
| | - Jun Peng
- Department of Hematology, Qilu Hospital of Shandong University, Jinan 250012, China
| | - Xuantao Su
- School of Integrated Circuits, Shandong University, Jinan 250101, China.
| |
Collapse
|
38
|
Xu Q, Ma Y, Lu Z, Bi K. DP-ID: Interleaving and Denoising to Improve the Quality of DNA Storage Image. Interdiscip Sci 2025; 17:306-320. [PMID: 39578306 DOI: 10.1007/s12539-024-00671-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 10/22/2024] [Accepted: 10/23/2024] [Indexed: 11/24/2024]
Abstract
In the field of storing images into DNA, the code tables and universal error correction codes have the potential to mitigate the effect of base errors to a certain extent. However, they prove to be ineffective in dealing with indels (insertion and deletion errors), resulting in a decline in information density and the quality of reconstructed image. This paper proposes a novel encoding and decoding method named DP-ID for storing images into DNA that improves information density and the quality of reconstructed image. Firstly, the image is compressed as bitstreams by the dynamic programming algorithm. Secondly, the bitstreams obtained are mapped to DNA, which are then interleaved. The reconstructed image is obtained by applying median filtering to remove salt-and-pepper noise. Simulation results show the reconstructed image by DP-ID at 5% error rate is better than that by other methods at 1% error rate. This robustness to high errors is compatible with the unsatisfied biological constraints caused by high information density. Wet experiments show that DP-ID can reconstruct high quality image at 5X sequencing depth. The high information density and low sequencing depth significantly reduce the cost of DNA storage, facilitating the large-scale storage of images into DNA.
Collapse
Affiliation(s)
- Qi Xu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Yitong Ma
- Monash University Joint Graduate School, Southeast University, Suzhou, 215123, China
| | - Zuhong Lu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Kun Bi
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
| |
Collapse
|
39
|
Wang S, Chu T, Wasi M, Guerra RM, Yuan X, Wang L. Prognostic assessment of osteolytic lesions and mechanical properties of bones bearing breast cancer using neural network and finite element analysis ☆. MECHANOBIOLOGY IN MEDICINE 2025; 3:100130. [PMID: 40395772 PMCID: PMC12067881 DOI: 10.1016/j.mbm.2025.100130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2024] [Revised: 03/13/2025] [Accepted: 03/25/2025] [Indexed: 05/22/2025]
Abstract
The management of skeletal-related events (SREs), particularly the prevention of pathological fractures, is crucial for cancer patients. Current clinical assessment of fracture risk is mostly based on medical images, but incorporating sequential images in the assessment remains challenging. This study addressed this issue by leveraging a comprehensive dataset consisting of 260 longitudinal micro-computed tomography (μCT) scans acquired in normal and breast cancer bearing mice. A machine learning (ML) model based on a spatial-temporal neural network was built to forecast bone structures from previous μCT scans, which were found to have an overall similarity coefficient (Dice) of 0.814 with ground truths. Despite the predicted lesion volumes (18.5 % ± 15.3 %) being underestimated by ∼21 % than the ground truths' (22.1 % ± 14.8 %), the time course of the lesion growth was better represented in the predicted images than the preceding scans (10.8 % ± 6.5 %). Under virtual biomechanical testing using finite element analysis (FEA), the predicted bone structures recapitulated the loading carrying behaviors of the ground truth structures with a positive correlation (y = 0.863x) and a high coefficient of determination (R2 = 0.955). Interestingly, the compliances of the predicted and ground truth structures demonstrated nearly identical linear relationships with the lesion volumes. In summary, we have demonstrated that bone deterioration could be proficiently predicted using machine learning in our preclinical dataset, suggesting the importance of large longitudinal clinical imaging datasets in fracture risk assessment for cancer bone metastasis.
Collapse
Affiliation(s)
- Shubo Wang
- Department of Mechanical Engineering, University of Delaware, Newark, DE 19716, USA
| | - Tiankuo Chu
- Department of Mechanical Engineering, University of Delaware, Newark, DE 19716, USA
| | - Murtaza Wasi
- Department of Mechanical Engineering, University of Delaware, Newark, DE 19716, USA
| | - Rosa M. Guerra
- Department of Biomedical Engineering, University of Delaware, Newark, DE 19716, USA
| | - Xu Yuan
- Department of Computer Science, University of Delaware, Newark, DE 19716, USA
| | - Liyun Wang
- Department of Mechanical Engineering, University of Delaware, Newark, DE 19716, USA
| |
Collapse
|
40
|
Tang C, Eisenmenger LB, Rivera‐Rivera L, Huo E, Junn JC, Kuner AD, Oechtering TH, Peret A, Starekova J, Johnson KM. Incorporating Radiologist Knowledge Into MRI Quality Metrics for Machine Learning Using Rank-Based Ratings. J Magn Reson Imaging 2025; 61:2572-2584. [PMID: 39690114 PMCID: PMC12063763 DOI: 10.1002/jmri.29672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 11/18/2024] [Accepted: 11/19/2024] [Indexed: 12/19/2024] Open
Abstract
BACKGROUND Deep learning (DL) often requires an image quality metric; however, widely used metrics are not designed for medical images. PURPOSE To develop an image quality metric that is specific to MRI using radiologists image rankings and DL models. STUDY TYPE Retrospective. POPULATION A total of 19,344 rankings on 2916 unique image pairs from the NYU fastMRI Initiative neuro database was used for the neural network-based image quality metrics training with an 80%/20% training/validation split and fivefold cross-validation. FIELD STRENGTH/SEQUENCE 1.5 T and 3 T T1, T1 postcontrast, T2, and FLuid Attenuated Inversion Recovery (FLAIR). ASSESSMENT Synthetically corrupted image pairs were ranked by radiologists (N = 7), with a subset also scoring images using a Likert scale (N = 2). DL models were trained to match rankings using two architectures (EfficientNet and IQ-Net) with and without reference image subtraction and compared to ranking based on mean squared error (MSE) and structural similarity (SSIM). Image quality assessing DL models were evaluated as alternatives to MSE and SSIM as optimization targets for DL denoising and reconstruction. STATISTICAL TESTS Radiologists' agreement was assessed by a percentage metric and quadratic weighted Cohen's kappa. Ranking accuracies were compared using repeated measurements analysis of variance. Reconstruction models trained with IQ-Net score, MSE and SSIM were compared by paired t test. P < 0.05 was considered significant. RESULTS Compared to direct Likert scoring, ranking produced a higher level of agreement between radiologists (70.4% vs. 25%). Image ranking was subjective with a high level of intraobserver agreement ( 94.9 % ± 2.4 % ) and lower interobserver agreement ( 61.47 % ± 5.51 % ). IQ-Net and EfficientNet accurately predicted rankings with a reference image ( 75.2 % ± 1.3 % and 79.2 % ± 1.7 % ). However, EfficientNet resulted in images with artifacts and high MSE when used in denoising tasks while IQ-Net optimized networks performed well for both denoising and reconstruction tasks. DATA CONCLUSION Image quality networks can be trained from image ranking and used to optimize DL tasks. LEVEL OF EVIDENCE 3 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Chenwei Tang
- Department of Medical PhysicsUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| | - Laura B. Eisenmenger
- Department of RadiologyUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| | - Leonardo Rivera‐Rivera
- Department of Medical PhysicsUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
- Department of MedicineUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| | - Eugene Huo
- Department of RadiologyUniversity of CaliforniaSan FranciscoCaliforniaUSA
| | - Jacqueline C. Junn
- Department of RadiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Anthony D. Kuner
- Department of RadiologyUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| | - Thekla H. Oechtering
- Department of RadiologyUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
- Department of Radiology and Nuclear MedicineUniversität zu LübeckLübeckGermany
| | - Anthony Peret
- Department of RadiologyUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| | - Jitka Starekova
- Department of RadiologyUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| | - Kevin M. Johnson
- Department of Medical PhysicsUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
- Department of RadiologyUniversity of Wisconsin School of Medicine and Public HealthMadisonWisconsinUSA
| |
Collapse
|
41
|
Ye X, Ma X, Pan Z, Zhang Z, Guo H, Uğurbil K, Wu X. Denoising complex-valued diffusion MR images using a two-step, nonlocal principal component analysis approach. Magn Reson Med 2025; 93:2473-2487. [PMID: 40079233 PMCID: PMC11980993 DOI: 10.1002/mrm.30502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 01/17/2025] [Accepted: 02/25/2025] [Indexed: 03/14/2025]
Abstract
PURPOSE To propose a two-step, nonlocal principal component analysis (PCA) method and demonstrate its utility for denoising complex diffusion MR images with a few diffusion directions. METHODS A two-step denoising pipeline was implemented to ensure accurate patch selection even with high noise levels and was coupled with data preprocessing for g-factor normalization and phase stabilization before data denoising with a nonlocal PCA algorithm. At the heart of our proposed pipeline was the use of a data-driven optimal shrinkage algorithm to manipulate the singular values in a way that would optimally estimate the noise-free signal. Our approach's denoising performances were evaluated using simulation and in vivo human data experiments. The results were compared with those obtained with existing local PCA-based methods. RESULTS In both simulation and human data experiments, our approach substantially enhanced image quality relative to the noisy counterpart, yielding improved performances for estimation of relevant diffusion tensor imaging metrics. It also outperformed existing local PCA-based methods in reducing noise while preserving anatomic details. It also led to improved whole-brain tractography relative to the noisy counterpart. CONCLUSION The proposed denoising method has the utility for improving image quality for diffusion MRI with a few diffusion directions and is believed to benefit many applications, especially those aiming to achieve high-quality parametric mapping using only a few image volumes.
Collapse
Affiliation(s)
- Xinyu Ye
- Center for Biomedical Imaging Research, School of Biomedical Engineering, Tsinghua University, Beijing, China
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford
| | - Xiaodong Ma
- Department of Radiology and Imaging Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Ziyi Pan
- Center for Biomedical Imaging Research, School of Biomedical Engineering, Tsinghua University, Beijing, China
| | - Zhe Zhang
- Tiantan Neuroimaging Center of Excellence, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Hua Guo
- Center for Biomedical Imaging Research, School of Biomedical Engineering, Tsinghua University, Beijing, China
| | - Kamil Uğurbil
- Center for Magnetic Resonance Research, Radiology, Medical School, University of Minnesota, Minneapolis, Minnesota
| | - Xiaoping Wu
- Center for Magnetic Resonance Research, Radiology, Medical School, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
42
|
Liu CK, Chang HY, Huang HM. Dual-energy CT-based virtual monoenergetic imaging via unsupervised learning. Phys Eng Sci Med 2025:10.1007/s13246-025-01560-y. [PMID: 40448904 DOI: 10.1007/s13246-025-01560-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/13/2025] [Indexed: 06/02/2025]
Abstract
Since its development, virtual monoenergetic imaging (VMI) derived from dual-energy computed tomography (DECT) has been shown to be valuable in many clinical applications. However, DECT-based VMI showed increased noise at low keV levels. In this study, we proposed an unsupervised learning method to generate VMI from DECT. This means that we don't require training and labeled (i.e. high-quality VMI) data. Specifically, DECT images were fed into a deep learning (DL) based model expected to output VMI. Based on the theory that VMI obtained from image space data is a linear combination of DECT images, we used the model output (i.e. the predicted VMI) to recalculate DECT images. By minimizing the difference between the measured and recalculated DECT images, the DL-based model can be constrained itself to generate VMI from DECT images. We investigate whether the proposed DL-based method has the ability to improve the quality of VMIs. The experimental results obtained from patient data showed that the DL-based VMIs had better image quality than the conventional DECT-based VMIs. Moreover, the CT number differences between the DECT-based and DL-based VMIs were distributed within ± 10 HU for bone and ± 5 HU for brain, fat, and muscle. Except for bone, no statistically significant difference in CT number measurements was found between the DECT-based and DL-based VMIs (p > 0.01). Our preliminary results show that DL has the potential to unsupervisedly generate high-quality VMIs directly from DECT.
Collapse
Affiliation(s)
- Chi-Kuang Liu
- Department of Medical Imaging, Changhua Christian Hospital, 135 Nanxiao St., Changhua, 500, Taiwan
| | - Hui-Yu Chang
- Institute of Medical Device and Imaging, College of Medicine, National Taiwan University, No.1, Sec. 1, Jen Ai Rd., Zhongzheng Dist., Taipei City, 100, Taiwan
| | - Hsuan-Ming Huang
- Institute of Medical Device and Imaging, College of Medicine, National Taiwan University, No.1, Sec. 1, Jen Ai Rd., Zhongzheng Dist., Taipei City, 100, Taiwan.
- Program for Precision Health and Intelligent Medicine, Graduate School of Advanced Technology, National Taiwan University, No.1, Sec. 1, Jen Ai Rd., Zhongzheng Dist., Taipei City, 100, Taiwan.
| |
Collapse
|
43
|
Eldeen S, Ramirez AFG, Keresteci B, Chang PD, Botvinick EL. Label-Free Prediction of Fluorescently Labeled Fibrin Networks. Biomater Res 2025; 29:0211. [PMID: 40438124 PMCID: PMC12117218 DOI: 10.34133/bmr.0211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 04/07/2025] [Accepted: 04/26/2025] [Indexed: 06/01/2025] Open
Abstract
While fluorescent labeling has been the standard for visualizing fibers within fibrillar scaffold models of the extracellular matrix (ECM), the use of fluorescent dyes can compromise cell viability and photobleach prematurely. The intricate fibrillar composition of ECM is crucial for its viscoelastic properties, which regulate intracellular signaling and provide structural support for cells. Naturally derived biomaterials such as fibrin and collagen replicate these fibrillar structures, but longitudinal confocal imaging of fibers using fluorescent dyes may impact cell function and photobleach the sample long before termination of the experiment. An alternative technique is reflection confocal microscopy (RCM) that provides high-resolution images of fibers. However, RCM is sensitive to fiber orientation relative to the optical axis, and consequently, many fibers are not detected. We aim to recover these fibers. Here, we propose a deep learning tool for predicting fluorescently labeled optical sections from unlabeled image stacks. Specifically, our model is conditioned to reproduce fluorescent labeling using RCM images at 3 laser wavelengths and a single laser transmission image. The model is implemented using a fully convolutional image-to-image mapping architecture with a hybrid loss function that includes both low-dimensional statistical and high-dimensional structural components. Upon convergence, the proposed method accurately recovers 3-dimensional fibrous architecture without substantial differences in fiber length or fiber count. However, the predicted fibers were slightly wider than original fluorescent labels (0.213 ± 0.009 μm). The model can be implemented on any commercial laser scanning microscope, providing wide use in the study of ECM biology.
Collapse
Affiliation(s)
- Sarah Eldeen
- Department of Mathematical, Computational, and Systems Biology,
University of California, Irvine, Irvine, CA, USA
| | - Andres Felipe Guerrero Ramirez
- Department of Mathematical, Computational, and Systems Biology,
University of California, Irvine, Irvine, CA, USA
- Department of Radiological Sciences and Computer Sciences,
University of California, Irvine, Irvine, CA, USA
| | - Bora Keresteci
- Department of Biomedical Engineering,
University of California, Irvine, Irvine, CA, USA
| | - Peter D. Chang
- Department of Radiological Sciences and Computer Sciences,
University of California, Irvine, Irvine, CA, USA
| | - Elliot L. Botvinick
- Department of Biomedical Engineering,
University of California, Irvine, Irvine, CA, USA
- Beckman Laser Institute and Medical Clinic,
University of California, Irvine, Irvine, CA, USA
- Edwards Lifesciences Foundation Cardiovascular Innovation and Research Center,
University of California, Irvine, Irvine, CA, USA
- Department of Surgery,
University of California, Irvine, Irvine, CA, USA
| |
Collapse
|
44
|
Read ML, Hodgetts CJ, Lawrence AD, Evans CJ, Singh KD, Umla-Runge K, Graham KS. Multimodal MEG and Microstructure-MRI Investigations of the Human Hippocampal Scene Network. J Neurosci 2025; 45:e1700242025. [PMID: 40228895 PMCID: PMC12121706 DOI: 10.1523/jneurosci.1700-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 02/28/2025] [Accepted: 03/03/2025] [Indexed: 04/16/2025] Open
Abstract
Although several studies have demonstrated that perceptual discrimination of complex scenes relies on an extended hippocampal posteromedial system, we currently have limited insight into the specific functional and structural properties of this system in humans. Here, combining electrophysiological (magnetoencephalography) and advanced microstructural (multishell diffusion magnetic resonance imaging; quantitative magnetization transfer) imaging in healthy human adults (30 females/10 males), we show that both theta power modulation of the hippocampus and fiber restriction/hindrance (reflecting axon packing/myelination) of the fornix (a major input/output pathway of the hippocampus) were independently related to scene, but not face, perceptual discrimination accuracy. Conversely, microstructural features of the inferior longitudinal fasciculus (a long-range occipitoanterotemporal tract) correlated with face, but not scene, perceptual discrimination accuracy. Our results provide new mechanistic insight into the neurocognitive systems underpinning complex scene discrimination, providing novel support for the idea of multiple processing streams within the human medial temporal lobe.
Collapse
Affiliation(s)
- Marie-Lucie Read
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff CF24 4HQ, United Kingdom
| | - Carl J Hodgetts
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff CF24 4HQ, United Kingdom
- Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom
| | - Andrew D Lawrence
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff CF24 4HQ, United Kingdom
- School of Philosophy, Psychology and Language Sciences, The University of Edinburgh, Edinburgh EH8 9JZ, United Kingdom
| | - C John Evans
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff CF24 4HQ, United Kingdom
| | - Krish D Singh
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff CF24 4HQ, United Kingdom
| | - Katja Umla-Runge
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff CF24 4HQ, United Kingdom
- School of Medicine, Cardiff University, Cardiff CF14 4XN, United Kingdom
| | - Kim S Graham
- School of Philosophy, Psychology and Language Sciences, The University of Edinburgh, Edinburgh EH8 9JZ, United Kingdom
| |
Collapse
|
45
|
Wyrzykowska M, Della Maggiora G, Deshpande N, Mokarian A, Yakimovich A. A Benchmark for Virus Infection Reporter Virtual Staining in Fluorescence and Brightfield Microscopy. Sci Data 2025; 12:886. [PMID: 40436865 PMCID: PMC12120016 DOI: 10.1038/s41597-025-05194-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 05/13/2025] [Indexed: 06/01/2025] Open
Abstract
Detecting virus-infected cells in light microscopy requires a reporter signal commonly achieved by immunohistochemistry or genetic engineering. While classification-based machine learning approaches to the detection of virus-infected cells have been proposed, their results lack the nuance of a continuous signal. Such a signal can be achieved by virtual staining. Yet, while this technique has been rapidly growing in importance, the virtual staining of virus-infected cells remains largely uncharted. In this work, we propose a benchmark and datasets to address this. We collate microscopy datasets, containing a panel of viruses of diverse biology and reporters obtained with a variety of magnifications and imaging modalities. Next, we explore the virus infection reporter virtual staining (VIRVS) task employing U-Net and pix2pix architectures as prototypical regressive and generative models. Together our work provides a comprehensive benchmark for VIRVS, as well as defines a new challenge at the interface of Data Science and Virology.
Collapse
Affiliation(s)
- Maria Wyrzykowska
- Center for Advanced Systems Understanding (CASUS), Görlitz, Germany
- Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Dresden, Germany
- Institute of Computer Science, University of Wrocław, Wrocław, Poland
| | - Gabriel Della Maggiora
- Center for Advanced Systems Understanding (CASUS), Görlitz, Germany
- Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Dresden, Germany
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Nikita Deshpande
- Center for Advanced Systems Understanding (CASUS), Görlitz, Germany
- Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Dresden, Germany
| | - Ashkan Mokarian
- Center for Advanced Systems Understanding (CASUS), Görlitz, Germany
- Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Dresden, Germany
| | - Artur Yakimovich
- Center for Advanced Systems Understanding (CASUS), Görlitz, Germany.
- Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Dresden, Germany.
- Institute of Computer Science, University of Wrocław, Wrocław, Poland.
| |
Collapse
|
46
|
Fang K, Zhang Q, Wan C, Lv P, Yuan C. Single view generalizable 3D reconstruction based on 3D Gaussian splatting. Sci Rep 2025; 15:18468. [PMID: 40425711 PMCID: PMC12117079 DOI: 10.1038/s41598-025-03200-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 05/19/2025] [Indexed: 05/29/2025] Open
Abstract
3D Gaussian Splatting (3DGS) has become a significant research focus in recent years, particularly for 3D reconstruction and novel view synthesis under non-ideal conditions. Among these studies, tasks involving sparse input data have been further classified, with the most challenging scenario being the reconstruction of 3D structures and synthesis of novel views from a single input image. In this paper, we introduce SVG3D, a method for generalizable 3D reconstruction from a single view, based on 3DGS. We use a state-of-the-art monocular depth estimator to obtain depth maps of the scenes. These depth maps, along with the original scene images, are fed into a U-Net network, which predicts the parameters for 3D Gaussian ellipsoids corresponding to each pixel. Unlike previous work, we do not stratify the predicted 3D Gaussian ellipsoids but allow the network to learn the positioning autonomously. This design enables accurate geometric representation when rendered from the target camera view, significantly enhancing novel view synthesis accuracy. We trained our model on the RealEstate10K dataset and performed both quantitative and qualitative analysis on the test set. We compared single-view novel view 3D reconstruction methods across different 3D representation techniques, including methods based on Multi-Plane Image (MPI) representation, hybrid MPI and Neural Radiance Fields representation, and the current state-of-the-art methods using 3DGS representation for single-view novel view reconstruction. These comparisons substantiated the effectiveness and accuracy of our method. Additionally, to assess the generalizability of our network, we validated it across the NYU and KITTI datasets, and the results confirmed its robust cross-dataset generalization capability.
Collapse
Affiliation(s)
- Kun Fang
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Qinghui Zhang
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China.
| | - Chenxia Wan
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Pengtao Lv
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Cheng Yuan
- Henan Center for Fair Competition Review Affairs, Zhengzhou, 450008, China
| |
Collapse
|
47
|
Li H, Chien J, Gutchess A, Sekuler R. Visual short-term memory, culture, and image structure. Atten Percept Psychophys 2025:10.3758/s13414-025-03094-7. [PMID: 40426006 DOI: 10.3758/s13414-025-03094-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/07/2025] [Indexed: 05/29/2025]
Abstract
Cultural differences in cognition, including visual perception and long-term memory, may arise because typical visual environments differ across cultures, particularly in their spatial scale. Consequently, the influence of culture on cognitive processing depends on whether stimuli are presented at a large or small spatial scale. We tested North American and East Asian young adults to determine whether such cultural differences extend to short-term memory-testing, for the first time, whether spatial frequency information contributes to cross-cultural differences in memory. Test materials were images of natural and constructed scenes whose spatial structure was manipulated by low-pass filtering. Several seconds after briefly viewing a target scene, a subject saw three versions of that scene: the target itself and two variants whose low-pass filtering differed from the target. From these three, the subject selected the image identical to the target. The two groups did not differ in overall recognition accuracy but did in the way they mistook nonmatching images for certain targets. Specifically, North American subjects made reliably fewer errors in matching images whose high-frequency content was intact, providing evidence that cultural differences in prioritization of high spatial frequency information extend to short-term memory. Across both groups, subjects were highly accurate at recognizing images that retained all or most of their high-spatial frequency content and were highly sensitive to different levels of spatial filtering. These findings show that visual memory has sufficient fidelity to support fine discrimination of variation in spatial frequency.
Collapse
Affiliation(s)
- Huilin Li
- Department of Psychology, Brandeis University, 415 South Street, MS 062, Waltham, MA, 02454, USA
| | - Jessie Chien
- Department of Psychology, Brandeis University, 415 South Street, MS 062, Waltham, MA, 02454, USA
| | - Angela Gutchess
- Department of Psychology, Brandeis University, 415 South Street, MS 062, Waltham, MA, 02454, USA.
| | - Robert Sekuler
- Department of Psychology, Brandeis University, 415 South Street, MS 062, Waltham, MA, 02454, USA
| |
Collapse
|
48
|
Nimoh D, Acquah I, Wordui E. Rational design of alternative natural-based coupling media for diagnostic ultrasound imaging: a review. Biomed Phys Eng Express 2025; 11:042001. [PMID: 40359961 DOI: 10.1088/2057-1976/add7e2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Accepted: 05/13/2025] [Indexed: 05/15/2025]
Abstract
Ultrasound imaging is an indispensable diagnostic and screening tool in healthcare, renowned for its non-invasive nature, real-time visualization, and use of non-ionizing radiation. It plays a vital role in obstetrics and gynaecology by significantly reducing maternal mortality and enhancing patient care. In addition to its use in obstetrics, ultrasound is used to guide biopsies and to evaluate various diseases such as liver cirrhosis, thyroid disorders, and kidney stones. Point-of-care ultrasonography has proven to be increasingly beneficial in low-resource settings. However, the availability and cost of commercial ultrasound gels pose significant challenges. Alternative natural-based gels formulated from locally sourced materials have emerged as viable substitutes. This review critically examines alternative natural-based ultrasound gels, focusing on their physicochemical properties, formulation procedures, and the limitations associated with their use in diagnostic imaging. Furthermore, it presents a rational design approach that methodically selects the ingredients based on their properties and interactions to formulate these gels for imaging applications. This offers a promising pathway for indigenous manufacturers to develop gels that meet ideal performance criteria, ensuring better imaging outcomes and wider acceptability in clinical practice.
Collapse
Affiliation(s)
- Dennis Nimoh
- Biomedical Engineering Program, College of Engineering, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Isaac Acquah
- Biomedical Engineering Program, College of Engineering, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Elizabeth Wordui
- Radiological and Medical Sciences Research Institute, Ghana Atomic Energy Commission, Accra, Ghana
| |
Collapse
|
49
|
Wang Z, Zhou S, Zhang Y, Lin J, Lin J, Zhu M, Ng TK, Yang W, Wang G. Application of generative adversarial networks in the restoration of blurred optical coherence tomography images caused by optical media opacity in eyes. BMJ Open Ophthalmol 2025; 10:e001987. [PMID: 40425199 PMCID: PMC12107585 DOI: 10.1136/bmjophth-2024-001987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 05/05/2025] [Indexed: 05/29/2025] Open
Abstract
PURPOSE To assess the application of generative adversarial networks (GANs) to restore the blurred optical coherence tomography (OCT) images caused by optical media opacity in eyes. METHODS In this cross-sectional study, a spectral-domain OCT (Zeiss Cirrus 5000, Germany) was used to scan the macula of 510 eyes from 272 Chinese subjects. Optical media opacity was simulated with an algorithm for training set (420 normal eyes). Images for three test sets were from the following: 56 normal eyes before and after fitting neutral density filter (NDF), 34 eyes before and after cataract surgeries and 90 eyes processed by algorithm. GANs of pix2pix was trained with training set and restored blurred images in test sets. Structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) were used to evaluate the performance of GANs. RESULTS PSNR for test sets before and after image restoration was 18.37±0.44 and 19.94±0.29 for NDF (p<0.01), 16.65±0.99 and 16.91±0.26 for cataract (p=0.68) and 18.33±0.55 and 20.83±0.41 for algorithm regenerated graph (p<0.01), respectively. SSIM for test sets before and after image restoration was 0.85±0.02 and 1.00±0.00 for NDF (p<0.01), 0.92±0.07 and 0.97±0.02 for cataract (p<0.01) and 0.86±0.02 and 0.99±0.01 for algorithm regenerated graph (p<0.01), respectively. CONCLUSIONS GANs can be used to restore blurred OCT images caused by optical media opacity in eyes. Future studies are warranted to optimise this technique before the application in clinical practice.
Collapse
Affiliation(s)
- Zhengfang Wang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, China
- Qingyuan People's Hospital, Qingyuan, China
| | - Shuang Zhou
- School of Physics and Optoelectronic Engineering, Hainan University, Haikou, China
| | - Yeye Zhang
- School of Physics and Optoelectronic Engineering, Hainan University, Haikou, China
| | - Jianwei Lin
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, China
| | - Jinyan Lin
- Department of Physics, Shantou University, Shantou, China
| | - Ming Zhu
- School of Physics and Optoelectronic Engineering, Hainan University, Haikou, China
| | - Tsz Kin Ng
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, China
| | - Weifeng Yang
- School of Physics and Optoelectronic Engineering, Hainan University, Haikou, China
- Center for Theoretical Physics, Hainan University, Haikou, China
| | - Geng Wang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, China
| |
Collapse
|
50
|
Hwang SH, Kim YJ, Cho JB, Kim KG, Nam DH. Digital image enhancement using deep learning algorithm in 3D heads-up vitreoretinal surgery. Sci Rep 2025; 15:18429. [PMID: 40419711 DOI: 10.1038/s41598-025-98801-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 04/15/2025] [Indexed: 05/28/2025] Open
Abstract
This study aims to predict the optimal imaging parameters using a deep learning algorithm in 3D heads-up vitreoretinal surgery and assess its effectiveness on improving the vitreoretinal surface visibility during surgery. To develop the deep learning algorithm, we utilized 212 manually-optimized still images extracted from epiretinal membrane (ERM) surgical videos. These images were applied to a two-stage Generative Adversarial Network (GAN) and Convolutional Neural Network (CNN) architecture. The algorithm's performance was evaluated based on the peak signal-to-noise ratio (PSNR) and structural similarity index map (SSIM), and the degree of surgical image enhancement by the algorithm was evaluated based on sharpness, brightness, and contrast values. A survey was conducted to evaluate the intraoperative suitability of optimized images. For an in-vitro experiment, 121 anonymized high-resolution ERM fundus images were optimized using a 3D display based on the algorithm. The PSNR and SSIM values are 34.59 ± 5.34 and 0.88 ± 0.08, respectively. The algorithm enhances the sharpness, brightness and contrast values of the surgical images. In the in-vitro experiment, both the ERM size and color contrast ratio increased significantly in the optimized fundus images. Both surgical and fundus images are digitally enhanced using a deep learning algorithm. Digital image enhancement using this algorithm can be potentially applied to 3D heads-up vitreoretinal surgeries.
Collapse
Affiliation(s)
- Sung Ha Hwang
- Department of Ophthalmology, Gachon University Gil Medical Center, 21 Namdong-daero 774-beon-gil, Namdong-gu, Incheon, 21565, Republic of Korea.
| | - Young Jae Kim
- Department of Pre-Medicine, Gachon University, Incheon, Korea
| | - Jae Bok Cho
- Medical Device R&D center, Gachon University, Incheon, Korea
| | - Kwang Gi Kim
- Department of Biomedical Engineering, Gil Medical Center, College of Medicine, Gachon University, 21, Namdong-daero 774 beon‑gil, Namdong-gu, Incheon, 21565, Republic of Korea.
| | - Dong Heun Nam
- Department of Ophthalmology, Gachon University Gil Medical Center, 21 Namdong-daero 774-beon-gil, Namdong-gu, Incheon, 21565, Republic of Korea.
| |
Collapse
|