151
|
Rogalski M, Arcab P, Wdowiak E, Picazo-Bueno JÁ, Micó V, Józwik M, Trusiak M. Hybrid Iterating-Averaging Low Photon Budget Gabor Holographic Microscopy. ACS PHOTONICS 2025; 12:1771-1782. [PMID: 40255508 PMCID: PMC12007103 DOI: 10.1021/acsphotonics.4c01863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 12/13/2024] [Accepted: 12/16/2024] [Indexed: 04/22/2025]
Abstract
Achieving high-contrast, label-free imaging with minimal impact on live cell culture behavior remains a primary challenge in quantitative phase imaging (QPI). By enabling imaging under low illumination intensities (low photon budget, LPB), it is possible to minimize cell photostimulation, phototoxicity, and photodamage while supporting long-term and high-speed observations. However, LPB imaging introduces significant difficulties in QPI due to high levels of camera shot noise and quantification noise. Digital in-line holographic microscopy (DIHM) is a QPI technique known for its robustness against LPB data. However, simultaneous minimization of shot noise and inherent in DIHM twin image perturbation remains a critical challenge. In this study, we present the iterative Gabor averaging (IGA) algorithm, a novel approach that integrates iterative phase retrieval with frame averaging to effectively suppress both twin image disturbance and shot noise in multiframe DIHM. The IGA algorithm achieves this by leveraging an iterative process that reconstructs high-fidelity phase images while selectively averaging camera shot noise across frames. Our simulations demonstrate that IGA consistently outperforms conventional methods, achieving superior reconstruction accuracy, particularly under high-noise conditions. Experimental validations involving high-speed imaging of dynamic sperm cells and a static phase test target measurement under low illumination further confirmed IGA's efficacy. The algorithm also proved successful for optically thin samples, which often yield low signal-to-noise holograms even at high photon budgets. These advancements make IGA a powerful tool for photostimulation-free, high-speed imaging of dynamic biological samples and enhance the ability to image samples with extremely low optical thickness, potentially transforming biomedical and environmental applications in low-light settings.
Collapse
Affiliation(s)
- Mikolaj Rogalski
- Warsaw
University of Technology, Institute of Micromechanics and Photonics, 8 Sw. A. Boboli St., 02-525 Warsaw, Poland
| | - Piotr Arcab
- Warsaw
University of Technology, Institute of Micromechanics and Photonics, 8 Sw. A. Boboli St., 02-525 Warsaw, Poland
| | - Emilia Wdowiak
- Warsaw
University of Technology, Institute of Micromechanics and Photonics, 8 Sw. A. Boboli St., 02-525 Warsaw, Poland
| | - José Ángel Picazo-Bueno
- Departamento
de Óptica y Optometría y Ciencias de la Visión, Universidad de Valencia, C/Doctor Moliner 50, 46100 Burjassot, Spain
- Biomedical
Technology Center, University of Muenster, Mendelstr. 17, D-48149 Muenster, Germany
| | - Vicente Micó
- Departamento
de Óptica y Optometría y Ciencias de la Visión, Universidad de Valencia, C/Doctor Moliner 50, 46100 Burjassot, Spain
| | - Michal Józwik
- Warsaw
University of Technology, Institute of Micromechanics and Photonics, 8 Sw. A. Boboli St., 02-525 Warsaw, Poland
| | - Maciej Trusiak
- Warsaw
University of Technology, Institute of Micromechanics and Photonics, 8 Sw. A. Boboli St., 02-525 Warsaw, Poland
| |
Collapse
|
152
|
Yuan Q, Yang G, Lyu R. Aesthetic Judgment in Calligraphic Tracing: The Dominant Role of Dynamic Features. Behav Sci (Basel) 2025; 15:525. [PMID: 40282149 PMCID: PMC12024151 DOI: 10.3390/bs15040525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2025] [Revised: 04/09/2025] [Accepted: 04/10/2025] [Indexed: 04/29/2025] Open
Abstract
Aesthetic judgment in visual arts has traditionally focused on static features, yet research suggests that dynamic features also shape aesthetic experience. This study examines the dominance of dynamic features in calligraphic tracing aesthetics. Using a custom-designed calligraphy acquisition system, we recorded calligraphy experts and novices imitating Chinese characters and presented their works in three formats: static result sequence video s, pen-holding writing video f, and brushstroke trajectory video b. Participants then rated the stimuli on aesthetic dimensions. Results show that stimuli containing motion cues (f and b) received significantly higher ratings than static stimuli (s), confirming the positive role of dynamic features. Additionally, traced results maintained high structural similarity across writers. And the predictive power of static features for aesthetic scores was limited. This confirms the weak influence of static features on the aesthetics of calligraphic tracing. In conclusion, this study reveals that dynamic features play a dominant role in aesthetic judgment within the context of calligraphic tracing. These findings contribute to aesthetic modeling, proposing that observers dynamically adjust the weighting of static and dynamic features based on aesthetic context to form aesthetic judgments, thereby offering a novel perspective for research on aesthetic cognition mechanisms.
Collapse
Affiliation(s)
- Qian Yuan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214125, China; (Q.Y.); (G.Y.)
| | - Guoying Yang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214125, China; (Q.Y.); (G.Y.)
| | - Ruimin Lyu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214125, China; (Q.Y.); (G.Y.)
- Jiangsu Key University Laboratory of Software and Media Technology under Human-Computer Cooperation, Jiangnan University, Wuxi 214125, China
| |
Collapse
|
153
|
Huang Y, Zhu X, Yuan F, Shi J, U K, Qin J, Kong X, Peng Y. A Mamba U-Net Model for Reconstruction of Extremely Dark RGGB Images. SENSORS (BASEL, SWITZERLAND) 2025; 25:2464. [PMID: 40285153 PMCID: PMC12030951 DOI: 10.3390/s25082464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 04/11/2025] [Accepted: 04/12/2025] [Indexed: 04/29/2025]
Abstract
Currently, most images captured by high-pixel devices such as mobile phones, camcorders, and drones are in RGGB format. However, image quality in extremely dark scenes often needs improvement. Traditional methods for processing these dark RGGB images typically rely on end-to-end U-Net networks and their enhancement techniques, which require substantial resources and processing time. To tackle this issue, we first converted RGGB images into RGB three-channel images by subtracting the black level and applying linear interpolation. During the training stage, we leveraged the computational efficiency of the state-space model (SSM) and developed a Mamba U-Net end-to-end model to enhance the restoration of extremely dark RGGB images. We utilized the see-in-the-dark (SID) dataset for training, assessing the effectiveness of our approach. Experimental results indicate that our method significantly reduces resource consumption compared to existing single-step training and prior multi-step training techniques, while achieving improved peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) outcomes.
Collapse
Affiliation(s)
- Yiyao Huang
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
| | - Xiaobao Zhu
- School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China
| | - Fenglian Yuan
- School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China
| | - Jing Shi
- Department of Mechanical & Materials Engineering, University of Cincinnati, Cincinnati, OH 999039, USA
| | - Kintak U
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
| | - Junshuo Qin
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
| | - Xiangjie Kong
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
| | - Yiran Peng
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
| |
Collapse
|
154
|
Oraby S, Emran A, El-Saghir B, Mohsen S. Hybrid of DSR-GAN and CNN for Alzheimer disease detection based on MRI images. Sci Rep 2025; 15:12727. [PMID: 40222973 PMCID: PMC11994760 DOI: 10.1038/s41598-025-94677-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 03/17/2025] [Indexed: 04/15/2025] Open
Abstract
In this paper, we propose a deep super-resolution generative adversarial network (DSR-GAN) combined with a convolutional neural network (CNN) model designed to classify four stages of Alzheimer's disease (AD): Mild Dementia (MD), Moderate Dementia (MOD), Non-Demented (ND), and Very Mild Dementia (VMD). The proposed DSR-GAN is implemented using a PyTorch library and uses a dataset of 6,400 MRI images. A super-resolution (SR) technique is applied to enhance the clarity and detail of the images, allowing the DSR-GAN to refine particular image features. The CNN model undergoes hyperparameter optimization and incorporates data augmentation strategies to maximize its efficiency. The normalized error matrix and area under ROC curve are used experimentally to evaluate the CNN's performance which achieved a testing accuracy of 99.22%, an area under the ROC curve of 100%, and an error rate of 0.0516. Also, the performance of the DSR-GAN is assessed using three different metrics: structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and multi-scale structural similarity index measure (MS-SSIM). The achieved SSIM score of 0.847, while the PSNR and MS-SSIM percentage are 29.30 dB and 96.39%, respectively. The combination of the DSR-GAN and CNN models provides a rapid and precise method to distinguish between various stages of Alzheimer's disease, potentially aiding professionals in the screening of AD cases.
Collapse
Affiliation(s)
- Sarah Oraby
- Department of Electrical Engineering, Faculty of Engineering, Al-Azhar University, Cairo, Egypt.
| | - Ahmed Emran
- Department of Electrical Engineering, Faculty of Engineering, Al-Azhar University, Cairo, Egypt
- Department of Artificial Intelligence Engineering, Faculty of Computer Science and Engineering, King Salman International University (KSIU), South Sinai, 46511, Egypt
| | - Basel El-Saghir
- Department of Electrical Engineering, Faculty of Engineering, Al-Azhar University, Cairo, Egypt
| | - Saeed Mohsen
- Department of Electronics and Communications Engineering, Al-Madinah Higher Institute for Engineering and Technology, Giza, 12947, Egypt
- Department of Artificial Intelligence Engineering, Faculty of Computer Science and Engineering, King Salman International University (KSIU), South Sinai, 46511, Egypt
| |
Collapse
|
155
|
Diès A, Roussel H, Joachimowicz N. Application of Spectral Approach Combined with U-NETs for Quantitative Microwave Breast Imaging. SENSORS (BASEL, SWITZERLAND) 2025; 25:2450. [PMID: 40285140 PMCID: PMC12030878 DOI: 10.3390/s25082450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 03/14/2025] [Accepted: 03/28/2025] [Indexed: 04/29/2025]
Abstract
This study focuses on breast imaging. A spectral approach based on the Fourier diffraction theorem is combined with a pair of U-NETs to perform real-time quantitative human breast imaging. The U-NET pair is trained based on the input of an induced current spectrum and the output of a contrast dielectric spectrum. A spectral database is constructed using combinations of anthropomorphic cavities. The weighted mean absolute percentage error (WMAPE) loss is associated with the Adam optimizer to perform optimization. Numerical results are presented to validate the proposed concept to demonstrate the transformation brought about by the U-NETs.
Collapse
Affiliation(s)
- Ambroise Diès
- Sorbonne Université, CNRS, Laboratoire de Génie Electrique et Electronique de Paris, 75252 Paris, France; (A.D.); (H.R.)
| | - Hélène Roussel
- Sorbonne Université, CNRS, Laboratoire de Génie Electrique et Electronique de Paris, 75252 Paris, France; (A.D.); (H.R.)
| | - Nadine Joachimowicz
- Sorbonne Université, CNRS, Laboratoire de Génie Electrique et Electronique de Paris, 75252 Paris, France; (A.D.); (H.R.)
- Université Paris Cité, F-75006 Paris, France
| |
Collapse
|
156
|
Yu S, Li Z, Gu J, Wang R, Liu X, Li L, Guo F, Ren Y. CWMS-GAN: A small-sample bearing fault diagnosis method based on continuous wavelet transform and multi-size kernel attention mechanism. PLoS One 2025; 20:e0319202. [PMID: 40215467 PMCID: PMC11991733 DOI: 10.1371/journal.pone.0319202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 01/28/2025] [Indexed: 04/14/2025] Open
Abstract
In industrial production, obtaining sufficient bearing fault signals is often extremely difficult, leading to a significant degradation in the performance of traditional deep learning-based fault diagnosis models. Many recent studies have shown that data augmentation using generative adversarial networks (GAN) can effectively alleviate this problem. However, the quality of generated samples is closely related to the performance of fault diagnosis models. For this reason, this paper proposes a new GAN-based small-sample bearing fault diagnosis method. Specifically, this study proposes a continuous wavelet convolution strategy (CWCL) instead of the traditional convolution operation in GAN, which can additionally capture the signal's frequency domain features. Meanwhile, this study designed a new multi-size kernel attention mechanism (MSKAM), which can extract the features of bearing vibration signals from different scales and adaptively select the features that are more important for the generation task to improve the accuracy and authenticity of the generated signals. In addition, the structural similarity index (SSIM) is adopted to quantitatively evaluate the quality of the generated signal by calculating the similarity between the generated signal and the real signal in both the time and frequency domains. Finally, we conducted extensive experiments on the CWRU and MFPT datasets and made a comprehensive comparison with existing small-sample bearing fault diagnosis methods, which verified the effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Shun Yu
- School of Systems and Computing, University of New South Wales, Canberra, Australia
| | - Zi Li
- School of Systems and Computing, University of New South Wales, Canberra, Australia
| | - Jialin Gu
- School of Systems and Computing, University of New South Wales, Canberra, Australia
| | - Runpu Wang
- School of Systems and Computing, University of New South Wales, Canberra, Australia
| | - Xiaoyu Liu
- Guanghua School of Management, Peking university, Beijing, China
| | - Lin Li
- Faculty of Science and Engineering, Southern Cross University, Gold Coast, Australia
| | - Fusen Guo
- School of Systems and Computing, University of New South Wales, Canberra, Australia
| | - Yuheng Ren
- School of Business Economics, European Union University, Montreux, Switzerland
| |
Collapse
|
157
|
Vergara P, Wang Y, Srinivasan S, Dong Z, Feng Y, Koyanagi I, Kumar D, Chérasse Y, Naoi T, Sugaya Y, Sakurai T, Kano M, Shuman T, Cai D, Yanagisawa M, Sakaguchi M. A comprehensive suite for extracting neuron signals across multiple sessions in one-photon calcium imaging. Nat Commun 2025; 16:3443. [PMID: 40216771 PMCID: PMC11992088 DOI: 10.1038/s41467-025-58817-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 04/01/2025] [Indexed: 04/14/2025] Open
Abstract
We developed CaliAli, a comprehensive suite designed to extract neuronal signals from one-photon calcium imaging data collected across multiple sessions in free-moving conditions in mice. CaliAli incorporates information from blood vessels and neurons to correct inter-session misalignments, making it robust against non-rigid brain deformations even after substantial changes in the field of view across sessions. This also makes CaliAli robust against high neuron overlap and changes in active neuron population across sessions. CaliAli performs computationally efficient signal extraction from concatenated video sessions that enhances the detectability of weak calcium signals. Notably, CaliAli enhanced the spatial coding accuracy of extracted hippocampal CA1 neuron activity across sessions. An optogenetic tagging experiment showed that CaliAli enhanced neuronal trackability in the dentate gyrus across a time scale of weeks. Finally, dentate gyrus neurons tracked using CaliAli exhibited stable population activity for 99 days. Overall, CaliAli advances our capacity to understand the activity dynamics of neuronal ensembles over time, which is crucial for deciphering the complex neuronal substrates of natural animal behaviors.
Collapse
Grants
- JP21zf0127005, JP23wm0525003 Japan Agency for Medical Research and Development (AMED)
- JP21zf0127005 Japan Agency for Medical Research and Development (AMED)
- 24H00894, 23H02784, 22H00469, 16H06280, 20H03552, 21H05674, 21F21080 MEXT | Japan Society for the Promotion of Science (JSPS)
- JPMJSP2124 MEXT | Japan Science and Technology Agency (JST)
- 24H00894, 21J11746, 23K19393, 24K18212 Japan Society for the Promotion of Science London (JSPS London)
- 16H06280 Japan Society for the Promotion of Science London (JSPS London)
- Takeda Science Foundation
- Uehara Memorial Foundation
- G-7 Scholarship Foundation Uehara Memorial Foundation The Mitsubishi Foundation
Collapse
Affiliation(s)
- Pablo Vergara
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan.
| | - Yuteng Wang
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
- Doctoral Program in Neuroscience, Degree Programs in Comprehensive Human Sciences, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Sakthivel Srinivasan
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Zhe Dong
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Yu Feng
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Iyo Koyanagi
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Deependra Kumar
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Yoan Chérasse
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Toshie Naoi
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Yuki Sugaya
- Department of Neurophysiology, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study (UTIAS), Tokyo, 113-0033, Japan
| | - Takeshi Sakurai
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Masanobu Kano
- Department of Neurophysiology, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study (UTIAS), Tokyo, 113-0033, Japan
| | - Tristan Shuman
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Denise Cai
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Masashi Yanagisawa
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan
- Doctoral Program in Neuroscience, Degree Programs in Comprehensive Human Sciences, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Masanori Sakaguchi
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, Japan.
- Doctoral Program in Neuroscience, Degree Programs in Comprehensive Human Sciences, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan.
- Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan.
| |
Collapse
|
158
|
Angel JC, El Amraoui N, Gürsoy G. pC-SAC: A method for high-resolution 3D genome reconstruction from low-resolution Hi-C data. Nucleic Acids Res 2025; 53:gkaf289. [PMID: 40226920 PMCID: PMC11995266 DOI: 10.1093/nar/gkaf289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 02/25/2025] [Accepted: 03/28/2025] [Indexed: 04/15/2025] Open
Abstract
The three-dimensional (3D) organization of the genome is crucial for gene regulation, with disruptions linked to various diseases. High-throughput Chromosome Conformation Capture (Hi-C) and related technologies have advanced our understanding of 3D genome organization by mapping interactions between distal genomic regions. However, capturing enhancer-promoter interactions at high resolution remains challenging due to the high sequencing depth required. We introduce pC-SAC (probabilistically Constrained Self-Avoiding Chromatin), a novel computational method for producing accurate high-resolution Hi-C matrices from low-resolution data. pC-SAC uses adaptive importance sampling with sequential Monte Carlo to generate ensembles of 3D chromatin chains that satisfy physical constraints derived from low-resolution Hi-C data. Our method achieves over 95% accuracy in reconstructing high-resolution chromatin maps and identifies novel interactions enriched with candidate cis-regulatory elements (cCREs) and expression quantitative trait loci (eQTLs). Benchmarking against state-of-the-art deep learning models demonstrates pC-SAC's performance in both short- and long-range interaction reconstruction. pC-SAC offers a cost-effective solution for enhancing the resolution of Hi-C data, thus enabling deeper insights into 3D genome organization and its role in gene regulation and disease. Our tool can be found at https://github.com/G2Lab/pCSAC.
Collapse
Affiliation(s)
- J Carlos Angel
- Department of Molecular Pharmacology and Therapeutics, Columbia University, New York, NY 10032, United States
- New York Genome Center, New York, NY 10013, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | | | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
- New York Genome Center, New York, NY 10013, United States
- Department of Computer Science, Columbia University, New York, NY 10027, United States
| |
Collapse
|
159
|
Kumari P, Van Marwick B, Kern J, Rädle M. A Multi-Modal Light Sheet Microscope for High-Resolution 3D Tomographic Imaging with Enhanced Raman Scattering and Computational Denoising. SENSORS (BASEL, SWITZERLAND) 2025; 25:2386. [PMID: 40285078 PMCID: PMC12031234 DOI: 10.3390/s25082386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2025] [Revised: 04/02/2025] [Accepted: 04/05/2025] [Indexed: 04/29/2025]
Abstract
Three-dimensional (3D) cellular models, such as spheroids, serve as pivotal systems for understanding complex biological phenomena in histology, oncology, and tissue engineering. In response to the growing need for advanced imaging capabilities, we present a novel multi-modal Raman light sheet microscope designed to capture elastic (Rayleigh) and inelastic (Raman) scattering, along with fluorescence signals, in a single platform. By leveraging a shorter excitation wavelength (532 nm) to boost Raman scattering efficiency and incorporating robust fluorescence suppression, the system achieves label-free, high-resolution tomographic imaging without the drawbacks commonly associated with near-infrared modalities. An accompanying Deep Image Prior (DIP) seamlessly integrates with the microscope to provide unsupervised denoising and resolution enhancement, preserving critical molecular details and minimizing extraneous artifacts. Altogether, this synergy of optical and computational strategies underscores the potential for in-depth, 3D imaging of biomolecular and structural features in complex specimens and sets the stage for future advancements in biomedical research, diagnostics, and therapeutics.
Collapse
Affiliation(s)
- Pooja Kumari
- CeMOS Research and Transfer Center, Mannheim University of Applied Sciences, 68163 Mannheim, Germany; (B.V.M.); (M.R.)
| | - Björn Van Marwick
- CeMOS Research and Transfer Center, Mannheim University of Applied Sciences, 68163 Mannheim, Germany; (B.V.M.); (M.R.)
| | - Johann Kern
- Medical Faculty Mannheim, Heidelberg University, 68167 Mannheim, Germany;
| | - Matthias Rädle
- CeMOS Research and Transfer Center, Mannheim University of Applied Sciences, 68163 Mannheim, Germany; (B.V.M.); (M.R.)
| |
Collapse
|
160
|
He M, Wang R, Zhang M, Lv F, Wang Y, Zhou F, Bian X. SwinLightGAN a study of low-light image enhancement algorithms using depth residuals and transformer techniques. Sci Rep 2025; 15:12151. [PMID: 40204793 PMCID: PMC11982214 DOI: 10.1038/s41598-025-95329-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Accepted: 03/20/2025] [Indexed: 04/11/2025] Open
Abstract
Contemporary algorithms for enhancing images in low-light conditions prioritize improving brightness and contrast but often neglect improving image details. This study introduces the Swin Transformer-based Light-enhancing Generative Adversarial Network (SwinLightGAN), a novel generative adversarial network (GAN) that effectively enhances image details under low-light conditions. The network integrates a generator model based on a Residual Jumping U-shaped Network (U-Net) architecture for precise local detail extraction with an illumination network enhanced by Shifted Window Transformer (Swin Transformer) technology that captures multi-scale spatial features and global contexts. This combination produces high-quality images that resemble those taken in normal lighting conditions, retaining intricate details. Through adversarial training that employs discriminators operating at multiple scales and a blend of loss functions, SwinLightGAN ensures a seamless distinction between generated and authentic images, ensuring superior enhancement quality. Extensive experimental analysis on multiple unpaired datasets demonstrates SwinLightGAN's outstanding performance. The system achieves Naturalness Image Quality Evaluator (NIQE) scores ranging from 5.193 to 5.397, Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) scores from 28.879 to 32.040, and Patch-based Image Quality Evaluator (PIQE) scores from 38.280 to 44.479, highlighting its efficacy in delivering high-quality enhancements across diverse metrics.
Collapse
Affiliation(s)
- Min He
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Rugang Wang
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China.
| | - Mingyang Zhang
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Feiyang Lv
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Yuanyuan Wang
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Feng Zhou
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xuesheng Bian
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| |
Collapse
|
161
|
Fang X, Chen P, Wang M, Wang S. Examining the role of compression in influencing AI-generated image authenticity. Sci Rep 2025; 15:12192. [PMID: 40204816 PMCID: PMC11982568 DOI: 10.1038/s41598-025-91545-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 02/21/2025] [Indexed: 04/11/2025] Open
Abstract
The rapid development of AI-generated Content (AIGC) in recent years has narrowed the gap between virtual and realistic. Among them, AI-generated Images (AIGIs) are particularly significant, as their emergence has led to a profound impact on education, art, virtual reality, etc. However, little research has been conducted to investigate whether compression artifacts can influence the subjective authenticity of AIGIs. In this paper, we systematically study this problem by creating the first-ever AIGC image dataset for subjective evaluations of authenticity discrimination. The dataset contains 500 AIGIs and 500 natural images with a resolution of 768 × 768. The content of the images therein has been categorized into 5 major categories and 20 subcategories to study the performance of AIGIs on different contents. Subsequently, we introduce four varying degrees of compression distortion (QP = 22, 32, 42, 52) on all images utilizing the standard Versatile Video Coding (VVC). It is interesting to find that with an increase in compression distortion, the accuracy of human vision in determining the AIGIs descends. The proposed study is expected to shed light on future research that aims to achieve a good balance between authenticity and visual quality.
Collapse
Affiliation(s)
- Xiaohan Fang
- Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China
| | - Peilin Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China
| | - Meng Wang
- School of Data Science, Lingnan University, Hong Kong, SAR, China
| | - Shiqi Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China.
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China.
| |
Collapse
|
162
|
Chen J, Ye S, Ouyang X, Zhuang J. WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement. J Imaging 2025; 11:114. [PMID: 40278030 PMCID: PMC12028025 DOI: 10.3390/jimaging11040114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 03/28/2025] [Accepted: 04/01/2025] [Indexed: 04/26/2025] Open
Abstract
Underwater image enhancement (UIE) is inherently challenging due to complex degradation effects such as light absorption and scattering, which result in color distortion and a loss of fine details. Most existing methods focus on spatial-domain processing, often neglecting the frequency-domain characteristics that are crucial for effectively restoring textures and edges. In this paper, we propose a novel UIE framework, the Wavelet-based Enhancement Diffusion Model (WEDM), which integrates frequency-domain decomposition with diffusion models. The WEDM consists of two main modules: the Wavelet Color Compensation Module (WCCM) for color correction in the LAB space using discrete wavelet transform, and the Wavelet Diffusion Module (WDM), which replaces traditional convolutions with wavelet-based operations to preserve multi-scale frequency features. By combining residual denoising diffusion with frequency-specific processing, the WEDM effectively reduces noise amplification and high-frequency blurring. Ablation studies further demonstrate the essential roles of the WCCM and WDM in improving color fidelity and texture details. Our framework offers a robust solution for underwater visual tasks, with promising applications in marine exploration and ecological monitoring.
Collapse
Affiliation(s)
- Junhao Chen
- Faculty of Mechanical Engineering & Mechanics, Ningbo University, Ningbo 315211, China;
| | - Sichao Ye
- Ningbo Institute of Materials Technology & Engineering, Chinese Academy of Sciences, Ningbo 315201, China;
| | - Xiong Ouyang
- Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China;
| | - Jiayan Zhuang
- Ningbo Institute of Materials Technology & Engineering, Chinese Academy of Sciences, Ningbo 315201, China;
| |
Collapse
|
163
|
Raghavendran G, Han B, Adekogbe F, Bai S, Lu B, Wu W, Zhang M, Meng YS. Deep learning assisted high-resolution microscopy image processing for phase segmentation in functional composite materials. J Microsc 2025. [PMID: 40195694 DOI: 10.1111/jmi.13413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 03/24/2025] [Accepted: 03/25/2025] [Indexed: 04/09/2025]
Abstract
In the domain of battery research, the processing of high-resolution microscopy images is a challenging task, as it involves dealing with complex images and requires a prior understanding of the components involved. The utilisation of deep learning methodologies for image analysis has attracted considerable interest in recent years, with multiple investigations employing such techniques for image segmentation and analysis within the realm of battery research. However, the automated analysis of high-resolution microscopy images for detecting phases and components in composite materials is still an underexplored area. This work proposes a novel workflow for FFT-based segmentation, periodic component detection and phase segmentation from raw high-resolution Transmission Electron Microscopy (TEM) images using a trained U-Net segmentation model. The developed model can expedite the detection of components and their phase segmentation, diminishing the temporal and cognitive demands associated with scrutinising an extensive array of TEM images, thereby mitigating the potential for human errors. This approach presents a novel and efficient image analysis approach with broad applicability beyond the battery field and holds potential for application in other related domains characterised by phase and composition distribution, such as alloy production.
Collapse
Affiliation(s)
- Ganesh Raghavendran
- Aiiso Yufeng Li Family Department of Chemical and Nano Engineering, University of California San Diego, La Jolla, California, USA
| | - Bing Han
- Aiiso Yufeng Li Family Department of Chemical and Nano Engineering, University of California San Diego, La Jolla, California, USA
- Department of Materials Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Fortune Adekogbe
- Department of Chemical and Petroleum Engineering, University of Lagos, Lagos, Nigeria
| | - Shuang Bai
- Department of Materials Science and Engineering, University of California San Diego, La Jolla, California, USA
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois, USA
| | - Bingyu Lu
- Aiiso Yufeng Li Family Department of Chemical and Nano Engineering, University of California San Diego, La Jolla, California, USA
| | - William Wu
- Department of Computer Science, University of California San Diego, La Jolla, California, USA
| | - Minghao Zhang
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois, USA
| | - Ying Shirley Meng
- Aiiso Yufeng Li Family Department of Chemical and Nano Engineering, University of California San Diego, La Jolla, California, USA
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
164
|
Niu Q, Wu K, Zhang J, Han Z, Liu L. SAD-Net: a full spectral self-attention detail enhancement network for single image dehazing. Sci Rep 2025; 15:11875. [PMID: 40195442 PMCID: PMC11977212 DOI: 10.1038/s41598-025-92061-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 02/25/2025] [Indexed: 04/09/2025] Open
Abstract
Single-image dehazing technology plays a significant role in video surveillance and intelligent transportation. However, existing dehazing methods using vanilla convolution only extract features in the temporal domain and lack the ability to capture multi-directional information. To address the aforementioned issues, we design a new full spectral attention-based detail enhancement dehazing network, named SAD-Net. SAD-Net adopts a U-Net-like structure and integrates Spectral Detail Enhancement Convolution (SDEC) and Frequency-Guided Attention (FGA). SDEC combines wavelet transform and difference convolution(DC) to enhance high-frequency features while preserving low-frequency information. FGA detects haze-induced discrepancies and fine-tunes feature modulation. Experimental results show that SAD-Net outperforms six other dehazing networks on the Dense-Haze, NH-Haze, RESIDE and I-Haze datasets. Specifically, it increases the peak signal-to-noise ratio (PSNR) to 17.16 dB on the Dense-Haze dataset, surpassing the current state-of-the-art (SOTA) methods. Additionally, SAD-Net achieves excellent dehazing performance on an external dataset without any prior training.
Collapse
Affiliation(s)
- Qingjun Niu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kun Wu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jialu Zhang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhenqi Han
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Lizhuang Liu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
165
|
Xu X, Luo W, Ren Z, Song X. Intelligent Detection and Recognition of Marine Plankton by Digital Holography and Deep Learning. SENSORS (BASEL, SWITZERLAND) 2025; 25:2325. [PMID: 40218838 PMCID: PMC11991423 DOI: 10.3390/s25072325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2025] [Revised: 03/12/2025] [Accepted: 03/17/2025] [Indexed: 04/14/2025]
Abstract
The detection, observation, recognition, and statistics of marine plankton are the basis of marine ecological research. In recent years, digital holography has been widely applied to plankton detection and recognition. However, the recording and reconstruction of digital holography require a strictly controlled laboratory environment and time-consuming iterative computation, respectively, which impede its application in marine plankton imaging. In this paper, an intelligent method designed with digital holography and deep learning algorithms is proposed to detect and recognize marine plankton (IDRMP). An accurate integrated A-Unet network is established under the principle of deep learning and trained by digital holograms recorded with publicly available plankton datasets. This method can complete the work of reconstructing and recognizing a variety of plankton organisms stably and efficiently by a single hologram, and a system interface of YOLOv5 that can realize the task of the end-to-end detection of plankton by a single frame is provided. The structural similarities of the images reconstructed by IDRMP are all higher than 0.97, and the average accuracy of the detection of four plankton species, namely, Appendicularian, Chaetognath, Echinoderm and Hydromedusae,, reaches 91.0% after using YOLOv5. In optical experiments, typical marine plankton collected from Weifang, China, are employed as samples. For randomly selected samples of Copepods, Tunicates and Polychaetes, the results are ideal and acceptable, and a batch detection function is developed for the learning of the system. Our test and experiment results demonstrate that this method is efficient and accurate for the detection and recognition of numerous plankton within a certain volume of space after they are recorded by digital holography.
Collapse
Affiliation(s)
- Xianfeng Xu
- College of Science, China University of Petroleum (East China), Qingdao 266580, China; (W.L.); (Z.R.); (X.S.)
| | | | | | | |
Collapse
|
166
|
Jiao H, Sun W, Wang H, Wan X. Comprehensive Exploitation of Time- and Frequency-Domain Information for Bearing Fault Diagnosis on Imbalanced Datasets via Adaptive Wavelet-like Transform General Adversarial Network and Ensemble Learning. SENSORS (BASEL, SWITZERLAND) 2025; 25:2328. [PMID: 40218840 PMCID: PMC11991466 DOI: 10.3390/s25072328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2025] [Revised: 03/29/2025] [Accepted: 04/04/2025] [Indexed: 04/14/2025]
Abstract
The vibration signals of faulty bearings contain rich feature information in both the time and frequency domains. Effectively leveraging this information is crucial, especially when addressing imbalanced bearing fault datasets, as it can significantly enhance the performance of fault diagnosis models. However, existing GAN models and diagnostic methods do not fully exploit these domain-specific features. To overcome this limitation, a novel fault diagnosis method is proposed, based on the Adaptive Wavelet-Like Transform Generative Adversarial Network (AWLT-GAN) and ensemble learning. In the first stage, AWLT-GAN is used to balance the bearing fault dataset by integrating time- and frequency-domain feature information. AWLT-GAN embeds an adaptive wavelet-like transform neural network into the generator as an adaptive layer and employs a dual-discriminator architecture. This design allows the network to simultaneously learn fault characteristics from both domains within a single training session, enhancing the quality of the synthetic fault data. Next, an ensemble learning approach is applied, combining time- and frequency-domain models, with the final classification determined through a soft voting mechanism. Experimental results demonstrate that the vibration signals generated by AWLT-GAN effectively replicate the feature distribution of real data, confirming its high performance. The fault diagnosis model, developed using these high-quality synthetic samples, accurately captures fault characteristics embedded in both the time and frequency domains, resulting in enhanced diagnostic performance. The proposed approach not only addresses the imbalance in bearing fault datasets but also significantly improves diagnostic accuracy.
Collapse
Affiliation(s)
| | - Wenlei Sun
- Intelligent Manufacturing Modern Industry College, Xinjiang University, Urumqi 830049, China; (H.J.); (H.W.); (X.W.)
| | | | | |
Collapse
|
167
|
Luo F, Wu D, Pino LR, Ding W. A novel multimodel medical image fusion framework with edge enhancement and cross-scale transformer. Sci Rep 2025; 15:11657. [PMID: 40185793 PMCID: PMC11971266 DOI: 10.1038/s41598-025-93616-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Accepted: 03/07/2025] [Indexed: 04/07/2025] Open
Abstract
Multimodal medical image fusion (MMIF) integrates complementary information from different imaging modalities to enhance image quality and remove redundant data, benefiting a variety of clinical applications such as tumor detection and organ delineation. However, existing MMIF methods often struggle to preserve sharp edges and maintain high contrast, both of which are critical for accurate diagnosis and treatment planning. To address these limitations, this paper proposes ECFusion, a novel MMIF framework that explicitly incorporates edge prior information and leverages a cross-scale transformer. First, an Edge-Augmented Module (EAM) employs the Sobel operator to extract edge features, thereby improving the representation and preservation of edge details. Second, a Cross-Scale Transformer Fusion Module (CSTF) with a Hierarchical Cross-Scale Embedding Layer (HCEL) captures multi-scale contextual information and enhances the global consistency of fused images. Additionally, a multi-path fusion strategy is introduced to disentangle deep and shallow features, mitigating feature loss during fusion. We conduct extensive experiments on the AANLIB dataset, evaluating CT-MRI, PET-MRI, and SPECT-MRI fusion tasks. Compared with state-of-the-art methods (U2Fusion, EMFusion, SwinFusion, and CDDFuse), ECFusion produces fused images with clearer edges and higher contrast. Quantitative results further highlight improvements in mutual information (MI), structural similarity (Qabf, SSIM), and visual perception (VIF, Qcb, Qcv).
Collapse
Affiliation(s)
- Fei Luo
- East China University of Science and Technology, School of Information Science and Engineering, Shanghai, 200237, China.
| | - Daoqi Wu
- East China University of Science and Technology, School of Information Science and Engineering, Shanghai, 200237, China
| | - Luis Rojas Pino
- Universidad San Sebastián, School of Engineering, Architecture and Design, Santiago, 8320000, Chile
| | - Weichao Ding
- East China University of Science and Technology, School of Information Science and Engineering, Shanghai, 200237, China
| |
Collapse
|
168
|
Wang M, Li Q, Liu H. Single-Character-Based Embedding Feature Aggregation Using Cross-Attention for Scene Text Super-Resolution. SENSORS (BASEL, SWITZERLAND) 2025; 25:2228. [PMID: 40218739 PMCID: PMC11991259 DOI: 10.3390/s25072228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 03/13/2025] [Accepted: 03/27/2025] [Indexed: 04/14/2025]
Abstract
In textual vision scenarios, super-resolution aims to enhance textual quality and readability to facilitate downstream tasks. However, the ambiguity of character regions in complex backgrounds remains challenging to mitigate, particularly the interference between tightly connected characters. In this paper, we propose single-character-based embedding feature aggregation using cross-attention for scene text super-resolution (SCE-STISR) to solve this problem. Firstly, a dynamic feature extraction mechanism is employed to adaptively capture shallow features by dynamically adjusting multi-scale feature weights based on spatial representations. During text-image interactions, a dual-level cross-attention mechanism is introduced to comprehensively aggregate the cropped single-character features with textual prior, also aligning semantic sequences and visual features. Finally, an adaptive normalized color correction operation is applied to mitigate color distortion caused by background interference. In TextZoom benchmarking, the text recognition accuracies of different recognizers are 53.6%, 60.9%, and 64.5%, which are improved by 0.9-1.4% over the baseline TATT, achieving an optimal SSIM value of 0.7951 and a PSNR of 21.84. Additionally, our approach improves accuracy by 0.2-2.2% over existing baselines on five text recognition datasets, validating the effectiveness of the model.
Collapse
Affiliation(s)
- Meng Wang
- School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; (Q.L.); (H.L.)
| | | | | |
Collapse
|
169
|
Dewey BE, Remedios SW, Sanjayan M, Rjeily NB, Lee AZ, Wyche C, Duncan S, Prince JL, Calabresi PA, Fitzgerald KC, Mowry EM. Super-Resolution in Clinically Available Spinal Cord MRIs Enables Automated Atrophy Analysis. AJNR Am J Neuroradiol 2025; 46:823-831. [PMID: 39366765 PMCID: PMC11979833 DOI: 10.3174/ajnr.a8526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 10/03/2024] [Indexed: 10/06/2024]
Abstract
BACKGROUND AND PURPOSE Measurement of the mean upper cervical cord area (MUCCA) is an important biomarker in the study of neurodegeneration. However, dedicated high-resolution (HR) scans of the cervical spinal cord are rare in standard-of-care imaging due to timing and clinical usability. Most clinical cervical spinal cord imaging is sagittally acquired in 2D with thick slices and anisotropic voxels. As a solution, previous work describes HR T1-weighted brain imaging for measuring the upper cord area, but this is still not common in clinical care. MATERIALS AND METHODS We propose using a zero-shot super-resolution technique, synthetic multi-orientation resolution enhancement (SMORE), already validated in the brain, to enhance the resolution of 2D-acquired scans for upper cord area calculations. To incorporate super-resolution in spinal cord analysis, we validate SMORE against HR research imaging and in a real-world longitudinal data analysis. RESULTS Super-resolved (SR) images reconstructed by using SMORE showed significantly greater similarity to the ground truth than low-resolution (LR) images across all tested resolutions (P < .001 for all resolutions in peak signal-to-noise ratio [PSNR] and mean structural similarity [MSSIM]). MUCCA results from SR scans demonstrate excellent correlation with HR scans (r > 0.973 for all resolutions) compared with LR scans. Additionally, SR scans are consistent between resolutions (r > 0.969), an essential factor in longitudinal analysis. Compared with clinical outcomes such as walking speed or disease severity, MUCCA values from LR scans have significantly lower correlations than those from HR scans. SR results have no significant difference. In a longitudinal real-world data set, we show that these SR volumes can be used in conjunction with T1-weighted brain scans to show a significant rate of atrophy (-0.790, P = .020 versus -0.438, P = .301 with LR). CONCLUSIONS Super-resolution is a valuable tool for enabling large-scale studies of cord atrophy, as LR images acquired in clinical practice are common and available.
Collapse
Affiliation(s)
- Blake E Dewey
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Samuel W Remedios
- Department of Computer Science (S.W.R.), Johns Hopkins University, Baltimore, Maryland
| | - Muraleetharan Sanjayan
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Nicole Bou Rjeily
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Alexandra Zambriczki Lee
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Chelsea Wyche
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Safiya Duncan
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Jerry L Prince
- Department of Electrical and Computer Engineering (J.L.P.), Johns Hopkins University, Baltimore, Maryland
| | - Peter A Calabresi
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Kathryn C Fitzgerald
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| | - Ellen M Mowry
- From the Department of Neurology (B.E.D., M.S., N.B.R., A.Z.L., C.W., S.D., P.A.C., K.C.F., E.M.M.), Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
170
|
Raymond C, Yao J, Clifford B, Feiweier T, Oshima S, Telesca D, Zhong X, Meyer H, Everson RG, Salamon N, Cloughesy TF, Ellingson BM. Leveraging Physics-Based Synthetic MR Images and Deep Transfer Learning for Artifact Reduction in Echo-Planar Imaging. AJNR Am J Neuroradiol 2025; 46:733-741. [PMID: 39947682 PMCID: PMC11979845 DOI: 10.3174/ajnr.a8566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 10/01/2024] [Indexed: 04/04/2025]
Abstract
BACKGOUND AND PURPOSE This study utilizes a physics-based approach to synthesize realistic MR artifacts and train a deep learning generative adversarial network (GAN) for use in artifact reduction on EPI, a crucial neuroimaging sequence with high acceleration that is notoriously susceptible to artifacts. MATERIALS AND METHODS A total of 4,573 anatomical MR sequences from 1,392 patients undergoing clinically indicated MRI of the brain were used to create a synthetic data set using physics-based, simulated artifacts commonly found in EPI. By using multiple MRI contrasts, we hypothesized the GAN would learn to correct common artifacts while preserving the inherent contrast information, even for contrasts the network has not been trained on. A modified Pix2PixGAN architecture with an Attention-R2UNet generator was used for the model. Three training strategies were employed: (1) An "all-in-one" model trained on all the artifacts at once; (2) a set of "single models", one for each artifact; and a (3) "stacked transfer learning" approach where a model is first trained on one artifact set, then this learning is transferred to a new model and the process is repeated for the next artifact set. Lastly, the "Stacked Transfer Learning" model was tested on ADC maps from single-shot diffusion MRI data in N = 49 patients diagnosed with recurrent glioblastoma to compare visual quality and lesion measurements between the natively acquired images and AI-corrected images. RESULTS The "stacked transfer learning" approach had superior artifact reduction performance compared to the other approaches as measured by Mean Squared Error (MSE = 0.0016), Structural Similarity Index (SSIM = 0.92), multiscale SSIM (MS-SSIM = 0.92), peak signal-to-noise ratio (PSNR = 28.10), and Hausdorff distance (HAUS = 4.08mm), suggesting that leveraging pre-trained knowledge and sequentially training on each artifact is the best approach this application. In recurrent glioblastoma, significantly higher visual quality was observed in model predicted images compared to native images, while quantitative measurements within the tumor regions remained consistent with non-corrected images. CONCLUSIONS The current study demonstrates the feasibility of using a physics-based method for synthesizing a large data set of images with realistic artifacts and the effectiveness of utilizing this synthetic data set in a "stacked transfer learning" approach to training a GAN for reduction of EPI-based artifacts.
Collapse
Affiliation(s)
- Catalina Raymond
- From the UCLA Brain Tumor Imaging Laboratory (C.R., J.Y., S.O., B.M.E.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Radiological Sciences (C.R., J.Y., S.O., X.Z., N.S., B.M.E), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Jingwen Yao
- From the UCLA Brain Tumor Imaging Laboratory (C.R., J.Y., S.O., B.M.E.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Radiological Sciences (C.R., J.Y., S.O., X.Z., N.S., B.M.E), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Bryan Clifford
- Siemens Medical Solutions USA, Inc. (B.C.), Los Angeles, CA
| | | | - Sonoko Oshima
- From the UCLA Brain Tumor Imaging Laboratory (C.R., J.Y., S.O., B.M.E.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Radiological Sciences (C.R., J.Y., S.O., X.Z., N.S., B.M.E), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Donatello Telesca
- Department of Biostatistics (D.T.), University of California, Los Angeles, Los Angeles, CA, USA
| | - Xiaodong Zhong
- Department of Radiological Sciences (C.R., J.Y., S.O., X.Z., N.S., B.M.E), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Bioengineering (X.Z., B.M.E.), Henry Samueli School of Engineering and Applied Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Heiko Meyer
- Siemens Healthineers AG (T.F., H.M.), Erlangen, Germany
| | - Richard G Everson
- Department of Neurosurgery (R.G.E.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Noriko Salamon
- Department of Radiological Sciences (C.R., J.Y., S.O., X.Z., N.S., B.M.E), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Timothy F Cloughesy
- Department of Neurology (T.F.C.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Benjamin M Ellingson
- From the UCLA Brain Tumor Imaging Laboratory (C.R., J.Y., S.O., B.M.E.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Radiological Sciences (C.R., J.Y., S.O., X.Z., N.S., B.M.E), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Psychiatry and Biobehavioral Sciences (B.M.E.), David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Bioengineering (X.Z., B.M.E.), Henry Samueli School of Engineering and Applied Science, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
171
|
Li W, Hayashi Y, Oda M, Kitasaka T, Misawa K, Mori K. Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images. Int J Comput Assist Radiol Surg 2025; 20:775-785. [PMID: 40021577 PMCID: PMC12034601 DOI: 10.1007/s11548-025-03332-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 02/03/2025] [Indexed: 03/03/2025]
Abstract
PURPOSE Depth estimation is a powerful tool for navigation in laparoscopic surgery. Previous methods utilize predicted depth maps and the relative poses of the camera to accomplish self-supervised depth estimation. However, the smooth surfaces of organs with textureless regions and the laparoscope's complex rotations make depth and pose estimation difficult in laparoscopic scenes. Therefore, we propose a novel and effective self-supervised monocular depth estimation method with self-attention-guided pose estimation and a joint depth-pose loss function for laparoscopic images. METHODS We extract feature maps and calculate the minimum re-projection error as a feature-metric loss to establish constraints based on feature maps with more meaningful representations. Moreover, we introduce the self-attention block in the pose estimation network to predict rotations and translations of the relative poses. In addition, we minimize the difference between predicted relative poses as the pose loss. We combine all of the losses as a joint depth-pose loss. RESULTS The proposed method is extensively evaluated using SCARED and Hamlyn datasets. Quantitative results show that the proposed method achieves improvements of about 18.07 % and 14.00 % in the absolute relative error when combining all of the proposed components for depth estimation on SCARED and Hamlyn datasets. The qualitative results show that the proposed method produces smooth depth maps with low error in various laparoscopic scenes. The proposed method also exhibits a trade-off between computational efficiency and performance. CONCLUSION This study considers the characteristics of laparoscopic datasets and presents a simple yet effective self-supervised monocular depth estimation. We propose a joint depth-pose loss function based on the extracted feature for depth estimation on laparoscopic images guided by a self-attention block. The experimental results prove that all of the proposed components contribute to the proposed method. Furthermore, the proposed method strikes an efficient balance between computational efficiency and performance.
Collapse
Affiliation(s)
- Wenda Li
- Graduate School of Informatics, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan.
| | - Yuichiro Hayashi
- Graduate School of Informatics, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
| | - Masahiro Oda
- Graduate School of Informatics, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
- Information Technology Center, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
| | - Takayuki Kitasaka
- Faculty of Information Science, Aichi Institute of Technology, Toyota, Aichi, Japan
| | | | - Kensaku Mori
- Graduate School of Informatics, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
- Information Technology Center, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
- Research Center of Medical Bigdata, National Institute of Informatics, Tokyo, Japan
| |
Collapse
|
172
|
Yalcinbas MF, Ozturk C, Ozyurt O, Emir UE, Bagci U. Rosette Trajectory MRI Reconstruction with Vision Transformers. Tomography 2025; 11:41. [PMID: 40278708 PMCID: PMC12031261 DOI: 10.3390/tomography11040041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 03/12/2025] [Accepted: 03/14/2025] [Indexed: 04/26/2025] Open
Abstract
INTRODUCTION An efficient pipeline for rosette trajectory magnetic resonance imaging reconstruction is proposed, combining the inverse Fourier transform with a vision transformer (ViT) network enhanced with a convolutional layer. This method addresses the challenges of reconstructing high-quality images from non-Cartesian data by leveraging the ViT's ability to handle complex spatial dependencies without extensive preprocessing. MATERIALS AND METHODS The inverse fast Fourier transform provides a robust initial approximation, which is refined by the ViT network to produce high-fidelity images. RESULTS AND DISCUSSION This approach outperforms established deep learning techniques for normalized root mean squared error, peak signal-to-noise ratio, and entropy-based image quality scores; offers better runtime performance; and remains competitive with respect to other metrics.
Collapse
Affiliation(s)
| | - Cengizhan Ozturk
- Institute of Biomedical Engineering, Boğaziçi University, Istanbul 34684, Turkey;
- Center for Targeted Therapy Technologies (CT3), Boğaziçi University, Istanbul 34984, Turkey
| | - Onur Ozyurt
- Wolfson Brain Imaging Centre, Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2-0QQ, UK;
| | - Uzay E. Emir
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA;
| | - Ulas Bagci
- Machine and Hybrid Intelligence Lab, Northwestern University, Chicago, IL 60611, USA;
| |
Collapse
|
173
|
Liao X, Wei X, Zhou M, Wong HS, Kwong S. Image Quality Assessment: Exploring Joint Degradation Effect of Deep Network Features via Kernel Representation Similarity Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2799-2815. [PMID: 40031058 DOI: 10.1109/tpami.2025.3527004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Typically, deep network-based full-reference image quality assessment (FR-IQA) models compare deep features from reference and distorted images pairwise, overlooking correlations among features from the same source. We propose a dual-branch framework to capture the joint degradation effect among deep network features. The first branch uses kernel representation similarity analysis (KRSA), which compares feature self-similarity matrices via the mean absolute error (MAE). The second branch conducts pairwise comparisons via the MAE, and a training-free logarithmic summation of both branches derives the final score. Our approach contributes in three ways. First, integrating the KRSA with pairwise comparisons enhances the model's perceptual awareness. Second, our approach is adaptable to diverse network architectures. Third, our approach can guide perceptual image enhancement. Extensive experiments on 10 datasets validate our method's efficacy, demonstrating that perceptual deformation widely exists in diverse IQA scenarios and that measuring the joint degradation effect can discern appealing content deformations.
Collapse
|
174
|
Wu H, Meng H, Li C, Liu X, Wen Z, Lee TY. Cartoon Animation Outpainting With Region-Guided Motion Inference. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2086-2100. [PMID: 38502621 DOI: 10.1109/tvcg.2024.3379125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Cartoon animation video is a popular visual entertainment form worldwide, however many classic animations were produced in a 4:3 aspect ratio that is incompatible with modern widescreen displays. Existing methods like cropping lead to information loss while retargeting causes distortion. Animation companies still rely on manual labor to renovate classic cartoon animations, which is tedious and labor-intensive, but can yield higher-quality videos. Conventional extrapolation or inpainting methods tailored for natural videos struggle with cartoon animations due to the lack of textures in anime, which affects the motion estimation of the objects. In this article, we propose a novel framework designed to automatically outpaint 4:3 anime to 16:9 via region-guided motion inference. Our core concept is to identify the motion correspondences between frames within a sequence in order to reconstruct missing pixels. Initially, we estimate optical flow guided by region information to address challenges posed by exaggerated movements and solid-color regions in cartoon animations. Subsequently, frames are stitched to produce a pre-filled guide frame, offering structural clues for the extension of optical flow maps. Finally, a voting and fusion scheme utilizes learned fusion weights to blend the aligned neighboring reference frames, resulting in the final outpainting frame. Extensive experiments confirm the superiority of our approach over existing methods.
Collapse
|
175
|
Pang M, Wang B, Ye M, Cheung YM, Zhou Y, Huang W, Wen B. Heterogeneous Prototype Learning From Contaminated Faces Across Domains via Disentangling Latent Factors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7169-7183. [PMID: 38691434 DOI: 10.1109/tnnls.2024.3393072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
This article studies an emerging practical problem called heterogeneous prototype learning (HPL). Unlike the conventional heterogeneous face synthesis (HFS) problem that focuses on precisely translating a face image from a source domain to another target one without removing facial variations, HPL aims at learning the variation-free prototype of an image in the target domain while preserving the identity characteristics. HPL is a compounded problem involving two cross-coupled subproblems, that is, domain transfer and prototype learning (PL), thus making most of the existing HFS methods that simply transfer the domain style of images unsuitable for HPL. To tackle HPL, we advocate disentangling the prototype and domain factors in their respective latent feature spaces and then replacing the source domain with the target one for generating a new heterogeneous prototype. In doing so, the two subproblems in HPL can be solved jointly in a unified manner. Based on this, we propose a disentangled HPL framework, dubbed DisHPL, which is composed of one encoder-decoder generator and two discriminators. The generator and discriminators play adversarial games such that the generator embeds contaminated images into a prototype feature space only capturing identity information and a domain-specific feature space, while generating realistic-looking heterogeneous prototypes. Experiments on various heterogeneous datasets with diverse variations validate the superiority of DisHPL.
Collapse
|
176
|
Adler TJ, Nölke JH, Reinke A, Tizabi MD, Gruber S, Trofimova D, Ardizzone L, Jaeger PF, Buettner F, Köthe U, Maier-Hein L. Application-driven validation of posteriors in inverse problems. Med Image Anal 2025; 101:103474. [PMID: 39892221 DOI: 10.1016/j.media.2025.103474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 01/13/2025] [Accepted: 01/15/2025] [Indexed: 02/03/2025]
Abstract
Current deep learning-based solutions for image analysis tasks are commonly incapable of handling problems to which multiple different plausible solutions exist. In response, posterior-based methods such as conditional Diffusion Models and Invertible Neural Networks have emerged; however, their translation is hampered by a lack of research on adequate validation. In other words, the way progress is measured often does not reflect the needs of the driving practical application. Closing this gap in the literature, we present the first systematic framework for the application-driven validation of posterior-based methods in inverse problems. As a methodological novelty, it adopts key principles from the field of object detection validation, which has a long history of addressing the question of how to locate and match multiple object instances in an image. Treating modes as instances enables us to perform mode-centric validation, using well-interpretable metrics from the application perspective. We demonstrate the value of our framework through instantiations for a synthetic toy example and two medical vision use cases: pose estimation in surgery and imaging-based quantification of functional tissue parameters for diagnostics. Our framework offers key advantages over common approaches to posterior validation in all three examples and could thus revolutionize performance assessment in inverse problems.
Collapse
Affiliation(s)
- Tim J Adler
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany
| | - Jan-Hinrich Nölke
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany.
| | - Annika Reinke
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany; German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany
| | - Minu Dietlinde Tizabi
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany; National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany
| | - Sebastian Gruber
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany
| | - Dasha Trofimova
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany
| | - Lynton Ardizzone
- Visual Learning Lab, Interdisciplinary Center for Scientific Computing (IWR), Heidelberg, Germany
| | - Paul F Jaeger
- German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany; German Cancer Research Center (DKFZ) Heidelberg, Interactive Machine Learning Group, Heidelberg, Germany
| | - Florian Buettner
- Department of Informatics, Goethe University Frankfurt, Frankfurt, Germany; Department of Medicine, Goethe University Frankfurt, Frankfurt, Germany; German Cancer Consortium (DKTK), partner site Frankfurt, a partnership between DKFZ and UCT Frankfurt-Marburg, Frankfurt, Germany; German Cancer Research Center (DKFZ), Heidelberg, Germany; Frankfurt Cancer Institute, Frankfurt, Germany
| | - Ullrich Köthe
- Visual Learning Lab, Interdisciplinary Center for Scientific Computing (IWR), Heidelberg, Germany
| | - Lena Maier-Hein
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Heidelberg, Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany; German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany; National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany.
| |
Collapse
|
177
|
Yue Z, Shi M. Enhancing space-time video super-resolution via spatial-temporal feature interaction. Neural Netw 2025; 184:107033. [PMID: 39705772 DOI: 10.1016/j.neunet.2024.107033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 08/05/2024] [Accepted: 12/06/2024] [Indexed: 12/23/2024]
Abstract
The target of space-time video super-resolution (STVSR) is to increase both the frame rate (also referred to as the temporal resolution) and the spatial resolution of a given video. Recent approaches solve STVSR using end-to-end deep neural networks. A popular solution is to first increase the frame rate of the video; then perform feature refinement among different frame features; and at last, increase the spatial resolutions of these features. The temporal correlation among features of different frames is carefully exploited in this process. The spatial correlation among features of different (spatial) resolutions, despite being also very important, is however not emphasized. In this paper, we propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations among features of different frames and spatial resolutions. Specifically, the spatial-temporal frame interpolation module is introduced to interpolate low- and high-resolution intermediate frame features simultaneously and interactively. The spatial-temporal local and global refinement modules are respectively deployed afterwards to exploit the spatial-temporal correlation among different features for their refinement. Finally, a novel motion consistency loss is employed to enhance the motion continuity among reconstructed frames. We conduct experiments on three standard benchmarks, Vid4, Vimeo-90K and Adobe240, and the results demonstrate that our method improves the state-of-the-art methods by a considerable margin. Our codes will be available at https://github.com/yuezijie/STINet-Space-time-Video-Super-resolution.
Collapse
Affiliation(s)
- Zijie Yue
- College of Electronic and Information Engineering, Tongji University, China
| | - Miaojing Shi
- College of Electronic and Information Engineering, Tongji University, China; Shanghai Institute of Intelligent Science and Technology, Tongji University, China.
| |
Collapse
|
178
|
Zhang Z, Zhang J, Mai W. VPT: Video portraits transformer for realistic talking face generation. Neural Netw 2025; 184:107122. [PMID: 39799718 DOI: 10.1016/j.neunet.2025.107122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 12/09/2024] [Accepted: 01/02/2025] [Indexed: 01/15/2025]
Abstract
Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio-to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo-realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.
Collapse
Affiliation(s)
- Zhijun Zhang
- School of Automation Science and Engineering, South China University of Technology, China; Key Library of Autonomous Systems and Network Control, Ministry of Education, China; Jiangxi Thousand Talents Plan, Nanchang University, China; College of Computer Science and Engineering, Jishou University, China; Guangdong Artificial Intelligence and Digital Economy Laboratory (Pazhou Lab), China; Shaanxi Provincial Key Laboratory of Industrial Automation, School of Mechanical Engineering, Shaanxi University of Technology, Hanzhong, China; School of Information Science and Engineering, Changsha Normal University, Changsha, China; School of Automation Science and Engineering, and also with the Institute of Artificial Intelligence and Automation, Guangdong University of Petrochemical Technology, Maoming, China; Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot (2024KSYS004), China; The Institute for Super Robotics (Huangpu), Guangzhou,, China.
| | - Jian Zhang
- School of Automation Science and Engineering, South China University of Technology, China; The Institute for Super Robotics (Huangpu), Guangzhou,, China.
| | - Weijian Mai
- School of Automation Science and Engineering, South China University of Technology, China.
| |
Collapse
|
179
|
Lin M, Liu J, Zhang C, Zhao Z, He C, Yu L. Non-Uniform Exposure Imaging via Neuromorphic Shutter Control. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2770-2784. [PMID: 40031061 DOI: 10.1109/tpami.2025.3526280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
By leveraging the blur-noise trade-off, imaging with non-uniform exposures largely extends the image acquisition flexibility in harsh environments. However, the limitation of conventional cameras in perceiving intra-frame dynamic information prevents existing methods from being implemented in the real-world frame acquisition for real-time adaptive camera shutter control. To address this challenge, we propose a novel Neuromorphic Shutter Control (NSC) system to avoid motion blur and alleviate instant noise, where the extremely low latency of events is leveraged to monitor the real-time motion and facilitate the scene-adaptive exposure. Furthermore, to stabilize the inconsistent Signal-to-Noise Ratio (SNR) caused by the non-uniform exposure times, we propose an event-based image denoising network within a self-supervised learning paradigm, i.e., SEID, exploring the statistics of image noise and inter-frame motion information of events to obtain artificial supervision signals for high-quality imaging in real-world scenes. To illustrate the effectiveness of the proposed NSC, we implement it in hardware by building a hybrid-camera imaging prototype system, with which we collect a real-world dataset containing well-synchronized frames and events in diverse scenarios with different target scenes and motion patterns. Experiments on the synthetic and real-world datasets demonstrate the superiority of our method over state-of-the-art approaches.
Collapse
|
180
|
Zhong Y, Huang Y, Hu J, Zhang Y, Ji R. Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2676-2692. [PMID: 40031001 DOI: 10.1109/tpami.2025.3528042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance. This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially. The first step, Activation quantization error reduction (Aqer), first applies Reparameterization Initialization aimed at mitigating initial quantization errors in high-variance activations. Then, it further mitigates the errors by formulating a Ridge Regression problem, which updates the weights maintained at full-precision using a closed-form solution. The second step, Weight quantization error reduction (Wqer), first applies Dual Uniform Quantization to handle weights with numerous outliers, which arise from adjustments made during Reparameterization Initialization, thereby reducing initial weight quantization errors. Then, it employs an iterative approach to further tackle the errors. In each iteration, it adopts Rounding Refinement that uses an empirically derived, efficient proxy to refine the rounding directions of quantized weights, complemented by a Ridge Regression solver to reduce the errors. Comprehensive experimental results demonstrate ERQ's superior performance across various ViTs variants and tasks. For example, ERQ surpasses the state-of-the-art GPTQ by a notable 36.81% in accuracy for W3A4 ViT-S.
Collapse
|
181
|
Sunaguchi N, Yuasa T, Shimao D, Huang Z, Ichihara S, Nishimura R, Iwakoshi A, Kim JK, Gupta R, Ando M. Superimposed Wavefront Imaging of Diffraction-enhanced X-rays: sparsity-aware CT reconstruction from limited-view projections. Int J Comput Assist Radiol Surg 2025; 20:653-663. [PMID: 39724204 PMCID: PMC12034596 DOI: 10.1007/s11548-024-03303-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 11/29/2024] [Indexed: 12/28/2024]
Abstract
PURPOSE In this paper, we describe an algebraic reconstruction algorithm with a total variation regularization (ART + TV) based on the Superimposed Wavefront Imaging of Diffraction-enhanced X-rays (SWIDeX) method to effectively reduce the number of projections required for differential phase-contrast CT reconstruction. METHODS SWIDeX is a technique that uses a Laue-case Si analyzer with closely spaced scintillator to generate second derivative phase-contrast images with high contrast of a subject. When the projections obtained by this technique are reconstructed, a Laplacian phase-contrast tomographic image with higher sparsity than the original physical distribution of the subject can be obtained. In the proposed method, the Laplacian image is first obtained by applying ART + TV, which is expected to reduce the projection with higher sparsity, to the projection obtained from SWIDeX with a limited number of views. Then, by solving Poisson's equation for the Laplacian image, a tomographic image representing the refractive index distribution is obtained. RESULTS Simulations and actual X-ray experiments were conducted to demonstrate the effectiveness of the proposed method in projection reduction. In the simulation, image quality was maintained even when the number of projections was reduced to about 1/10 of the originally required views, and in the actual experiment, biological tissue structure was maintained even when the number of projections was reduced to about 1/30. CONCLUSION SWIDeX can visualize the internal structures of biological tissues with very high contrast, and the proposed method will be useful for CT reconstruction from large projection data with a wide field of view and high spatial resolution.
Collapse
Affiliation(s)
- Naoki Sunaguchi
- Department of Radiological and Medical Laboratory Sciences, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 461-8673, Japan.
| | - Tetsuya Yuasa
- Graduate School of Engineering and Science, Yamagata University, Yonezawa, Yamagata, 992-8510, Japan
| | - Daisuke Shimao
- Department of Radiological Sciences, International University of Health and Welfare, Otawara, Tochigi, 324-8501, Japan
| | - Zhuoran Huang
- Department of Radiological and Medical Laboratory Sciences, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 461-8673, Japan
| | - Shu Ichihara
- Department of Pathology, NHO Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Rieko Nishimura
- Department of Pathology, NHO Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Akari Iwakoshi
- Department of Pathology, NHO Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Jong-Ki Kim
- Biomedical Engineering and Radiology, School of Medicine, Catholic University of Daegu, Daegu, 705-034, Korea
| | - Rajiv Gupta
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| | - Masami Ando
- High Energy Accelerator Research Organization, Tsukuba, Ibaraki, 305-0801, Japan
| |
Collapse
|
182
|
Zhang J, Bell MAL. Overfit detection method for deep neural networks trained to beamform ultrasound images. ULTRASONICS 2025; 148:107562. [PMID: 39746284 PMCID: PMC11839378 DOI: 10.1016/j.ultras.2024.107562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 12/18/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025]
Abstract
Deep neural networks (DNNs) have remarkable potential to reconstruct ultrasound images. However, this promise can suffer from overfitting to training data, which is typically detected via loss function monitoring during an otherwise time-consuming training process or via access to new sources of test data. We present a method to detect overfitting with associated evaluation approaches that only require knowledge of a network architecture and associated trained weights. Three types of artificial DNN inputs (i.e., zeros, ones, and Gaussian noise), unseen during DNN training, were input to three DNNs designed for ultrasound image formation, trained on multi-site data, and submitted to the Challenge on Ultrasound Beamforming with Deep Learning (CUBDL). Overfitting was detected using these artificial DNN inputs. Qualitative and quantitative comparisons of DNN-created images to ground truth images immediately revealed signs of overfitting (e.g., zeros input produced mean output values ≥0.08, ones input produced mean output values ≤0.07, with corresponding image-to-image normalized correlations ≤0.8). The proposed approach is promising to detect overfitting without requiring lengthy network retraining or the curation of additional test data. Potential applications include sanity checks during federated learning, as well as optimization, security, public policy, regulation creation, and benchmarking.
Collapse
Affiliation(s)
- Jiaxin Zhang
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Muyinatu A Lediju Bell
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
183
|
Kawata N, Iwao Y, Matsuura Y, Higashide T, Okamoto T, Sekiguchi Y, Nagayoshi M, Takiguchi Y, Suzuki T, Haneishi H. Generation of short-term follow-up chest CT images using a latent diffusion model in COVID-19. Jpn J Radiol 2025; 43:622-633. [PMID: 39585556 PMCID: PMC11953082 DOI: 10.1007/s11604-024-01699-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 11/02/2024] [Indexed: 11/26/2024]
Abstract
PURPOSE Despite a global decrease in the number of COVID-19 patients, early prediction of the clinical course for optimal patient care remains challenging. Recently, the usefulness of image generation for medical images has been investigated. This study aimed to generate short-term follow-up chest CT images using a latent diffusion model in patients with COVID-19. MATERIALS AND METHODS We retrospectively enrolled 505 patients with COVID-19 for whom the clinical parameters (patient background, clinical symptoms, and blood test results) upon admission were available and chest CT imaging was performed. Subject datasets (n = 505) were allocated for training (n = 403), and the remaining (n = 102) were reserved for evaluation. The image underwent variational autoencoder (VAE) encoding, resulting in latent vectors. The information consisting of initial clinical parameters and radiomic features were formatted as a table data encoder. Initial and follow-up latent vectors and the initial table data encoders were utilized for training the diffusion model. The evaluation data were used to generate prognostic images. Then, similarity of the prognostic images (generated images) and the follow-up images (real images) was evaluated by zero-mean normalized cross-correlation (ZNCC), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). Visual assessment was also performed using a numerical rating scale. RESULTS Prognostic chest CT images were generated using the diffusion model. Image similarity showed reasonable values of 0.973 ± 0.028 for the ZNCC, 24.48 ± 3.46 for the PSNR, and 0.844 ± 0.075 for the SSIM. Visual evaluation of the images by two pulmonologists and one radiologist yielded a reasonable mean score. CONCLUSIONS The similarity and validity of generated predictive images for the course of COVID-19-associated pneumonia using a diffusion model were reasonable. The generation of prognostic images may suggest potential utility for early prediction of the clinical course in COVID-19-associated pneumonia and other respiratory diseases.
Collapse
Affiliation(s)
- Naoko Kawata
- Department of Respirology, Graduate School of Medicine, Chiba University, 1-8-1, Inohana, Chuo-Ku, Chiba-Shi, Chiba, 260-8677, Japan.
- Graduate School of Science and Engineering, Chiba University, Chiba, 263-8522, Japan.
| | - Yuma Iwao
- Center for Frontier Medical Engineering, Chiba University, 1-33, Yayoi-Cho, Inage-Ku, Chiba-Shi, Chiba, 263-8522, Japan
- Institute for Quantum Medical Science, National Institutes for Quantum Science and Technology, 4-9-1, Anagawa, Inage-Ku, Chiba-Shi, Chiba, 263-8555, Japan
| | - Yukiko Matsuura
- Department of Respiratory Medicine, Chiba Aoba Municipal Hospital, 1273-2 Aoba-Cho, Chuo-Ku, Chiba-Shi, Chiba, 260-0852, Japan
| | - Takashi Higashide
- Department of Radiology, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-Shi, Chiba, 260-8677, Japan
- Department of Radiology, Japanese Red Cross Narita Hospital, 90-1, Iida-Cho, Narita-Shi, Chiba, 286-8523, Japan
| | - Takayuki Okamoto
- Center for Frontier Medical Engineering, Chiba University, 1-33, Yayoi-Cho, Inage-Ku, Chiba-Shi, Chiba, 263-8522, Japan
| | - Yuki Sekiguchi
- Graduate School of Science and Engineering, Chiba University, Chiba, 263-8522, Japan
| | - Masaru Nagayoshi
- Department of Respiratory Medicine, Chiba Aoba Municipal Hospital, 1273-2 Aoba-Cho, Chuo-Ku, Chiba-Shi, Chiba, 260-0852, Japan
| | - Yasuo Takiguchi
- Department of Respiratory Medicine, Chiba Aoba Municipal Hospital, 1273-2 Aoba-Cho, Chuo-Ku, Chiba-Shi, Chiba, 260-0852, Japan
| | - Takuji Suzuki
- Department of Respirology, Graduate School of Medicine, Chiba University, 1-8-1, Inohana, Chuo-Ku, Chiba-Shi, Chiba, 260-8677, Japan
| | - Hideaki Haneishi
- Center for Frontier Medical Engineering, Chiba University, 1-33, Yayoi-Cho, Inage-Ku, Chiba-Shi, Chiba, 263-8522, Japan
| |
Collapse
|
184
|
Waida H, Yamazaki K, Tokuhisa A, Wada M, Wada Y. Investigating self-supervised image denoising with denaturation. Neural Netw 2025; 184:106966. [PMID: 39700824 DOI: 10.1016/j.neunet.2024.106966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 10/08/2024] [Accepted: 11/25/2024] [Indexed: 12/21/2024]
Abstract
Self-supervised learning for image denoising problems in the presence of denaturation for noisy data is a crucial approach in machine learning. However, theoretical understanding of the performance of the approach that uses denatured data is lacking. To provide better understanding of the approach, in this paper, we analyze a self-supervised denoising algorithm that uses denatured data in depth through theoretical analysis and numerical experiments. Through the theoretical analysis, we discuss that the algorithm finds desired solutions to the optimization problem with the population risk, while the guarantee for the empirical risk depends on the hardness of the denoising task in terms of denaturation levels. We also conduct several experiments to investigate the performance of an extended algorithm in practice. The results indicate that the algorithm training with denatured images works, and the empirical performance aligns with the theoretical results. These results suggest several insights for further improvement of self-supervised image denoising that uses denatured data in future directions.
Collapse
Affiliation(s)
- Hiroki Waida
- Department of Mathematical and Computing Science, Institute of Science Tokyo, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan
| | - Kimihiro Yamazaki
- Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa, 211-8588, Japan
| | - Atsushi Tokuhisa
- RIKEN Center for Computational Science, 7-1-26 Minatojima-minami-machi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
| | - Mutsuyo Wada
- Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa, 211-8588, Japan
| | - Yuichiro Wada
- Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa, 211-8588, Japan; RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building, 15th floor, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
| |
Collapse
|
185
|
Oh G, Kim S, Gu H, Yoon SH, Kim J, Kim S. FPANet: Frequency-based video demoiréing using frame-level post alignment. Neural Netw 2025; 184:107021. [PMID: 39733699 DOI: 10.1016/j.neunet.2024.107021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 10/07/2024] [Accepted: 12/03/2024] [Indexed: 12/31/2024]
Abstract
Moiré patterns, created by the interference between overlapping grid patterns in the pixel space, degrade the visual quality of images and videos. Therefore, removing such patterns (demoiréing) is crucial, yet remains a challenge due to their complexities in sizes and distortions. Conventional methods mainly tackle this task by only exploiting the spatial domain of the input images, limiting their capabilities in removing large-scale moiré patterns. Therefore, this work proposes FPANet, an image-video demoiréing network that learns filters in both frequency and spatial domains, improving the restoration quality by removing various sizes of moiré patterns. To further enhance, our model takes multiple consecutive frames, learning to extract frame-invariant content features and outputting better quality temporally consistent images. We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset, observing that ours outperforms the state-of-the-art approaches in terms of image and video quality metrics and visual experience.
Collapse
Affiliation(s)
- Gyeongrok Oh
- Department of Artificial Intelligence, Korea University, South Korea
| | - Sungjune Kim
- Department of Artificial Intelligence, Korea University, South Korea
| | - Heon Gu
- LG Display Research Center, South Korea
| | - Sang Ho Yoon
- Graduate School of Culture Technology, KAIST, South Korea
| | - Jinkyu Kim
- Department of Computer Science and Engineering, Korea University, South Korea.
| | - Sangpil Kim
- Department of Artificial Intelligence, Korea University, South Korea.
| |
Collapse
|
186
|
Broad Z, Robinson AW, Wells J, Nicholls D, Moshtaghpour A, Kirkland AI, Browning ND. Compressive electron backscatter diffraction imaging. J Microsc 2025; 298:44-57. [PMID: 39797608 DOI: 10.1111/jmi.13379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 10/22/2024] [Accepted: 12/18/2024] [Indexed: 01/13/2025]
Abstract
Electron backscatter diffraction (EBSD) has developed over the last few decades into a valuable crystallographic characterisation method for a wide range of sample types. Despite these advances, issues such as the complexity of sample preparation, relatively slow acquisition, and damage in beam-sensitive samples, still limit the quantity and quality of interpretable data that can be obtained. To mitigate these issues, here we propose a method based on the subsampling of probe positions and subsequent reconstruction of an incomplete data set. The missing probe locations (or pixels in the image) are recovered via an inpainting process using a dictionary-learning based method called beta-process factor analysis (BPFA). To investigate the robustness of both our inpainting method and Hough-based indexing, we simulate subsampled and noisy EBSD data sets from a real fully sampled Ni-superalloy data set for different subsampling ratios of probe positions using both Gaussian and Poisson noise models. We find that zero solution pixel detection (inpainting un-indexed pixels) enables higher-quality reconstructions to be obtained. Numerical tests confirm high-quality reconstruction of band contrast and inverse pole figure maps from only 10% of the probe positions, with the potential to reduce this to 5% if only inverse pole figure maps are needed. These results show the potential application of this method in EBSD, allowing for faster analysis and extending the use of this technique to beam sensitive materials.
Collapse
Affiliation(s)
- Zoë Broad
- Department of Mechanical, Materials and Aerospace Engineering, University of Liverpool, Liverpool, UK
| | | | | | | | - Amirafshar Moshtaghpour
- Correlated Imaging Group, Rosalind Franklin Institute, Harwell Science and Innovation Campus, Didcot, UK
| | - Angus I Kirkland
- Correlated Imaging Group, Rosalind Franklin Institute, Harwell Science and Innovation Campus, Didcot, UK
- Department of Materials, University of Oxford, Oxford, UK
| | - Nigel D Browning
- Department of Mechanical, Materials and Aerospace Engineering, University of Liverpool, Liverpool, UK
- SenseAI Innovations Ltd., Liverpool, UK
| |
Collapse
|
187
|
Sun Y, Wang L, Li G, Lin W, Wang L. A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks. Nat Biomed Eng 2025; 9:521-538. [PMID: 39638876 DOI: 10.1038/s41551-024-01283-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 10/17/2024] [Indexed: 12/07/2024]
Abstract
In structural magnetic resonance (MR) imaging, motion artefacts, low resolution, imaging noise and variability in acquisition protocols frequently degrade image quality and confound downstream analyses. Here we report a foundation model for the motion correction, resolution enhancement, denoising and harmonization of MR images. Specifically, we trained a tissue-classification neural network to predict tissue labels, which are then leveraged by a 'tissue-aware' enhancement network to generate high-quality MR images. We validated the model's effectiveness on a large and diverse dataset comprising 2,448 deliberately corrupted images and 10,963 images spanning a wide age range (from foetuses to elderly individuals) acquired using a variety of clinical scanners across 19 public datasets. The model consistently outperformed state-of-the-art algorithms in improving the quality of MR images, handling pathological brains with multiple sclerosis or gliomas, generating 7-T-like images from 3 T scans and harmonizing images acquired from different scanners. The high-quality, high-resolution and harmonized images generated by the model can be used to enhance the performance of models for tissue segmentation, registration, diagnosis and other downstream tasks.
Collapse
Affiliation(s)
- Yue Sun
- Developing Brain Computing Lab, Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill and North Carolina State University, Chapel Hill, NC, USA
| | - Limei Wang
- Developing Brain Computing Lab, Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill and North Carolina State University, Chapel Hill, NC, USA
| | - Gang Li
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Weili Lin
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Li Wang
- Developing Brain Computing Lab, Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
188
|
Weiser PJ, Langs G, Bogner W, Motyka S, Strasser B, Golland P, Singh N, Dietrich J, Uhlmann E, Batchelor T, Cahill D, Hoffmann M, Klauser A, Andronesi OC. Deep-ER: Deep Learning ECCENTRIC Reconstruction for fast high-resolution neurometabolic imaging. Neuroimage 2025; 309:121045. [PMID: 39894238 PMCID: PMC11952141 DOI: 10.1016/j.neuroimage.2025.121045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 01/16/2025] [Accepted: 01/22/2025] [Indexed: 02/04/2025] Open
Abstract
INTRODUCTION Altered neurometabolism is an important pathological mechanism in many neurological diseases and brain cancer, which can be mapped non-invasively by Magnetic Resonance Spectroscopic Imaging (MRSI). Advanced MRSI using non-cartesian compressed-sense acquisition enables fast high-resolution metabolic imaging but has lengthy reconstruction times that limits throughput and needs expert user interaction. Here, we present a robust and efficient Deep Learning reconstruction embedded in a physical model within an end-to-end automated processing pipeline to obtain high-quality metabolic maps. METHODS Fast high-resolution whole-brain metabolic imaging was performed at 3.4 mm3 isotropic resolution with acquisition times between 4:11-9:21 min:s using ECCENTRIC pulse sequence on a 7T MRI scanner. Data were acquired in a high-resolution phantom and 27 human participants, including 22 healthy volunteers and 5 glioma patients. A deep neural network using recurring interlaced convolutional layers with joint dual-space feature representation was developed for deep learning ECCENTRIC reconstruction (Deep-ER). 21 subjects were used for training and 6 subjects for testing. Deep-ER performance was compared to iterative compressed sensing Total Generalized Variation reconstruction using image and spectral quality metrics. RESULTS Deep-ER demonstrated 600-fold faster reconstruction than conventional methods, providing improved spatial-spectral quality and metabolite quantification with 12%-45% (P<0.05) higher signal-to-noise and 8%-50% (P<0.05) smaller Cramer-Rao lower bounds. Metabolic images clearly visualize glioma tumor heterogeneity and boundary. Deep-ER generalizes reliably to unseen data. CONCLUSION Deep-ER provides efficient and robust reconstruction for sparse-sampled MRSI. The accelerated acquisition-reconstruction MRSI is compatible with high-throughput imaging workflow. It is expected that such improved performance will facilitate basic and clinical MRSI applications for neuroscience and precision medicine.
Collapse
Affiliation(s)
- Paul J Weiser
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA; Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Computational Imaging Research Lab - Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria.
| | - Georg Langs
- Computational Imaging Research Lab - Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria
| | - Wolfgang Bogner
- High Field MR Center - Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, Vienna, Austria
| | - Stanislav Motyka
- Computational Imaging Research Lab - Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria
| | - Bernhard Strasser
- High Field MR Center - Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, Vienna, Austria
| | - Polina Golland
- Computer Science and Artificial Intelligence Lab, MIT, Cambridge, MA, USA
| | - Nalini Singh
- Computer Science and Artificial Intelligence Lab, MIT, Cambridge, MA, USA
| | - Jorg Dietrich
- Pappas Center for Neuro-Oncology, Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Erik Uhlmann
- Department of Neurology, Beth-Israel Deaconess Medical Center, Boston, MA, USA
| | - Tracy Batchelor
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Cahill
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, USA
| | - Malte Hoffmann
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA; Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Antoine Klauser
- Advanced Clinical Imaging Technology, Siemens Healthineers International AG, Lausanne, Switzerland; Center for Biomedical Imaging (CIBM), Geneva, Switzerland
| | - Ovidiu C Andronesi
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA; Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
189
|
Liang D, Yao Y, Ye M, Luo Q, Chu J. Automatic visual detection of activated sludge microorganisms based on microscopic phase contrast image optimisation and deep learning. J Microsc 2025; 298:58-73. [PMID: 39846854 DOI: 10.1111/jmi.13385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/02/2025] [Accepted: 01/13/2025] [Indexed: 01/24/2025]
Abstract
The types and quantities of microorganisms in activated sludge are directly related to the stability and efficiency of sewage treatment systems. This paper proposes a sludge microorganism detection method based on microscopic phase contrast image optimisation and deep learning. Firstly, a dataset containing eight types of microorganisms is constructed, and an augmentation strategy based on single and multisamples processing is designed to address the issues of sample deficiency and uneven distribution. Secondly, a phase contrast image quality optimisation algorithm based on fused variance is proposed, which can effectively improve the standard deviation, entropy, and detection performance. Thirdly, a lightweight YOLOv8n-SimAM model is designed, which introduces a SimAM attention module to suppress the complex background interference and enhance attentions to the target objects. The lightweight of the network is realised using a detection head based on multiscale information fusion convolutional module. In addition, a new loss function IW-IoU is proposed to improve the generalisation ability and overall performance. Comparative and ablative experiments are conducted, demonstrating the great application potential for rapid and accurate detection of microbial targets. Compared to the baseline model, the proposed method improves the detection accuracy by 12.35% and hastens the running speed by 37.9 frames per second while evidently reducing the model size.
Collapse
Affiliation(s)
- Dan Liang
- Ningbo Key Laboratory of Micro-Nano Motion and Intelligent Control, Ningbo University, Ningbo, PR China
- Part Rolling Key Laboratory of Zhejiang Province, Ningbo University, Ningbo, PR China
| | - Yuming Yao
- Ningbo Key Laboratory of Micro-Nano Motion and Intelligent Control, Ningbo University, Ningbo, PR China
| | - Minjie Ye
- Ningbo Key Laboratory of Micro-Nano Motion and Intelligent Control, Ningbo University, Ningbo, PR China
| | - Qinze Luo
- Ningbo Key Laboratory of Micro-Nano Motion and Intelligent Control, Ningbo University, Ningbo, PR China
| | - Jiale Chu
- Ningbo Key Laboratory of Micro-Nano Motion and Intelligent Control, Ningbo University, Ningbo, PR China
| |
Collapse
|
190
|
Zhao X, Du Y, Peng Y. DLPVI: Deep learning framework integrating projection, view-by-view backprojection, and image domains for high- and ultra-sparse-view CBCT reconstruction. Comput Med Imaging Graph 2025; 121:102508. [PMID: 39921927 DOI: 10.1016/j.compmedimag.2025.102508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 01/07/2025] [Accepted: 01/30/2025] [Indexed: 02/10/2025]
Abstract
This study proposes a deep learning framework, DLPVI, which integrates projection, view-by-view backprojection (VVBP), and image domains to improve the quality of high-sparse-view and ultra-sparse-view cone-beam computed tomography (CBCT) images. The DLPVI comprises a projection domain sub-framework, a VVBP domain sub-framework, and a Transformer-based image domain model. First, full-view projections were restored from sparse-view projections via the projection domain sub-framework, then filtered and view-by-view backprojected to generate VVBP raw data. Next, the VVBP raw data was processed by the VVBP domain sub-framework to suppress residual noise and artifacts, and produce CBCT axial images. Finally, the axial images were further refined using the image domain model. The DLPVI was trained, validated, and tested on CBCT data from 163, 30, and 30 real patients respectively. Quantitative metrics including root-mean-square error (RMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and feature similarity (FSIM) were calculated to evaluate the method performance. The DLPVI was compared with 15 state-of-the-art (SOTA) methods, including 2 projection domain models, 10 image domain models, and 3 projection-image dual-domain frameworks, on 1/8 high-sparse-view and 1/16 ultra-sparse-view reconstruction tasks. Statistical analysis was conducted using the Kruskal-Wallis test, followed by the post-hoc Dunn's test. Experimental results demonstrated that the DLPVI outperformed all 15 SOTA methods for both tasks, with statistically significant improvements (p < 0.05 in Kruskal-Wallis test and p < 0.05/15 in Dunn's test). The proposed DLPVI effectively improves the quality of high- and ultra-sparse-view CBCT images.
Collapse
Affiliation(s)
- Xuzhi Zhao
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
| | - Yi Du
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China; Institute of Medical Technology, Peking University Health Science Center, Beijing, China.
| | - Yahui Peng
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
191
|
Pan S, Chang CW, Tian Z, Wang T, Axente M, Shelton J, Liu T, Roper J, Yang X. Data-Driven Volumetric Computed Tomography Image Generation From Surface Structures Using a Patient-Specific Deep Leaning Model. Int J Radiat Oncol Biol Phys 2025; 121:1349-1360. [PMID: 39577474 DOI: 10.1016/j.ijrobp.2024.11.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 10/18/2024] [Accepted: 11/07/2024] [Indexed: 11/24/2024]
Abstract
PURPOSE Optical surface imaging presents radiation-dose-free and noninvasive approaches for image guided radiation therapy, allowing continuous monitoring during treatment delivery. However, it falls short in cases where correlation of motion between body surface and internal tumor is complex, limiting the use of purely surface guided surrogates for tumor tracking. Relying solely on surface guided radiation therapy (SGRT) may not ensure accurate intrafractional monitoring. This work aims to develop a data-driven framework, mitigating the limitations of SGRT in lung cancer radiation therapy by reconstructing volumetric computed tomography (CT) images from surface images. METHODS AND MATERIALS We conducted a retrospective analysis involving 50 patients with lung cancer who underwent radiation therapy and had 10-phase 4-dimensional CT (4DCT) scans during their treatment simulation. For each patient, we used 9 phases of 4DCT images for patient-specific model training and validation, reserving 1 phase for testing purposes. Our approach employed a surface-to-volume image synthesis framework, harnessing cycle-consistency generative adversarial networks to transform surface images into volumetric representations. The framework was extensively validated using an additional 6-patient cohort with resimulated 4DCT. RESULTS The proposed technique has produced accurate volumetric CT images from the patient's body surface. In comparison with the ground truth CT images, those generated synthetically by the proposed method exhibited the gross tumor volume center of mass difference of 1.72 ± 0.87 mm, the overall mean absolute error of 36.2 ± 7.0 HU, structural similarity index measure of 0.94 ± 0.02, and Dice score coefficient of 0.81 ± 0.07. Furthermore, the robustness of the proposed framework was found to be linked to respiratory motion. CONCLUSIONS The proposed approach provides a novel solution to overcome the limitation of SGRT for lung cancer radiation therapy, which can potentially enable real-time volumetric imaging during radiation treatment delivery for accurate tumor tracking without radiation-induced risk. This data-driven framework offers a comprehensive solution to tackle motion management in radiation therapy, without necessitating the rigid application of first principles modeling for organ motion.
Collapse
Affiliation(s)
- Shaoyan Pan
- Departments of Radiation Oncology and Winship Cancer Institute, Atlanta, Georgia; Departments of Biomedical Informatics, Emory University, Atlanta, Georgia
| | - Chih-Wei Chang
- Departments of Radiation Oncology and Winship Cancer Institute, Atlanta, Georgia
| | - Zhen Tian
- Department of Radiation & Cellular Oncology, University of Chicago, Chicago, Illinois
| | - Tonghe Wang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center
| | - Marian Axente
- Departments of Radiation Oncology and Winship Cancer Institute, Atlanta, Georgia
| | - Joseph Shelton
- Departments of Radiation Oncology and Winship Cancer Institute, Atlanta, Georgia
| | - Tian Liu
- Department of Radiation Oncology, Mount Sinai Medical Center, New York, New York
| | - Justin Roper
- Departments of Radiation Oncology and Winship Cancer Institute, Atlanta, Georgia
| | - Xiaofeng Yang
- Departments of Radiation Oncology and Winship Cancer Institute, Atlanta, Georgia; Departments of Biomedical Informatics, Emory University, Atlanta, Georgia.
| |
Collapse
|
192
|
Huang J, Tan T, Li X, Ye T, Wu Y. Multiple attention channels aggregated network for multimodal medical image fusion. Med Phys 2025; 52:2356-2374. [PMID: 39729625 DOI: 10.1002/mp.17607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 11/15/2024] [Accepted: 12/08/2024] [Indexed: 12/29/2024] Open
Abstract
BACKGROUND In clinical practices, doctors usually need to synthesize several single-modality medical images for diagnosis, which is a time-consuming and costly process. With this background, multimodal medical image fusion (MMIF) techniques have emerged to synthesize medical images of different modalities, providing a comprehensive and objective interpretation of the lesion. PURPOSE Although existing MMIF approaches have shown promising results, they often overlook the importance of multiscale feature diversity and attention interaction, which are essential for superior visual outcomes. This oversight can lead to diminished fusion performance. To bridge the gaps, we introduce a novel approach that emphasizes the integration of multiscale features through a structured decomposition and attention interaction. METHODS Our method first decomposes the source images into three distinct groups of multiscale features by stacking different numbers of diverse branch blocks. Then, to extract global and local information separately for each group of features, we designed the convolutional and Transformer block attention branch. These two attention branches make full use of channel and spatial attention mechanisms and achieve attention interaction, enabling the corresponding feature channels to fully capture local and global information and achieve effective inter-block feature aggregation. RESULTS For the MRI-PET fusion type, MACAN achieves average improvements of 24.48%, 27.65%, 19.24%, 27.32%, 18.51%, and 10.33% over the compared methods in terms of Qcb, AG, SSIM, SF, Qabf, and VIF metrics, respectively. Similarly, for the MRI-SPECT fusion type, MACAN outperforms the compared methods with average improvements of 29.13%, 26.43%, 18.20%, 27.71%, 16.79%, and 10.38% in the same metrics. In addition, our method demonstrates promising results in segmentation experiments. Specifically, for the T2-T1ce fusion, it achieves a Dice coefficient of 0.60 and a Hausdorff distance of 15.15. Comparable performance is observed for the Flair-T1ce fusion, with a Dice coefficient of 0.60 and a Hausdorff distance of 13.27. CONCLUSION The proposed multiple attention channels aggregated network (MACAN) can effectively retain the complementary information from source images. The evaluation of MACAN through medical image fusion and segmentation experiments on public datasets demonstrated its superiority over the state-of-the-art methods, both in terms of visual quality and objective metrics. Our code is available at https://github.com/JasonWong30/MACAN.
Collapse
Affiliation(s)
- Jingxue Huang
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Tianshu Tan
- School of Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
| | - Xiaosong Li
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
- Guangdong Provincial Key Laboratory of Industrial Intelligent Inspection Technology, Foshan University, Foshan, China
- Guangdong-HongKong-Macao Joint Laboratory for Intelligent Micro-Nano Optoelectronic Technology, Foshan University, Foshan, China
| | - Tao Ye
- School of Mechanical Electronic and Information Engineering, China University of Mining and Technology, Beijing, China
| | - Yanxiong Wu
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| |
Collapse
|
193
|
Fan H, Li S, Shao C, Shen Y, Yao XR, Zhao Q. Enhanced Fourier single-pixel imaging via positive-negative dithering. OPTICS LETTERS 2025; 50:2247-2250. [PMID: 40167692 DOI: 10.1364/ol.551685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 02/08/2025] [Indexed: 04/02/2025]
Abstract
Fourier single-pixel imaging (FSI) takes full advantage of the high modulation speed of digital micromirror devices by applying upsampling and spatial dithering to binarize grayscale Fourier patterns, thereby achieving efficient imaging. However, the upsampling process of patterns sacrifices spatial resolution. Here, we propose a binarization method for FSI that enhances reconstructed image quality without the need for upsampling. The key is applying spatial dithering with a serpentine path directly to both positive and negative components of Fourier patterns before binarization. By quantizing these components into {-1, 0, +1} values and subsequently mapping them to binary patterns, our method reduces quantization errors in Fourier coefficient acquisition. Both simulation and experimental results demonstrate that the method significantly improves imaging quality. It can also be applied to other types of single-pixel imaging that use positive-negative grayscale patterns.
Collapse
|
194
|
Xiao Y, Shen Y, Liao S, Yao B, Cai X, Zhang Y, Gao F. Limited-view photoacoustic imaging reconstruction via high-quality self-supervised neural representation. PHOTOACOUSTICS 2025; 42:100685. [PMID: 39931293 PMCID: PMC11808520 DOI: 10.1016/j.pacs.2025.100685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 01/04/2025] [Accepted: 01/06/2025] [Indexed: 02/13/2025]
Abstract
In practical applications within the human body, it is often challenging to fully encompass the target tissue or organ, necessitating the use of limited-view arrays, which can lead to the loss of crucial information. Addressing the reconstruction of photoacoustic sensor signals in limited-view detection spaces has become a focal point of current research. In this study, we introduce a self-supervised network termed HIgh-quality Self-supervised neural representation (HIS), which tackles the inverse problem of photoacoustic imaging to reconstruct high-quality photoacoustic images from sensor data acquired under limited viewpoints. We regard the desired reconstructed photoacoustic image as an implicit continuous function in 2D image space, viewing the pixels of the image as sparse discrete samples. The HIS's objective is to learn the continuous function from limited observations by utilizing a fully connected neural network combined with Fourier feature position encoding. By simply minimizing the error between the network's predicted sensor data and the actual sensor data, HIS is trained to represent the observed continuous model. The results indicate that the proposed HIS model offers superior image reconstruction quality compared to three commonly used methods for photoacoustic image reconstruction.
Collapse
Affiliation(s)
- Youshen Xiao
- School of Information Science and Technology, ShanghaiTech University, No. 393 HuaXia Middle Road, Pudong New Dist., 201210, China
| | - Yuting Shen
- School of Information Science and Technology, ShanghaiTech University, No. 393 HuaXia Middle Road, Pudong New Dist., 201210, China
| | - Sheng Liao
- School of Information Science and Technology, ShanghaiTech University, No. 393 HuaXia Middle Road, Pudong New Dist., 201210, China
| | - Bowei Yao
- School of Information Science and Technology, ShanghaiTech University, No. 393 HuaXia Middle Road, Pudong New Dist., 201210, China
| | - Xiran Cai
- School of Information Science and Technology, ShanghaiTech University, No. 393 HuaXia Middle Road, Pudong New Dist., 201210, China
| | - Yuyao Zhang
- School of Information Science and Technology, ShanghaiTech University, No. 393 HuaXia Middle Road, Pudong New Dist., 201210, China
| | - Fei Gao
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230026, China
- Hybrid Imaging System Laboratory, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, 215123, China
- School of Engineering Science, University of Science and Technology of China, Hefei, Anhui, 230026, China
| |
Collapse
|
195
|
Xiao Y, Yang F, Deng Q, Ming Y, Tang L, Yue S, Li Z, Zhang B, Liang H, Huang J, Sun J. Comparison of conventional diffusion-weighted imaging and multiplexed sensitivity-encoding combined with deep learning-based reconstruction in breast magnetic resonance imaging. Magn Reson Imaging 2025; 117:110316. [PMID: 39716684 DOI: 10.1016/j.mri.2024.110316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/17/2024] [Accepted: 12/18/2024] [Indexed: 12/25/2024]
Abstract
PURPOSE To evaluate the feasibility of multiplexed sensitivity-encoding (MUSE) with deep learning-based reconstruction (DLR) for breast imaging in comparison with conventional diffusion-weighted imaging (DWI) and MUSE alone. METHODS This study was conducted using conventional single-shot DWI and MUSE data of female participants who underwent breast magnetic resonance imaging (MRI) from June to December 2023. The k-space data in MUSE were reconstructed using both conventional reconstruction and DLR. Two experienced radiologists conducted quantitative analyses of DWI, MUSE, and MUSE-DLR images by obtaining the signal-to-noise ratio (SNR) and the contrast-to-noise ratio (CNR) of lesions and normal tissue and qualitative analyses by using a 5-point Likert scale to assess the image quality. Inter-reader agreement was assessed using the intraclass correlation coefficient (ICC). Image scores, SNR, CNR, and apparent diffusion coefficient (ADC) measurements among the three sequences were compared using the Friedman test, with significance defined at P < 0.05. RESULTS In evaluations of the images of 51 female participants using the three sequences, the two radiologists exhibited good agreement (ICC = 0.540-1.000, P < 0.05). MUSE-DLR showed significantly better SNR than MUSE (P < 0.001), while the ADC values within lesions and tissues did not differ significantly among the three sequences (P = 0.924, P = 0.636, respectively). In the subjective assessments, MUSE and MUSE-DLR scored significantly higher than conventional DWI in overall image quality, geometric distortion and axillary lymph node (P < 0.001). CONCLUSION In comparison with conventional DWI, MUSE-DLR yielded improved image quality with only a slightly longer acquisition time.
Collapse
Affiliation(s)
- Yitian Xiao
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Fan Yang
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Qiao Deng
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Yue Ming
- West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
| | - Lu Tang
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Shuting Yue
- West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
| | - Zheng Li
- West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
| | - Bo Zhang
- GE HealthCare MR Research, Beijing, China
| | | | - Juan Huang
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.
| | - Jiayu Sun
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.
| |
Collapse
|
196
|
Xu H, Wang J, Feng Q, Zhang Y, Ning Z. Domain-specific information preservation for Alzheimer's disease diagnosis with incomplete multi-modality neuroimages. Med Image Anal 2025; 101:103448. [PMID: 39798527 DOI: 10.1016/j.media.2024.103448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 10/22/2024] [Accepted: 12/24/2024] [Indexed: 01/15/2025]
Abstract
Although multi-modality neuroimages have advanced the early diagnosis of Alzheimer's Disease (AD), missing modality issue still poses a unique challenge in the clinical practice. Recent studies have tried to impute the missing data so as to utilize all available subjects for training robust multi-modality models. However, these studies may overlook the modality-specific information inherent in multi-modality data, that is, different modalities possess distinct imaging characteristics and focus on different aspects of the disease. In this paper, we propose a domain-specific information preservation (DSIP) framework, consisting of modality imputation stage and status identification stage, for AD diagnosis with incomplete multi-modality neuroimages. In the first stage, a specificity-induced generative adversarial network (SIGAN) is developed to bridge the modality gap and capture modality-specific details for imputing high-quality neuroimages. In the second stage, a specificity-promoted diagnosis network (SPDN) is designed to promote the inter-modality feature interaction and the classifier robustness for identifying disease status accurately. Extensive experiments demonstrate the proposed method significantly outperforms state-of-the-art methods in both modality imputation and status identification tasks.
Collapse
Affiliation(s)
- Haozhe Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; Department of Radiotherapy, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Jian Wang
- Department of Radiation Oncology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Yu Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China.
| | - Zhenyuan Ning
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China.
| |
Collapse
|
197
|
Jiang M, Wang S, Chan KH, Sun Y, Xu Y, Zhang Z, Gao Q, Gao Z, Tong T, Chang HC, Tan T. Multimodal Cross Global Learnable Attention Network for MR images denoising with arbitrary modal missing. Comput Med Imaging Graph 2025; 121:102497. [PMID: 39904265 DOI: 10.1016/j.compmedimag.2025.102497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 12/10/2024] [Accepted: 01/22/2025] [Indexed: 02/06/2025]
Abstract
Magnetic Resonance Imaging (MRI) generates medical images of multiple sequences, i.e., multimodal, from different contrasts. However, noise will reduce the quality of MR images, and then affect the doctor's diagnosis of diseases. Existing filtering methods, transform-domain methods, statistical methods and Convolutional Neural Network (CNN) methods main aim to denoise individual sequences of images without considering the relationships between multiple different sequences. They cannot balance the extraction of high-dimensional and low-dimensional features in MR images, and hard to maintain a good balance between preserving image texture details and denoising strength. To overcome these challenges, this work proposes a controllable Multimodal Cross-Global Learnable Attention Network (MMCGLANet) for MR image denoising with Arbitrary Modal Missing. Specifically, Encoder is employed to extract the shallow features of the image which share weight module, and Convolutional Long Short-Term Memory(ConvLSTM) is employed to extract the associated features between different frames within the same modal. Cross Global Learnable Attention Network(CGLANet) is employed to extract and fuse image features between multimodal and within the same modality. In addition, sequence code is employed to label missing modalities, which allows for Arbitrary Modal Missing during model training, validation, and testing. Experimental results demonstrate that our method has achieved good denoising results on different public and real MR image dataset.
Collapse
Affiliation(s)
- Mingfu Jiang
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, Macao Special Administrative Region of China; College of Information Engineering, Xinyang Agriculture and Forestry University, No. 1 North Ring Road, Pingqiao District, Xinyang, 464000, Henan, China
| | - Shuai Wang
- School of Cyberspace, Hangzhou Dianzi University, No. 65 Wen Yi Road, Hangzhou, 310018, Zhejiang, China
| | - Ka-Hou Chan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, Macao Special Administrative Region of China
| | - Yue Sun
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, Macao Special Administrative Region of China
| | - Yi Xu
- Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, No. 800 Dongchuan Road, Minhang District, Shanghai, 200030, China
| | - Zhuoneng Zhang
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, Macao Special Administrative Region of China
| | - Qinquan Gao
- College of Physics and Information Engineering, Fuzhou University, No. 2 Wulongjiang Avenue, Fuzhou, 350108, Fujian, China
| | - Zhifan Gao
- School of Biomedical Engineering, Sun Yat-sen University, No. 66 Gongchang Road, Guangming District, Shenzhen, 518107, Guangdong, China
| | - Tong Tong
- College of Physics and Information Engineering, Fuzhou University, No. 2 Wulongjiang Avenue, Fuzhou, 350108, Fujian, China
| | - Hing-Chiu Chang
- Department of Biomedical Engineering, Chinese University of Hong Kong, Sha Tin District, 999077, Hong Kong, China
| | - Tao Tan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, Macao Special Administrative Region of China.
| |
Collapse
|
198
|
Dou H, Huang Y, Huang Y, Yang X, Zhen C, Zhang Y, Xiong Y, Huang W, Ni D. Standard plane localization using denoising diffusion model with multi-scale guidance. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108619. [PMID: 39919604 DOI: 10.1016/j.cmpb.2025.108619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 12/09/2024] [Accepted: 01/24/2025] [Indexed: 02/09/2025]
Abstract
BACKGROUND AND OBJECTIVE Standard planes (SPs) acquisition is a fundamental yet crucial step in routine ultrasound (US) examinations. Compared to the 2D US, 3D US offers the advantage of capturing multiple SPs in a single scan, and visualizing particular SPs (e.g., the coronal plane of the uterus). However, SPs localization in 3D US is challenging due to the vast 3D search space, anatomical variability, and poor image quality. METHODS In this study, we present a probabilistic method based on the conditional denoising diffusion model for SPs localization in 3D US. Specifically, we construct multi-scale guidance to provide the model with both global and local context. We improve the model's angular sensitivity by modifying the tangent-based plane representation with the spherical coordinates. We also reveal the potential in simultaneously localizing SPs and detecting their abnormality without introducing extra parameters. RESULTS Extensive validations were performed on a large in-house dataset containing 837 patients across two organs with four SPs. The proposed method achieved average errors of less than 10° and 1 mm in terms of the angle and distance on the four investigated SPs. Furthermore, it can obtain over 90% accuracy for detecting anomalies by simply thresholding the quantified uncertainty. CONCLUSIONS The results show that our proposed method significantly outperformed the current state-of-the-art approaches regarding spatial and content metrics across four SPs in two organs, indicating its superiority and generalizability. Meanwhile, the investigated anomaly detection of our method demonstrates its potential in applying clinical practice.
Collapse
Affiliation(s)
- Haoran Dou
- School of Computer Science, University of Leeds, Leeds, UK; Department of Computer Science, University of Manchester, Manchester, UK
| | - Yuhao Huang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Yunzhi Huang
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Xin Yang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Chaojiong Zhen
- Department of Ultrasound, The First People's Hospital of Foshan, Foshan, China
| | - Yuanji Zhang
- Department of Computer Science, University of Manchester, Manchester, UK; National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Shenzhen RayShape Medical Technology Co., Ltd, Shenzhen, China
| | - Yi Xiong
- Department of Ultrasound, Shenzhen Luohu People's Hospital, Shenzhen, China
| | - Weijun Huang
- Department of Ultrasound, The First People's Hospital of Foshan, Foshan, China.
| | - Dong Ni
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China; School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
199
|
Shin Y, Son G, Hwang D, Eo T. Ensemble and low-frequency mixing with diffusion models for accelerated MRI reconstruction. Med Image Anal 2025; 101:103477. [PMID: 39913965 DOI: 10.1016/j.media.2025.103477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 12/10/2024] [Accepted: 01/17/2025] [Indexed: 03/05/2025]
Abstract
Magnetic resonance imaging (MRI) is an important imaging modality in medical diagnosis, providing comprehensive anatomical information with detailed tissue structures. However, the long scan time required to acquire high-quality MR images is a major challenge, especially in urgent clinical scenarios. Although diffusion models have achieved remarkable performance in accelerated MRI, there are several challenges. In particular, they struggle with the long inference time due to the high number of iterations in the reverse process of diffusion models. Additionally, they occasionally create artifacts or 'hallucinate' tissues that do not exist in the original anatomy. To address these problems, we propose ensemble and adaptive low-frequency mixing on the diffusion model, namely ELF-Diff for accelerated MRI. The proposed method consists of three key components in the reverse diffusion step: (1) optimization based on unified data consistency; (2) low-frequency mixing; and (3) aggregation of multiple perturbations of the predicted images for the ensemble in each step. We evaluate ELF-Diff on two MRI datasets, FastMRI and SKM-TEA. ELF-Diff surpasses other existing diffusion models for MRI reconstruction. Furthermore, extensive experiments, including a subtask of pathology detection, further demonstrate the superior anatomical precision of our method. ELF-Diff outperforms the existing state-of-the-art MRI reconstruction methods without being limited to specific undersampling patterns.
Collapse
Affiliation(s)
- Yejee Shin
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea
| | - Geonhui Son
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea
| | - Dosik Hwang
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea; Department of Radiology, College of Dentistry, Yonsei University, Seoul 03722, Republic of Korea; Department of Oral and Maxillofacial Radiology, College of Dentistry, Yonsei University, Seoul 03722, Republic of Korea; Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology, 5, Hwarang-ro 14-gil, Seongbuk-gu, Seoul 02792, Republic of Korea.
| | - Taejoon Eo
- School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea; Probe Medical, Seoul 03777, Republic of Korea.
| |
Collapse
|
200
|
Zhu M, Wang Z, Wang C, Zeng C, Zeng D, Ma J, Wang Y. VBVT-Net: VOI-Based VVBP-Tensor Network for High-Attenuation Artifact Suppression in Digital Breast Tomosynthesis Imaging. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1953-1968. [PMID: 40030817 DOI: 10.1109/tmi.2024.3522242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
High-attenuation (HA) artifacts may lead to obscured subtle lesions and lesion over-estimation in digital breast tomosynthesis (DBT) imaging. High-attenuation artifact suppression (HAAS) is vital for widespread DBT applications in clinic. The conventional HAAS methods usually rely on the segmentation accuracy of HA objects and manual weighting schemes, without considering the geometry information in DBT reconstruction. And the global weighted strategy designed for HA artifacts may decrease the resolution in low-contrast soft-tissue regions. Moreover, the view-by-view backprojection tensor (VVBP-Tensor) domain has recently developed as a new intermediary domain that contains the lossless information in projection domain and the structural details in image domain. Therefore, we propose a VOI-Based VVBP-Tensor Network (VBVT-Net) for HAAS task in DBT imaging, which learns a local implicit weighted strategy based on the analytical FDK reconstruction mechanism. Specifically, the VBVT-Net method incorporates a volume of interest (VOI) recognition sub-network and a HAAS sub-network. The VOI recognition sub-network automatically extracts all 4D VVBP-Tensor patches containing HA artifacts. The HAAS sub-network reduces HA artifacts in these 4D VVBP-Tensor patches by leveraging the ray-trace backprojection features and extra neighborhood information. All results on four datasets demonstrate that the proposed VBVT-Net method could accurately detect HA regions, effectively reduce HA artifacts and simultaneously preserve structures in soft-tissue background regions. The proposed VBVT-Net method has a good interpretability as a general variant of the weighted FDK algorithm, which is potential to be applied in the next generation DBT prototype system in the future.
Collapse
|