301
|
Zhu C, Lu G, He B, Xie R, Song L. Implicit-Explicit Integrated Representations for Multi-View Video Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1106-1118. [PMID: 40031726 DOI: 10.1109/tip.2025.3536201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
With the increasing consumption of 3D displays and virtual reality, multi-view video has become a promising format. However, its high resolution and multi-camera shooting result in a substantial increase in data volume, making storage and transmission a challenging task. To tackle these difficulties, we propose an implicit-explicit integrated representation for multi-view video compression. Specifically, we first use the explicit representation-based 2D video codec to encode one of the source views. Subsequently, we propose employing the implicit neural representation (INR)-based codec to encode the remaining views. The implicit codec takes the time and view index of multi-view video as coordinate input and generates the corresponding implicit reconstruction frames. To enhance the compressibility, we introduce a multi-level feature grid embedding and a fully convolutional architecture into the implicit codec. These components facilitate coordinate-feature and feature-RGB mapping, respectively. To further enhance the reconstruction quality from the INR codec, we leverage the high-quality reconstructed frames from the explicit codec to achieve inter-view compensation. Finally, the compensated results are fused with the implicit reconstructions from the INR to obtain the final reconstructed frames. Our proposed framework combines the strengths of both implicit neural representation and explicit 2D codec. Extensive experiments conducted on public datasets demonstrate that the proposed framework can achieve comparable or even superior performance to the latest multi-view video compression standard MIV and other INR-based schemes in terms of view compression and scene modeling. The source code can be found at https://github.com/zc-lynen/MV-IERV.
Collapse
|
302
|
Yin M, Yang J. ILR-Net: Low-light image enhancement network based on the combination of iterative learning mechanism and Retinex theory. PLoS One 2025; 20:e0314541. [PMID: 39946342 PMCID: PMC11825054 DOI: 10.1371/journal.pone.0314541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 11/13/2024] [Indexed: 02/16/2025] Open
Abstract
Images captured in nighttime or low-light environments are often affected by external factors such as noise and lighting. Aiming at the existing image enhancement algorithms tend to overly focus on increasing brightness, while neglecting the enhancement of color and detailed features. This paper proposes a low-light image enhancement network based on a combination of iterative learning mechanisms and Retinex theory (defined as ILR-Net) to enhance both detail and color features simultaneously. Specifically, the network continuously learns local and global features of low-light images across different dimensions and receptive fields to achieve a clear and convergent illumination estimation. Meanwhile, the denoising process is applied to the reflection component after Retinex decomposition to enhance the image's rich color features. Finally, the enhanced image is obtained by concatenating the features along the channel dimension. In the adaptive learning sub-network, a dilated convolution module, U-Net feature extraction module, and adaptive iterative learning module are designed. These modules respectively expand the network's receptive field to capture multi-dimensional features, extract the overall and edge details of the image, and adaptively enhance features at different stages of convergence. The Retinex decomposition sub-network focuses on denoising the reflection component before and after decomposition to obtain a low-noise, clear reflection component. Additionally, an efficient feature extraction module-global feature attention is designed to address the problem of feature loss. Experiments were conducted on six common datasets and in real-world environments. The proposed method achieved PSNR and SSIM values of 23.7624dB and 0.8653 on the LOL dataset, and 26.8252dB and 0.7784 on the LOLv2-Real dataset, demonstrating significant advantages over other algorithms.
Collapse
Affiliation(s)
- Mohan Yin
- School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China
| | - Jianbai Yang
- School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China
| |
Collapse
|
303
|
Bercea CI, Wiestler B, Rueckert D, Schnabel JA. Evaluating normative representation learning in generative AI for robust anomaly detection in brain imaging. Nat Commun 2025; 16:1624. [PMID: 39948337 PMCID: PMC11825664 DOI: 10.1038/s41467-025-56321-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 01/15/2025] [Indexed: 02/16/2025] Open
Abstract
Normative representation learning focuses on understanding the typical anatomical distributions from large datasets of medical scans from healthy individuals. Generative Artificial Intelligence (AI) leverages this attribute to synthesize images that accurately reflect these normative patterns. This capability enables the AI allowing them to effectively detect and correct anomalies in new, unseen pathological data without the need for expert labeling. Traditional anomaly detection methods often evaluate the anomaly detection performance, overlooking the crucial role of normative learning. In our analysis, we introduce novel metrics, specifically designed to evaluate this facet in AI models. We apply these metrics across various generative AI frameworks, including advanced diffusion models, and rigorously test them against complex and diverse brain pathologies. In addition, we conduct a large multi-reader study to compare these metrics to experts' evaluations. Our analysis demonstrates that models proficient in normative learning exhibit exceptional versatility, adeptly detecting a wide range of unseen medical conditions. Our code is available at https://github.com/compai-lab/2024-ncomms-bercea.git .
Collapse
Affiliation(s)
- Cosmin I Bercea
- Chair of Computational Imaging and AI in Medicine, Technical University of Munich (TUM), Munich, Germany.
- Helmholtz AI and Helmholtz Center Munich, Munich, Germany.
| | - Benedikt Wiestler
- Chair of AI for Image-Guided Diagnosis and Therapy, TUM School of Medicine and Health, Munich, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
| | - Daniel Rueckert
- Munich Center for Machine Learning (MCML), Munich, Germany
- Chair of AI in Healthcare and Medicine, Technical University of Munich (TUM) and TUM University Hospital, Munich, Germany
- Department of Computing, Imperial College London, London, UK
| | - Julia A Schnabel
- Chair of Computational Imaging and AI in Medicine, Technical University of Munich (TUM), Munich, Germany
- Helmholtz AI and Helmholtz Center Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| |
Collapse
|
304
|
Fu L, Chen Z, Duan Y, Cheng Z, Chen L, Yang Y, Zheng H, Liang D, Pang ZF, Hu Z. High-temporal-resolution dynamic PET imaging based on a kinetic-induced voxel filter. Phys Med Biol 2025; 70:045024. [PMID: 39943839 DOI: 10.1088/1361-6560/adae4e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 01/24/2025] [Indexed: 05/09/2025]
Abstract
Objective. Dynamic positron emission tomography (dPET) is an important molecular imaging technology that is used for the clinical diagnosis, staging, and treatment of various human cancers. Higher temporal imaging resolutions are desired for the early stages of radioactive tracer metabolism. However, images reconstructed from raw data with shorter frame durations have lower image signal-to-noise ratios (SNRs) and unexpected spatial resolutions.Approach. To address these issues, this paper proposes a kinetic-induced voxel filtering technique for processing noisy and distorted dPET images. This method extracts the inherent motion information contained in the target PET image and effectively uses this information to construct an image filter for each PET image frame. To ensure that the filtered image remains undistorted, we integrate and reorganize the information from each frame along the temporal dimension. In addition, our method applies repeated filtering operations to the image to produce optimal denoising results.Main results. The effectiveness of the proposed method is validated on both simulated and clinical dPET data, with quantitative evaluations of dynamic images and pharmacokinetic parameter maps calculated via the peak SNR and mean structural similarity index measure. Compared with the state-of-the-art methods, our method achieves superior results in both qualitative and quantitative imaging scenarios.Significance. It exhibits commendable performance and high interpretability and is demonstrated to be both effective and feasible in high-temporal-resolution dynamic PET imaging tasks.
Collapse
Affiliation(s)
- Liwen Fu
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
- College of Mathematics and Statistics, Henan University, Kaifeng 475004, People's Republic of China
| | - Zixiang Chen
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
| | - Yanhua Duan
- Department of PET/CT, The First Affiliated Hospital of Shandong First Medical University, Jinan 250014, Shandong, People's Republic of China
| | - Zhaoping Cheng
- Department of PET/CT, The First Affiliated Hospital of Shandong First Medical University, Jinan 250014, Shandong, People's Republic of China
| | - Lingxin Chen
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
| | - Yongfeng Yang
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
| | - Hairong Zheng
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
| | - Dong Liang
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
| | - Zhi-Feng Pang
- College of Mathematics and Statistics, Henan University, Kaifeng 475004, People's Republic of China
| | - Zhanli Hu
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People's Republic of China
| |
Collapse
|
305
|
Kim YJ, Hwang SH, Kim KG, Nam DH. Automated Imaging of Cataract Surgery Using Artificial Intelligence. Diagnostics (Basel) 2025; 15:445. [PMID: 40002596 PMCID: PMC11854092 DOI: 10.3390/diagnostics15040445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 02/10/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
Objectives: This study proposes a state-of-the-art technology to estimate a set of parameters to automatically display an optimized image on a screen during cataract surgery. Methods: We constructed an architecture comprising two stages to estimate the parameters for realizing the optimized image. The Pix2Pix approach was first introduced to generate fake images that mimic the optimal image. This part can be considered a preliminary step; it uses training datasets comprising both an original microscopy image as the input data and an optimally tuned image by ophthalmologists as the label data. The second part of the architecture was inspired by ensemble learning, in which two ResNet-50 models were trained in parallel using fake images obtained in the previous step and unprocessed images. Each set of features extracted by the ensemble-like scheme was exploited for the regression of the optimal parameters. Results: The fidelity of our method was confirmed through relevant quantitative assessments (NMSE 121.052 ± 181.227, PSNR 29.887 ± 4.682, SSIM 0.965 ± 0.047). Conclusions: Subsequently, surgeons reassured that the objects to be highlighted on the screen for cataract surgery were faithfully visualized by the automatically estimated parameters.
Collapse
Affiliation(s)
- Young Jae Kim
- Gachon Biomedical & Convergence Institute, Gil Medical Center, Gachon University, Incheon 21565, Republic of Korea;
| | - Sung Ha Hwang
- Department of Ophthalmology, Gil Medical Center, College of Medicine, Gachon University, Incheon 21565, Republic of Korea;
| | - Kwang Gi Kim
- Department of Biomedical Engineering, Gil Medical Center, College of Medicine, Gachon University, Incheon 21565, Republic of Korea
| | - Dong Heun Nam
- Department of Ophthalmology, Gil Medical Center, College of Medicine, Gachon University, Incheon 21565, Republic of Korea;
| |
Collapse
|
306
|
Jian BL, Chang HL, Chen CL. Enhanced U-Net with Multi-Module Integration for High-Exposure-Difference Image Restoration. SENSORS (BASEL, SWITZERLAND) 2025; 25:1105. [PMID: 40006334 PMCID: PMC11858959 DOI: 10.3390/s25041105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 02/02/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025]
Abstract
Machine vision systems have become key unmanned vehicle (UAV) sensing systems. However, under different weather conditions, the lighting direction and the selection of exposure parameters often lead to insufficient or missing object features in images, which could fail to perform various tasks. As a result, images need to be restored to secure information that is accessible when facing a light exposure difference environment. Many applications require real-time and high-quality images; therefore, efficiently restoring images is also important for subsequent tasks. This study adopts supervised learning to solve the problem of images under lighting discrepancies using a U-Net as our main architecture of the network and adding suitable modules to its encoder and decoder, such as inception-like blocks, dual attention units, selective kernel feature fusion, and denoising blocks. In addition to the ablation study, we also compared the quality of image light restoration with other network models using BAID and considered the overall trainable parameters of the model to construct a lightweight, high-exposure-difference image restoration model. The performance of the proposed network was demonstrated by enhancing image detection and recognition.
Collapse
Affiliation(s)
- Bo-Lin Jian
- Department of Electrical Engineering, Chin-Yi University of Technology, Taichung 411030, Taiwan;
| | - Hong-Li Chang
- Department of Aeronautics and Astronautics, National Cheng Kung University, Tainan 701401, Taiwan;
| | - Chieh-Li Chen
- Department of Aeronautics and Astronautics, National Cheng Kung University, Tainan 701401, Taiwan;
| |
Collapse
|
307
|
Tompkins CG, Todhunter LD, Gottmann H, Rettig C, Schmitt R, Wacker J, Piano S. Three-dimensional runout characterisation for rotationally symmetric components. COMMUNICATIONS ENGINEERING 2025; 4:19. [PMID: 39939544 PMCID: PMC11821993 DOI: 10.1038/s44172-025-00354-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 01/27/2025] [Indexed: 02/14/2025]
Abstract
Rotationally symmetric components (such as gears and axels) are ubiquitous to modern devices, and their precision manufacture is necessary to keep costs and manufacture time down, as well as reduce waste and possibly hazardous component failure. The manufacturing errors, which affect the shape in the rotation axis, are grouped together into the common term "runout". Here we present a potential updated standard for characterising the runout of rotationally symmetric machined parts in three-dimensions, and evaluated using virtual instrumentation, enabling an accurate characterisation of the three dimensional (3D) surface deformation of a part from minimal surface information. For any 3D characterisation method to be widely adopted by the science, technology, engineering, and mathematics community, it must be fully compatible with previous methods and standards. As such, the proposed method produces a 3D runout vector based on four standard profile measurements. To evaluate the efficacy of the proposed runout method, a technique for evaluating the errors of commonly used virtual instruments has been developed. This evaluation technique produces a single-valued quantification of the deviation of the instrument outputs compared to the input parameters, decoupled from the errors on the instrument itself.
Collapse
Affiliation(s)
- Christopher G Tompkins
- Manufacturing Metrology Team, Faculty of Engineering, University of Nottingham, Nottingham, NG8 1BB, UK.
| | - Luke D Todhunter
- Manufacturing Metrology Team, Faculty of Engineering, University of Nottingham, Nottingham, NG8 1BB, UK
| | | | | | | | | | - Samanta Piano
- Manufacturing Metrology Team, Faculty of Engineering, University of Nottingham, Nottingham, NG8 1BB, UK
| |
Collapse
|
308
|
Fiszer J, Ciupek D, Malawski M, Pieciak T. Validation of ten federated learning strategies for multi-contrast image-to-image MRI data synthesis from heterogeneous sources. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.09.637305. [PMID: 39990397 PMCID: PMC11844418 DOI: 10.1101/2025.02.09.637305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Deep learning (DL)-based image synthesis has recently gained enormous interest in medical imaging, allowing for generating multi-contrast data and therefore, the recovery of missing samples from interrupted or artefact-distorted acquisitions. However, the accuracy of DL models heavily relies on the representativeness of the training datasets naturally characterized by their distributions, experimental setups or preprocessing schemes. These complicate generalizing DL models across multi-site heterogeneous data sets while maintaining the confidentiality of the data. One of the possible solutions is to employ federated learning (FL), which enables the collaborative training of a DL model in a decentralized manner, demanding the involved sites to share only the characteristics of the models without transferring their sensitive medical data. The paper presents a DL-based magnetic resonance (MR) data translation in a FL way. We introduce a new aggregation strategy called FedBAdam that couples two state-of-the-art methods with complementary strengths by incorporating momentum in the aggregation scheme and skipping the batch normalization layers. The work comprehensively validates 10 FL-based strategies for an image-to-image multi-contrast MR translation, considering healthy and tumorous brain scans from five different institutions. Our study has revealed that the FedBAdam shows superior results in terms of mean squared error and structural similarity index over personalized methods, like the FedMRI, and standard FL-based aggregation techniques, such as the FedAvg or FedProx, considering multi-site multi-vendor heterogeneous environment. The FedBAdam has prevented the overfitting of the model and gradually reached the optimal model parameters, exhibiting no oscillations.
Collapse
Affiliation(s)
- Jan Fiszer
- Sano Centre for Computational Medicine, Kraków, Poland
- AGH University of Science and Technology, Kraków, Poland
| | | | - Maciej Malawski
- Sano Centre for Computational Medicine, Kraków, Poland
- AGH University of Science and Technology, Kraków, Poland
| | - Tomasz Pieciak
- Laboratorio de Procesado de Imagen (LPI), ETSI Telecomunicación, Universidad de Valladolid, Valladolid, Spain
| |
Collapse
|
309
|
Veeramani N, Jayaraman P. A promising AI based super resolution image reconstruction technique for early diagnosis of skin cancer. Sci Rep 2025; 15:5084. [PMID: 39934265 PMCID: PMC11814132 DOI: 10.1038/s41598-025-89693-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 02/06/2025] [Indexed: 02/13/2025] Open
Abstract
Skin cancer can be prevalent in people of any age group who are exposed to ultraviolet (UV) radiation. Among all other types, melanoma is a notable severe kind of skin cancer, which can be fatal. Melanoma is a malignant skin cancer arising from melanocytes, requiring early detection. Typically, skin lesions are classified either as benign or malignant. However, some lesions do exist that don't show clear cancer signs, making them suspicious. If unnoticed, these suspicious lesions develop into severe melanoma, requiring invasive treatments later on. These intermediate or suspicious skin lesions are completely curable if it is diagnosed at their early stages. To tackle this, few researchers intended to improve the image quality of the infected lesions obtained from the dermoscopy through image reconstruction techniques. Analyzing reconstructed super-resolution (SR) images allows early detection, fine feature extraction, and treatment plans. Despite advancements in machine learning, deep learning, and complex neural networks enhancing skin lesion image quality, a key challenge remains unresolved: how the intricate textures are obtained while performing significant up scaling in medical image reconstruction? Thus, an artificial intelligence (AI) based reconstruction algorithm is proposed to obtain the fine features from the intermediate skin lesion from dermoscopic images for early diagnosis. This serves as a non-invasive approach. In this research, a novel melanoma information improvised generative adversarial network (MELIIGAN) framework is proposed for the expedited diagnosis of intermediate skin lesions. Also, designed a stacked residual block that handles larger scaling factors and the reconstruction of fine-grained details. Finally, a hybrid loss function with a total variation (TV) regularization term switches to the Charbonnier loss function, a robust substitute for the mean square error loss function. The benchmark dataset results in a structural index similarity (SSIM) of 0.946 and a peak signal-to-noise ratio (PSNR) of 40.12 dB as the highest texture information, evidently compared to other state-of-the-art methods.
Collapse
Affiliation(s)
- Nirmala Veeramani
- School of Computing, SASTRA University, Thirumalaisamudram, Thanjavur, 613401, Tamil Nadu, India
| | - Premaladha Jayaraman
- School of Computing, SASTRA University, Thirumalaisamudram, Thanjavur, 613401, Tamil Nadu, India.
| |
Collapse
|
310
|
Arcano-Bea P, Rubiños M, García-Fischer A, Zayas-Gato F, Calvo-Rolle JL, Jove E. Defect Detection for Enhanced Traceability in Naval Construction. SENSORS (BASEL, SWITZERLAND) 2025; 25:1077. [PMID: 40006305 PMCID: PMC11859182 DOI: 10.3390/s25041077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 02/04/2025] [Accepted: 02/07/2025] [Indexed: 02/27/2025]
Abstract
The digitalization of shipbuilding processes has become an important trend in modern naval construction, enabling more efficient design, assembly, and maintenance operations. A key aspect of this digital transformation is traceability, which ensures that every component and step in the shipbuilding process can be accurately tracked and managed. Traceability is critical for quality assurance, safety, and operational efficiency, especially when it comes to identifying and addressing defects that may arise during construction. In this context, defect traceability plays a key role, enabling manufacturers to track the origin, type, and evolution of issues throughout the production process, which are fundamental for maintaining structural integrity and preventing failures. In this paper, we focus on the detection of defects in minor and simple pre-assemblies, which are among the smallest components that form the building blocks of ship assemblies. These components are essential to the larger shipbuilding process, yet their defects can propagate and lead to more significant issues in the overall assembly if left unaddressed. For that reason, we propose an intelligent approach to defect detection in minor and simple pre-assembly pieces by implementing unsupervised learning with convolutional autoencoders (CAEs). Specifically, we evaluate the performance of five different CAEs: BaseLineCAE, InceptionCAE, SkipCAE, ResNetCAE, and MVTecCAE, to detect overshooting defects in these components. Our methodology focuses on automated defect identification, providing a scalable and efficient solution to quality control in the shipbuilding process.
Collapse
Affiliation(s)
- Paula Arcano-Bea
- Department of Industrial Engineering, University of A Coruña, CTC, CITIC, 15403 Ferrol, Spain
| | | | | | | | - José Luis Calvo-Rolle
- Department of Industrial Engineering, University of A Coruña, CTC, CITIC, 15403 Ferrol, Spain
| | | |
Collapse
|
311
|
Zhou J, Mei L, Yu M, Ma X, Hou D, Yin Z, Liu X, Ding Y, Yang K, Xiao R, Yuan X, Weng Y, Long M, Hu T, Hou J, Xu Y, Tao L, Mei S, Shen H, Yalikun Y, Zhou F, Wang L, Wang D, Liu S, Lei C. Imaging flow cytometry with a real-time throughput beyond 1,000,000 events per second. LIGHT, SCIENCE & APPLICATIONS 2025; 14:76. [PMID: 39924500 PMCID: PMC11808109 DOI: 10.1038/s41377-025-01754-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 12/09/2024] [Accepted: 01/08/2025] [Indexed: 02/11/2025]
Abstract
Imaging flow cytometry (IFC) combines the imaging capabilities of microscopy with the high throughput of flow cytometry, offering a promising solution for high-precision and high-throughput cell analysis in fields such as biomedicine, green energy, and environmental monitoring. However, due to limitations in imaging framerate and real-time data processing, the real-time throughput of existing IFC systems has been restricted to approximately 1000-10,000 events per second (eps), which is insufficient for large-scale cell analysis. In this work, we demonstrate IFC with real-time throughput exceeding 1,000,000 eps by integrating optical time-stretch (OTS) imaging, microfluidic-based cell manipulation, and online image processing. Cells flowing at speeds up to 15 m/s are clearly imaged with a spatial resolution of 780 nm, and images of each individual cell are captured, stored, and analyzed. The capabilities and performance of our system are validated through the identification of malignancies in clinical colorectal samples. This work sets a new record for throughput in imaging flow cytometry, and we believe it has the potential to revolutionize cell analysis by enabling highly efficient, accurate, and intelligent measurement.
Collapse
Affiliation(s)
- Jiehua Zhou
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Liye Mei
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
- School of Computer Science, Hubei University of Technology, Wuhan, 430068, China
| | - Mingjie Yu
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Xiao Ma
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Dan Hou
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Zhuo Yin
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Xun Liu
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
- Division of Materials Science, Nara Institute of Science and Technology, Takayama-cho, 8916-5, Japan
| | - Yan Ding
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Kaining Yang
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Ruidong Xiao
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Xiandan Yuan
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
- School of Science, Hubei University of Technology, Wuhan, 430068, China
| | - Yueyun Weng
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Mengping Long
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
- Department of Pathology, Peking University Cancer Hospital, Beijing, 100142, China
| | - Taobo Hu
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
- Department of Breast Surgery, Peking University People's Hospital, Beijing, 100044, China
| | - Jinxuan Hou
- Department of Thyroid and Breast Surgery, Zhongnan Hospital, Wuhan University, Wuhan, 430071, China
| | - Yu Xu
- Department of Radiation and Medical Oncology, Zhongnan Hospital, Wuhan University, Wuhan, 430071, China
| | - Liang Tao
- People's Hospital of Anshun City Guizhou Province, Anshun, 561000, China
| | - Sisi Mei
- People's Hospital of Anshun City Guizhou Province, Anshun, 561000, China
| | - Hui Shen
- Department of Hematology, Zhongnan Hospital, Wuhan University, Wuhan, 430071, China
| | - Yaxiaer Yalikun
- Division of Materials Science, Nara Institute of Science and Technology, Takayama-cho, 8916-5, Japan
| | - Fuling Zhou
- Department of Hematology, Zhongnan Hospital, Wuhan University, Wuhan, 430071, China
| | - Liang Wang
- National Engineering Laboratory for Next Generation Internet Access System, School of Optics and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Du Wang
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China.
| | - Sheng Liu
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China
| | - Cheng Lei
- The Institute of Technological Sciences, Wuhan University, Wuhan, 430072, China.
- Suzhou Institute of Wuhan University, Suzhou, 215000, China.
- Shenzhen Institute of Wuhan University, Shenzhen, 518057, China.
| |
Collapse
|
312
|
Kalantari F, Faez K, Amindavar H, Nazari S. Improved image reconstruction from brain activity through automatic image captioning. Sci Rep 2025; 15:4907. [PMID: 39930076 PMCID: PMC11811215 DOI: 10.1038/s41598-025-89242-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 02/04/2025] [Indexed: 02/13/2025] Open
Abstract
Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.
Collapse
Affiliation(s)
- Fatemeh Kalantari
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Karim Faez
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran.
| | - Hamidreza Amindavar
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Soheila Nazari
- Faculty of Electrical Engineering, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
313
|
Tripathi M, Kongprawechnon W, Kondo T. A Highly Robust Encoder-Decoder Network with Multi-Scale Feature Enhancement and Attention Gate for the Reduction of Mixed Gaussian and Salt-and-Pepper Noise in Digital Images. J Imaging 2025; 11:51. [PMID: 39997553 PMCID: PMC11856137 DOI: 10.3390/jimaging11020051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 02/04/2025] [Accepted: 02/07/2025] [Indexed: 02/26/2025] Open
Abstract
Image denoising is crucial for correcting distortions caused by environmental factors and technical limitations. We propose a novel and highly robust encoder-decoder network (HREDN) for effectively removing mixed salt-and-pepper and Gaussian noise from digital images. HREDN integrates a multi-scale feature enhancement block in the encoder, allowing the network to capture features at various scales and handle complex noise patterns more effectively. To mitigate information loss during encoding, skip connections transfer essential feature maps from the encoder to the decoder, preserving structural details. However, skip connections can also propagate redundant information. To address this, we incorporate attention gates within the skip connections, ensuring that only relevant features are passed to the decoding layers. We evaluate the robustness of the proposed method across facial, medical, and remote sensing domains. The experimental results demonstrate that HREDN excels in preserving edge details and structural features in denoised images, outperforming state-of-the-art techniques in both qualitative and quantitative measures. Statistical analysis further highlights the model's ability to effectively remove noise in diverse, complex scenarios with images of varying resolutions across multiple domains.
Collapse
Affiliation(s)
| | - Waree Kongprawechnon
- School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand; (M.T.); (T.K.)
| | | |
Collapse
|
314
|
Sidorov M, Birman R, Hadar O, Dvir A. Estimating QoE from Encrypted Video Conferencing Traffic. SENSORS (BASEL, SWITZERLAND) 2025; 25:1009. [PMID: 40006242 PMCID: PMC11858984 DOI: 10.3390/s25041009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/27/2025]
Abstract
Traffic encryption is vital for internet security but complicates analytical applications like video delivery optimization or quality of experience (QoE) estimation, which often rely on clear text data. While many models address the problem of QoE prediction in video streaming, the video conferencing (VC) domain remains underexplored despite rising demand for these applications. Existing models often provide low-resolution predictions, categorizing QoE into broad classes such as "high" or "low", rather than providing precise, continuous predictions. Moreover, most models focus on clear-text rather than encrypted traffic. This paper addresses these challenges by analyzing a large dataset of Zoom sessions and training five classical machine learning (ML) models and two custom deep neural networks (DNNs) to predict three QoE indicators: frames per second (FPS), resolution (R), and the naturalness image quality evaluator (NIQE). The models achieve mean error rates of 8.27%, 7.56%, and 2.08% for FPS, R, and NIQE, respectively, using a 10-fold cross-validation technique. This approach advances QoE assessment for encrypted traffic in VC applications.
Collapse
Affiliation(s)
- Michael Sidorov
- School of Electrical and Computer Engineering, Ben Gurion University of the Negev, Be’er Sheba 8410501, Israel; (R.B.); (O.H.)
| | - Raz Birman
- School of Electrical and Computer Engineering, Ben Gurion University of the Negev, Be’er Sheba 8410501, Israel; (R.B.); (O.H.)
| | - Ofer Hadar
- School of Electrical and Computer Engineering, Ben Gurion University of the Negev, Be’er Sheba 8410501, Israel; (R.B.); (O.H.)
| | - Amit Dvir
- Department of Computer and Software Engineering, Ariel University, Ariel 40700, Israel;
| |
Collapse
|
315
|
Schwarz A, Hofmann C, Dickmann J, Simon A, Maier A, Wacker FK, Raatschen HJ, Gleitz S, Schmidbauer M. Free-Breathing Respiratory Triggered High-Pitch Lung CT: Insights From Phantom and Patient Scans. Invest Radiol 2025:00004424-990000000-00288. [PMID: 39847727 DOI: 10.1097/rli.0000000000001157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
OBJECTIVE Respiratory motion can affect image quality and thus affect the diagnostic accuracy of CT images by masking or mimicking relevant lung pathologies. CT examinations are often performed during deep inspiration and breath-hold to achieve optimal image quality. However, this can be challenging for certain patient groups, such as children, the elderly, or sedated patients. The study aimed to validate a dedicated triggering algorithm for initiating respiratory-triggered high-pitch computed tomography (RT-HPCT) scans in end inspiration and end expiration in complex and irregular respiratory patterns using an anthropomorphic dynamic chest phantom. Additionally, a patient study was conducted to compare the image quality and lung expansion between RT-HPCT and standard HPCT. MATERIALS AND METHODS The study utilized an algorithm that processes the patient's breathing motion in real-time to determine the appropriate time to initiate a scan. This algorithm was tested on a dynamic, tissue-equivalent chest motion phantom to replicate and simulate 3-dimensional target motion using 28 breathing motion patterns taken from patient with irregular breathing. To evaluate the performance on human patients, prospective RT-HPCT was performed in 18 free-breathing patients. As a reference, unenhanced HPCT of the chest was performed in 20 patients without respiratory triggering during free-breathing. The mean CTDI was 1.73 mGy ± 0.1 mGy for HPCT and 1.68 mGy ± 0.1 mGy for RT-HPCT. For phantom tests, the deviation from the target position of the phantom inlay is known. Image quality is approximated by evaluating stationary versus moving acquisitions. For patient scans, respiratory motion artifacts and inspiration depth were analyzed using expert knowledge of lung anatomy and automated lung volume estimation. Statistical analysis was performed to compare image quality and lung volumes between conventional HPCT and RT-HPCT. RESULTS In phantom scans, the average deviation from the desired excursion phase was 1.6 mm ± 4.7 mm or 15% ± 24% of the phantom movement range. In patients, the overall image quality significantly improved with respiratory triggering compared with conventional HPCT ( P < 0.001). Quantitative average lung volume was 4.0 L ± 1.1 L in the RT group and 3.6 L ± 1.0 L in the control group. CONCLUSIONS This study demonstrated the feasibility of using a patient-adaptive respiratory triggering algorithm for high-pitch lung CT in both phantom and patients. Respiratory-triggered high-pitch CT scanning significantly reduces breathing artifacts compared with conventional nontriggered free-breathing scans.
Collapse
Affiliation(s)
- Annette Schwarz
- From the Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (A. Schwarz, A. Simon, A.M.); Siemens Healthineers AG, Forchheim, Germany (A. Schwarz, C.H., J.D., A. Simon); Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany (F.K.W., S.G., M.S.); and Institut for Radiology, Pediatric and Neuroradiology, Helios Hospital, Schwerin, Germany (H.-J.R.)
| | | | | | | | | | | | | | | | | |
Collapse
|
316
|
Baiz CR, Kanevche K, Kozuch J, Heberle J. Data-driven signal-to-noise enhancement in scattering near-field infrared microscopy. J Chem Phys 2025; 162:054201. [PMID: 39898567 DOI: 10.1063/5.0247251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Accepted: 01/06/2025] [Indexed: 02/04/2025] Open
Abstract
This study introduces a machine-learning approach to enhance signal-to-noise ratios in scattering-type scanning near-field optical microscopy (s-SNOM). While s-SNOM offers a high spatial resolution, its effectiveness is often hindered by low signal levels, particularly in weakly absorbing samples. To address these challenges, we utilize a data-driven "patch-based" machine learning reconstruction method, incorporating modern generative adversarial neural networks (CycleGANs) for denoising s-SNOM images. This method allows for flexible reconstruction of images of arbitrary sizes, a critical capability given the variable nature of scanned sample areas in point-scanning probe-based microscopies. The CycleGAN model is trained on unpaired sets of images captured at both rapid and extended acquisition times, thereby modeling instrument noise while preserving essential topographical and molecular information. The results show significant improvements in image quality, as indicated by higher structural similarity index and peak signal-to-noise ratio values, comparable to those obtained from images captured with four times the integration time. This method not only enhances image quality but also has the potential to reduce the overall data acquisition time, making high-resolution s-SNOM imaging more feasible for a wide range of biological and materials science applications.
Collapse
Affiliation(s)
- Carlos R Baiz
- Department of Chemistry, University of Texas at Austin, 105 E 24th St. A5300, Austin, Texas 78712, USA
- Fachbereich Physik, Experimentelle Molekulare Biophysik, Freie Universität Berlin, Berlin 14195, Germany
| | - Katerina Kanevche
- Fachbereich Physik, Experimentelle Molekulare Biophysik, Freie Universität Berlin, Berlin 14195, Germany
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, USA
| | - Jacek Kozuch
- Fachbereich Physik, Experimentelle Molekulare Biophysik, Freie Universität Berlin, Berlin 14195, Germany
| | - Joachim Heberle
- Fachbereich Physik, Experimentelle Molekulare Biophysik, Freie Universität Berlin, Berlin 14195, Germany
| |
Collapse
|
317
|
Tong Q, Wang L, Dai Q, Zheng C, Zhou F. Enhanced cloud removal via temporal U-Net and cloud cover evolution simulation. Sci Rep 2025; 15:4544. [PMID: 39915507 PMCID: PMC11802727 DOI: 10.1038/s41598-025-87296-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 01/17/2025] [Indexed: 02/09/2025] Open
Abstract
Remote sensing images are indispensable for continuous environmental monitoring and Earth observations. However, cloud occlusion can severely degrade image quality, posing a significant challenge for the accurate extraction of ground information. Existing cloud removal techniques often suffer from incomplete cloud removal, artifacts, and color distortions. Owing to the scarcity of sequential data, the effective utilization of temporal information to enhance cloud removal performance poses a challenge. Therefore, we propose a cloud removal method based on cloud evolution simulation. This method is applicable to all paired cloud datasets, enabling the construction of cloud evolution time-series in the absence of actual temporal information. We embed temporal information from the sequence into the Temporal U-Net to achieve more accurate cloud predictions. We conducted extensive experiments on RICE and T-CLOUD datasets. The results demonstrate that our approach significantly improves the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) compared with existing methods.
Collapse
Affiliation(s)
- Qingwei Tong
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, Yunnan, China
| | - Leiguang Wang
- College of Landscape Architecture and Horticulture, Southwest Forestry University, Kunming, Yunnan, China.
- Key Laboratory of National Forestry and Grassland Administration on Forestry and Ecological Big Data, Southwest Forestry University, Kunming, China.
| | - Qinling Dai
- College of Art and Design, Southwest Forestry University, Kunming, Yunnan, China
| | - Chen Zheng
- School of Mathematics and Statistics, Henan University , Kaifeng, China
- Henan Engineering Research Center for Artificial Intelligence Theory and Algorithms, Institute of Applied Mathematics, Henan University, Kaifeng, China
| | - Fangrong Zhou
- Joint Laboratory of power remote sensing technology(Electric Power Research Institute, Yunnan Power Grid Company ltd., China Southern Power Grid), Kunming, Yunnan, China
| |
Collapse
|
318
|
Ghassel S, Jabbarpour A, Lang J, Moulton E, Klein R. The effect of resizing on the natural appearance of scintigraphic images: an image similarity analysis. FRONTIERS IN NUCLEAR MEDICINE 2025; 4:1505377. [PMID: 39981066 PMCID: PMC11839826 DOI: 10.3389/fnume.2024.1505377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 12/24/2024] [Indexed: 02/22/2025]
Abstract
Background and objective This study aimed to assess the impact of upsampling and downsampling techniques on the noise characteristics and similarity metrics of scintigraphic images in nuclear medical imaging. Methods A physical phantom study using dynamic imaging was used to generate reproducible static images of varying count statistics. Naïve upsampling and downsampling with linear interpolation were compared against alternative methods based on the preservation of Poisson count statistics and principles of nuclear scintigraphic imaging; namely, linear interpolation with a Poisson resampling correction (upsampling) and a sliding window summation method (downsampling). For each resizing method, we computed the similarity of resized images to count-matched images acquired at the target grid size with the structural similarity index measure and the logarithm of the mean squared error. These image quality metrics were subsequently compared to those of two independent count-matched images at the target grid size (representing variance due to natural noise permutations) as a reference to establish an optimal resizing method. Results Only upsampled images with the Poisson resampling correction after linear interpolation produced images that were similar to those acquired at the target grid size. For downsampling, both linear interpolation and sliding window summation yielded similar outcomes for a reduction factor of 2. However, for a reduction factor of 4, only sliding window summation resulted in image similarity metrics in agreement with those at the target grid size. Conclusions The study underlines the importance of applying appropriate resizing techniques in nuclear medical imaging to produce realistic images at the target grid size.
Collapse
Affiliation(s)
- Siraj Ghassel
- Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
| | - Amir Jabbarpour
- Department of Physics, Carleton University, Ottawa, ON, Canada
| | - Jochen Lang
- Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
| | - Eric Moulton
- Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
- Jubilant DraxImage Inc., Kirkland, QC, Canada
| | - Ran Klein
- Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
- Department of Physics, Carleton University, Ottawa, ON, Canada
- Division of Nuclear Medicine and Molecular Imaging, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
- Department of Nuclear Medicine and Molecular Imaging, The Ottawa Hospital, Ottawa, ON, Canada
| |
Collapse
|
319
|
Romanin L, Prsa M, Roy CW, Sieber X, Yerly J, Milani B, Rutz T, Si-Mohamed S, Tenisch E, Piccini D, Stuber M. Exploring the limits of scan time reduction for ferumoxytol-enhanced whole-heart angiography in congenital heart disease patients. J Cardiovasc Magn Reson 2025; 27:101854. [PMID: 39920923 PMCID: PMC11889962 DOI: 10.1016/j.jocmr.2025.101854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 12/24/2024] [Accepted: 02/04/2025] [Indexed: 02/10/2025] Open
Abstract
BACKGROUND One major challenge in cardiovascular magnetic resonance is reducing scan times to be more compatible with clinical workflows. In 3D magnetic resonance imaging (MRI), strategies to shorten scan times mostly rely on ECG-triggering or self-navigation for motion management, but are affected by heart rate variabilities or respiratory drifts. A similarity-driven multi-dimensional binning algorithm (SIMBA) was introduced for 3D whole-heart angiography from ferumoxytol-enhanced free-running MRI. This study explores acceleration limits using SIMBA, and its compressed-sensing extension extra-dimensional motion-compensation (XD-MC)-SIMBA, while preserving image quality. METHODS Data from 6-min free-running acquisitions of 30 congenital heart disease (CHD) patients were retrospectively undersampled to simulate 5-, 4-, 3-, 2-, and 1-min datasets. SIMBA and XD-MC-SIMBA reconstructions were applied. and the consistency of the data selection together with sharpness metrics were computed as a function of undersampling. Image quality was rated on a 5-point Likert scale. Shorter 3-minute acquisitions were prospectively acquired in nine CHD patients. RESULTS SIMBA's motion state selection was consistent across undersampling levels, with only 2 of 30 cases showing completely different selections. Image quality metrics decreased with increased undersampling, with SIMBA scoring lower compared to XD-MC-SIMBA. The diagnostic quality was good, with lower scores for 2- and 1-min datasets. Using XD-MC-SIMBA, 43% (31/72) of cases showed improved scores compared to SIMBA and 58% (7/12) of 1-min datasets improved to good or excellent quality. CONCLUSIONS This study demonstrates that ferumoxytol-enhanced free-running MRI can be highly accelerated for 3D angiography in CHD.With the aid of compressed sensing, XD-MC-SIMBA supports the acceleration down to 3 minutes or less.
Collapse
Affiliation(s)
- Ludovica Romanin
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland; Advanced Clinical Imaging Technology, Siemens Healthineers International AG, Lausanne, Switzerland
| | - Milan Prsa
- Division of Pediatric Cardiology, Woman-Mother-Child Department, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Christopher W Roy
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Xavier Sieber
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Jérôme Yerly
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland; Center for Biomedical Imaging (CIBM), Lausanne, Switzerland
| | - Bastien Milani
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Tobias Rutz
- Service of Cardiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Salim Si-Mohamed
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland; University Lyon, INSA-Lyon, University Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS, Villeurbanne, France; Department of Radiology, Louis Pradel Hospital, Bron, France
| | - Estelle Tenisch
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Davide Piccini
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland; Advanced Clinical Imaging Technology, Siemens Healthineers International AG, Lausanne, Switzerland
| | - Matthias Stuber
- Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland; Center for Biomedical Imaging (CIBM), Lausanne, Switzerland.
| |
Collapse
|
320
|
Pinchuk D, Chowdhury HMAM, Pandeya A, Oluwadare O. HiCForecast: dynamic network optical flow estimation algorithm for spatiotemporal Hi-C data forecasting. Bioinformatics 2025; 41:btaf030. [PMID: 39842868 PMCID: PMC11793695 DOI: 10.1093/bioinformatics/btaf030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 12/26/2024] [Accepted: 01/20/2025] [Indexed: 01/24/2025] Open
Abstract
MOTIVATION The exploration of the 3D organization of DNA within the nucleus in relation to various stages of cellular development has led to experiments generating spatiotemporal Hi-C data. However, there is limited spatiotemporal Hi-C data for many organisms, impeding the study of 3D genome dynamics. To overcome this limitation and advance our understanding of genome organization, it is crucial to develop methods for forecasting Hi-C data at future time points from existing timeseries Hi-C data. RESULT In this work, we designed a novel framework named HiCForecast, adopting a dynamic voxel flow algorithm to forecast future spatiotemporal Hi-C data. We evaluated how well our method generalizes forecasting data across different species and systems, ensuring performance in homogeneous, heterogeneous, and general contexts. Using both computational and biological evaluation metrics, our results show that HiCForecast outperforms the current state-of-the-art algorithm, emerging as an efficient and powerful tool for forecasting future spatiotemporal Hi-C datasets. AVAILABILITY AND IMPLEMENTATION HiCForecast is publicly available at https://github.com/OluwadareLab/HiCForecast.
Collapse
Affiliation(s)
- Dmitry Pinchuk
- Department of Computer Science, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - H M A Mohit Chowdhury
- Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, United States
| | - Abhishek Pandeya
- Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, United States
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, United States
- Department of Biomedical Informatics, University of Colorado, Anschutz Medical Campus, Aurora, CO 80045, United States
| |
Collapse
|
321
|
Tamura Y, Utsumi Y, Miwa Y, Iwamura M, Kise K. Unsupervised monocular depth estimation with omnidirectional camera for 3D reconstruction of grape berries in the wild. PLoS One 2025; 20:e0317359. [PMID: 39899513 PMCID: PMC11790092 DOI: 10.1371/journal.pone.0317359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 12/26/2024] [Indexed: 02/05/2025] Open
Abstract
Japanese table grapes are quite expensive because their production is highly labor-intensive. In particular, grape berry pruning is a labor-intensive task performed to produce grapes with desirable characteristics. Because it is considered difficult to master, it is desirable to assist new entrants by using information technology to show the recommended berries to cut. In this research, we aim to build a system that identifies which grape berries should be removed during the pruning process. To realize this, the 3D positions of individual grape berries need to be estimated. Our environmental restriction is that bunches hang from trellises at a height of about 1.6 meters in the grape orchards outside. It is hard to use depth sensors in such circumstances, and using an omnidirectional camera with a wide field of view is desired for the convenience of shooting videos. Obtaining 3D information of grape berries from videos is challenging because they have textureless surfaces, highly symmetric shapes, and crowded arrangements. For these reasons, it is hard to use conventional 3D reconstruction methods, which rely on matching local unique features. To satisfy the practical constraints of this task, we extend a deep learning-based unsupervised monocular depth estimation method to an omnidirectional camera and propose using it. Our experiments demonstrate the effectiveness of the proposed method for estimating the 3D positions of grape berries in the wild.
Collapse
Affiliation(s)
- Yasuto Tamura
- College of Engineering, Osaka Prefecture University, Sakai, Japan
| | - Yuzuko Utsumi
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| | - Yuka Miwa
- Local Incorporated Administrative Agency Research Institute of Environment, Agriculture and Fisheries, Osaka Prefecture, Habikino, Japan
| | - Masakazu Iwamura
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| | - Koichi Kise
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| |
Collapse
|
322
|
Wang JK, Johnson BA, Chen Z, Zhang H, Szanto D, Woods B, Wall M, Kwon YH, Linton EF, Pouw A, Kupersmith MJ, Garvin MK, Kardon RH. Quantifying the spatial patterns of retinal ganglion cell loss and progression in optic neuropathy by applying a deep learning variational autoencoder approach to optical coherence tomography. FRONTIERS IN OPHTHALMOLOGY 2025; 4:1497848. [PMID: 39963427 PMCID: PMC11830743 DOI: 10.3389/fopht.2024.1497848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 12/16/2024] [Indexed: 02/20/2025]
Abstract
Introduction Glaucoma, optic neuritis (ON), and non-arteritic anterior ischemic optic neuropathy (NAION) produce distinct patterns of retinal ganglion cell (RGC) damage. We propose a booster Variational Autoencoder (bVAE) to capture spatial variations in RGC loss and generate latent space (LS) montage maps that visualize different degrees and spatial patterns of optic nerve bundle injury. Furthermore, the bVAE model is capable of tracking the spatial pattern of RGC thinning over time and classifying the underlying cause. Methods The bVAE model consists of an encoder, a display decoder, and a booster decoder. The encoder decomposes input ganglion cell layer (GCL) thickness maps into two display latent variables (dLVs) and eight booster latent variables (bLVs). The dLVs capture primary spatial patterns of RGC thinning, while the display decoder reconstructs the GCL map and creates the LS montage map. The bLVs add finer spatial details, improving reconstruction accuracy. XGBoost was used to analyze the dLVs and bLVs, estimating normal/abnormal GCL thinning and classifying diseases (glaucoma, ON, and NAION). A total of 10,701 OCT macular scans from 822 subjects were included in this study. Results Incorporating bLVs improved reconstruction accuracy, with the image-based root-mean-square error (RMSE) between input and reconstructed GCL thickness maps decreasing from 5.55 ± 2.29 µm (two dLVs only) to 4.02 ± 1.61 µm (two dLVs and eight bLVs). However, the image-based structural similarity index (SSIM) remained similar (0.91 ± 0.04), indicating that just two dLVs effectively capture the main GCL spatial patterns. For classification, the XGBoost model achieved an AUC of 0.98 for identifying abnormal spatial patterns of GCL thinning over time using the dLVs. Disease classification yielded AUCs of 0.95 for glaucoma, 0.84 for ON, and 0.93 for NAION, with bLVs further increasing the AUCs to 0.96 for glaucoma, 0.93 for ON, and 0.99 for NAION. Conclusion This study presents a novel approach to visualizing and quantifying GCL thinning patterns in optic neuropathies using the bVAE model. The combination of dLVs and bLVs enhances the model's ability to capture key spatial features and predict disease progression. Future work will focus on integrating additional image modalities to further refine the model's diagnostic capabilities.
Collapse
Affiliation(s)
- Jui-Kai Wang
- Center for the Prevention and Treatment of Visual Loss, Iowa City VA Health Care System, Iowa City, IA, United States
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| | - Brett A. Johnson
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| | - Zhi Chen
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States
| | - Honghai Zhang
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States
| | - David Szanto
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Brian Woods
- Department of Ophthalmology, University Hospital Galway, Galway, Ireland
- Department of Physics, School of Natural Sciences, University of Galway, Galway, Ireland
| | - Michael Wall
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| | - Young H. Kwon
- Center for the Prevention and Treatment of Visual Loss, Iowa City VA Health Care System, Iowa City, IA, United States
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| | - Edward F. Linton
- Center for the Prevention and Treatment of Visual Loss, Iowa City VA Health Care System, Iowa City, IA, United States
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| | - Andrew Pouw
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| | - Mark J. Kupersmith
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Ophthalmology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Mona K. Garvin
- Center for the Prevention and Treatment of Visual Loss, Iowa City VA Health Care System, Iowa City, IA, United States
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States
| | - Randy H. Kardon
- Center for the Prevention and Treatment of Visual Loss, Iowa City VA Health Care System, Iowa City, IA, United States
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
323
|
Ha JU, Kim HW, Cho M, Lee MC. Three-Dimensional Visualization Using Proportional Photon Estimation Under Photon-Starved Conditions. SENSORS (BASEL, SWITZERLAND) 2025; 25:893. [PMID: 39943532 PMCID: PMC11819839 DOI: 10.3390/s25030893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 01/29/2025] [Accepted: 01/30/2025] [Indexed: 02/16/2025]
Abstract
In this paper, we propose a new method for three-dimensional (3D) visualization that proportionally estimates the number of photons in the background and the object under photon-starved conditions. Photon-counting integral imaging is one of the techniques for 3D image visualization under photon-starved conditions. However, conventional photon-counting integral imaging has the problem that a random noise is generated in the background of the image by estimating the same number of photons in entire areas of images. On the other hand, our proposed method reduces the random noise by estimating the proportional number of photons in the background and the object. In addition, the spatial overlaps have been applied to the space where photons overlap to obtain the enhanced 3D images. To demonstrate the feasibility of our proposed method, we conducted optical experiments and calculated the performance metrics such as normalized cross-correlation, peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM). For SSIM of 3D visualization results by our proposed method and conventional method, our proposed method achieves about 3.42 times higher SSIM than conventional method. Therefore, our proposed method can obtain better 3D visualization of objects than conventional photon-counting integral imaging methods under photon-starved conditions.
Collapse
Affiliation(s)
- Jin-Ung Ha
- Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi 820-8502, Japan; (J.-U.H.); (H.-W.K.)
| | - Hyun-Woo Kim
- Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi 820-8502, Japan; (J.-U.H.); (H.-W.K.)
| | - Myungjin Cho
- School of ICT, Robotics, and Mechanical Engineering, Hankyong National University, IITC, 327 Chungang-ro, Anseong 17579, Republic of Korea
| | - Min-Chul Lee
- Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi 820-8502, Japan; (J.-U.H.); (H.-W.K.)
| |
Collapse
|
324
|
Zhang M, Bai H, Shang W, Guo J, Li Y, Gao X. MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality Enhancement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2410-2422. [PMID: 38285580 DOI: 10.1109/tnnls.2024.3354982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Deep learning methods have achieved impressive performance in compressed video quality enhancement tasks. However, these methods rely excessively on practical experience by manually designing the network structure and do not fully exploit the potential of the feature information contained in the video sequences, i.e., not taking full advantage of the multiscale similarity of the compressed artifact information and not seriously considering the impact of the partition boundaries in the compressed video on the overall video quality. In this article, we propose a novel Mixed Difference Equation inspired Transformer (MDEformer) for compressed video quality enhancement, which provides a relatively reliable principle to guide the network design and yields a new insight into the interpretable transformer. Specifically, drawing on the graphical concept of the mixed difference equation (MDE), we utilize multiple cross-layer cross-attention aggregation (CCA) modules to establish long-range dependencies between encoders and decoders of the transformer, where partition boundary smoothing (PBS) modules are inserted as feedforward networks. The CCA module can make full use of the multiscale similarity of compression artifacts to effectively remove compression artifacts, and recover the texture and detail information of the frame. The PBS module leverages the sensitivity of smoothing convolution to partition boundaries to eliminate the impact of partition boundaries on the quality of compressed video and improve its overall quality, while not having too much impacts on non-boundary pixels. Extensive experiments on the MFQE 2.0 dataset demonstrate that the proposed MDEformer can eliminate compression artifacts for improving the quality of the compressed video, and surpasses the state-of-the-arts (SOTAs) in terms of both objective metrics and visual quality.
Collapse
|
325
|
Wang Y, Wang D, Zhong L, Zhou Y, Wang Q, Chen W, Qi L. Cross-sectional imaging of speed-of-sound distribution using photoacoustic reversal beacons. PHOTOACOUSTICS 2025; 41:100666. [PMID: 39850092 PMCID: PMC11754137 DOI: 10.1016/j.pacs.2024.100666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 10/28/2024] [Accepted: 11/07/2024] [Indexed: 01/25/2025]
Abstract
Photoacoustic tomography (PAT) enables non-invasive cross-sectional imaging of biological tissues, but it fails to map the spatial variation of speed-of-sound (SOS) within tissues. While SOS is intimately linked to density and elastic modulus of tissues, the imaging of SOS distribution serves as a complementary imaging modality to PAT. Moreover, an accurate SOS map can be leveraged to correct for PAT image degradation arising from acoustic heterogeneities. Herein, we propose a method for SOS imaging using scanned photoacoustic beacons excited by short laser pulse with inversion reconstruction. Our method is based on photoacoustic reversal beacons (PRBs), which are small light-absorbing targets with strong photoacoustic contrast. We excite and scan a number of PRBs positioned at the periphery of the target, and the generated photoacoustic waves propagate through the target from various directions, thereby achieve spatial sampling of the internal SOS. By picking up the PRB signal using a graph-based dynamic programing algorithm, we formulate a linear inverse model for pixel-wise SOS reconstruction and solve it with iterative optimization technique. We validate the feasibility of the proposed method through simulations, phantoms, and ex vivo biological tissue tests. Experimental results demonstrate that our approach can achieve accurate reconstruction of SOS distribution. Leveraging the obtained SOS map, we further demonstrate significantly enhanced PAT image reconstruction with acoustic correction.
Collapse
Affiliation(s)
- Yang Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Danni Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Liting Zhong
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Yi Zhou
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Qing Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Wufan Chen
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Li Qi
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong 510515, China
| |
Collapse
|
326
|
Wang L, Meng YC, Qian Y. MSD-Net: Multi-scale dense convolutional neural network for photoacoustic image reconstruction with sparse data. PHOTOACOUSTICS 2025; 41:100679. [PMID: 39802237 PMCID: PMC11720879 DOI: 10.1016/j.pacs.2024.100679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/20/2024] [Accepted: 12/10/2024] [Indexed: 01/16/2025]
Abstract
Photoacoustic imaging (PAI) is an emerging hybrid imaging technology that combines the advantages of optical and ultrasound imaging. Despite its excellent imaging capabilities, PAI still faces numerous challenges in clinical applications, particularly sparse spatial sampling and limited view detection. These limitations often result in severe streak artifacts and blurring when using standard methods to reconstruct images from incomplete data. In this work, we propose an improved convolutional neural network (CNN) architecture, called multi-scale dense UNet (MSD-Net), to correct artifacts in 2D photoacoustic tomography (PAT). MSD-Net exploits the advantages of multi-scale information fusion and dense connections to improve the performance of CNN. Experimental validation with both simulated and in vivo datasets demonstrates that our method achieves better reconstructions with improved speed.
Collapse
Affiliation(s)
- Liangjie Wang
- Institute of Fiber Optics, Shanghai University, Shanghai 201800, China
- Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai University, Shanghai 200444, China
| | - Yi-Chao Meng
- Institute of Fiber Optics, Shanghai University, Shanghai 201800, China
- Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai University, Shanghai 200444, China
| | - Yiming Qian
- Institute of Fiber Optics, Shanghai University, Shanghai 201800, China
- Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai University, Shanghai 200444, China
| |
Collapse
|
327
|
Cui ZX, Cao C, Wang Y, Jia S, Cheng J, Liu X, Zheng H, Liang D, Zhu Y. SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1019-1031. [PMID: 39361455 DOI: 10.1109/tmi.2024.3473009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Diffusion models have emerged as a leading methodology for image generation and have proven successful in the realm of magnetic resonance imaging (MRI) reconstruction. However, existing reconstruction methods based on diffusion models are primarily formulated in the image domain, making the reconstruction quality susceptible to inaccuracies in coil sensitivity maps (CSMs). k-space interpolation methods can effectively address this issue but conventional diffusion models are not readily applicable in k-space interpolation. To overcome this challenge, we introduce a novel approach called SPIRiT-Diffusion, which is a diffusion model for k-space interpolation inspired by the iterative self-consistent SPIRiT method. Specifically, we utilize the iterative solver of the self-consistent term (i.e., k-space physical prior) in SPIRiT to formulate a novel stochastic differential equation (SDE) governing the diffusion process. Subsequently, k-space data can be interpolated by executing the diffusion process. This innovative approach highlights the optimization model's role in designing the SDE in diffusion models, enabling the diffusion process to align closely with the physics inherent in the optimization model-a concept referred to as model-driven diffusion. We evaluated the proposed SPIRiT-Diffusion method using a 3D joint intracranial and carotid vessel wall imaging dataset. The results convincingly demonstrate its superiority over image-domain reconstruction methods, achieving high reconstruction quality even at a substantial acceleration rate of 10. Our code are available at https://github.com/zhyjSIAT/SPIRiT-Diffusion.
Collapse
|
328
|
Qi X, Sun M, Wang Z, Liu J, Li Q, Zhao F, Zhang S, Shan C. Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2182-2195. [PMID: 38113153 DOI: 10.1109/tnnls.2023.3341246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Biphasic face photo-sketch synthesis has significant practical value in wide-ranging fields such as digital entertainment and law enforcement. Previous approaches directly generate the photo-sketch in a global view, they always suffer from the low quality of sketches and complex photograph variations, leading to unnatural and low-fidelity results. In this article, we propose a novel semantic-driven generative adversarial network to address the above issues, cooperating with graph representation learning. Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator to provide style-based spatial information for synthesized face photographs and sketches. In addition, to enhance the authenticity of details in generated faces, we construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the intraclass semantic graph (IASG) and the interclass structure graph (IRSG). Specifically, the IASG effectively models the intraclass semantic correlations of each facial semantic component, thus producing realistic facial details. To preserve the generated faces being more structure-coordinated, the IRSG models interclass structural relations among every facial component by graph representation learning. To further enhance the perceptual quality of synthesized images, we present a biphasic interactive cycle training strategy by fully taking advantage of the multilevel feature consistency between the photograph and sketch. Extensive experiments demonstrate that our method outperforms the state-of-the-art competitors on the CUHK Face Sketch (CUFS) and CUHK Face Sketch FERET (CUFSF) datasets.
Collapse
|
329
|
Zhang T, Pang H, Wu Y, Xu J, Liu L, Li S, Xia S, Chen R, Liang Z, Qi S. BreathVisionNet: A pulmonary-function-guided CNN-transformer hybrid model for expiratory CT image synthesis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 259:108516. [PMID: 39571504 DOI: 10.1016/j.cmpb.2024.108516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 10/15/2024] [Accepted: 11/13/2024] [Indexed: 12/11/2024]
Abstract
BACKGROUND AND OBJECTIVE Chronic obstructive pulmonary disease (COPD) has high heterogeneity in etiologies and clinical manifestations. Expiratory Computed tomography (CT) can effectively assess air trapping, aiding in disease diagnosis. However, due to concerns about radiation exposure and cost, expiratory CT is not routinely performed. Recent work on synthesizing expiratory CT has primarily focused on imaging features while neglecting patient-specific pulmonary function. METHODS To address these issues, we developed a novel model named BreathVisionNet that incorporates pulmonary function data to guide the synthesis of expiratory CT from inspiratory CT. An architecture combining a convolutional neural network and transformer is introduced to leverage the irregular phenotypic distribution in COPD patients. The model can better understand the long-range and global contexts by incorporating global information into the encoder. The utilization of edge information and multi-view data further enhances the quality of the synthesized CT. Parametric response mapping (PRM) can be estimated by using synthesized expiratory CT and inspiratory CT to quantify COPD phenotypes of the normal, emphysema, and functional small airway disease (fSAD), including their percentages, spatial distributions, and voxel distribution maps. RESULTS BreathVisionNet outperforms other generative models in terms of synthesized image quality. It achieves a mean absolute error, normalized mean square error, structural similarity index and peak signal-to-noise ratio of 78.207 HU, 0.643, 0.847 and 25.828 dB, respectively. Comparing the predicted and real PRM, the Dice coefficient can reach 0.732 (emphysema) and 0.560 (fSAD). The mean of differences between true and predicted fSAD percentage is 4.42 for the development dataset (low radiation dose CT scans), and 9.05 for an independent external validation dataset (routine dose), indicating that model has great generalizability. A classifier trained on voxel distribution maps can achieve an accuracy of 0.891 in predicting the presence of COPD. CONCLUSIONS BreathVisionNet can accurately synthesize expiratory CT images from inspiratory CT and predict their voxel distribution. The estimated PRM can help to quantify COPD phenotypes of the normal, emphysema, and fSAD. This capability provides additional insights into COPD diversity while only inspiratory CT images are available.
Collapse
Affiliation(s)
- Tiande Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China
| | - Haowen Pang
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing, China
| | - Yanan Wu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Jiaxuan Xu
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Lingkai Liu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China
| | - Shang Li
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China
| | - Shuyue Xia
- Department of Respiratory and Critical Care Medicine, Central Hospital Affiliated to Shenyang Medical College, Shenyang, China
| | - Rongchang Chen
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China; Hetao Institute of Guangzhou National Laboratory, Guangzhou China
| | - Zhenyu Liang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Shouliang Qi
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; Department of Respiratory and Critical Care Medicine, Central Hospital Affiliated to Shenyang Medical College, Shenyang, China.
| |
Collapse
|
330
|
Liu Z, Fang Y, Li C, Wu H, Liu Y, Shen D, Cui Z. Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1083-1097. [PMID: 39365719 DOI: 10.1109/tmi.2024.3473970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/06/2024]
Abstract
Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging. Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image, leading to considerable radiation exposure. This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses. While recent advances, including deep learning and neural rendering algorithms, have made strides in this area, these methods either produce unsatisfactory results or suffer from time inefficiency of individual optimization. In this paper, we introduce a novel geometry-aware encoder-decoder framework to solve this problem. Our framework starts by encoding multi-view 2D features from various 2D X-ray projections with a 2D CNN encoder. Leveraging the geometry of CBCT scanning, it then back-projects the multi-view 2D features into the 3D space to formulate a comprehensive volumetric feature map, followed by a 3D CNN decoder to recover 3D CBCT image. Importantly, our approach respects the geometric relationship between 3D CBCT image and its 2D X-ray projections during feature back projection stage, and enjoys the prior knowledge learned from the data population. This ensures its adaptability in dealing with extremely sparse view inputs without individual training, such as scenarios with only 5 or 10 X-ray projections. Extensive evaluations on two simulated datasets and one real-world dataset demonstrate exceptional reconstruction quality and time efficiency of our method.
Collapse
|
331
|
Wang H, Lou R, Wang Y, Hao L, Wang Q, Li R, Su J, Liu S, Zhou X, Gao X, Hao Q, Chen Z, Xu Y, Wu C, Zheng Y, Guo Q, Bai L. Parallel gut-to-brain pathways orchestrate feeding behaviors. Nat Neurosci 2025; 28:320-335. [PMID: 39627537 DOI: 10.1038/s41593-024-01828-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 10/29/2024] [Indexed: 02/08/2025]
Abstract
The caudal nucleus of the solitary tract (cNTS) in the brainstem serves as a hub for integrating interoceptive cues from diverse sensory pathways. However, the mechanisms by which cNTS neurons transform these signals into behaviors remain debated. We analyzed 18 cNTS-Cre mouse lines and cataloged the dynamics of nine cNTS cell types during feeding. We show that Th+ cNTS neurons encode esophageal mechanical distension and transient gulp size via vagal afferent inputs, providing quick feedback regulation of ingestion speed. By contrast, Gcg+ cNTS neurons monitor intestinal nutrients and cumulative ingested calories and have long-term effects on food satiation and preference. These nutritive signals are conveyed through a portal vein-spinal ascending pathway rather than vagal sensory neurons. Our findings underscore distinctions among cNTS subtypes marked by differences in temporal dynamics, sensory modalities, associated visceral organs and ascending sensory pathways, all of which contribute to specific functions in coordinated feeding regulation.
Collapse
Affiliation(s)
- Hongyun Wang
- Chinese Institute for Brain Research, Beijing, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Runxiang Lou
- Chinese Institute for Brain Research, Beijing, China
| | - Yunfeng Wang
- Chinese Institute for Brain Research, Beijing, China
| | - Liufang Hao
- Chinese Institute for Brain Research, Beijing, China
| | - Qiushi Wang
- Chinese Institute for Brain Research, Beijing, China
| | - Rui Li
- Chinese Institute for Brain Research, Beijing, China
- State Key Laboratory of Cognitive Neuroscience and Leaning, Beijing Normal University, Beijing, China
| | - Jiayi Su
- Chinese Institute for Brain Research, Beijing, China
| | - Shuhan Liu
- Chinese Institute for Brain Research, Beijing, China
- State Key Laboratory of Cognitive Neuroscience and Leaning, Beijing Normal University, Beijing, China
| | - Xiangyu Zhou
- Chinese Institute for Brain Research, Beijing, China
| | - Xinwei Gao
- Chinese Institute for Brain Research, Beijing, China
| | - Qianxi Hao
- Chinese Institute for Brain Research, Beijing, China
| | - Zihe Chen
- Chinese Institute for Brain Research, Beijing, China
| | - Yibo Xu
- Chinese Institute for Brain Research, Beijing, China
| | - Chongwei Wu
- Chinese Institute for Brain Research, Beijing, China
| | - Yang Zheng
- Chinese Institute for Brain Research, Beijing, China
| | - Qingchun Guo
- Chinese Institute for Brain Research, Beijing, China
- School of Biomedical Engineering, Capital Medical University, Beijing, China
| | - Ling Bai
- Chinese Institute for Brain Research, Beijing, China.
| |
Collapse
|
332
|
Catalán T, Courdurier M, Osses A, Fotaki A, Botnar R, Sahli-Costabal F, Prieto C. Unsupervised reconstruction of accelerated cardiac cine MRI using neural fields. Comput Biol Med 2025; 185:109467. [PMID: 39672009 DOI: 10.1016/j.compbiomed.2024.109467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 11/11/2024] [Accepted: 11/20/2024] [Indexed: 12/15/2024]
Abstract
BACKGROUND Cardiac cine MRI is the gold standard for cardiac functional assessment, but the inherently slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available and might introduce biases. METHODS In this work we propose NF-cMRI, an unsupervised approach based on implicit neural field representations for cardiac cine MRI. We evaluate our method in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 13x, 17x and 26x. RESULTS The proposed method achieves excellent scores in sharpness and robustness to artifacts and comparable or improved spatial-temporal depiction than state-of-the-art conventional and unsupervised deep learning reconstruction techniques. CONCLUSIONS We have demonstrated NF-cMRI potential for cardiac cine MRI reconstruction with highly undersampled data.
Collapse
Affiliation(s)
- Tabita Catalán
- Millennium Nucleus for Applied Control and Inverse Problems, Santiago, Chile; Millennium Institute for Intelligent Healthcare Engineering, Santiago, Chile
| | - Matías Courdurier
- Department of Mathematics, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Axel Osses
- Center for Mathematical Modeling and Department of Mathematical Engineering, Universidad de Chile, Santiago, Chile
| | - Anastasia Fotaki
- Department of Biomedical Engineering, School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - René Botnar
- Millennium Institute for Intelligent Healthcare Engineering, Santiago, Chile; Department of Biomedical Engineering, School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom; Institute for Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile; School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Sahli-Costabal
- Millennium Institute for Intelligent Healthcare Engineering, Santiago, Chile; Institute for Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile; School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile.
| | - Claudia Prieto
- Millennium Nucleus for Applied Control and Inverse Problems, Santiago, Chile; Millennium Institute for Intelligent Healthcare Engineering, Santiago, Chile; Department of Biomedical Engineering, School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom; School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
333
|
Aetesam H, Maji SK, Prasath VBS. Hyperspectral image restoration using noise gradient and dual priors under mixed noise conditions. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2025; 10:72-93. [DOI: 10.1049/cit2.12355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 03/27/2024] [Indexed: 12/06/2024] Open
Abstract
AbstractImages obtained from hyperspectral sensors provide information about the target area that extends beyond the visible portions of the electromagnetic spectrum. However, due to sensor limitations and imperfections during the image acquisition and transmission phases, noise is introduced into the acquired image, which can have a negative impact on downstream analyses such as classification, target tracking, and spectral unmixing. Noise in hyperspectral images (HSI) is modelled as a combination from several sources, including Gaussian/impulse noise, stripes, and deadlines. An HSI restoration method for such a mixed noise model is proposed. First, a joint optimisation framework is proposed for recovering hyperspectral data corrupted by mixed Gaussian‐impulse noise by estimating both the clean data as well as the sparse/impulse noise levels. Second, a hyper‐Laplacian prior is used along both the spatial and spectral dimensions to express sparsity in clean image gradients. Third, to model the sparse nature of impulse noise, an ℓ1 − norm over the impulse noise gradient is used. Because the proposed methodology employs two distinct priors, the authors refer to it as the hyperspectral dual prior (HySpDualP) denoiser. To the best of authors' knowledge, this joint optimisation framework is the first attempt in this direction. To handle the non‐smooth and non‐convex nature of the general ℓp − norm‐based regularisation term, a generalised shrinkage/thresholding (GST) solver is employed. Finally, an efficient split‐Bregman approach is used to solve the resulting optimisation problem. Experimental results on synthetic data and real HSI datacube obtained from hyperspectral sensors demonstrate that the authors’ proposed model outperforms state‐of‐the‐art methods, both visually and in terms of various image quality assessment metrics.
Collapse
Affiliation(s)
- Hazique Aetesam
- Computer Science and Engineering Birla Institute of Technology Mesra Bihar India
| | - Suman Kumar Maji
- Computer Science and Engineering Indian Institute of Technology Patna Bihar India
| | | |
Collapse
|
334
|
Yoon B, Hong S, Lee D. Directional Correspondence Based Cross-Source Point Cloud Registration for USV-AAV Cooperation in Lentic Environments. IEEE Robot Autom Lett 2025; 10:1601-1608. [DOI: 10.1109/lra.2024.3523232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2025]
Affiliation(s)
- Byoungkwon Yoon
- Department of Mechanical Engineering, IAMD and IOER, Seoul National University, Seoul, Republic of Korea
| | - Seokhyun Hong
- Department of Mechanical Engineering, IAMD and IOER, Seoul National University, Seoul, Republic of Korea
| | - Dongjun Lee
- Department of Mechanical Engineering, IAMD and IOER, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
335
|
Vanherle B, Pippi V, Cascianelli S, Michiels N, Van Reeth F, Cucchiara R. VATr++: Choose Your Words Wisely for Handwritten Text Generation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:934-948. [PMID: 39405139 DOI: 10.1109/tpami.2024.3481154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This work extends the VATr (Pippi et al. 2023) Styled-HTG approach by addressing the pre-processing and training issues that it faces, which are common to many HTG models. In particular, we propose generally applicable strategies for input preparation and training regularization that allow the model to achieve better performance and generalization capabilities. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
Collapse
|
336
|
Zhang Z, Zhao S, Jin X, Xu M, Yang Y, Yan S, Wang M. Noise Self-Regression: A New Learning Paradigm to Enhance Low-Light Images Without Task-Related Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1073-1088. [PMID: 39466857 DOI: 10.1109/tpami.2024.3487361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Deep learning-based low-light image enhancement (LLIE) is a task of leveraging deep neural networks to enhance the image illumination while keeping the image content unchanged. From the perspective of training data, existing methods complete the LLIE task driven by one of the following three data types: paired data, unpaired data and zero-reference data. Each type of these data-driven methods has its own advantages, e.g., zero-reference data-based methods have very low requirements on training data and can meet the human needs in many scenarios. In this paper, we leverage pure Gaussian noise to complete the LLIE task, which further reduces the requirements for training data in LLIE tasks and can be used as another alternative in practical use. Specifically, we propose Noise SElf-Regression (NoiSER) without access to any task-related data, simply learns a convolutional neural network equipped with an instance-normalization layer by taking a random noise image, for each pixel, as both input and output for each training pair, and then the low-light image is fed to the trained network for predicting the normal-light image. Technically, an intuitive explanation for its effectiveness is as follows: 1) the self-regression reconstructs the contrast between adjacent pixels of the input image, 2) the instance-normalization layer may naturally remediate the overall magnitude/lighting of the input image, and 3) the assumption for each pixel enforces the output image to follow the well-known gray-world hypothesis (Buchsbaum, 1980) when the image size is big enough. Compared to current state-of-the-art LLIE methods with access to different task-related data, NoiSER is highly competitive in enhancement quality, yet with a much smaller model size, and much lower training and inference cost. In addition, the experiments also demonstrate that NoiSER has great potential in overexposure suppression and joint processing with other restoration tasks.
Collapse
|
337
|
Atalık A, Chopra S, Sodickson DK. Accelerating multi-coil MR image reconstruction using weak supervision. MAGMA (NEW YORK, N.Y.) 2025; 38:37-51. [PMID: 39382814 PMCID: PMC12022963 DOI: 10.1007/s10334-024-01206-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/05/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024]
Abstract
Deep-learning-based MR image reconstruction in settings where large fully sampled dataset collection is infeasible requires methods that effectively use both under-sampled and fully sampled datasets. This paper evaluates a weakly supervised, multi-coil, physics-guided approach to MR image reconstruction, leveraging both dataset types, to improve both the quality and robustness of reconstruction. A physics-guided end-to-end variational network (VarNet) is pretrained in a self-supervised manner using a 4 × under-sampled dataset following the self-supervised learning via data undersampling (SSDU) methodology. The pre-trained weights are transferred to another VarNet, which is fine-tuned using a smaller, fully sampled dataset by optimizing multi-scale structural similarity (MS-SSIM) loss in image space. The proposed methodology is compared with fully self-supervised and fully supervised training. Reconstruction quality improvements in SSIM, PSNR, and NRMSE when abundant training data is available (the high-data regime), and enhanced robustness when training data is scarce (the low-data regime) are demonstrated using weak supervision for knee and brain MR image reconstructions at 8 × and 10 × acceleration, respectively. Multi-coil physics-guided MR image reconstruction using both under-sampled and fully sampled datasets is achievable with transfer learning and fine-tuning. This methodology can provide improved reconstruction quality in the high-data regime and improved robustness in the low-data regime at high acceleration rates.
Collapse
Affiliation(s)
- Arda Atalık
- Center for Data Science, New York University, 60 Fifth Ave, New York, NY, 10011, USA.
- Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, New York University Grossman School of Medicine, New York, NY, 10016, USA.
- Department of Radiology, Center for Advanced Imaging Innovation and Research (CAI2R), New York University Grossman School of Medicine, New York, NY, 10016, USA.
| | - Sumit Chopra
- Courant Institute of Mathematical Sciences, New York University, 60 Fifth Ave, New York, NY, 10011, USA
- Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, New York University Grossman School of Medicine, New York, NY, 10016, USA
- Department of Radiology, Center for Advanced Imaging Innovation and Research (CAI2R), New York University Grossman School of Medicine, New York, NY, 10016, USA
| | - Daniel K Sodickson
- Center for Data Science, New York University, 60 Fifth Ave, New York, NY, 10011, USA
- Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, New York University Grossman School of Medicine, New York, NY, 10016, USA
- Department of Radiology, Center for Advanced Imaging Innovation and Research (CAI2R), New York University Grossman School of Medicine, New York, NY, 10016, USA
| |
Collapse
|
338
|
Xu G, Zhao Z, Zhu Q, Zhu K, Zhang J, Wu D. Myelin water imaging of in vivo and ex vivo human brains using multi-echo gradient echo at 3 T and 7 T. Magn Reson Med 2025; 93:803-813. [PMID: 39370873 DOI: 10.1002/mrm.30310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 08/08/2024] [Accepted: 09/03/2024] [Indexed: 10/08/2024]
Abstract
PURPOSE To compare the myelin water fraction (MWF) measurements between 3 T and 7 T and between in vivo and ex vivo human brains, and to investigate the relationship between multi-echo gradient-echo (mGRE)-based 3D MWF and myelin content using histological staining, which has not been validated in the human brain. METHODS In this study, we performed 3D mGRE-based MWF measurements on five ex vivo human brain hemispheres and five healthy volunteers at 3 T and 7 T with 1 mm isotropic resolution. The data were fitted with theT 2 * $$ {\mathrm{T}}_2^{\ast } $$ based on a three compartment complex-valued model to estimate MWF. We obtained myelin basic protein (MBP) staining from two tissue blocks and co-registered the MWF map and histology image for voxel-wise correlation between the two. RESULTS The MWF values measured from 7 T were overall higher than 7 T, but data between the two field strength demonstrated high correlations both in vivo (r = 0.88) and ex vivo (r = 0.83) across 19 white matter regions. Moreover, the MWF measurements showed a good agreement between in vivo and ex vivo assessments at 3 T (r = 0.61) and 7 T (r = 0.54). Based on MBP staining, the MWF values exhibited strong positive correlations with myelin content on both 3 T (r = 0.68 and r = 0.78 for the two tissue blocks) and 7 T (r = 0.64 and r = 0.82 for the two tissue blocks). CONCLUSION The findings demonstrated that the mGRE-based MWF mapping can be used to quantify myelin content in the human brain, despite the field-strength dependency of the measurements.
Collapse
Affiliation(s)
- Guojun Xu
- Key Laboratory for Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Zhiyong Zhao
- Children's Hospital, National Clinical Research Center for Child Health, Zhejiang University School of Medicine, Hangzhou, China
| | - Qinfeng Zhu
- Key Laboratory for Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Keqing Zhu
- China Brain Bank and Department of Neurology in Second Affiliated Hospital, Key Laboratory of Medical Neurobiology of Zhejiang Province, and Department of Neurobiology, Zhejiang University School of Medicine, Hangzhou, China
- Department of Pathology, The First Affiliated Hospital and School of Medicine, Zhejiang University, Hangzhou, China
| | - Jing Zhang
- China Brain Bank and Department of Neurology in Second Affiliated Hospital, Key Laboratory of Medical Neurobiology of Zhejiang Province, and Department of Neurobiology, Zhejiang University School of Medicine, Hangzhou, China
- Department of Pathology, The First Affiliated Hospital and School of Medicine, Zhejiang University, Hangzhou, China
| | - Dan Wu
- Key Laboratory for Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
339
|
Dyken L, Usher W, Kumar S. Interactive Isosurface Visualization in Memory Constrained Environments Using Deep Learning and Speculative Raycasting. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1582-1597. [PMID: 38941206 DOI: 10.1109/tvcg.2024.3420225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
New web technologies have enabled the deployment of powerful GPU-based computational pipelines that run entirely in the web browser, opening a new frontier for accessible scientific visualization applications. However, these new capabilities do not address the memory constraints of lightweight end-user devices encountered when attempting to visualize the massive data sets produced by today's simulations and data acquisition systems. We propose a novel implicit isosurface rendering algorithm for interactive visualization of massive volumes within a small memory footprint. We achieve this by progressively traversing a wavefront of rays through the volume and decompressing blocks of the data on-demand to perform implicit ray-isosurface intersections, displaying intermediate results each pass. We improve the quality of these intermediate results using a pretrained deep neural network that reconstructs the output of early passes, allowing for interactivity with better approximates of the final image. To accelerate rendering and increase GPU utilization, we introduce speculative ray-block intersection into our algorithm, where additional blocks are traversed and intersected speculatively along rays to exploit additional parallelism in the workload. Our algorithm is able to trade-off image quality to greatly decrease rendering time for interactive rendering even on lightweight devices. Our entire pipeline is run in parallel on the GPU to leverage the parallel computing power that is available even on lightweight end-user devices. We compare our algorithm to the state of the art in low-overhead isosurface extraction and demonstrate that it achieves - reductions in memory overhead and up to reductions in data decompressed.
Collapse
|
340
|
Lomoio U, Veltri P, Guzzi PH, Liò P. Design and use of a Denoising Convolutional Autoencoder for reconstructing electrocardiogram signals at super resolution. Artif Intell Med 2025; 160:103058. [PMID: 39742614 DOI: 10.1016/j.artmed.2024.103058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 12/06/2024] [Accepted: 12/15/2024] [Indexed: 01/03/2025]
Abstract
Electrocardiogram signals play a pivotal role in cardiovascular diagnostics, providing essential information on electrical hearth activity. However, inherent noise and limited resolution can hinder an accurate interpretation of the recordings. In this paper an advanced Denoising Convolutional Autoencoder designed to process electrocardiogram signals, generating super-resolution reconstructions is proposed; this is followed by in-depth analysis of the enhanced signals. The autoencoder receives a signal window (of 5 s) sampled at 50 Hz (low resolution) as input and reconstructs a denoised super-resolution signal at 500 Hz. The proposed autoencoder is applied to publicly available datasets, demonstrating optimal performance in reconstructing high-resolution signals from very low-resolution inputs sampled at 50 Hz. The results were then compared with current state-of-the-art for electrocardiogram super-resolution, demonstrating the effectiveness of the proposed method. The method achieves a signal-to-noise ratio of 12.20 dB, a mean squared error of 0.0044, and a root mean squared error of 4.86%, which significantly outperforms current state-of-the-art alternatives. This framework can effectively enhance hidden information within signals, aiding in the detection of heart-related diseases.
Collapse
Affiliation(s)
- Ugo Lomoio
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Italy.
| | | | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Italy.
| | - Pietro Liò
- Department of Computer Science and Technology, Cambridge University, Cambridge, United Kingdom.
| |
Collapse
|
341
|
Lu F, Zhou D, Chen H, Liu S, Ling X, Zhu L, Gong T, Sheng B, Liao X, Jin H, Li P, Feng DD. S2P-Matching: Self-Supervised Patch-Based Matching Using Transformer for Capsule Endoscopic Images Stitching. IEEE Trans Biomed Eng 2025; 72:540-551. [PMID: 39302789 DOI: 10.1109/tbme.2024.3462502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
The Magnetically Controlled Capsule Endoscopy (MCCE) has a limited shooting range, resulting in capturing numerous fragmented images and an inability to precisely locate and examine the region of interest (ROI) as traditional endoscopy can. Addressing this issue, image stitching around the ROI can be employed to aid in the diagnosis of gastrointestinal (GI) tract conditions. However, MCCE images possess unique characteristics, such as weak texture, close-up shooting, and large angle rotation, presenting challenges to current image-matching methods. In this context, a method named S2P-Matching is proposed for self-supervised patch-based matching in MCCE image stitching. The method involves augmenting the raw data by simulating the capsule endoscopic camera's behavior around the GI tract's ROI. Subsequently, an improved contrast learning encoder is utilized to extract local features, represented as deep feature descriptors. This encoder comprises two branches that extract distinct scale features, which are combined over the channel without manual labeling. The data-driven descriptors are then input into a Transformer model to obtain patch-level matches by learning the globally consented matching priors in the pseudo-ground-truth match pairs. Finally, the patch-level matching is refined and filtered to the pixel-level. The experimental results on real-world MCCE images demonstrate that S2P-Matching provides enhanced accuracy in addressing challenging issues in the GI tract environment with image parallax. The performance improvement can reach up to 203 and 55.8% in terms of NCM (Number of Correct Matches) and SR (Success Rate), respectively. This approach is expected to facilitate the wide adoption of MCCE-based gastrointestinal screening.
Collapse
|
342
|
Bian W, Jang A, Liu F. Multi-task magnetic resonance imaging reconstruction using meta-learning. Magn Reson Imaging 2025; 116:110278. [PMID: 39580007 PMCID: PMC11645196 DOI: 10.1016/j.mri.2024.110278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 08/30/2024] [Accepted: 11/13/2024] [Indexed: 11/25/2024]
Abstract
Using single-task deep learning methods to reconstruct Magnetic Resonance Imaging (MRI) data acquired with different imaging sequences is inherently challenging. The trained deep learning model typically lacks generalizability, and the dissimilarity among image datasets with different types of contrast leads to suboptimal learning performance. This paper proposes a meta-learning approach to efficiently learn image features from multiple MRI datasets. Our algorithm can perform multi-task learning to simultaneously reconstruct MRI images acquired using different imaging sequences with various image contrasts. We have developed a proximal gradient descent-inspired optimization method to learn image features across image and k-space domains, ensuring high-performance learning for every image contrast. Meanwhile, meta-learning, a "learning-to-learn" process, is incorporated into our framework to improve the learning of mutual features embedded in multiple image contrasts. The experimental results reveal that our proposed multi-task meta-learning approach surpasses state-of-the-art single-task learning methods at high acceleration rates. Our meta-learning consistently delivers accurate and detailed reconstructions, achieves the lowest pixel-wise errors, and significantly enhances qualitative performance across all tested acceleration rates. We have demonstrated the ability of our new meta-learning reconstruction method to successfully reconstruct highly-undersampled k-space data from multiple MRI datasets simultaneously, outperforming other compelling reconstruction methods previously developed for single-task learning.
Collapse
Affiliation(s)
- Wanyu Bian
- Harvard Medical School, Boston, MA 02115, USA; Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA 02129, USA
| | - Albert Jang
- Harvard Medical School, Boston, MA 02115, USA; Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA 02129, USA
| | - Fang Liu
- Harvard Medical School, Boston, MA 02115, USA; Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA 02129, USA.
| |
Collapse
|
343
|
Zhang B, Suo J, Dai Q. Event-Enhanced Snapshot Compressive Videography at 10K FPS. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1266-1278. [PMID: 39527439 DOI: 10.1109/tpami.2024.3496788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and only frames at corresponding temporal intervals can be reconstructed, while the dynamics occurring between consecutive frames are lost. To unlock the potential of conventional snapshot compressive videography, we propose a novel hybrid "intensity event imaging scheme by incorporating an event camera into a video SCI setup. Our proposed system consists of a dual-path optical setup to record the coded intensity measurement and intermediate event signals simultaneously, which is compact and photon-efficient by collecting the half photons discarded in conventional video SCI. Correspondingly, we developed a dual-branch Transformer utilizing the reciprocal relationship between two data modes to decode dense video frames. Extensive experiments on both simulated and real-captured data demonstrate our superiority to state-of-the-art video SCI and video frame interpolation (VFI) methods. Benefiting from the new hybrid design leveraging both intrinsic redundancy in videos and the unique feature of event cameras, we achieve high-quality videography at 0.1ms time intervals with a low-cost CMOS image sensor working at 24 FPS.
Collapse
|
344
|
Yu Y, Zhang P, Zhang K, Luo W, Li C. Multiprior Learning Via Neural Architecture Search for Blind Face Restoration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3057-3070. [PMID: 38090868 DOI: 10.1109/tnnls.2023.3339614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Blind face restoration (BFR) aims to recover high-quality (HQ) face images from low-quality (LQ) ones and usually resorts to facial priors for improving restoration performance. However, current methods still suffer from two major difficulties: 1) how to derive a powerful network architecture without extensive hand tuning and 2) how to capture complementary information from multiple facial priors in one network to improve restoration performance. To this end, we propose a face restoration searching network (FRSNet) to adaptively search the suitable feature extraction architecture within our specified search space, which can directly contribute to the restoration quality. On the basis of FRSNet, we further design our multiple facial prior searching network (MFPSNet) with a multiprior learning scheme. MFPSNet optimally extracts information from diverse facial priors and fuses the information into image features, ensuring that both external guidance and internal features are reserved. In this way, MFPSNet takes full advantage of semantic-level (parsing maps), geometric-level (facial heat maps), reference-level (facial dictionaries), and pixel-level (degraded images) information and, thus, generates faithful and realistic images. Quantitative and qualitative experiments show that the MFPSNet performs favorably on both synthetic and real-world datasets against the state-of-the-art (SOTA) BFR methods. The codes are publicly available at: https://github.com/YYJ1anG/MFPSNet.
Collapse
|
345
|
Nishioka N, Shimizu Y, Kaneko Y, Shirai T, Suzuki A, Amemiya T, Ochi H, Bito Y, Takizawa M, Ikebe Y, Kameda H, Harada T, Fujima N, Kudo K. Accelerating FLAIR imaging via deep learning reconstruction: potential for evaluating white matter hyperintensities. Jpn J Radiol 2025; 43:200-209. [PMID: 39316286 PMCID: PMC11790734 DOI: 10.1007/s11604-024-01666-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/16/2024] [Indexed: 09/25/2024]
Abstract
PURPOSE To evaluate deep learning-reconstructed (DLR)-fluid-attenuated inversion recovery (FLAIR) images generated from undersampled data, compare them with fully sampled and rapidly acquired FLAIR images, and assess their potential for white matter hyperintensity evaluation. MATERIALS AND METHODS We examined 30 patients with white matter hyperintensities, obtaining fully sampled FLAIR images (standard FLAIR, std-FLAIR). We created accelerated FLAIR (acc-FLAIR) images using one-third of the fully sampled data and applied deep learning to generate DLR-FLAIR images. Three neuroradiologists assessed the quality (amount of noise and gray/white matter contrast) in all three image types. The reproducibility of hyperintensities was evaluated by comparing a subset of 100 hyperintensities in acc-FLAIR and DLR-FLAIR images with those in the std-FLAIR images. Quantitatively, similarities and errors of the entire image and the focused regions on white matter hyperintensities in acc-FLAIR and DLR-FLAIR images were measured against std-FLAIR images using structural similarity index measure (SSIM), regional SSIM, normalized root mean square error (NRMSE), and regional NRMSE values. RESULTS All three neuroradiologists evaluated DLR-FLAIR as having significantly less noise and higher image quality scores compared with std-FLAIR and acc-FLAIR (p < 0.001). All three neuroradiologists assigned significantly higher frontal lobe gray/white matter visibility scores for DLR-FLAIR than for acc-FLAIR (p < 0.001); two neuroradiologists attributed significantly higher scores for DLR-FLAIR than for std-FLAIR (p < 0.05). Regarding white matter hyperintensities, all three neuroradiologists significantly preferred DLR-FLAIR (p < 0.0001). DLR-FLAIR exhibited higher similarity to std-FLAIR in terms of visibility of the hyperintensities, with 97% of the hyperintensities rated as nearly identical or equivalent. Quantitatively, DLR-FLAIR demonstrated significantly higher SSIM and regional SSIM values than acc-FLAIR, with significantly lower NRMSE and regional NRMSE values (p < 0.0001). CONCLUSIONS DLR-FLAIR can reduce scan time and generate images of similar quality to std-FLAIR in patients with white matter hyperintensities. Therefore, DLR-FLAIR may serve as an effective method in traditional magnetic resonance imaging protocols.
Collapse
Affiliation(s)
- Noriko Nishioka
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan
- Department of Diagnostic Imaging, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Sapporo, Japan
| | - Yukie Shimizu
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan.
- Department of Diagnostic Imaging, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Sapporo, Japan.
| | - Yukio Kaneko
- Medical Systems Research & Development Center, FUJIFILM Corporation, Tokyo, Japan
| | - Toru Shirai
- Medical Systems Research & Development Center, FUJIFILM Corporation, Tokyo, Japan
| | - Atsuro Suzuki
- Medical Systems Research & Development Center, FUJIFILM Corporation, Tokyo, Japan
| | - Tomoki Amemiya
- Medical Systems Research & Development Center, FUJIFILM Corporation, Tokyo, Japan
| | - Hisaaki Ochi
- Medical Systems Research & Development Center, FUJIFILM Corporation, Tokyo, Japan
| | - Yoshitaka Bito
- Department of Diagnostic Imaging, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Sapporo, Japan
- FUJIFILM Healthcare Corporation, Tokyo, Japan
| | | | - Yohei Ikebe
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan
- Center for Cause of Death Investigation, Faculty of Medicine, Hokkaido University, Sapporo, Japan
| | - Hiroyuki Kameda
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan
- Faculty of Dental Medicine, Department of Radiology, Hokkaido University, Sapporo, Japan
| | - Taisuke Harada
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan
- Department of Diagnostic Imaging, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Sapporo, Japan
| | - Noriyuki Fujima
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan
- Department of Diagnostic Imaging, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Sapporo, Japan
| | - Kohsuke Kudo
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Japan
- Department of Diagnostic Imaging, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Sapporo, Japan
- Center for Cause of Death Investigation, Faculty of Medicine, Hokkaido University, Sapporo, Japan
- Division of Medical AI Education and Research, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| |
Collapse
|
346
|
Yu Z, Bu T, Zhang Y, Jia S, Huang T, Liu JK. Robust Decoding of Rich Dynamical Visual Scenes With Retinal Spikes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3396-3409. [PMID: 38265909 DOI: 10.1109/tnnls.2024.3351120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Sensory information transmitted to the brain activates neurons to create a series of coping behaviors. Understanding the mechanisms of neural computation and reverse engineering the brain to build intelligent machines requires establishing a robust relationship between stimuli and neural responses. Neural decoding aims to reconstruct the original stimuli that trigger neural responses. With the recent upsurge of artificial intelligence, neural decoding provides an insightful perspective for designing novel algorithms of brain-machine interface. For humans, vision is the dominant contributor to the interaction between the external environment and the brain. In this study, utilizing the retinal neural spike data collected over multi trials with visual stimuli of two movies with different levels of scene complexity, we used a neural network decoder to quantify the decoded visual stimuli with six different metrics for image quality assessment establishing comprehensive inspection of decoding. With the detailed and systematical study of the effect and single and multiple trials of data, different noise in spikes, and blurred images, our results provide an in-depth investigation of decoding dynamical visual scenes using retinal spikes. These results provide insights into the neural coding of visual scenes and services as a guideline for designing next-generation decoding algorithms of neuroprosthesis and other devices of brain-machine interface.
Collapse
|
347
|
Montesuma EF, Mboula FN, Souloumiac A. Recent Advances in Optimal Transport for Machine Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1161-1180. [PMID: 39480719 DOI: 10.1109/tpami.2024.3489030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2024]
Abstract
Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 - 2023, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport and its extensions, such as partial, unbalanced, Gromov and Neural Optimal Transport, and its interplay with Machine Learning practice.
Collapse
|
348
|
Lai Z, Fu Y, Zhang J. Hyperspectral Image Super Resolution With Real Unaligned RGB Guidance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2999-3011. [PMID: 38236669 DOI: 10.1109/tnnls.2023.3340561] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Fusion-based hyperspectral image (HSI) super-resolution has become increasingly prevalent for its capability to integrate high-frequency spatial information from the paired high-resolution (HR) RGB reference (Ref-RGB) image. However, most of the existing methods either heavily rely on the accurate alignment between low-resolution (LR) HSIs and RGB images or can only deal with simulated unaligned RGB images generated by rigid geometric transformations, which weakens their effectiveness for real scenes. In this article, we explore the fusion-based HSI super-resolution with real Ref-RGB images that have both rigid and nonrigid misalignments. To properly address the limitations of existing methods for unaligned reference images, we propose an HSI fusion network (HSIFN) with heterogeneous feature extractions, multistage feature alignments, and attentive feature fusion. Specifically, our network first transforms the input HSI and RGB images into two sets of multiscale features with an HSI encoder and an RGB encoder, respectively. The features of Ref-RGB images are then processed by a multistage alignment module to explicitly align the features of Ref-RGB with the LR HSI. Finally, the aligned features of Ref-RGB are further adjusted by an adaptive attention module to focus more on discriminative regions before sending them to the fusion decoder to generate the reconstructed HR HSI. Additionally, we collect a real-world HSI fusion dataset, consisting of paired HSI and unaligned Ref-RGB, to support the evaluation of the proposed model for real scenes. Extensive experiments are conducted on both simulated and our real-world datasets, and it shows that our method obtains a clear improvement over existing single-image and fusion-based super-resolution methods on quantitative assessment as well as visual comparison. The code and dataset are publicly available at https://zeqiang-lai.github.io/HSI-RefSR/.
Collapse
|
349
|
Xie W, Gan M, Tan X, Li M, Yang W, Wang W. Efficient labeling for fine-tuning chest X-ray bone-suppression networks for pediatric patients. Med Phys 2025; 52:978-992. [PMID: 39546640 PMCID: PMC11788263 DOI: 10.1002/mp.17516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 10/07/2024] [Accepted: 10/25/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND Pneumonia, a major infectious cause of morbidity and mortality among children worldwide, is typically diagnosed using low-dose pediatric chest X-ray [CXR (chest radiography)]. In pediatric CXR images, bone occlusion leads to a risk of missed diagnosis. Deep learning-based bone-suppression networks relying on training data have enabled considerable progress to be achieved in bone suppression in adult CXR images; however, these networks have poor generalizability to pediatric CXR images because of the lack of labeled pediatric CXR images (i.e., bone images vs. soft-tissue images). Dual-energy subtraction imaging approaches are capable of producing labeled adult CXR images; however, their application is limited because they require specialized equipment, and they are infrequently employed in pediatric settings. Traditional image processing-based models can be used to label pediatric CXR images, but they are semiautomatic and have suboptimal performance. PURPOSE We developed an efficient labeling approach for fine-tuning pediatric CXR bone-suppression networks capable of automatically suppressing bone structures in CXR images for pediatric patients without the need for specialized equipment and technologist training. METHODS Three steps were employed to label pediatric CXR images and fine-tune pediatric bone-suppression networks: distance transform-based bone-edge detection, traditional image processing-based bone suppression, and fully automated pediatric bone suppression. In distance transform-based bone-edge detection, bone edges were automatically detected by predicting bone-edge distance-transform images, which were then used as inputs in traditional image processing. In this processing, pediatric CXR images were labeled by obtaining bone images through a series of traditional image processing techniques. Finally, the pediatric bone-suppression network was fine-tuned using the labeled pediatric CXR images. This network was initially pretrained on a public adult dataset comprising 240 adult CXR images (A240) and then fine-tuned and validated on 40 pediatric CXR images (P260_40labeled) from our customized dataset (named P260) through five-fold cross-validation; finally, the network was tested on 220 pediatric CXR images (P260_220unlabeled dataset). RESULTS The distance transform-based bone-edge detection network achieved a mean boundary distance of 1.029. Moreover, the traditional image processing-based bone-suppression model obtained bone images exhibiting a relative Weber contrast of 93.0%. Finally, the fully automated pediatric bone-suppression network achieved a relative mean absolute error of 3.38%, a peak signal-to-noise ratio of 35.5 dB, a structural similarity index measure of 98.1%, and a bone-suppression ratio of 90.1% on P260_40labeled. CONCLUSIONS The proposed fully automated pediatric bone-suppression network, together with the proposed distance transform-based bone-edge detection network, can automatically acquire bone and soft-tissue images solely from CXR images for pediatric patients and has the potential to help diagnose pneumonia in children.
Collapse
Affiliation(s)
- Weijie Xie
- Information and Data Centre, Guangzhou First People's HospitalGuangzhou Medical UniversityGuangzhouChina
- Information and Data Centre, the Second Affiliated Hospital, School of MedicineSouth China University of TechnologyGuangzhouChina
| | - Mengkun Gan
- Information and Data Centre, Guangzhou First People's HospitalGuangzhou Medical UniversityGuangzhouChina
- Information and Data Centre, the Second Affiliated Hospital, School of MedicineSouth China University of TechnologyGuangzhouChina
| | - Xiaocong Tan
- Information and Data Centre, Guangzhou First People's HospitalGuangzhou Medical UniversityGuangzhouChina
- Information and Data Centre, the Second Affiliated Hospital, School of MedicineSouth China University of TechnologyGuangzhouChina
| | - Mujiao Li
- Information and Data Centre, Guangzhou First People's HospitalGuangzhou Medical UniversityGuangzhouChina
- Information and Data Centre, the Second Affiliated Hospital, School of MedicineSouth China University of TechnologyGuangzhouChina
| | - Wei Yang
- School of Biomedical EngineeringSouthern Medical UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Medical Image ProcessingSchool of Biomedical Engineering, Southern Medical UniversityGuangzhouChina
| | - Wenhui Wang
- Information and Data Centre, Guangzhou First People's HospitalGuangzhou Medical UniversityGuangzhouChina
- Information and Data Centre, the Second Affiliated Hospital, School of MedicineSouth China University of TechnologyGuangzhouChina
| |
Collapse
|
350
|
Zong F, Zhu Z, Zhang J, Deng X, Li Z, Ye C, Liu Y. Attention-Based Q-Space Deep Learning Generalized for Accelerated Diffusion Magnetic Resonance Imaging. IEEE J Biomed Health Inform 2025; 29:1176-1188. [PMID: 39471111 DOI: 10.1109/jbhi.2024.3487755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2024]
Abstract
Diffusion magnetic resonance imaging (dMRI) is a non-invasive method for capturing the microanatomical information of tissues by measuring the diffusion weighted signals along multiple directions, which is widely used in the quantification of microstructures. Obtaining microscopic parameters requires dense sampling in the q space, leading to significant time consumption. The most popular approach to accelerating dMRI acquisition is to undersample the q-space data, along with applying deep learning methods to reconstruct quantitative diffusion parameters. However, the reliance on a predetermined q-space sampling strategy often constrains traditional deep learning-based reconstructions. The present study proposed a novel deep learning model, named attention-based q-space deep learning (aqDL), to implement the reconstruction with variable q-space sampling strategies. The aqDL maps dMRI data from different scanning strategies onto a common feature space by using a series of Transformer encoders. The latent features are employed to reconstruct dMRI parameters via a multilayer perceptron. The performance of the aqDL model was assessed utilizing the Human Connectome Project datasets at varying undersampling numbers. To validate its generalizability, the model was further tested on two additional independent datasets. Our results showed that aqDL consistently achieves the highest reconstruction accuracy at various undersampling numbers, regardless of whether variable or predetermined q-space scanning strategies are employed. These findings suggest that aqDL has the potential to be used on general clinical dMRI datasets.
Collapse
|