1
|
Thibeault S, Roy-Beaudry M, Parent S, Kadoury S. Prediction of the upright articulated spine shape in the operating room using conditioned neural kernel fields. Med Image Anal 2025; 100:103400. [PMID: 39622114 DOI: 10.1016/j.media.2024.103400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/07/2024] [Accepted: 11/19/2024] [Indexed: 12/16/2024]
Abstract
Anterior vertebral tethering (AVT) is a non-invasive spine surgery technique, treating severe spine deformations and preserving lower back mobility. However, patient positioning and surgical strategies greatly influences postoperative results. Predicting the upright geometry from pediatric spines is needed to optimize patient positioning in the operating room (OR) and improve surgical outcomes, but remains a complex task due to immature bone properties. We propose a framework used in the OR predicting the upright spine geometry at the first visit following surgery in idiopathic scoliosis patients. The approach first creates a 3D model of the spine while the patient is on the operating table. For this, multiview Transformers that combine images from different viewpoints are used to generate the intraoperative pose. The postoperative upright shape is then predicted on-the-fly using implicit neural fields, which are trained from geometries at different time points and conditioned with surgical parameters. A Signed Distance Function for shape constellations is used to handle the variability in spine appearance, capturing a disentangled latent domain of the articulation vectors, with separate encoding vectors representing both articulation and shape parameters. A regularization criterion based on a pre-trained group-wise trajectory of spine transformations generates complete spine models. A training set of 652 patients with 3D models was used to train the model, tested on a distinct cohort of 83 surgical patients. The framework based on neural kernels predicted upright 3D geometries with a mean 3D error of 1.3±0.5mm in landmarks points, and IoU of 95.9% in vertebral shapes when compared to actual postop models, falling within the acceptable margins of error below 2 mm.
Collapse
Affiliation(s)
| | | | - Stefan Parent
- Centre de Recherche du CHU Sainte-Justine, Montréal, QC, Canada
| | - Samuel Kadoury
- Centre de Recherche du CHU Sainte-Justine, Montréal, QC, Canada; Polytechnique Montréal, Montréal, QC, Canada.
| |
Collapse
|
2
|
Chen H, Emu Y, Gao J, Chen Z, Aburas A, Hu C. Retrospective motion correction for cardiac multi-parametric mapping with dictionary matching-based image synthesis and a low-rank constraint. Magn Reson Med 2025; 93:550-562. [PMID: 39285623 DOI: 10.1002/mrm.30291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/25/2024] [Accepted: 08/26/2024] [Indexed: 11/30/2024]
Abstract
PURPOSE To develop a model-based motion correction (MoCo) method that does not need an analytical signal model to improve the quality of cardiac multi-parametric mapping. METHODS The proposed method constructs a hybrid loss that includes a dictionary-matching loss and a signal low-rankness loss, where the former registers the multi-contrast original images to a set of motion-free synthetic images and the latter forces the deformed images to be spatiotemporally coherent. We compared the proposed method with non-MoCo, a pairwise registration method (Pairwise-MI), and a groupwise registration method (pTVreg) via a free-breathing Multimapping dataset of 15 healthy subjects, both quantitatively and qualitatively. RESULTS The proposed method achieved the lowest contour tracking errors (epicardium: 2.00 ± 0.39 mm vs 4.93 ± 2.29 mm, 3.50 ± 1.26 mm, and 2.61 ± 1.00 mm, and endocardium: 1.84 ± 0.34 mm vs 4.93 ± 2.40 mm, 3.43 ± 1.27 mm, and 2.55 ± 1.09 mm for the proposed method, non-MoCo, Pairwise-MI, and pTVreg, respectively; all p < 0.01) and the lowest dictionary matching errors among all methods. The proposed method also achieved the highest scores on the visual quality of mapping (T1: 4.74 ± 0.33 vs 2.91 ± 0.82, 3.58 ± 0.87, and 3.97 ± 1.05, and T2: 4.48 ± 0.56 vs 2.59 ± 0.81, 3.56 ± 0.93, and 4.14 ± 0.80 for the proposed method, non-MoCo, Pairwise-MI, and pTVreg, respectively; all p < 0.01). Finally, the proposed method had similar T1 and T2 mean values and SDs relative to the breath-hold reference in nearly all myocardial segments, whereas all other methods led to significantly different T1 and T2 measures and increases of SDs in multiple segments. CONCLUSION The proposed method significantly improves the motion correction accuracy and mapping quality compared with non-MoCo and alternative image-based methods.
Collapse
Affiliation(s)
- Haiyang Chen
- National Engineering Research Center of Advanced Magnetic Resonance Technologies for Diagnosis and Therapy, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yixin Emu
- National Engineering Research Center of Advanced Magnetic Resonance Technologies for Diagnosis and Therapy, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Juan Gao
- National Engineering Research Center of Advanced Magnetic Resonance Technologies for Diagnosis and Therapy, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Zhuo Chen
- National Engineering Research Center of Advanced Magnetic Resonance Technologies for Diagnosis and Therapy, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Ahmed Aburas
- National Engineering Research Center of Advanced Magnetic Resonance Technologies for Diagnosis and Therapy, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Chenxi Hu
- National Engineering Research Center of Advanced Magnetic Resonance Technologies for Diagnosis and Therapy, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
3
|
Chen J, Liu Y, Wei S, Bian Z, Subramanian S, Carass A, Prince JL, Du Y. A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond. Med Image Anal 2025; 100:103385. [PMID: 39612808 PMCID: PMC11730935 DOI: 10.1016/j.media.2024.103385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/27/2024] [Accepted: 11/01/2024] [Indexed: 12/01/2024]
Abstract
Deep learning technologies have dramatically reshaped the field of medical image registration over the past decade. The initial developments, such as regression-based and U-Net-based networks, established the foundation for deep learning in image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regularizations, network architectures, and uncertainty estimation. These advancements have not only enriched the field of image registration but have also facilitated its application in a wide range of tasks, including atlas construction, multi-atlas segmentation, motion estimation, and 2D-3D registration. In this paper, we present a comprehensive overview of the most recent advancements in deep learning-based image registration. We begin with a concise introduction to the core concepts of deep learning-based image registration. Then, we delve into innovative network architectures, loss functions specific to registration, and methods for estimating registration uncertainty. Additionally, this paper explores appropriate evaluation metrics for assessing the performance of deep learning models in registration tasks. Finally, we highlight the practical applications of these novel techniques in medical imaging and discuss the future prospects of deep learning-based image registration.
Collapse
Affiliation(s)
- Junyu Chen
- Department of Radiology and Radiological Science, Johns Hopkins School of Medicine, MD, USA.
| | - Yihao Liu
- Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA
| | - Shuwen Wei
- Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA
| | - Zhangxing Bian
- Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA
| | - Shalini Subramanian
- Department of Radiology and Radiological Science, Johns Hopkins School of Medicine, MD, USA
| | - Aaron Carass
- Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA
| | - Yong Du
- Department of Radiology and Radiological Science, Johns Hopkins School of Medicine, MD, USA
| |
Collapse
|
4
|
Lin H, Song Y, Zhang Q. GMmorph: dynamic spatial matching registration model for 3D medical image based on gated Mamba. Phys Med Biol 2025; 70:035011. [PMID: 39813811 DOI: 10.1088/1361-6560/adaacd] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 01/15/2025] [Indexed: 01/18/2025]
Abstract
Objective.Deformable registration aims to achieve nonlinear alignment of image space by estimating a dense displacement field. It is commonly used as a preprocessing step in clinical and image analysis applications, such as surgical planning, diagnostic assistance, and surgical navigation. We aim to overcome these challenges: Deep learning-based registration methods often struggle with complex displacements and lack effective interaction between global and local feature information. They also neglect the spatial position matching process, leading to insufficient registration accuracy and reduced robustness when handling abnormal tissues.Approach.We propose a dual-branch interactive registration model architecture from the perspective of spatial matching. Implicit regularization is achieved through a consistency loss, enabling the network to balance high accuracy with a low folding rate. We introduced the dynamic matching module between the two branches of the registration, which generates learnable offsets based on all the tokens across the entire resolution range of the base branch features. Using trilinear interpolation, the model adjusts its feature expression range according to the learned offsets, capturing highly flexible positional differences. To facilitate the spatial matching process, we designed the gated mamba layer to globally model pixel-level features by associating all voxel information, while the detail enhancement module, which is based on channel and spatial attention, enhances the richness of local feature details.Main results.Our study explores the model's performance in single-modal and multi-modal image registration, including normal brain, brain tumor, and lung images. We propose unsupervised and semi-supervised registration modes and conduct extensive validation experiments. The results demonstrate that the model achieves state-of-the-art performance across multiple datasets.Significance.By introducing a novel perspective of position matching, the model achieves precise registration of various types of medical data, offering significant clinical value in medical applications.
Collapse
Affiliation(s)
- Hao Lin
- School of Software, Xi'an Jiaotong University, Xi'an City, Shanxi Province 710049, People's Republic of China
| | - Yonghong Song
- School of Software, Xi'an Jiaotong University, Xi'an City, Shanxi Province 710049, People's Republic of China
| | - Qi Zhang
- School of Software, Xi'an Jiaotong University, Xi'an City, Shanxi Province 710049, People's Republic of China
| |
Collapse
|
5
|
Kim JW, Khan AU, Banerjee I. Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-024-01322-4. [PMID: 39871042 DOI: 10.1007/s10278-024-01322-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 10/11/2024] [Accepted: 10/25/2024] [Indexed: 01/29/2025]
Abstract
Vision transformer (ViT)and convolutional neural networks (CNNs) each possess distinct strengths in medical imaging: ViT excels in capturing long-range dependencies through self-attention, while CNNs are adept at extracting local features via spatial convolution filters. While ViT may struggle with capturing detailed local spatial information, critical for tasks like anomaly detection in medical imaging, shallow CNNs often fail to effectively abstract global context. This study aims to explore and evaluate hybrid architectures that integrate ViT and CNN to leverage their complementary strengths for enhanced performance in medical vision tasks, such as segmentation, classification, reconstruction, and prediction. Following PRISMA guideline, a systematic review was conducted on 34 articles published between 2020 and Sept. 2024. These articles proposed novel hybrid ViT-CNN architectures specifically for medical imaging tasks in radiology. The review focused on analyzing architectural variations, merging strategies between ViT and CNN, innovative applications of ViT, and efficiency metrics including parameters, inference time (GFlops), and performance benchmarks. The review identified that integrating ViT and CNN can mitigate the limitations of each architecture offering comprehensive solutions that combine global context understanding with precise local feature extraction. We benchmarked the articles based on architectural variations, merging strategies, innovative uses of ViT, and efficiency metrics (number of parameters, inference time (GFlops), and performance), and derived a ranked list. By synthesizing current literature, this review defines fundamental concepts of hybrid vision transformers and highlights emerging trends in the field. It provides a clear direction for future research aimed at optimizing the integration of ViT and CNN for effective utilization in medical imaging, contributing to advancements in diagnostic accuracy and image analysis. We performed systematic review of hybrid vision transformer architecture using PRISMA guideline and performed thorough comparative analysis to benchmark the architectures.
Collapse
Affiliation(s)
- Ji Woong Kim
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA
| | | | - Imon Banerjee
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA.
- Department of Radiology, Mayo Clinic, Phoenix, AZ, USA.
- Department of Artificial Intelligence and Informatics (AI&I), Mayo Clinic, Scottsdale, AZ, USA.
| |
Collapse
|
6
|
Comte V, Alenya M, Urru A, Recober J, Nakaki A, Crovetto F, Camara O, Gratacós E, Eixarch E, Crispi F, Piella G, Ceresa M, González Ballester MA. Deep cascaded registration and weakly-supervised segmentation of fetal brain MRI. Heliyon 2025; 11:e40148. [PMID: 39816514 PMCID: PMC11732682 DOI: 10.1016/j.heliyon.2024.e40148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 09/28/2024] [Accepted: 11/04/2024] [Indexed: 01/18/2025] Open
Abstract
Deformable image registration is a cornerstone of many medical image analysis applications, particularly in the context of fetal brain magnetic resonance imaging (MRI), where precise registration is essential for studying the rapidly evolving fetal brain during pregnancy and potentially identifying neurodevelopmental abnormalities. While deep learning has become the leading approach for medical image registration, traditional convolutional neural networks (CNNs) often fall short in capturing fine image details due to their bias toward low spatial frequencies. To address this challenge, we introduce a deep learning registration framework comprising multiple cascaded convolutional networks. These networks predict a series of incremental deformation fields that transform the moving image at various spatial frequency levels, ensuring accurate alignment with the fixed image. This multi-resolution approach allows for a more accurate and detailed registration process, capturing both coarse and fine image structures. Our method outperforms existing state-of-the-art techniques, including other multi-resolution strategies, by a substantial margin. Furthermore, we integrate our registration method into a multi-atlas segmentation pipeline and showcase its competitive performance compared to nnU-Net, achieved using only a small subset of annotated images as atlases. This approach is particularly valuable in the context of fetal brain MRI, where annotated datasets are limited. Our pipeline for registration and multi-atlas segmentation is publicly available at https://github.com/ValBcn/CasReg.
Collapse
Affiliation(s)
- Valentin Comte
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
- European Commission, Joint Research Centre (JRC), Geel, Belgium
| | - Mireia Alenya
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Andrea Urru
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Judith Recober
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Ayako Nakaki
- BCNatal, Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), University of Barcelona, Barcelona, Spain
- Institut d’Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Francesca Crovetto
- BCNatal, Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), University of Barcelona, Barcelona, Spain
| | - Oscar Camara
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Eduard Gratacós
- BCNatal, Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), University of Barcelona, Barcelona, Spain
- Institut d’Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Barcelona, Spain
| | - Elisenda Eixarch
- BCNatal, Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), University of Barcelona, Barcelona, Spain
- Institut d’Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Barcelona, Spain
| | - Fatima Crispi
- BCNatal, Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), University of Barcelona, Barcelona, Spain
- Institut d’Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Barcelona, Spain
| | - Gemma Piella
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Mario Ceresa
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Miguel A. González Ballester
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
- ICREA, Barcelona, Spain
| |
Collapse
|
7
|
Duan T, Chen W, Ruan M, Zhang X, Shen S, Gu W. Unsupervised deep learning-based medical image registration: a survey. Phys Med Biol 2025; 70:02TR01. [PMID: 39667278 DOI: 10.1088/1361-6560/ad9e69] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 12/12/2024] [Indexed: 12/14/2024]
Abstract
In recent decades, medical image registration technology has undergone significant development, becoming one of the core technologies in medical image analysis. With the rise of deep learning, deep learning-based medical image registration methods have achieved revolutionary improvements in processing speed and automation, showing great potential, especially in unsupervised learning. This paper briefly introduces the core concepts of deep learning-based unsupervised image registration, followed by an in-depth discussion of innovative network architectures and a detailed review of these studies, highlighting their unique contributions. Additionally, this paper explores commonly used loss functions, datasets, and evaluation metrics. Finally, we discuss the main challenges faced by various categories and propose potential future research topics. This paper surveys the latest advancements in unsupervised deep neural network-based medical image registration methods, aiming to help active readers interested in this field gain a deep understanding of this exciting area.
Collapse
Affiliation(s)
- Taisen Duan
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, People's Republic of China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, People's Republic of China
| | - Wenkang Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, People's Republic of China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, People's Republic of China
| | - Meilin Ruan
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, People's Republic of China
| | - Xuejun Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, People's Republic of China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, People's Republic of China
| | - Shaofei Shen
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, People's Republic of China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, People's Republic of China
| | - Weiyu Gu
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, People's Republic of China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, People's Republic of China
| |
Collapse
|
8
|
Wang Y, Feng Y, Zeng W. Non-Rigid Cycle Consistent Bidirectional Network with Transformer for Unsupervised Deformable Functional Magnetic Resonance Imaging Registration. Brain Sci 2025; 15:46. [PMID: 39851414 PMCID: PMC11764259 DOI: 10.3390/brainsci15010046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 01/02/2025] [Accepted: 01/03/2025] [Indexed: 01/26/2025] Open
Abstract
BACKGROUND In neuroscience research about functional magnetic resonance imaging (fMRI), accurate inter-subject image registration is the basis for effective statistical analysis. Traditional fMRI registration methods are usually based on high-resolution structural MRI with clear anatomical structure features. However, this registration method based on structural information cannot achieve accurate functional consistency between subjects since the functional regions do not necessarily correspond to anatomical structures. In recent years, fMRI registration methods based on functional information have emerged, which usually ignore the importance of structural MRI information. METHODS In this study, we proposed a non-rigid cycle consistent bidirectional network with Transformer for unsupervised deformable functional MRI registration. The work achieves fMRI registration through structural MRI registration, and functional information is introduced to improve registration performance. Specifically, we employ a bidirectional registration network that implements forward and reverse registration between image pairs and apply Transformer in the registration network to establish remote spatial mapping between image voxels. Functional and structural information are integrated by introducing the local functional connectivity pattern, the local functional connectivity features of the whole brain are extracted as functional information. The proposed registration method was experimented on real fMRI datasets, and qualitative and quantitative evaluations of the quality of the registration method were implemented on the test dataset using relevant evaluation metrics. We implemented group ICA analysis in brain functional networks after registration. Functional consistency was evaluated on the resulting t-maps. RESULTS Compared with non-learning-based methods (Affine, Syn) and learning-based methods (Transmorph-tiny, Cyclemorph, VoxelMorph x2), our method improves the peak t-value of t-maps on DMN, VN, CEN, and SMN to 18.7, 16.5, 16.6, and 17.3 and the mean number of suprathreshold voxels (p < 0.05, t > 5.01) on the four networks to 2596.25, and there is an average improvement in peak t-value of 23.79%, 12.74%, 12.27%, 7.32%, and 5.43%. CONCLUSIONS The experimental results show that the registration method of this study improves the structural and functional consistency between fMRI with superior registration performance.
Collapse
Affiliation(s)
| | | | - Weiming Zeng
- Lab of Digital Image and Intelligent Computation, College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; (Y.W.); (Y.F.)
| |
Collapse
|
9
|
Zhong W, Ren X, Zhang H. Automatic X-ray teeth segmentation with grouped attention. Sci Rep 2025; 15:64. [PMID: 39747360 PMCID: PMC11696191 DOI: 10.1038/s41598-024-84629-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 12/25/2024] [Indexed: 01/04/2025] Open
Abstract
Detection and teeth segmentation from X-rays, aiding healthcare professionals in accurately determining the shape and growth trends of teeth. However, small dataset sizes due to patient privacy, high noise, and blurred boundaries between periodontal tissue and teeth pose challenges to the models' transportability and generalizability, making them prone to overfitting. To address these issues, we propose a novel model, named Grouped Attention and Cross-Layer Fusion Network (GCNet). GCNet effectively handles numerous noise points and significant individual differences in the data, achieving stable and precise segmentation on small-scale datasets. The model comprises two core modules: Grouped Global Attention (GGA) modules and Cross-Layer Fusion (CLF) modules. The GGA modules capture and group texture and contour features, while the CLF modules combine these features with deep semantic information to improve prediction. Experimental results on the Children's Dental Panoramic Radiographs dataset show that our model outperformed existing models such as GT-U-Net and Teeth U-Net, with a Dice coefficient of 0.9338, sensitivity of 0.9426, and specificity of 0.9821. The GCNet model also demonstrates clearer segmentation boundaries compared to other models.
Collapse
Affiliation(s)
| | - XiaoXiao Ren
- The University of New South Wales, Sydney, Australia
| | - HanWen Zhang
- The University of New South Wales, Sydney, Australia
| |
Collapse
|
10
|
Liu H, McKenzie E, Xu D, Xu Q, Chin RK, Ruan D, Sheng K. MUsculo-Skeleton-Aware (MUSA) deep learning for anatomically guided head-and-neck CT deformable registration. Med Image Anal 2025; 99:103351. [PMID: 39388843 DOI: 10.1016/j.media.2024.103351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 06/05/2024] [Accepted: 09/16/2024] [Indexed: 10/12/2024]
Abstract
Deep-learning-based deformable image registration (DL-DIR) has demonstrated improved accuracy compared to time-consuming non-DL methods across various anatomical sites. However, DL-DIR is still challenging in heterogeneous tissue regions with large deformation. In fact, several state-of-the-art DL-DIR methods fail to capture the large, anatomically plausible deformation when tested on head-and-neck computed tomography (CT) images. These results allude to the possibility that such complex head-and-neck deformation may be beyond the capacity of a single network structure or a homogeneous smoothness regularization. To address the challenge of combined multi-scale musculoskeletal motion and soft tissue deformation in the head-and-neck region, we propose a MUsculo-Skeleton-Aware (MUSA) framework to anatomically guide DL-DIR by leveraging the explicit multiresolution strategy and the inhomogeneous deformation constraints between the bony structures and soft tissue. The proposed method decomposes the complex deformation into a bulk posture change and residual fine deformation. It can accommodate both inter- and intra- subject registration. Our results show that the MUSA framework can consistently improve registration accuracy and, more importantly, the plausibility of deformation for various network architectures. The code will be publicly available at https://github.com/HengjieLiu/DIR-MUSA.
Collapse
Affiliation(s)
- Hengjie Liu
- Physics and Biology in Medicine Graduate Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Radiation Oncology, University of California Los Angeles, Los Angeles, CA, USA
| | - Elizabeth McKenzie
- Department of Radiation Oncology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Di Xu
- UCSF/UC Berkeley Graduate Program in Bioengineering, University of California San Francisco, San Francisco, CA, USA; Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Qifan Xu
- UCSF/UC Berkeley Graduate Program in Bioengineering, University of California San Francisco, San Francisco, CA, USA; Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Robert K Chin
- Department of Radiation Oncology, University of California Los Angeles, Los Angeles, CA, USA
| | - Dan Ruan
- Physics and Biology in Medicine Graduate Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Radiation Oncology, University of California Los Angeles, Los Angeles, CA, USA
| | - Ke Sheng
- UCSF/UC Berkeley Graduate Program in Bioengineering, University of California San Francisco, San Francisco, CA, USA; Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
11
|
Chilaparasetti AN, Thai A, Gao P, Xu X, Gopi M. RegBoost: Enhancing mouse brain image registration using geometric priors and Laplacian interpolation. Neuroimage 2025; 305:120981. [PMID: 39732220 DOI: 10.1016/j.neuroimage.2024.120981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 11/18/2024] [Accepted: 12/17/2024] [Indexed: 12/30/2024] Open
Abstract
We show in this work that incorporating geometric features and geometry processing algorithms for mouse brain image registration broadens the applicability of registration algorithms and improves the registration accuracy of existing methods. We introduce the preprocessing and postprocessing steps in our proposed framework as RegBoost. We develop a method to align the axis of 3D image stacks by detecting the central planes that pass symmetrically through the image volumes. We then find geometric contours by defining external and internal structures to facilitate image correspondences. We establish Dirichlet boundary conditions at these correspondences and find the displacement map throughout the volume using Laplacian interpolation. We discuss the challenges in our standalone framework and demonstrate how our new approaches can improve the results of existing image registration methods. We expect our new approach and algorithms will have critical applications in brain mapping projects.
Collapse
Affiliation(s)
| | - Andy Thai
- Department of Computer Science, University of California, Irvine, Irvine, CA 92617, USA.
| | - Pan Gao
- Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, Irvine, CA 92617, USA.
| | - Xiangmin Xu
- Department of Computer Science, University of California, Irvine, Irvine, CA 92617, USA; Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, Irvine, CA 92617, USA.
| | - M Gopi
- Department of Computer Science, University of California, Irvine, Irvine, CA 92617, USA.
| |
Collapse
|
12
|
Kim H, Lee M, Kim B, Shin YG, Chung M. Feature-centric registration of large deformed images using transformers and correlation distance. Comput Biol Med 2025; 184:109356. [PMID: 39536389 DOI: 10.1016/j.compbiomed.2024.109356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 10/24/2024] [Accepted: 11/03/2024] [Indexed: 11/16/2024]
Abstract
In deformable medical image registration, both a robust backbone registration network and a suitable similarity metric are essential. This paper introduces a robust registration network combined with a feature-based loss function, specifically designed to handle large deformations and address the challenge of the absence of ground truth data. Tackling large deformations typically requires either expanding the receptive field or breaking down extensive deformations into smaller, more manageable ones. We address this challenge through two key network components: the coarse-to-fine estimation of the target displacement vector field (DVF) and the integration of the Transformer's feature attention mechanism. To further enhance registration performance, we propose a novel feature correlation-based distance metric that leverages the symmetric properties of the correlation matrix to efficiently exploit feature correlations. Additionally, by utilizing the features extracted directly from the registration network, we eliminate the need for additional feature extraction networks. Experimental results demonstrate that our feature correlation-based loss function is particularly effective in achieving accurate registration in the absence of ground truth data. Our method has proven successful in both mono-modality abdomen CT registration and brain MRI atlas registration, leading to improvements in Dice similarity coefficient and other evaluation metrics.
Collapse
Affiliation(s)
- Heeyeon Kim
- School of Software, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, 06978, Seoul, Republic of Korea
| | - Minkyung Lee
- Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, 08826, Seoul, Republic of Korea
| | - Bohyoung Kim
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, 81 Oedae-ro, Mohyeon-myeon, Cheoin-gu, Yongin-si, 17035, Gyeonggi-do, Republic of Korea
| | - Yeong-Gil Shin
- Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, 08826, Seoul, Republic of Korea
| | - Minyoung Chung
- School of Software, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, 06978, Seoul, Republic of Korea.
| |
Collapse
|
13
|
Jiang P, Wu S, Qin W, Xie Y. Complex Large-Deformation Multimodality Image Registration Network for Image-Guided Radiotherapy of Cervical Cancer. Bioengineering (Basel) 2024; 11:1304. [PMID: 39768121 PMCID: PMC11726759 DOI: 10.3390/bioengineering11121304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 12/11/2024] [Accepted: 12/19/2024] [Indexed: 01/16/2025] Open
Abstract
In recent years, image-guided brachytherapy for cervical cancer has become an important treatment method for patients with locally advanced cervical cancer, and multi-modality image registration technology is a key step in this system. However, due to the patient's own movement and other factors, the deformation between the different modalities of images is discontinuous, which brings great difficulties to the registration of pelvic computed tomography (CT/) and magnetic resonance (MR) images. In this paper, we propose a multimodality image registration network based on multistage transformation enhancement features (MTEF) to maintain the continuity of the deformation field. The model uses wavelet transform to extract different components of the image and performs fusion and enhancement processing as the input to the model. The model performs multiple registrations from local to global regions. Then, we propose a novel shared pyramid registration network that can accurately extract features from different modalities, optimizing the predicted deformation field through progressive refinement. In order to improve the registration performance, we also propose a deep learning similarity measurement method combined with bistructural morphology. On the basis of deep learning, bistructural morphology is added to the model to train the pelvic area registration evaluator, and the model can obtain parameters covering large deformation for loss function. The model was verified by the actual clinical data of cervical cancer patients. After a large number of experiments, our proposed model achieved the highest dice similarity coefficient (DSC) metric compared with the state-of-the-art registration methods. The DSC index of the MTEF algorithm is 5.64% higher than that of the TransMorph algorithm. It will effectively integrate multi-modal image information, improve the accuracy of tumor localization, and benefit more cervical cancer patients.
Collapse
Affiliation(s)
- Ping Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; (P.J.); (S.W.); (W.Q.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Sijia Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; (P.J.); (S.W.); (W.Q.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenjian Qin
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; (P.J.); (S.W.); (W.Q.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yaoqin Xie
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; (P.J.); (S.W.); (W.Q.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
14
|
Wang Z, Wang H, Ni D, Xu M, Wang Y. Encoding matching criteria for cross-domain deformable image registration. Med Phys 2024. [PMID: 39688347 DOI: 10.1002/mp.17565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 11/08/2024] [Accepted: 11/30/2024] [Indexed: 12/18/2024] Open
Abstract
BACKGROUND Most existing deep learning-based registration methods are trained on single-type images to address same-domain tasks, resulting in performance degradation when applied to new scenarios. Retraining a model for new scenarios requires extra time and data. Therefore, efficient and accurate solutions for cross-domain deformable registration are in demand. PURPOSE We argue that the tailor-made matching criteria in traditional registration methods is one of the main reason they are applicable in different domains. Motivated by this, we devise a registration-oriented encoder to model the matching criteria of image features and structural features, which is beneficial to boost registration accuracy and adaptability. METHODS Specifically, a general feature encoder (Encoder-G) is proposed to capture comprehensive medical image features, while a structural feature encoder (Encoder-S) is designed to encode the structural self-similarity into the global representation. Moreover, by updating Encoder-S using one-shot learning, our method can effectively adapt to different domains. The efficacy of our method is evaluated using MRI images from three different domains, including brain images (training/testing: 870/90 pairs), abdomen images (training/testing: 1406/90 pairs), and cardiac images (training/testing: 64770/870 pairs). The comparison methods include traditional method (SyN) and cutting-edge deep networks. The evaluation metrics contain dice similarity coefficient (DSC) and average symmetric surface distance (ASSD). RESULTS In the single-domain task, our method attains an average DSC of 68.9%/65.2%/72.8%, and ASSD of 9.75/3.82/1.30 mm on abdomen/cardiac/brain images, outperforming the second-best comparison methods by large margins. In the cross-domain task, without one-shot optimization, our method outperforms other deep networks in five out of six cross-domain scenarios and even surpasses symmetric image normalization method (SyN) in two scenarios. By conducting the one-shot optimization, our method successfully surpasses SyN in all six cross-domain scenarios. CONCLUSIONS Our method yields favorable results in the single-domain task while ensuring improved generalization and adaptation performance in the cross-domain task, showing its feasibility for the challenging cross-domain registration applications. The code is publicly available at https://github.com/JuliusWang-7/EncoderReg.
Collapse
Affiliation(s)
- Zhuoyuan Wang
- Smart Medical Imaging, Learning and Engineering (SMILE) Lab, Medical UltraSound Image Computing (MUSIC) Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Haiqiao Wang
- Smart Medical Imaging, Learning and Engineering (SMILE) Lab, Medical UltraSound Image Computing (MUSIC) Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Dong Ni
- Smart Medical Imaging, Learning and Engineering (SMILE) Lab, Medical UltraSound Image Computing (MUSIC) Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Ming Xu
- Department of Medical Ultrasound, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yi Wang
- Smart Medical Imaging, Learning and Engineering (SMILE) Lab, Medical UltraSound Image Computing (MUSIC) Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| |
Collapse
|
15
|
Zhong W, Zhang H. EF-net: Accurate edge segmentation for segmenting COVID-19 lung infections from CT images. Heliyon 2024; 10:e40580. [PMID: 39669151 PMCID: PMC11635652 DOI: 10.1016/j.heliyon.2024.e40580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 11/19/2024] [Accepted: 11/19/2024] [Indexed: 12/14/2024] Open
Abstract
Despite advances in modern medicine including the use of computed tomography for detecting COVID-19, precise identification and segmentation of lesions remain a significant challenge owing to indistinct boundaries and low degrees of contrast between infected and healthy lung tissues. This study introduces a novel model called the edge-based dual-parallel attention (EDA)-guided feature-filtering network (EF-Net), specifically designed to accurately segment the edges of COVID-19 lesions. The proposed model comprises two modules: an EDA module and a feature-filtering module (FFM). EDA efficiently extracts structural and textural features from low-level features, enabling the precise identification of lesion boundaries. FFM receives semantically rich features from a deep-level encoder and integrates features with abundant texture and contour information obtained from the EDA module. After filtering through a gating mechanism of the FFM, the EDA features are fused with deep-level features, yielding features rich in both semantic and textural information. Experiments demonstrate that our model outperforms existing models including Inf_Net, GFNet, and BSNet considering various metrics, offering better and clearer segmentation results, particularly for segmenting lesion edges. Moreover, superior performance on the three datasets is achieved, with dice coefficients of 98.1, 97.3, and 72.1 %.
Collapse
|
16
|
Huang X, Zhang J, Tang K, Cheng X, Ye C, Wang L. Multilevel network for large deformation image registration based on feature consistency and flow normalization. Med Phys 2024; 51:8962-8978. [PMID: 39302604 DOI: 10.1002/mp.17390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/17/2024] [Accepted: 08/13/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Deformable image registration is an essential technique of medical image analysis, which plays important roles in several clinical applications. Existing deep learning-based registration methods have already achieved promising performance for the registrations with small deformations, while it is still challenging to deal with the large deformation registration due to the limits of the image intensity-similarity-based objective function. PURPOSE To achieve the image registration with large-scale deformations, we proposed a multilevel network architecture FCNet to gradually refine the registration results based on semantic feature consistency constraint and flow normalization (FN) strategy. METHODS At each level of FCNet, the architecture is mainly composed to a FeaExtractor, a FN module, and a spatial transformation module. FeaExtractor consists of three parallel streams which are used to extract the individual features of fixed and moving images, as well as their joint features, respectively. Using these features, the initial deformation field is estimated, which passes through a FN module to refine the deformation field based on the difference map of deformation filed between two adjacent levels. This allows the FCNet to progressively improve the registration performance. Finally, a spatial transformation module is used to get the warped image based on the deformation field. Moreover, in addition to the image intensity-similarity-based objective function, a semantic-feature consistency constraint is also introduced, which can further promote the alignments by imposing the similarity between the fixed and warped image features. To validate the effectiveness of the proposed method, we compared our method with the state-of-the-art methods on three different datasets. In EMPIRE10 dataset, 20, 3, and 7 fixed and moving 3D computer tomography (CT) image pairs were used for training, validation, and testing respectively; in IXI dataset, atlas to individual image registration task was performed, with 3D MR images of 408, 58, and 115 individuals were used for training, validation, and testing respectively; in the in-house dataset, patient to atlas registration task was implemented, with the 3D MR images of 94, 3, and 15 individuals being training, validation, and testing sets, respectively. RESULTS The qualitative and quantitative comparison results demonstrated that the proposed method is beneficial for handling large deformation image registration problems, with the DSC and ASSD improved by at least 1.0% and 25.9% on EMPIRE10 dataset. The ablation experiments also verified the effectiveness of the proposed feature combination strategy, feature consistency constraint, and FN module. CONCLUSIONS Our proposed FCNet enables multiscale registration from coarse to fine, surpassing existing SOTA registration methods and effectively handling long-range spatial relationships.
Collapse
Affiliation(s)
- Xingyu Huang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Jian Zhang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Kun Tang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Xinyu Cheng
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Chen Ye
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Lihui Wang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| |
Collapse
|
17
|
Jiang J, Choi CMS, Thor M, Deasy JO, Veeraraghavan H. Tumor aware recurrent inter-patient deformable image registration of computed tomography scans with lung cancer. Med Phys 2024. [PMID: 39589333 DOI: 10.1002/mp.17536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 09/17/2024] [Accepted: 10/18/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND Voxel-based analysis (VBA) for population level radiotherapy (RT) outcomes modeling requires topology preserving inter-patient deformable image registration (DIR) that preserves tumors on moving images while avoiding unrealistic deformations due to tumors occurring on fixed images. PURPOSE We developed a tumor-aware recurrent registration (TRACER) deep learning (DL) method and evaluated its suitability for VBA. METHODS TRACER consists of encoder layers implemented with stacked 3D convolutional long short term memory network (3D-CLSTM) followed by decoder and spatial transform layers to compute dense deformation vector field (DVF). Multiple CLSTM steps are used to compute a progressive sequence of deformations. Input conditioning was applied by including tumor segmentations with 3D image pairs as input channels. Bidirectional tumor rigidity, image similarity, and deformation smoothness losses were used to optimize the network in an unsupervised manner. TRACER and multiple DL methods were trained with 204 3D computed tomography (CT) image pairs from patients with lung cancers (LC) and evaluated using (a) Dataset I (N = 308 pairs) with DL segmented LCs, (b) Dataset II (N = 765 pairs) with manually delineated LCs, and (c) Dataset III with 42 LC patients treated with RT. RESULTS TRACER accurately aligned normal tissues. It best preserved tumors, indicated by the smallest tumor volume difference of 0.24%, 0.40%, and 0.13 % and mean square error in CT intensities of 0.005, 0.005, 0.004, computed between original and resampled moving image tumors, for Datasets I, II, and III, respectively. It resulted in the smallest planned RT tumor dose difference computed between original and resampled moving images of 0.01 and 0.013 Gy when using a female and a male reference. CONCLUSIONS TRACER is a suitable method for inter-patient registration involving LC occurring in both fixed and moving images and applicable to voxel-based analysis methods.
Collapse
Affiliation(s)
- Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Chloe Min Seo Choi
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Maria Thor
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Harini Veeraraghavan
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| |
Collapse
|
18
|
Sun H, Chen L, Li J, Yang Z, Zhu J, Wang Z, Ren G, Cai J, Zhao L. Synthesis of pseudo-PET/CT fusion images in radiotherapy based on a new transformer model. Med Phys 2024. [PMID: 39569842 DOI: 10.1002/mp.17512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 10/04/2024] [Accepted: 10/25/2024] [Indexed: 11/22/2024] Open
Abstract
BACKGROUND PET/CT and planning CT are commonly used medical images in radiotherapy for esophageal and nasopharyngeal cancer. However, repeated scans will expose patients to additional radiation doses and also introduce registration errors. This multimodal treatment approach is expected to be further improved. PURPOSE A new Transformer model is proposed to obtain pseudo-PET/CT fusion images for esophageal and nasopharyngeal cancer radiotherapy. METHODS The data of 129 cases of esophageal cancer and 141 cases of nasopharyngeal cancer were retrospectively selected for training, validation, and testing. PET and CT images are used as input. Based on the Transformer model with a "focus-disperse" attention mechanism and multi-consistency loss constraints, the feature information in two images is effectively captured. This ultimately results in the synthesis of pseudo-PET/CT fusion images with enhanced tumor region imaging. During the testing phase, the accuracy of pseudo-PET/CT fusion images was verified in anatomy and dosimetry, and two prospective cases were selected for further dose verification. RESULTS In terms of anatomical verification, the PET/CT fusion image obtained using the wavelet fusion algorithm was used as the ground truth image after correction by clinicians. The evaluation metrics, including peak signal-to-noise ratio, structural similarity index, mean absolute error, and normalized root mean square error, between the pseudo-fused images obtained based on the proposed model and ground truth, are represented by means (standard deviation). They are 37.82 (1.57), 95.23 (2.60), 29.70 (2.49), and 9.48 (0.32), respectively. These numerical values outperform those of the state-of-the-art deep learning comparative models. In terms of dosimetry validation, based on a 3%/2 mm gamma analysis, the average passing rates of global and tumor regions between the pseudo-fused images (with a PET/CT weight ratio of 2:8) and the planning CT images are 97.2% and 95.5%, respectively. These numerical outcomes are superior to those of pseudo-PET/CT fusion images with other weight ratios. CONCLUSIONS This pseudo-PET/CT fusion images obtained based on the proposed model hold promise as a new modality in the radiotherapy for esophageal and nasopharyngeal cancer.
Collapse
Affiliation(s)
- Hongfei Sun
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Liting Chen
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Jie Li
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Zhi Yang
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Jiarui Zhu
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Zhongfei Wang
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Ge Ren
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jing Cai
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Lina Zhao
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| |
Collapse
|
19
|
Ren M, Xue P, Ji H, Zhang Z, Dong E. Pulmonary CT Registration Network Based on Deformable Cross Attention. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01324-2. [PMID: 39528889 DOI: 10.1007/s10278-024-01324-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 10/26/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024]
Abstract
Current Transformer structure utilizes the self-attention mechanism to model global contextual relevance within image, which makes an impact on medical image registration. However, the use of Transformer in handling large deformation lung CT registration is relatively straightforwardly. These models only focus on single image feature representation neglecting to employ attention mechanism to capture the across image correspondence. This hinders further improvement in registration performance. To address the above limitations, we propose a novel registration method in a cascaded manner, Cascaded Swin Deformable Cross Attention Transformer based U-shape structure (SD-CATU), to address the challenge of large deformations in lung CT registration. In SD-CATU, we introduce a Cross Attention-based Transformer (CAT) block that incorporates the Shifted Regions Multihead Cross-attention (SR-MCA) mechanism to flexibly exchange feature information and thus reduce the computational complexity. Besides, a consistency constraint in the loss function is used to ensure the preservation of topology and inverse consistency of the transformations. Experiments with public lung datasets demonstrate that the Cascaded SD-CATU outperforms current state-of-the-art registration methods (Dice Similarity Coefficient of 93.19% and Target registration error of 0.98 mm). The results further highlight the potential for obtaining excellent registration accuracy while assuring desirable smoothness and consistency in the deformed images.
Collapse
Affiliation(s)
- Meirong Ren
- Shool of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264,209, China
| | - Peng Xue
- Shool of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264,209, China
| | - Huizhong Ji
- Shool of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264,209, China
| | - Zhili Zhang
- Shool of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264,209, China
| | - Enqing Dong
- Shool of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264,209, China.
| |
Collapse
|
20
|
Long L, Xue X, Xiao H. CCMNet: Cross-scale correlation-aware mapping network for 3D lung CT image registration. Comput Biol Med 2024; 182:109103. [PMID: 39244962 DOI: 10.1016/j.compbiomed.2024.109103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/04/2024] [Accepted: 09/01/2024] [Indexed: 09/10/2024]
Abstract
The lung is characterized by high elasticity and complex structure, which implies that the lung is capable of undergoing complex deformation and the shape variable is substantial. Large deformation estimation poses significant challenges to lung image registration. The traditional U-Net architecture is difficult to cover complex deformation due to its limited receptive field. Moreover, the relationship between voxels weakens as the number of downsampling times increases, that is, the long-range dependence issue. In this paper, we propose a novel multilevel registration framework which enhances the correspondence between voxels to improve the ability of estimating large deformations. Our approach consists of a convolutional neural network (CNN) with a two-stream registration structure and a cross-scale mapping attention (CSMA) mechanism. The former extracts the robust features of image pairs within layers, while the latter establishes frequent connections between layers to maintain the correlation of image pairs. This method fully utilizes the context information of different scales to establish the mapping relationship between low-resolution and high-resolution feature maps. We have achieved remarkable results on DIRLAB (TRE 1.56 ± 1.60) and POPI (NCC 99.72% SSIM 91.42%) dataset, demonstrating that this strategy can effectively address the large deformation issues, mitigate long-range dependence, and ultimately achieve more robust lung CT image registration.
Collapse
Affiliation(s)
- Li Long
- School of Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China
| | - Xufeng Xue
- School of Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China
| | - Hanguang Xiao
- School of Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China.
| |
Collapse
|
21
|
Bai X, Wang H, Qin Y, Han J, Yu N. SparseMorph: A weakly-supervised lightweight sparse transformer for mono- and multi-modal deformable image registration. Comput Biol Med 2024; 182:109205. [PMID: 39332116 DOI: 10.1016/j.compbiomed.2024.109205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/14/2024] [Accepted: 09/22/2024] [Indexed: 09/29/2024]
Abstract
PURPOSE Deformable image registration (DIR) is crucial for improving the precision of clinical diagnosis. Recent Transformer-based DIR methods have shown promising performance by capturing long-range dependencies. Nevertheless, these methods still grapple with high computational complexity. This work aims to enhance the performance of DIR in both computational efficiency and registration accuracy. METHODS We proposed a weakly-supervised lightweight Transformer model, named SparseMorph. To reduce computational complexity without compromising the representative feature capture ability, we designed a sparse multi-head self-attention (SMHA) mechanism. To accumulate representative features while preserving high computational efficiency, we constructed a multi-branch multi-layer perception (MMLP) module. Additionally, we developed an anatomically-constrained weakly-supervised strategy to guide the alignment of regions-of-interest in mono- and multi-modal images. RESULTS We assessed SparseMorph in terms of registration accuracy and computational complexity. Within the mono-modal brain datasets IXI and OASIS, our SparseMorph outperforms the state-of-the-art method TransMatch with improvements of 3.2 % and 2.9 % in DSC scores for MRI-to-CT registration tasks, respectively. Moreover, in the multi-modal cardiac dataset MMWHS, our SparseMorph shows DSC score improvements of 9.7 % and 11.4 % compared to TransMatch in MRI-to-CT and CT-to-MRI registration tasks, respectively. Notably, SparseMorph attains these performance advantages while utilizing 33.33 % of the parameters of TransMatch. CONCLUSIONS The proposed weakly-supervised deformable image registration model, SparseMorph, demonstrates efficiency in both mono- and multi-modal registration tasks, exhibiting superior performance compared to state-of-the-art algorithms, and establishing an effective DIR method for clinical applications.
Collapse
Affiliation(s)
- Xinhao Bai
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Hongpeng Wang
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Yanding Qin
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Jianda Han
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Ningbo Yu
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China.
| |
Collapse
|
22
|
Tan Z, Zhang L, Lv Y, Ma Y, Lu H. GroupMorph: Medical Image Registration via Grouping Network With Contextual Fusion. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3807-3819. [PMID: 38739510 DOI: 10.1109/tmi.2024.3400603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Pyramid-based deformation decomposition is a promising registration framework, which gradually decomposes the deformation field into multi-resolution subfields for precise registration. However, most pyramid-based methods directly produce one subfield per resolution level, which does not fully depict the spatial deformation. In this paper, we propose a novel registration model, called GroupMorph. Different from typical pyramid-based methods, we adopt the grouping-combination strategy to predict deformation field at each resolution. Specifically, we perform group-wise correlation calculation to measure the similarities of grouped features. After that, n groups of deformation subfields with different receptive fields are predicted in parallel. By composing these subfields, a deformation field with multi-receptive field ranges is formed, which can effectively identify both large and small deformations. Meanwhile, a contextual fusion module is designed to fuse the contextual features and provide the inter-group information for the field estimator of the next level. By leveraging the inter-group correspondence, the synergy among deformation subfields is enhanced. Extensive experiments on four public datasets demonstrate the effectiveness of GroupMorph. Code is available at https://github.com/TVayne/GroupMorph.
Collapse
|
23
|
Liu Y, Chen J, Zuo L, Carass A, Prince JL. Vector field attention for deformable image registration. J Med Imaging (Bellingham) 2024; 11:064001. [PMID: 39513093 PMCID: PMC11540117 DOI: 10.1117/1.jmi.11.6.064001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 09/30/2024] [Accepted: 10/16/2024] [Indexed: 11/15/2024] Open
Abstract
Purpose Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their feature maps and predict displacement or deformation fields through convolutional or fully connected layers from these high-dimensional feature maps. We present vector field attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences. Approach VFA uses neural networks to extract multi-resolution feature maps from the fixed and moving images and then retrieves pixel-level correspondences based on feature similarity. The retrieval is achieved with a novel attention module without the need for learnable parameters. VFA is trained end-to-end in either a supervised or unsupervised manner. Results We evaluated VFA for intra- and inter-modality registration and unsupervised and semi-supervised registration using public datasets as well as the Learn2Reg challenge. VFA demonstrated comparable or superior registration accuracy compared with several state-of-the-art methods. Conclusions VFA offers a novel approach to deformable image registration by directly retrieving spatial correspondences from feature maps, leading to improved performance in registration tasks. It holds potential for broader applications.
Collapse
Affiliation(s)
- Yihao Liu
- Johns Hopkins University, Department of Electrical and Computer Engineering, Baltimore, Maryland, United States
| | - Junyu Chen
- Johns Hopkins School of Medicine, Department of Radiology and Radiological Science, Baltimore, Maryland, United States
| | - Lianrui Zuo
- Johns Hopkins University, Department of Electrical and Computer Engineering, Baltimore, Maryland, United States
- National Institute on Aging, National Institute of Health, Laboratory of Behavioral Neuroscience, Baltimore, Maryland, United States
| | - Aaron Carass
- Johns Hopkins University, Department of Electrical and Computer Engineering, Baltimore, Maryland, United States
| | - Jerry L. Prince
- Johns Hopkins University, Department of Electrical and Computer Engineering, Baltimore, Maryland, United States
| |
Collapse
|
24
|
Han T, Wu J, Sheng P, Li Y, Tao Z, Qu L. Deep coupled registration and segmentation of multimodal whole-brain images. Bioinformatics 2024; 40:btae606. [PMID: 39400311 PMCID: PMC11543610 DOI: 10.1093/bioinformatics/btae606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 08/07/2024] [Accepted: 10/11/2024] [Indexed: 10/15/2024] Open
Abstract
MOTIVATION Recent brain mapping efforts are producing large-scale whole-brain images using different imaging modalities. Accurate alignment and delineation of anatomical structures in these images are essential for numerous studies. These requirements are typically modeled as two distinct tasks: registration and segmentation. However, prevailing methods, fail to fully explore and utilize the inherent correlation and complementarity between the two tasks. Furthermore, variations in brain anatomy, brightness, and texture pose another formidable challenge in designing multi-modal similarity metrics. A high-throughput approach capable of overcoming the bottleneck of multi-modal similarity metric design, while effective leveraging the highly correlated and complementary nature of two tasks is highly desirable. RESULTS We introduce a deep learning framework for joint registration and segmentation of multi-modal brain images. Under this framework, registration and segmentation tasks are deeply coupled and collaborated at two hierarchical layers. In the inner layer, we establish a strong feature-level coupling between the two tasks by learning a unified common latent feature representation. In the outer layer, we introduce a mutually supervised dual-branch network to decouple latent features and facilitate task-level collaboration between registration and segmentation. Since the latent features we designed are also modality-independent, the bottleneck of designing multi-modal similarity metric is essentially addressed. Another merit offered by this framework is the interpretability of latent features, which allows intuitive manipulation of feature learning, thereby further enhancing network training efficiency and the performance of both tasks. Extensive experiments conducted on both multi-modal and mono-modal datasets of mouse and human brains demonstrate the superiority of our method. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/tingtingup/DCRS.
Collapse
Affiliation(s)
- Tingting Han
- Ministry of Education Key Laboratory of Intelligent Computation and Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, 230601, China
| | - Jun Wu
- Ministry of Education Key Laboratory of Intelligent Computation and Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, 230601, China
| | - Pengpeng Sheng
- Ministry of Education Key Laboratory of Intelligent Computation and Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, 230601, China
| | - Yuanyuan Li
- Ministry of Education Key Laboratory of Intelligent Computation and Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, 230601, China
| | - ZaiYang Tao
- Ministry of Education Key Laboratory of Intelligent Computation and Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, 230601, China
| | - Lei Qu
- Ministry of Education Key Laboratory of Intelligent Computation and Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, 230601, China
- SEU-ALLEN Joint Center, Institute for Brain and Intelligence, Southeast University, Nanjing, Jiangsu, 210096, China
- Institute of Artiffcial Intelligence, Hefei Comprehensive National Science Center, Hefei, 231299, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, 230094, China
| |
Collapse
|
25
|
Lombardo E, Velezmoro L, Marschner SN, Rabe M, Tejero C, Papadopoulou CI, Sui Z, Reiner M, Corradini S, Belka C, Kurz C, Riboldi M, Landry G. Patient-Specific Deep Learning Tracking Framework for Real-Time 2D Target Localization in Magnetic Resonance Imaging-Guided Radiation Therapy. Int J Radiat Oncol Biol Phys 2024:S0360-3016(24)03508-9. [PMID: 39461599 DOI: 10.1016/j.ijrobp.2024.10.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 09/20/2024] [Accepted: 10/11/2024] [Indexed: 10/29/2024]
Abstract
PURPOSE We propose a tumor tracking framework for 2D cine magnetic resonance imaging (MRI) based on a pair of deep learning (DL) models relying on patient-specific (PS) training. METHODS AND MATERIALS The chosen DL models are: (1) an image registration transformer and (2) an auto-segmentation convolutional neural network (CNN). We collected over 1,400,000 cine MRI frames from 219 patients treated on a 0.35 T MRI-linac plus 7500 frames from additional 35 patients that were manually labeled and subdivided into fine-tuning, validation, and testing sets. The transformer was first trained on the unlabeled data (without segmentations). We then continued training (with segmentations) either on the fine-tuning set or for PS models based on 8 randomly selected frames from the first 5 seconds of each patient's cine MRI. The PS auto-segmentation CNN was trained from scratch with the same 8 frames for each patient, without pre-training. Furthermore, we implemented B-spline image registration as a conventional model, as well as different baselines. Output segmentations of all models were compared on the testing set using the Dice similarity coefficient, the 50% and 95% Hausdorff distance (HD50%/HD95%), and the root-mean-square-error of the target centroid in superior-inferior direction. RESULTS The PS transformer and CNN significantly outperformed all other models, achieving a median (interquartile range) dice similarity coefficient of 0.92 (0.03)/0.90 (0.04), HD50% of 1.0 (0.1)/1.0 (0.4) mm, HD95% of 3.1 (1.9)/3.8 (2.0) mm, and root-mean-square-error of the target centroid in superior-inferior direction of 0.7 (0.4)/0.9 (1.0) mm on the testing set. Their inference time was about 36/8 ms per frame and PS fine-tuning required 3 min for labeling and 8/4 min for training. The transformer was better than the CNN in 9/12 patients, the CNN better in 1/12 patients, and the 2 PS models achieved the same performance on the remaining 2/12 testing patients. CONCLUSIONS For targets in the thorax, abdomen, and pelvis, we found 2 PS DL models to provide accurate real-time target localization during MRI-guided radiotherapy.
Collapse
Affiliation(s)
- Elia Lombardo
- Department of Radiation Oncology, LMU University Hospital, LMU Munich.
| | - Laura Velezmoro
- Department of Radiation Oncology, LMU University Hospital, LMU Munich
| | | | - Moritz Rabe
- Department of Radiation Oncology, LMU University Hospital, LMU Munich
| | - Claudia Tejero
- Department of Radiation Oncology, LMU University Hospital, LMU Munich
| | | | - Zhuojie Sui
- Department of Medical Physics, Faculty of Physics, LMU Munich
| | - Michael Reiner
- Department of Radiation Oncology, LMU University Hospital, LMU Munich
| | | | - Claus Belka
- Department of Radiation Oncology, LMU University Hospital, LMU Munich; German Cancer Consortium (DKTK), Partner Site Munich, a Partnership between DKFZ and LMU University Hospital Munich; Bavarian Cancer Research Center (BZKF), Partner Site Munich, Munich, Germany
| | - Christopher Kurz
- Department of Radiation Oncology, LMU University Hospital, LMU Munich
| | - Marco Riboldi
- Department of Medical Physics, Faculty of Physics, LMU Munich
| | - Guillaume Landry
- Department of Radiation Oncology, LMU University Hospital, LMU Munich
| |
Collapse
|
26
|
Li L, Lu Z, Jiang A, Sha G, Luo Z, Xie X, Ding X. Swin Transformer-based automatic delineation of the hippocampus by MRI in hippocampus-sparing whole-brain radiotherapy. Front Neurosci 2024; 18:1441791. [PMID: 39464425 PMCID: PMC11502472 DOI: 10.3389/fnins.2024.1441791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 09/26/2024] [Indexed: 10/29/2024] Open
Abstract
Objective This study aims to develop and validate SwinHS, a deep learning-based automatic segmentation model designed for precise hippocampus delineation in patients receiving hippocampus-protected whole-brain radiotherapy. By streamlining this process, we seek to significantly improve workflow efficiency for clinicians. Methods A total of 100 three-dimensional T1-weighted MR images were collected, with 70 patients allocated for training and 30 for testing. Manual delineation of the hippocampus was performed according to RTOG0933 guidelines. The SwinHS model, which incorporates a 3D ELSA Transformer module and an sSE CNN decoder, was trained and tested on these datasets. To prove the effectiveness of SwinHS, this study compared the segmentation performance of SwinHS with that of V-Net, U-Net, ResNet and VIT. Evaluation metrics included the Dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), and Hausdorff distance (HD). Dosimetric evaluation compared radiotherapy plans generated using automatic segmentation (plan AD) versus manual hippocampus segmentation (plan MD). Results SwinHS outperformed four advanced deep learning-based models, achieving an average DSC of 0.894, a JSC of 0.817, and an HD of 3.430 mm. Dosimetric evaluation revealed that both plan (AD) and plan (MD) met treatment plan constraints for the target volume (PTV). However, the hippocampal Dmax in plan (AD) was significantly greater than that in plan (MD), approaching the 17 Gy constraint limit. Nonetheless, there were no significant differences in D100% or maximum doses to other critical structures between the two plans. Conclusion Compared with manual delineation, SwinHS demonstrated superior segmentation performance and a significantly shorter delineation time. While plan (AD) met clinical requirements, caution should be exercised regarding hippocampal Dmax. SwinHS offers a promising tool to enhance workflow efficiency and facilitate hippocampal protection in radiotherapy planning for patients with brain metastases.
Collapse
Affiliation(s)
- Liang Li
- Department of Radiotherapy, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Zhennan Lu
- Department of Equipment, Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Nanjing, China
| | - Aijun Jiang
- Department of Radiotherapy, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Guanchen Sha
- Department of Radiation Oncology, Xuzhou Central Hospital, Xuzhou, China
| | - Zhaoyang Luo
- HaiChuang Future Medical Technology Co., Ltd., Zhejiang, China
| | - Xin Xie
- Department of Radiotherapy, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Xin Ding
- Department of Radiotherapy, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
27
|
Zeng B, Wang H, Tao X, Shi H, Joskowicz L, Chen X. A bidirectional framework for fracture simulation and deformation-based restoration prediction in pelvic fracture surgical planning. Med Image Anal 2024; 97:103267. [PMID: 39053167 DOI: 10.1016/j.media.2024.103267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 07/05/2024] [Accepted: 07/05/2024] [Indexed: 07/27/2024]
Abstract
Pelvic fracture is a severe trauma with life-threatening implications. Surgical reduction is essential for restoring the anatomical structure and functional integrity of the pelvis, requiring accurate preoperative planning. However, the complexity of pelvic fractures and limited data availability necessitate labor-intensive manual corrections in a clinical setting. We describe in this paper a novel bidirectional framework for automatic pelvic fracture surgical planning based on fracture simulation and structure restoration. Our fracture simulation method accounts for patient-specific pelvic structures, bone density information, and the randomness of fractures, enabling the generation of various types of fracture cases from healthy pelvises. Based on these features and on adversarial learning, we develop a novel structure restoration network to predict the deformation mapping in CT images before and after a fracture for the precise structural reconstruction of any fracture. Furthermore, a self-supervised strategy based on pelvic anatomical symmetry priors is developed to optimize the details of the restored pelvic structure. Finally, the restored pelvis is used as a template to generate a surgical reduction plan in which the fragments are repositioned in an efficient jigsaw puzzle registration manner. Extensive experiments on simulated and clinical datasets, including scans with metal artifacts, show that our method achieves good accuracy and robustness: a mean SSIM of 90.7% for restorations, with translational errors of 2.88 mm and rotational errors of 3.18°for reductions in real datasets. Our method takes 52.9 s to complete the surgical planning in the phantom study, representing a significant acceleration compared to standard clinical workflows. Our method may facilitate effective surgical planning for pelvic fractures tailored to individual patients in clinical settings.
Collapse
Affiliation(s)
- Bolun Zeng
- Institute of Biomedical Manufacturing and Life Quality Engineering, School of Mechanical Engineering, Shanghai Jiao Tong University, China
| | - Huixiang Wang
- Department of Orthopedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xingguang Tao
- Department of Orthopedics, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Haochen Shi
- Institute of Biomedical Manufacturing and Life Quality Engineering, School of Mechanical Engineering, Shanghai Jiao Tong University, China
| | - Leo Joskowicz
- School of Computer Science and Engineering and the Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Xiaojun Chen
- Institute of Biomedical Manufacturing and Life Quality Engineering, School of Mechanical Engineering, Shanghai Jiao Tong University, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
28
|
Sha Q, Sun K, Jiang C, Xu M, Xue Z, Cao X, Shen D. Detail-preserving image warping by enforcing smooth image sampling. Neural Netw 2024; 178:106426. [PMID: 38878640 DOI: 10.1016/j.neunet.2024.106426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 04/14/2024] [Accepted: 06/01/2024] [Indexed: 08/13/2024]
Abstract
Multi-phase dynamic contrast-enhanced magnetic resonance imaging image registration makes a substantial contribution to medical image analysis. However, existing methods (e.g., VoxelMorph, CycleMorph) often encounter the problem of image information misalignment in deformable registration tasks, posing challenges to the practical application. To address this issue, we propose a novel smooth image sampling method to align full organic information to realize detail-preserving image warping. In this paper, we clarify that the phenomenon about image information mismatch is attributed to imbalanced sampling. Then, a sampling frequency map constructed by sampling frequency estimators is utilized to instruct smooth sampling by reducing the spatial gradient and discrepancy between all-ones matrix and sampling frequency map. In addition, our estimator determines the sampling frequency of a grid voxel in the moving image by aggregating the sum of interpolation weights from warped non-grid sampling points in its vicinity and vectorially constructs sampling frequency map through projection and scatteration. We evaluate the effectiveness of our approach through experiments on two in-house datasets. The results showcase that our method preserves nearly complete details with ideal registration accuracy compared with several state-of-the-art registration methods. Additionally, our method exhibits a statistically significant difference in the regularity of the registration field compared to other methods, at a significance level of p < 0.05. Our code will be released at https://github.com/QingRui-Sha/SFM.
Collapse
Affiliation(s)
- Qingrui Sha
- School of Biomedical Engineering, ShanghaiTech, Shanghai, China.
| | - Kaicong Sun
- School of Biomedical Engineering, ShanghaiTech, Shanghai, China.
| | - Caiwen Jiang
- School of Biomedical Engineering, ShanghaiTech, Shanghai, China.
| | - Mingze Xu
- School of Science and Engineering, Chinese University of Hong Kong-Shenzhen, Guangdong, China.
| | - Zhong Xue
- Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China.
| | - Xiaohuan Cao
- Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China.
| | - Dinggang Shen
- School of Biomedical Engineering, ShanghaiTech, Shanghai, China; Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China; Shanghai Clinical Research and Trial Center, Shanghai, China.
| |
Collapse
|
29
|
Zheng F, Yin P, Liang K, Liu T, Wang Y, Hao W, Hao Q, Hong N. Comparison of Different Fusion Radiomics for Predicting Benign and Malignant Sacral Tumors: A Pilot Study. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2415-2427. [PMID: 38717515 PMCID: PMC11522258 DOI: 10.1007/s10278-024-01134-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/27/2024] [Accepted: 04/29/2024] [Indexed: 10/30/2024]
Abstract
Differentiating between benign and malignant sacral tumors is crucial for determining appropriate treatment options. This study aims to develop two benchmark fusion models and a deep learning radiomic nomogram (DLRN) capable of distinguishing between benign and malignant sacral tumors using multiple imaging modalities. We reviewed axial T2-weighted imaging (T2WI) and non-contrast computed tomography (NCCT) of 134 patients pathologically confirmed as sacral tumors. The two benchmark fusion models were developed using fusion deep learning (DL) features and fusion classical machine learning (CML) features from multiple imaging modalities, employing logistic regression, K-nearest neighbor classification, and extremely randomized trees. The two benchmark models exhibiting the most robust predictive performance were merged with clinical data to formulate the DLRN. Performance assessment involved computing the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The DL benchmark fusion model demonstrated superior performance compared to the CML fusion model. The DLRN, identified as the optimal model, exhibited the highest predictive performance, achieving an accuracy of 0.889 and an AUC of 0.961 in the test sets. Calibration curves were utilized to evaluate the predictive capability of the models, and decision curve analysis (DCA) was conducted to assess the clinical net benefit of the DLR model. The DLRN could serve as a practical predictive tool, capable of distinguishing between benign and malignant sacral tumors, offering valuable information for risk counseling, and aiding in clinical treatment decisions.
Collapse
Affiliation(s)
- Fei Zheng
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China
| | - Ping Yin
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China
| | - Kewei Liang
- Intelligent Manufacturing Research Institute, Visual 3D Medical Science and Technology Development, Fengtai District, No. 186 South Fourth Ring Road West, Beijing, 100071, People's Republic of China
| | - Tao Liu
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China
| | - Yujian Wang
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China
| | - Wenhan Hao
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China
| | - Qi Hao
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China
| | - Nan Hong
- Department of Radiology, Peking University People's Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing, 100044, People's Republic of China.
| |
Collapse
|
30
|
Sun S, Han K, You C, Tang H, Kong D, Naushad J, Yan X, Ma H, Khosravi P, Duncan JS, Xie X. Medical image registration via neural fields. Med Image Anal 2024; 97:103249. [PMID: 38963972 DOI: 10.1016/j.media.2024.103249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 05/24/2024] [Accepted: 06/21/2024] [Indexed: 07/06/2024]
Abstract
Image registration is an essential step in many medical image analysis tasks. Traditional methods for image registration are primarily optimization-driven, finding the optimal deformations that maximize the similarity between two images. Recent learning-based methods, trained to directly predict transformations between two images, run much faster, but suffer from performance deficiencies due to domain shift. Here we present a new neural network based image registration framework, called NIR (Neural Image Registration), which is based on optimization but utilizes deep neural networks to model deformations between image pairs. NIR represents the transformation between two images with a continuous function implemented via neural fields, receiving a 3D coordinate as input and outputting the corresponding deformation vector. NIR provides two ways of generating deformation field: directly output a displacement vector field for general deformable registration, or output a velocity vector field and integrate the velocity field to derive the deformation field for diffeomorphic image registration. The optimal registration is discovered by updating the parameters of the neural field via stochastic mini-batch gradient descent. We describe several design choices that facilitate model optimization, including coordinate encoding, sinusoidal activation, coordinate sampling, and intensity sampling. NIR is evaluated on two 3D MR brain scan datasets, demonstrating highly competitive performance in terms of both registration accuracy and regularity. Compared to traditional optimization-based methods, our approach achieves better results in shorter computation times. In addition, our methods exhibit performance on a cross-dataset registration task, compared to the pre-trained learning-based methods.
Collapse
Affiliation(s)
- Shanlin Sun
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Kun Han
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Chenyu You
- Yale University, New Haven, CT 06520, USA.
| | - Hao Tang
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Deying Kong
- University of California, Irvine, Irvine, CA 92697, USA.
| | | | - Xiangyi Yan
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Haoyu Ma
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Pooya Khosravi
- University of California, Irvine, Irvine, CA 92697, USA.
| | | | - Xiaohui Xie
- University of California, Irvine, Irvine, CA 92697, USA.
| |
Collapse
|
31
|
Deng L, Lan Q, Yang X, Wang J, Huang S. DELR-Net: a network for 3D multimodal medical image registration in more lightweight application scenarios. Abdom Radiol (NY) 2024:10.1007/s00261-024-04602-3. [PMID: 39400589 DOI: 10.1007/s00261-024-04602-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 09/14/2024] [Accepted: 09/17/2024] [Indexed: 10/15/2024]
Abstract
PURPOSE 3D multimodal medical image deformable registration plays a significant role in medical image analysis and diagnosis. However, due to the substantial differences between images of different modalities, registration is challenging and requires high computational costs. Deep learning-based registration methods face these challenges. The primary aim of this paper is to design a 3D multimodal registration network that ensures high-quality registration results while reducing the number of parameters. METHODS This study designed a Dual-Encoder More Lightweight Registration Network (DELR-Net). DELR-Net is a low-complexity network that integrates Mamba and ConvNet. The State Space Sequence Module and the Dynamic Large Kernel block are used as the main components of the dual encoders, while the Dynamic Feature Fusion block is used as the main component of the decoder. RESULTS This study conducted experiments on 3D brain MR images and abdominal MR and CT images. Compared to existing registration methods, DELR-Net achieved better registration results while maintaining a lower number of parameters. Additionally, generalization experiments on other modalities showed that DELR-Net has superior generalization capabilities. CONCLUSION DELR-Net significantly improves the limitations of 3D multimodal medical image deformable registration, achieving better registration performance with fewer parameters.
Collapse
Affiliation(s)
- Liwei Deng
- Harbin University of Science and Technology, Harbin, China
| | - Qi Lan
- Harbin University of Science and Technology, Harbin, China
| | - Xin Yang
- Sun Yat-sen University Cancer Center, Guangzhou, China.
| | - Jing Wang
- South China Normal University, Guangzhou, China
| | - Sijuan Huang
- Sun Yat-sen University Cancer Center, Guangzhou, China.
| |
Collapse
|
32
|
Bhati D, Neha F, Amiruzzaman M. A Survey on Explainable Artificial Intelligence (XAI) Techniques for Visualizing Deep Learning Models in Medical Imaging. J Imaging 2024; 10:239. [PMID: 39452402 PMCID: PMC11508748 DOI: 10.3390/jimaging10100239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 09/14/2024] [Accepted: 09/21/2024] [Indexed: 10/26/2024] Open
Abstract
The combination of medical imaging and deep learning has significantly improved diagnostic and prognostic capabilities in the healthcare domain. Nevertheless, the inherent complexity of deep learning models poses challenges in understanding their decision-making processes. Interpretability and visualization techniques have emerged as crucial tools to unravel the black-box nature of these models, providing insights into their inner workings and enhancing trust in their predictions. This survey paper comprehensively examines various interpretation and visualization techniques applied to deep learning models in medical imaging. The paper reviews methodologies, discusses their applications, and evaluates their effectiveness in enhancing the interpretability, reliability, and clinical relevance of deep learning models in medical image analysis.
Collapse
Affiliation(s)
- Deepshikha Bhati
- Department of Computer Science, Kent State University, Kent, OH 44242, USA;
| | - Fnu Neha
- Department of Computer Science, Kent State University, Kent, OH 44242, USA;
| | - Md Amiruzzaman
- Department of Computer Science, West Chester University, West Chester, PA 19383, USA;
| |
Collapse
|
33
|
Opfer R, Krüger J, Buddenkotte T, Spies L, Behrendt F, Schippling S, Buchert R. BrainLossNet: a fast, accurate and robust method to estimate brain volume loss from longitudinal MRI. Int J Comput Assist Radiol Surg 2024; 19:1763-1771. [PMID: 38879844 PMCID: PMC11365843 DOI: 10.1007/s11548-024-03201-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/27/2024] [Indexed: 09/02/2024]
Abstract
PURPOSE MRI-derived brain volume loss (BVL) is widely used as neurodegeneration marker. SIENA is state-of-the-art for BVL measurement, but limited by long computation time. Here we propose "BrainLossNet", a convolutional neural network (CNN)-based method for BVL-estimation. METHODS BrainLossNet uses CNN-based non-linear registration of baseline(BL)/follow-up(FU) 3D-T1w-MRI pairs. BVL is computed by non-linear registration of brain parenchyma masks segmented in the BL/FU scans. The BVL estimate is corrected for image distortions using the apparent volume change of the total intracranial volume. BrainLossNet was trained on 1525 BL/FU pairs from 83 scanners. Agreement between BrainLossNet and SIENA was assessed in 225 BL/FU pairs from 94 MS patients acquired with a single scanner and 268 BL/FU pairs from 52 scanners acquired for various indications. Robustness to short-term variability of 3D-T1w-MRI was compared in 354 BL/FU pairs from a single healthy men acquired in the same session without repositioning with 116 scanners (Frequently-Traveling-Human-Phantom dataset, FTHP). RESULTS Processing time of BrainLossNet was 2-3 min. The median [interquartile range] of the SIENA-BrainLossNet BVL difference was 0.10% [- 0.18%, 0.35%] in the MS dataset, 0.08% [- 0.14%, 0.28%] in the various indications dataset. The distribution of apparent BVL in the FTHP dataset was narrower with BrainLossNet (p = 0.036; 95th percentile: 0.20% vs 0.32%). CONCLUSION BrainLossNet on average provides the same BVL estimates as SIENA, but it is significantly more robust, probably due to its built-in distortion correction. Processing time of 2-3 min makes BrainLossNet suitable for clinical routine. This can pave the way for widespread clinical use of BVL estimation from intra-scanner BL/FU pairs.
Collapse
Affiliation(s)
| | | | - Thomas Buddenkotte
- Department of Diagnostic and Interventional Radiology and Nuclear Medicine, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany
| | | | - Finn Behrendt
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology, Hamburg, Germany
| | - Sven Schippling
- Multimodal Imaging in Neuroimmunological Diseases (MINDS), University of Zurich, Zurich, Switzerland
- Neuroscience and Rare Diseases (NRD), Roche Pharma Research and Early Development (pRED), Basel, Switzerland
| | - Ralph Buchert
- Department of Diagnostic and Interventional Radiology and Nuclear Medicine, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany.
| |
Collapse
|
34
|
Kim KM, Suh M, Selvam HSMS, Tan TH, Cheon GJ, Kang KW, Lee JS. Enhancing voxel-based dosimetry accuracy with an unsupervised deep learning approach for hybrid medical image registration. Med Phys 2024; 51:6432-6444. [PMID: 38772037 DOI: 10.1002/mp.17129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/27/2024] [Accepted: 05/04/2024] [Indexed: 05/23/2024] Open
Abstract
BACKGROUND Deformable registration is required to generate a time-integrated activity (TIA) map which is essential for voxel-based dosimetry. The conventional iterative registration algorithm using anatomical images (e.g., computed tomography (CT)) could result in registration errors in functional images (e.g., single photon emission computed tomography (SPECT) or positron emission tomography (PET)). Various deep learning-based registration tools have been proposed, but studies specifically focused on the registration of serial hybrid images were not found. PURPOSE In this study, we introduce CoRX-NET, a novel unsupervised deep learning network designed for deformable registration of hybrid medical images. The CoRX-NET structure is based on the Swin-transformer (ST), allowing for the representation of complex spatial connections in images. Its self-attention mechanism aids in the effective exchange and integration of information across diverse image regions. To augment the amalgamation of SPECT and CT features, cross-stitch layers have been integrated into the network. METHODS Two different 177 Lu DOTATATE SPECT/CT datasets were acquired at different medical centers. 22 sets from Seoul National University and 14 sets from Sunway Medical Centre are used for training/internal validation and external validation respectively. The CoRX-NET architecture builds upon the ST, enabling the modeling of intricate spatial relationships within images. To further enhance the fusion of SPECT and CT features, cross-stitch layers have been incorporated within the network. The network takes a pair of SPECT/CT images (e.g., fixed and moving images) and generates a deformed SPECT/CT image. The performance of the network was compared with Elastix and TransMorph using L1 loss and structural similarity index measure (SSIM) of CT, SSIM of normalized SPECT, and local normalized cross correlation (LNCC) of SPECT as metrics. The voxel-wise root mean square errors (RMSE) of TIA were compared among the different methods. RESULTS The ablation study revealed that cross-stitch layers improved SPECT/CT registration performance. The cross-stitch layers notably enhance SSIM (internal validation: 0.9614 vs. 0.9653, external validation: 0.9159 vs. 0.9189) and LNCC of normalized SPECT images (internal validation: 0.7512 vs. 0.7670, external validation: 0.8027 vs. 0.8027). CoRX-NET with the cross-stitch layer achieved superior performance metrics compared to Elastix and TransMorph, except for CT SSIM in the external dataset. When qualitatively analyzed for both internal and external validation cases, CoRX-NET consistently demonstrated superior SPECT registration results. In addition, CoRX-NET accomplished SPECT/CT image registration in less than 6 s, whereas Elastix required approximately 50 s using the same PC's CPU. When employing CoRX-NET, it was observed that the voxel-wise RMSE values for TIA were approximately 27% lower for the kidney and 33% lower for the tumor, compared to when Elastix was used. CONCLUSION This study represents a major advancement in achieving precise SPECT/CT registration using an unsupervised deep learning network. It outperforms conventional methods like Elastix and TransMorph, reducing uncertainties in TIA maps for more accurate dose assessments.
Collapse
Affiliation(s)
- Keon Min Kim
- Interdisciplinary Program in Bioengineering, Seoul National University Graduate School, Seoul, Republic of Korea
- Integrated Major in Innovative Medical Science, Seoul National University Graduate School, Seoul, Republic of Korea
| | - Minseok Suh
- Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Radiation Medicine, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | | | - Teik Hin Tan
- Nuclear Medicine Centre, Sunway Medical Centre, Subang Jaya, Selangor, Malaysia
| | - Gi Jeong Cheon
- Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea
- Cancer Research Institute & Institute on Aging, Seoul National University, Seoul, Republic of Korea
| | - Keon Wook Kang
- Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Radiation Medicine, Medical Research Center, Seoul National University, Seoul, Republic of Korea
- Bio-MAX Institute, Seoul National University, Seoul, Republic of Korea
| | - Jae Sung Lee
- Interdisciplinary Program in Bioengineering, Seoul National University Graduate School, Seoul, Republic of Korea
- Integrated Major in Innovative Medical Science, Seoul National University Graduate School, Seoul, Republic of Korea
- Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Radiation Medicine, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
35
|
Wang X, Zhang Z, Xu S, Luo X, Zhang B, Wu XJ. Contrastive learning based method for X-ray and CT registration under surgical equipment occlusion. Comput Biol Med 2024; 180:108946. [PMID: 39106676 DOI: 10.1016/j.compbiomed.2024.108946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 08/09/2024]
Abstract
Deep learning-based 3D/2D surgical navigation registration techniques achieved excellent results. However, these methods are limited by the occlusion of surgical equipment resulting in poor accuracy. We designed a contrastive learning method that treats occluded and unoccluded X-rays as positive samples, maximizing the similarity between the positive samples and reducing interference from occlusion. The designed registration model has Transformer's residual connection (ResTrans), which enhances the long-sequence mapping capability, combined with the contrast learning strategy, ResTrans can adaptively retrieve the valid features in the global range to ensure the performance in the case of occlusion. Further, a learning-based region of interest (RoI) fine-tuning method is designed to refine the misalignment. We conducted experiments on occluded X-rays that contained different surgical devices. The experiment results show that the mean target registration error (mTRE) of ResTrans is 3.25 mm and the running time is 1.59 s. Compared with the state-of-the-art (SOTA) 3D/2D registration methods, our method offers better performance on occluded 3D/2D registration tasks.
Collapse
Affiliation(s)
- Xiyuan Wang
- School of Electronics and Information Engineering at University of Science and Technology Suzhou, SuZhou, 215009, China
| | - Zhancheng Zhang
- School of Electronics and Information Engineering at University of Science and Technology Suzhou, SuZhou, 215009, China.
| | - Shaokang Xu
- School of Electronics and Information Engineering at University of Science and Technology Suzhou, SuZhou, 215009, China; Shanghai Jirui Maestro Surgical Technology Co, ShangHai, 200000, China
| | - Xiaoqing Luo
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science at Jiangnan University, WuXi, 214122, China
| | - Baocheng Zhang
- Department of Orthopaedics, General Hospital of Central Theater Command of PLA, WuHan, 430012, China
| | - Xiao-Jun Wu
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science at Jiangnan University, WuXi, 214122, China
| |
Collapse
|
36
|
Bai X, Wang H, Qin Y, Han J, Yu N. MatchMorph: A real-time pre- and intra-operative deformable image registration framework for MRI-guided surgery. Comput Biol Med 2024; 180:108948. [PMID: 39121681 DOI: 10.1016/j.compbiomed.2024.108948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 06/27/2024] [Accepted: 07/25/2024] [Indexed: 08/12/2024]
Abstract
PURPOSE The technological advancements in surgical robots compatible with magnetic resonance imaging (MRI) have created an indispensable demand for real-time deformable image registration (DIR) of pre- and intra-operative MRI, but there is a lack of relevant methods. Challenges arise from dimensionality mismatch, resolution discrepancy, non-rigid deformation and requirement for real-time registration. METHODS In this paper, we propose a real-time DIR framework called MatchMorph, specifically designed for the registration of low-resolution local intraoperative MRI and high-resolution global preoperative MRI. Firstly, a super-resolution network based on global inference is developed to enhance the resolution of intraoperative MRI to the same as preoperative MRI, thus resolving the resolution discrepancy. Secondly, a fast-matching algorithm is designed to identify the optimal position of the intraoperative MRI within the corresponding preoperative MRI to address the dimensionality mismatch. Further, a cross-attention-based dual-stream DIR network is constructed to manipulate the deformation between pre- and intra-operative MRI, real-timely. RESULTS We conducted comprehensive experiments on publicly available datasets IXI and OASIS to evaluate the performance of the proposed MatchMorph framework. Compared to the state-of-the-art (SOTA) network TransMorph, the designed dual-stream DIR network of MatchMorph achieved superior performance with a 1.306 mm smaller HD and a 0.07 mm smaller ASD score on the IXI dataset. Furthermore, the MatchMorph framework demonstrates an inference speed of approximately 280 ms. CONCLUSIONS The qualitative and quantitative registration results obtained from high-resolution global preoperative MRI and simulated low-resolution local intraoperative MRI validated the effectiveness and efficiency of the proposed MatchMorph framework.
Collapse
Affiliation(s)
- Xinhao Bai
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Hongpeng Wang
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Yanding Qin
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Jianda Han
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China
| | - Ningbo Yu
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China; Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, Tianjin, 300350, China; Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Shenzhen, 518083, China.
| |
Collapse
|
37
|
Peng K, Zhou D, Sun K, Wang J, Deng J, Gong S. ACSwinNet: A Deep Learning-Based Rigid Registration Method for Head-Neck CT-CBCT Images in Image-Guided Radiotherapy. SENSORS (BASEL, SWITZERLAND) 2024; 24:5447. [PMID: 39205140 PMCID: PMC11359988 DOI: 10.3390/s24165447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/20/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024]
Abstract
Accurate and precise rigid registration between head-neck computed tomography (CT) and cone-beam computed tomography (CBCT) images is crucial for correcting setup errors in image-guided radiotherapy (IGRT) for head and neck tumors. However, conventional registration methods that treat the head and neck as a single entity may not achieve the necessary accuracy for the head region, which is particularly sensitive to radiation in radiotherapy. We propose ACSwinNet, a deep learning-based method for head-neck CT-CBCT rigid registration, which aims to enhance the registration precision in the head region. Our approach integrates an anatomical constraint encoder with anatomical segmentations of tissues and organs to enhance the accuracy of rigid registration in the head region. We also employ a Swin Transformer-based network for registration in cases with large initial misalignment and a perceptual similarity metric network to address intensity discrepancies and artifacts between the CT and CBCT images. We validate the proposed method using a head-neck CT-CBCT dataset acquired from clinical patients. Compared with the conventional rigid method, our method exhibits lower target registration error (TRE) for landmarks in the head region (reduced from 2.14 ± 0.45 mm to 1.82 ± 0.39 mm), higher dice similarity coefficient (DSC) (increased from 0.743 ± 0.051 to 0.755 ± 0.053), and higher structural similarity index (increased from 0.854 ± 0.044 to 0.870 ± 0.043). Our proposed method effectively addresses the challenge of low registration accuracy in the head region, which has been a limitation of conventional methods. This demonstrates significant potential in improving the accuracy of IGRT for head and neck tumors.
Collapse
Affiliation(s)
- Kuankuan Peng
- Digital Manufacturing Equipment and Technology Key National Laboratories, Huazhong University of Science and Technology, Wuhan 430074, China; (K.P.); (D.Z.); (K.S.); (J.D.)
- Huagong Manufacturing Equipment Digital National Engineering Center Co., Ltd., Wuhan 430074, China
| | - Danyu Zhou
- Digital Manufacturing Equipment and Technology Key National Laboratories, Huazhong University of Science and Technology, Wuhan 430074, China; (K.P.); (D.Z.); (K.S.); (J.D.)
| | - Kaiwen Sun
- Digital Manufacturing Equipment and Technology Key National Laboratories, Huazhong University of Science and Technology, Wuhan 430074, China; (K.P.); (D.Z.); (K.S.); (J.D.)
| | - Junfeng Wang
- Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Jianchun Deng
- Digital Manufacturing Equipment and Technology Key National Laboratories, Huazhong University of Science and Technology, Wuhan 430074, China; (K.P.); (D.Z.); (K.S.); (J.D.)
- Huagong Manufacturing Equipment Digital National Engineering Center Co., Ltd., Wuhan 430074, China
| | - Shihua Gong
- Digital Manufacturing Equipment and Technology Key National Laboratories, Huazhong University of Science and Technology, Wuhan 430074, China; (K.P.); (D.Z.); (K.S.); (J.D.)
- Huagong Manufacturing Equipment Digital National Engineering Center Co., Ltd., Wuhan 430074, China
| |
Collapse
|
38
|
Nie Q, Zhang X, Hu Y, Gong M, Liu J. Medical image registration and its application in retinal images: a review. Vis Comput Ind Biomed Art 2024; 7:21. [PMID: 39167337 PMCID: PMC11339199 DOI: 10.1186/s42492-024-00173-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 07/31/2024] [Indexed: 08/23/2024] Open
Abstract
Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, they have not systematically summarized the existing medical image registration methods. To this end, a comprehensive review of these methods is provided from traditional and deep-learning-based perspectives, aiming to help audiences quickly understand the development of medical image registration. In particular, we review recent advances in retinal image registration, which has not attracted much attention. In addition, current challenges in retinal image registration are discussed and insights and prospects for future research provided.
Collapse
Affiliation(s)
- Qiushi Nie
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiaoqing Zhang
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
- Center for High Performance Computing and Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yan Hu
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Mingdao Gong
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jiang Liu
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China.
- Singapore Eye Research Institute, Singapore, 169856, Singapore.
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| |
Collapse
|
39
|
Zhu Z, Li Q, Wei Y, Song R. Hierarchical multi-level dynamic hyperparameter deformable image registration with convolutional neural network. Phys Med Biol 2024; 69:175007. [PMID: 39053510 DOI: 10.1088/1361-6560/ad67a6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 07/25/2024] [Indexed: 07/27/2024]
Abstract
Objective. To enable the registration network to be trained only once, achieving fast regularization hyperparameter selection during the inference phase, and to improve registration accuracy and deformation field regularity.Approach. Hyperparameter tuning is an essential process for deep learning deformable image registration (DLDIR). Most DLDIR methods usually perform a large number of independent experiments to select the appropriate regularization hyperparameters, which are time-consuming and resource-consuming. To address this issue, we propose a novel dynamic hyperparameter block, which comprises a distributed mapping network, dynamic convolution, attention feature extraction layer, and instance normalization layer. The dynamic hyperparameter block encodes the input feature vectors and regularization hyperparameters into learnable feature variables and dynamic convolution parameters which changes the feature statistics of the high-dimensional features layer feature variables, respectively. In addition, the proposed method replaced the single-level structure residual blocks in LapIRN with a hierarchical multi-level architecture for the dynamic hyperparameter block in order to improve registration performance.Main results. On the OASIS dataset, the proposed method reduced the percentage of|Jϕ|⩽0by 28.01%, 9.78%and improved Dice similarity coefficient by 1.17%, 1.17%, compared with LapIRN and CIR, respectively. On the DIR-Lab dataset, the proposed method reduced the percentage of|Jϕ|⩽0by 10.00%, 5.70%and reduced target registration error by 10.84%, 10.05%, compared with LapIRN and CIR, respectively.Significance. The proposed method can fast achieve the corresponding registration deformation field for arbitrary hyperparameter value during the inference phase. Extensive experiments demonstrate that the proposed method reduces training time compared to DLDIR with fixed regularization hyperparameters while outperforming the state-of-the-art registration methods concerning registration accuracy and deformation smoothness on brain dataset OASIS and lung dataset DIR-Lab.
Collapse
Affiliation(s)
- Zhenyu Zhu
- School of Control Science and Engineering, Shandong University, Jinan, People's Republic of China
| | - Qianqian Li
- School and Hospital of Stomatology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China
| | - Ying Wei
- School of Control Science and Engineering, Shandong University, Jinan, People's Republic of China
- Shandong Research Institute of Industrial Technology, Jinan, People's Republic of China
| | - Rui Song
- School of Control Science and Engineering, Shandong University, Jinan, People's Republic of China
- Shandong Research Institute of Industrial Technology, Jinan, People's Republic of China
| |
Collapse
|
40
|
Cui X, Xu H, Liu J, Tian Z, Yang J. NCNet: Deformable medical image registration network based on neighborhood cross-attention combined with multi-resolution constraints. Biomed Phys Eng Express 2024; 10:055023. [PMID: 39084234 DOI: 10.1088/2057-1976/ad6992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 07/31/2024] [Indexed: 08/02/2024]
Abstract
Objective. Existing registration networks based on cross-attention design usually divide the image pairs to be registered into patches for input. The division and merging operations of a series of patches are difficult to maintain the topology of the deformation field and reduce the interpretability of the network. Therefore, our goal is to develop a new network architecture based on a cross-attention mechanism combined with a multi-resolution strategy to improve the accuracy and interpretability of medical image registration.Approach. We propose a new deformable image registration network NCNet based on neighborhood cross-attention combined with multi-resolution strategy. The network structure mainly consists of a multi-resolution feature encoder, a multi-head neighborhood cross-attention module and a registration decoder. The hierarchical feature extraction capability of our encoder is improved by introducing large kernel parallel convolution blocks; the cross-attention module based on neighborhood calculation is used to reduce the impact on the topology of the deformation field and double normalization is used to reduce its computational complexity.Main result. We performed atlas-based registration and inter-subject registration tasks on the public 3D brain magnetic resonance imaging datasets LPBA40 and IXI respectively. Compared with the popular VoxelMorph method, our method improves the average DSC value by 7.9% and 3.6% on LPBA40 and IXI. Compared with the popular TransMorph method, our method improves the average DSC value by 4.9% and 1.3% on LPBA40 and IXI.Significance. We proved the advantages of the neighborhood attention calculation method compared to the window attention calculation method based on partitioning patches, and analyzed the impact of the pyramid feature encoder and double normalization on network performance. This has made a valuable contribution to promoting the further development of medical image registration methods.
Collapse
Affiliation(s)
- Xinxin Cui
- School of Medical Information Engineering, Gansu University of Traditional Chinese Medicine, Lanzhou, 730000, People's Republic of China
| | - Hao Xu
- School of Medical Information Engineering, Gansu University of Traditional Chinese Medicine, Lanzhou, 730000, People's Republic of China
| | - Jing Liu
- School of Medical Information Engineering, Gansu University of Traditional Chinese Medicine, Lanzhou, 730000, People's Republic of China
| | - Zhenyu Tian
- School of Medical Information Engineering, Gansu University of Traditional Chinese Medicine, Lanzhou, 730000, People's Republic of China
| | - Jianlan Yang
- School of Medical Information Engineering, Gansu University of Traditional Chinese Medicine, Lanzhou, 730000, People's Republic of China
- Orthopedic Traumatology Hospital, Quanzhou, Fujian, 362000, People's Republic of China
| |
Collapse
|
41
|
Ren X, Song H, Zhang Z, Yang T. MSRA-Net: multi-channel semantic-aware and residual attention mechanism network for unsupervised 3D image registration. Phys Med Biol 2024; 69:165011. [PMID: 39047770 DOI: 10.1088/1361-6560/ad6741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 07/23/2024] [Indexed: 07/27/2024]
Abstract
Objective. Convolutional neural network (CNN) is developing rapidly in the field of medical image registration, and the proposed U-Net further improves the precision of registration. However, this method may discard certain important information in the process of encoding and decoding steps, consequently leading to a decline in accuracy. To solve this problem, a multi-channel semantic-aware and residual attention mechanism network (MSRA-Net) is proposed in this paper.Approach. Our proposed network achieves efficient information aggregation by cleverly extracting the features of different channels. Firstly, a context-aware module (CAM) is designed to extract valuable contextual information. And the depth-wise separable convolution is employed in the CAM to alleviate the computational burden. Then, a new multi-channel semantic-aware module (MCSAM) is designed for more comprehensive fusion of up-sampling features. Additionally, the residual attention module is introduced in the up-sampling process to extract more semantic information and minimize information loss.Main results. This study utilizes Dice score, average symmetric surface distance and negative Jacobian determinant evaluation metrics to evaluate the influence of registration. The experimental results demonstrate that our proposed MSRA-Net has the highest accuracy compared to several state-of-the-art methods. Moreover, our network has demonstrated the highest Dice score across multiple datasets, thereby indicating that the superior generalization capabilities of our model.Significance. The proposed MSRA-Net offers a novel approach to improve medical image registration accuracy, with implications for various clinical applications. Our implementation is available athttps://github.com/shy922/MSRA-Net.
Collapse
Affiliation(s)
- Xiaozhen Ren
- Key Laboratory of Grain Information Processing and Control, Henan University of Technology, Ministry of Education, Zhengzhou 450001, People's Republic of China
- Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, People's Republic of China
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| | - Haoyuan Song
- Key Laboratory of Grain Information Processing and Control, Henan University of Technology, Ministry of Education, Zhengzhou 450001, People's Republic of China
- Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, People's Republic of China
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| | - Zihao Zhang
- Key Laboratory of Grain Information Processing and Control, Henan University of Technology, Ministry of Education, Zhengzhou 450001, People's Republic of China
- Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, People's Republic of China
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| | - Tiejun Yang
- Key Laboratory of Grain Information Processing and Control, Henan University of Technology, Ministry of Education, Zhengzhou 450001, People's Republic of China
- Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, People's Republic of China
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| |
Collapse
|
42
|
Hua Y, Xu K, Yang X. Variational image registration with learned prior using multi-stage VAEs. Comput Biol Med 2024; 178:108785. [PMID: 38925089 DOI: 10.1016/j.compbiomed.2024.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 05/16/2024] [Accepted: 06/15/2024] [Indexed: 06/28/2024]
Abstract
Variational Autoencoders (VAEs) are an efficient variational inference technique coupled with the generated network. Due to the uncertainty provided by variational inference, VAEs have been applied in medical image registration. However, a critical problem in VAEs is that the simple prior cannot provide suitable regularization, which leads to the mismatch between the variational posterior and prior. An optimal prior can close the gap between the evidence's real and variational posterior. In this paper, we propose a multi-stage VAE to learn the optimal prior, which is the aggregated posterior. A lightweight VAE is used to generate the aggregated posterior as a whole. It is an effective way to estimate the distribution of the high-dimensional aggregated posterior that commonly exists in medical image registration based on VAEs. A factorized telescoping classifier is trained to estimate the density ratio of a simple given prior and aggregated posterior, aiming to calculate the KL divergence between the variational and aggregated posterior more accurately. We analyze the KL divergence and find that the finer the factorization, the smaller the KL divergence is. However, too fine a partition is not conducive to registration accuracy. Moreover, the diagonal hypothesis of the variational posterior's covariance ignores the relationship between latent variables in image registration. To address this issue, we learn a covariance matrix with low-rank information to enable correlations with each dimension of the variational posterior. The covariance matrix is further used as a measure to reduce the uncertainty of deformation fields. Experimental results on four public medical image datasets demonstrate that our proposed method outperforms other methods in negative log-likelihood (NLL) and achieves better registration accuracy.
Collapse
Affiliation(s)
- Yong Hua
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, Guangdong, China
| | - Kangrong Xu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, Guangdong, China
| | - Xuan Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, Guangdong, China.
| |
Collapse
|
43
|
Yuan W, Cheng J, Gong Y, He L, Zhang J. MACG-Net: Multi-axis cross gating network for deformable medical image registration. Comput Biol Med 2024; 178:108673. [PMID: 38905891 DOI: 10.1016/j.compbiomed.2024.108673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 04/18/2024] [Accepted: 05/26/2024] [Indexed: 06/23/2024]
Abstract
Deformable Image registration is a fundamental yet vital task for preoperative planning, intraoperative information fusion, disease diagnosis and follow-ups. It solves the non-rigid deformation field to align an image pair. Latest approaches such as VoxelMorph and TransMorph compute features from a simple concatenation of moving and fixed images. However, this often leads to weak alignment. Moreover, the convolutional neural network (CNN) or the hybrid CNN-Transformer based backbones are constrained to have limited sizes of receptive field and cannot capture long range relations while full Transformer based approaches are computational expensive. In this paper, we propose a novel multi-axis cross grating network (MACG-Net) for deformable medical image registration, which combats these limitations. MACG-Net uses a dual stream multi-axis feature fusion module to capture both long-range and local context relationships from the moving and fixed images. Cross gate blocks are integrated with the dual stream backbone to consider both independent feature extractions in the moving-fixed image pair and the relationship between features from the image pair. We benchmark our method on several different datasets including 3D atlas-based brain MRI, inter-patient brain MRI and 2D cardiac MRI. The results demonstrate that the proposed method has achieved state-of-the-art performance. The source code has been released at https://github.com/Valeyards/MACG.
Collapse
Affiliation(s)
- Wei Yuan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jun Cheng
- Institute for Infocomm Research, Agency for Science, Technology and Research, 138632, Singapore
| | - Yuhang Gong
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Ling He
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
| | - Jing Zhang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
44
|
Pham XL, Luu MH, van Walsum T, Mai HS, Klein S, Le NH, Chu DT. CMAN: Cascaded Multi-scale Spatial Channel Attention-guided Network for large 3D deformable registration of liver CT images. Med Image Anal 2024; 96:103212. [PMID: 38830326 DOI: 10.1016/j.media.2024.103212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 03/27/2024] [Accepted: 05/17/2024] [Indexed: 06/05/2024]
Abstract
Deformable image registration is an essential component of medical image analysis and plays an irreplaceable role in clinical practice. In recent years, deep learning-based registration methods have demonstrated significant improvements in convenience, robustness and execution time compared to traditional algorithms. However, registering images with large displacements, such as those of the liver organ, remains underexplored and challenging. In this study, we present a novel convolutional neural network (CNN)-based unsupervised learning registration method, Cascaded Multi-scale Spatial-Channel Attention-guided Network (CMAN), which addresses the challenge of large deformation fields using a double coarse-to-fine registration approach. The main contributions of CMAN include: (i) local coarse-to-fine registration in the base network, which generates the displacement field for each resolution and progressively propagates these local deformations as auxiliary information for the final deformation field; (ii) global coarse-to-fine registration, which stacks multiple base networks for sequential warping, thereby incorporating richer multi-layer contextual details into the final deformation field; (iii) integration of the spatial-channel attention module in the decoder stage, which better highlights important features and improves the quality of feature maps. The proposed network was trained using two public datasets and evaluated on another public dataset as well as a private dataset across several experimental scenarios. We compared CMAN with four state-of-the-art CNN-based registration methods and two well-known traditional algorithms. The results show that the proposed double coarse-to-fine registration strategy outperforms other methods in most registration evaluation metrics. In conclusion, CMAN can effectively handle the large-deformation registration problem and show potential for application in clinical practice. The source code is made publicly available at https://github.com/LocPham263/CMAN.git.
Collapse
Affiliation(s)
- Xuan Loc Pham
- FET, VNU University of Engineering and Technology, Hanoi, Viet Nam; Diagnostic Image Analysis Group, Radboud UMC, Nijmegen, The Netherlands
| | - Manh Ha Luu
- FET, VNU University of Engineering and Technology, Hanoi, Viet Nam; AVITECH, VNU University of Engineering and Technology, Hanoi, Viet Nam; Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands.
| | - Theo van Walsum
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Hong Son Mai
- Department of Nuclear Medicine, Hospital 108, Hanoi, Viet Nam
| | - Stefan Klein
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Ngoc Ha Le
- Department of Nuclear Medicine, Hospital 108, Hanoi, Viet Nam
| | - Duc Trinh Chu
- FET, VNU University of Engineering and Technology, Hanoi, Viet Nam
| |
Collapse
|
45
|
Ghoul A, Pan J, Lingg A, Kubler J, Krumm P, Hammernik K, Rueckert D, Gatidis S, Kustner T. Attention-Aware Non-Rigid Image Registration for Accelerated MR Imaging. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3013-3026. [PMID: 39088484 DOI: 10.1109/tmi.2024.3385024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]
Abstract
Accurate motion estimation at high acceleration factors enables rapid motion-compensated reconstruction in Magnetic Resonance Imaging (MRI) without compromising the diagnostic image quality. In this work, we introduce an attention-aware deep learning-based framework that can perform non-rigid pairwise registration for fully sampled and accelerated MRI. We extract local visual representations to build similarity maps between the registered image pairs at multiple resolution levels and additionally leverage long-range contextual information using a transformer-based module to alleviate ambiguities in the presence of artifacts caused by undersampling. We combine local and global dependencies to perform simultaneous coarse and fine motion estimation. The proposed method was evaluated on in-house acquired fully sampled and accelerated data of 101 patients and 62 healthy subjects undergoing cardiac and thoracic MRI. The impact of motion estimation accuracy on the downstream task of motion-compensated reconstruction was analyzed. We demonstrate that our model derives reliable and consistent motion fields across different sampling trajectories (Cartesian and radial) and acceleration factors of up to 16x for cardiac motion and 30x for respiratory motion and achieves superior image quality in motion-compensated reconstruction qualitatively and quantitatively compared to conventional and recent deep learning-based approaches. The code is publicly available at https://github.com/lab-midas/GMARAFT.
Collapse
|
46
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
47
|
Hernandez M, Ramon Julvez U. Insights into traditional Large Deformation Diffeomorphic Metric Mapping and unsupervised deep-learning for diffeomorphic registration and their evaluation. Comput Biol Med 2024; 178:108761. [PMID: 38908357 DOI: 10.1016/j.compbiomed.2024.108761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 06/04/2024] [Accepted: 06/13/2024] [Indexed: 06/24/2024]
Abstract
This paper explores the connections between traditional Large Deformation Diffeomorphic Metric Mapping methods and unsupervised deep-learning approaches for non-rigid registration, particularly emphasizing diffeomorphic registration. The study provides useful insights and establishes connections between the methods, thereby facilitating a profound understanding of the methodological landscape. The methods considered in our study are extensively evaluated in T1w MRI images using traditional NIREP and Learn2Reg OASIS evaluation protocols with a focus on fairness, to establish equitable benchmarks and facilitate informed comparisons. Through a comprehensive analysis of the results, we address key questions, including the intricate relationship between accuracy and transformation quality in performance, the disentanglement of the influence of registration ingredients on performance, and the determination of benchmark methods and baselines. We offer valuable insights into the strengths and limitations of both traditional and deep-learning methods, shedding light on their comparative performance and guiding future advancements in the field.
Collapse
Affiliation(s)
- Monica Hernandez
- Computer Science Department, University of Zaragoza, Spain; Aragon Institute on Engineering Research, Spain.
| | - Ubaldo Ramon Julvez
- Computer Science Department, University of Zaragoza, Spain; Aragon Institute on Engineering Research, Spain
| |
Collapse
|
48
|
AlMohimeed A, Shehata M, El-Rashidy N, Mostafa S, Samy Talaat A, Saleh H. ViT-PSO-SVM: Cervical Cancer Predication Based on Integrating Vision Transformer with Particle Swarm Optimization and Support Vector Machine. Bioengineering (Basel) 2024; 11:729. [PMID: 39061811 PMCID: PMC11273508 DOI: 10.3390/bioengineering11070729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/10/2024] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Cervical cancer (CCa) is the fourth most prevalent and common cancer affecting women worldwide, with increasing incidence and mortality rates. Hence, early detection of CCa plays a crucial role in improving outcomes. Non-invasive imaging procedures with good diagnostic performance are desirable and have the potential to lessen the degree of intervention associated with the gold standard, biopsy. Recently, artificial intelligence-based diagnostic models such as Vision Transformers (ViT) have shown promising performance in image classification tasks, rivaling or surpassing traditional convolutional neural networks (CNNs). This paper studies the effect of applying a ViT to predict CCa using different image benchmark datasets. A newly developed approach (ViT-PSO-SVM) was presented for boosting the results of the ViT based on integrating the ViT with particle swarm optimization (PSO), and support vector machine (SVM). First, the proposed framework extracts features from the Vision Transformer. Then, PSO is used to reduce the complexity of extracted features and optimize feature representation. Finally, a softmax classification layer is replaced with an SVM classification model to precisely predict CCa. The models are evaluated using two benchmark cervical cell image datasets, namely SipakMed and Herlev, with different classification scenarios: two, three, and five classes. The proposed approach achieved 99.112% accuracy and 99.113% F1-score for SipakMed with two classes and achieved 97.778% accuracy and 97.805% F1-score for Herlev with two classes outperforming other Vision Transformers, CNN models, and pre-trained models. Finally, GradCAM is used as an explainable artificial intelligence (XAI) tool to visualize and understand the regions of a given image that are important for a model's prediction. The obtained experimental results demonstrate the feasibility and efficacy of the developed ViT-PSO-SVM approach and hold the promise of providing a robust, reliable, accurate, and non-invasive diagnostic tool that will lead to improved healthcare outcomes worldwide.
Collapse
Affiliation(s)
- Abdulaziz AlMohimeed
- College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia;
| | - Mohamed Shehata
- Bioengineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA
| | - Nora El-Rashidy
- Machine Learning and Information Retrieval Department, Faculty of Artificial Intelligence, Kafrelsheiksh University, Kafrelsheiksh 13518, Egypt;
| | - Sherif Mostafa
- Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada 84511, Egypt;
| | - Amira Samy Talaat
- Computers and Systems Department, Electronics Research Institute, Cairo 12622, Egypt;
| | - Hager Saleh
- Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada 84511, Egypt;
- Insight SFI Research Centre for Data Analytics, Galway University, H91 TK33 Galway, Ireland
- Research Development, Atlantic Technological University, Letterkenny, H91 AH5K Donegal, Ireland
| |
Collapse
|
49
|
Zhang J, Xie X, Cheng X, Li T, Zhong J, Hu X, Sun L, Yan H. Deep learning-based deformable image registration with bilateral pyramid to align pre-operative and follow-up magnetic resonance imaging (MRI) scans. Quant Imaging Med Surg 2024; 14:4779-4791. [PMID: 39022247 PMCID: PMC11250335 DOI: 10.21037/qims-23-1821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 05/23/2024] [Indexed: 07/20/2024]
Abstract
Background The evaluation of brain tumor recurrence after surgery is based on the comparison between tumor regions on pre-operative and follow-up magnetic resonance imaging (MRI) scans in clinical practice. Accurate alignment of MRI scans is important in this evaluation process. However, existing methods often fail to yield accurate alignment due to substantial appearance and shape changes of tumor regions. The study aimed to improve this misalignment situation through multimodal information and compensation for shape changes. Methods In this work, a deep learning-based deformation registration method using bilateral pyramid to create multi-scale image features was developed. Moreover, morphology operations were employed to build correspondence between the surgical resection on the follow-up and pre-operative MRI scans. Results Compared with baseline methods, the proposed method achieved the lowest mean absolute error of 1.82 mm on the public BraTS-Reg 2022 dataset. Conclusions The results suggest that the proposed method is potentially useful for evaluating tumor recurrence after surgery. We effectively verified its ability to extract and integrate the information of the second modality, and also revealed the micro representation of tumor recurrence. This study can assist doctors in registering multiple sequence images of patients, observing lesions and surrounding areas, analyzing and processing them, and guiding doctors in their treatment plans.
Collapse
Affiliation(s)
- Jingjing Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Electrical Engineering and Automation, Anhui University, Hefei, China
| | - Xin Xie
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
| | - Xuebin Cheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Electrical Engineering and Automation, Anhui University, Hefei, China
| | - Teng Li
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Electrical Engineering and Automation, Anhui University, Hefei, China
| | - Jinqin Zhong
- School of Internet, Anhui University, Hefei, China
| | - Xiaokun Hu
- Interventional Medicine Center, Affiliated Hospital of Qingdao University, Qingdao, China
| | - Lu Sun
- Department of Oncology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Hui Yan
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
50
|
Chaudhary MFA, Gerard SE, Christensen GE, Cooper CB, Schroeder JD, Hoffman EA, Reinhardt JM. LungViT: Ensembling Cascade of Texture Sensitive Hierarchical Vision Transformers for Cross-Volume Chest CT Image-to-Image Translation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2448-2465. [PMID: 38373126 PMCID: PMC11227912 DOI: 10.1109/tmi.2024.3367321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Chest computed tomography (CT) at inspiration is often complemented by an expiratory CT to identify peripheral airways disease. Additionally, co-registered inspiratory-expiratory volumes can be used to derive various markers of lung function. Expiratory CT scans, however, may not be acquired due to dose or scan time considerations or may be inadequate due to motion or insufficient exhale; leading to a missed opportunity to evaluate underlying small airways disease. Here, we propose LungViT- a generative adversarial learning approach using hierarchical vision transformers for translating inspiratory CT intensities to corresponding expiratory CT intensities. LungViT addresses several limitations of the traditional generative models including slicewise discontinuities, limited size of generated volumes, and their inability to model texture transfer at volumetric level. We propose a shifted-window hierarchical vision transformer architecture with squeeze-and-excitation decoder blocks for modeling dependencies between features. We also propose a multiview texture similarity distance metric for texture and style transfer in 3D. To incorporate global information into the training process and refine the output of our model, we use ensemble cascading. LungViT is able to generate large 3D volumes of size 320×320×320 . We train and validate our model using a diverse cohort of 1500 subjects with varying disease severity. To assess model generalizability beyond the development set biases, we evaluate our model on an out-of-distribution external validation set of 200 subjects. Clinical validation on internal and external testing sets shows that synthetic volumes could be reliably adopted for deriving clinical endpoints of chronic obstructive pulmonary disease.
Collapse
|