1
|
Nernekli K, Persad AR, Hori YS, Yener U, Celtikci E, Sahin MC, Sozer A, Sozer B, Park DJ, Chang SD. Automatic Segmentation of Vestibular Schwannomas: A Systematic Review. World Neurosurg 2024; 188:35-44. [PMID: 38685346 DOI: 10.1016/j.wneu.2024.04.145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 04/23/2024] [Indexed: 05/02/2024]
Abstract
BACKGROUND Vestibular schwannomas (VSs) are benign tumors often monitored over time, with measurement techniques for assessing growth rates subject to significant interobserver variability. Automatic segmentation of these tumors could provide a more reliable and efficient for tracking their progression, especially given the irregular shape and growth patterns of VS. METHODS Various studies and segmentation techniques employing different Convolutional Neural Network architectures and models, such as U-Net and convolutional-attention transformer segmentation, were analyzed. Models were evaluated based on their performance across diverse datasets, and challenges, including domain shift and data sharing, were scrutinized. RESULTS Automatic segmentation methods offer a promising alternative to conventional measurement techniques, offering potential benefits in precision and efficiency. However, these methods are not without challenges, notably the "domain shift" that occurs when models trained on specific datasets underperform when applied to different datasets. Techniques such as domain adaptation, domain generalization, and data diversity were discussed as potential solutions. CONCLUSIONS Accurate measurement of VS growth is a complex process, with volumetric analysis currently appearing more reliable than linear measurements. Automatic segmentation, despite its challenges, offers a promising avenue for future investigation. Robust well-generalized models could potentially improve the efficiency of tracking tumor growth, thereby augmenting clinical decision-making. Further work needs to be done to develop more robust models, address the domain shift, and enable secure data sharing for wider applicability.
Collapse
Affiliation(s)
- Kerem Nernekli
- Department of Radiology, Stanford University School of Medicine, Stanford, California, USA
| | - Amit R Persad
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California, USA
| | - Yusuke S Hori
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California, USA
| | - Ulas Yener
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California, USA
| | - Emrah Celtikci
- Department of Neurosurgery, Gazi University, Ankara, Turkey
| | | | - Alperen Sozer
- Department of Neurosurgery, Gazi University, Ankara, Turkey
| | - Batuhan Sozer
- Department of Neurosurgery, Gazi University, Ankara, Turkey
| | - David J Park
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California, USA.
| | - Steven D Chang
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
2
|
Kujawa A, Dorent R, Connor S, Thomson S, Ivory M, Vahedi A, Guilhem E, Wijethilake N, Bradford R, Kitchen N, Bisdas S, Ourselin S, Vercauteren T, Shapey J. Deep learning for automatic segmentation of vestibular schwannoma: a retrospective study from multi-center routine MRI. Front Comput Neurosci 2024; 18:1365727. [PMID: 38784680 PMCID: PMC11111906 DOI: 10.3389/fncom.2024.1365727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Automatic segmentation of vestibular schwannoma (VS) from routine clinical MRI has potential to improve clinical workflow, facilitate treatment decisions, and assist patient management. Previous work demonstrated reliable automatic segmentation performance on datasets of standardized MRI images acquired for stereotactic surgery planning. However, diagnostic clinical datasets are generally more diverse and pose a larger challenge to automatic segmentation algorithms, especially when post-operative images are included. In this work, we show for the first time that automatic segmentation of VS on routine MRI datasets is also possible with high accuracy. We acquired and publicly release a curated multi-center routine clinical (MC-RC) dataset of 160 patients with a single sporadic VS. For each patient up to three longitudinal MRI exams with contrast-enhanced T1-weighted (ceT1w) (n = 124) and T2-weighted (T2w) (n = 363) images were included and the VS manually annotated. Segmentations were produced and verified in an iterative process: (1) initial segmentations by a specialized company; (2) review by one of three trained radiologists; and (3) validation by an expert team. Inter- and intra-observer reliability experiments were performed on a subset of the dataset. A state-of-the-art deep learning framework was used to train segmentation models for VS. Model performance was evaluated on a MC-RC hold-out testing set, another public VS datasets, and a partially public dataset. The generalizability and robustness of the VS deep learning segmentation models increased significantly when trained on the MC-RC dataset. Dice similarity coefficients (DSC) achieved by our model are comparable to those achieved by trained radiologists in the inter-observer experiment. On the MC-RC testing set, median DSCs were 86.2(9.5) for ceT1w, 89.4(7.0) for T2w, and 86.4(8.6) for combined ceT1w+T2w input images. On another public dataset acquired for Gamma Knife stereotactic radiosurgery our model achieved median DSCs of 95.3(2.9), 92.8(3.8), and 95.5(3.3), respectively. In contrast, models trained on the Gamma Knife dataset did not generalize well as illustrated by significant underperformance on the MC-RC routine MRI dataset, highlighting the importance of data variability in the development of robust VS segmentation models. The MC-RC dataset and all trained deep learning models were made available online.
Collapse
Affiliation(s)
- Aaron Kujawa
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Reuben Dorent
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Steve Connor
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
- Department of Neuroradiology, King's College Hospital, London, United Kingdom
- Department of Radiology, Guy's and St Thomas' Hospital, London, United Kingdom
| | - Suki Thomson
- Department of Neuroradiology, King's College Hospital, London, United Kingdom
| | - Marina Ivory
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Ali Vahedi
- Department of Neuroradiology, King's College Hospital, London, United Kingdom
| | - Emily Guilhem
- Department of Neuroradiology, King's College Hospital, London, United Kingdom
| | - Navodini Wijethilake
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Robert Bradford
- Queen Square Radiosurgery Centre (Gamma Knife), National Hospital for Neurology and Neurosurgery, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Neil Kitchen
- Queen Square Radiosurgery Centre (Gamma Knife), National Hospital for Neurology and Neurosurgery, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Sotirios Bisdas
- Department of Neuroradiology, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Sebastien Ourselin
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Tom Vercauteren
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Jonathan Shapey
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
- Department of Neurosurgery, King's College Hospital, London, United Kingdom
| |
Collapse
|
3
|
Hussain D, Al-Masni MA, Aslam M, Sadeghi-Niaraki A, Hussain J, Gu YH, Naqvi RA. Revolutionizing tumor detection and classification in multimodality imaging based on deep learning approaches: methods, applications and limitations. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024:XST230429. [PMID: 38701131 DOI: 10.3233/xst-230429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
BACKGROUND The emergence of deep learning (DL) techniques has revolutionized tumor detection and classification in medical imaging, with multimodal medical imaging (MMI) gaining recognition for its precision in diagnosis, treatment, and progression tracking. OBJECTIVE This review comprehensively examines DL methods in transforming tumor detection and classification across MMI modalities, aiming to provide insights into advancements, limitations, and key challenges for further progress. METHODS Systematic literature analysis identifies DL studies for tumor detection and classification, outlining methodologies including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and their variants. Integration of multimodality imaging enhances accuracy and robustness. RESULTS Recent advancements in DL-based MMI evaluation methods are surveyed, focusing on tumor detection and classification tasks. Various DL approaches, including CNNs, YOLO, Siamese Networks, Fusion-Based Models, Attention-Based Models, and Generative Adversarial Networks, are discussed with emphasis on PET-MRI, PET-CT, and SPECT-CT. FUTURE DIRECTIONS The review outlines emerging trends and future directions in DL-based tumor analysis, aiming to guide researchers and clinicians toward more effective diagnosis and prognosis. Continued innovation and collaboration are stressed in this rapidly evolving domain. CONCLUSION Conclusions drawn from literature analysis underscore the efficacy of DL approaches in tumor detection and classification, highlighting their potential to address challenges in MMI analysis and their implications for clinical practice.
Collapse
Affiliation(s)
- Dildar Hussain
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Republic of Korea
| | - Mohammed A Al-Masni
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Republic of Korea
| | - Muhammad Aslam
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Republic of Korea
| | - Abolghasem Sadeghi-Niaraki
- Department of Computer Science & Engineering and Convergence Engineering for Intelligent Drone, XR Research Center, Sejong University, Seoul, Republic of Korea
| | - Jamil Hussain
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Republic of Korea
| | - Yeong Hyeon Gu
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Republic of Korea
| | - Rizwan Ali Naqvi
- Department of Intelligent Mechatronics Engineering, Sejong University, Seoul, Republic of Korea
| |
Collapse
|
4
|
Han L, Tan T, Zhang T, Huang Y, Wang X, Gao Y, Teuwen J, Mann R. Synthesis-based imaging-differentiation representation learning for multi-sequence 3D/4D MRI. Med Image Anal 2024; 92:103044. [PMID: 38043455 DOI: 10.1016/j.media.2023.103044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 10/14/2023] [Accepted: 11/24/2023] [Indexed: 12/05/2023]
Abstract
Multi-sequence MRIs can be necessary for reliable diagnosis in clinical practice due to the complimentary information within sequences. However, redundant information exists across sequences, which interferes with mining efficient representations by learning-based models. To handle various clinical scenarios, we propose a sequence-to-sequence generation framework (Seq2Seq) for imaging-differentiation representation learning. In this study, not only do we propose arbitrary 3D/4D sequence generation within one model to generate any specified target sequence, but also we are able to rank the importance of each sequence based on a new metric estimating the difficulty of a sequence being generated. Furthermore, we also exploit the generation inability of the model to extract regions that contain unique information for each sequence. We conduct extensive experiments using three datasets including a toy dataset of 20,000 simulated subjects, a brain MRI dataset of 1251 subjects, and a breast MRI dataset of 2101 subjects, to demonstrate that (1) top-ranking sequences can be used to replace complete sequences with non-inferior performance; (2) combining MRI with our imaging-differentiation map leads to better performance in clinical tasks such as glioblastoma MGMT promoter methylation status prediction and breast cancer pathological complete response status prediction. Our code is available at https://github.com/fiy2W/mri_seq2seq.
Collapse
Affiliation(s)
- Luyi Han
- Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Tao Tan
- Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands; Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao Special Administrative Region of China.
| | - Tianyu Zhang
- Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, P. Debyelaan 25, 6202 AZ, Maastricht, The Netherlands
| | - Yunzhi Huang
- Institute for AI in Medicine, School of Automation, Nanjing University of Information Science and Technology, Nanjing, China
| | - Xin Wang
- Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, P. Debyelaan 25, 6202 AZ, Maastricht, The Netherlands
| | - Yuan Gao
- Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, P. Debyelaan 25, 6202 AZ, Maastricht, The Netherlands
| | - Jonas Teuwen
- Department of Radiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Ritse Mann
- Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands; Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Al-Dhamari I, Helal R, Abdelaziz T, Waldeck S, Paulus D. Automatic cochlear multimodal 3D image segmentation and analysis using atlas-model-based method. Cochlear Implants Int 2024; 25:46-58. [PMID: 37922404 DOI: 10.1080/14670100.2023.2274199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2023]
Abstract
OBJECTIVES To propose an automated fast cochlear segmentation, length, and volume estimation method from clinical 3D multimodal images which has a potential role in the choice of cochlear implant type, surgery planning, and robotic surgeries. Methods: Two datasets from different countries were used. These datasets include 219 clinical 3D images of cochlea from 3 modalities: CT, CBCT, and MR. The datasets include different ages, genders, and types of cochlear implants. We propose an atlas-model-based method for cochlear segmentation and measurement based on high-resolution μCT model and A-value. The method was evaluated using 3D landmarks located by two experts. Results: The average error was 0.61 ± 0.22 mm and the average time required to process an image was 5.21 ± 0.93 seconds (P<0.001). The volume of the cochlea ranged from 73.96 mm3 to 106.97 mm3 , the cochlear length ranged from 36.69 to 45.91 mm at the lateral wall and from 29.12 to 39.05 mm at the organ of Corti. Discussion: We propose a method that produces nine different automated measurements of the cochlea: volume of scala tympani, volume of scala vestibuli, central lengths of the two scalae, the scala tympani lateral wall length, and the organ of Corti length in addition to three measurements related to A-value. Conclusion: This automatic cochlear image segmentation and analysis method can help clinician process multimodal cochlear images in approximately 5 seconds using a simple computer. The proposed method is publicly available for free download as an extension for 3D Slicer software.
Collapse
Affiliation(s)
- Ibraheem Al-Dhamari
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin, Berlin, Germany
| | | | | | | | | |
Collapse
|
6
|
Andrade-Miranda G, Jaouen V, Tankyevych O, Cheze Le Rest C, Visvikis D, Conze PH. Multi-modal medical Transformers: A meta-analysis for medical image segmentation in oncology. Comput Med Imaging Graph 2023; 110:102308. [PMID: 37918328 DOI: 10.1016/j.compmedimag.2023.102308] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 10/05/2023] [Accepted: 10/24/2023] [Indexed: 11/04/2023]
Abstract
Multi-modal medical image segmentation is a crucial task in oncology that enables the precise localization and quantification of tumors. The aim of this work is to present a meta-analysis of the use of multi-modal medical Transformers for medical image segmentation in oncology, specifically focusing on multi-parametric MR brain tumor segmentation (BraTS2021), and head and neck tumor segmentation using PET-CT images (HECKTOR2021). The multi-modal medical Transformer architectures presented in this work exploit the idea of modality interaction schemes based on visio-linguistic representations: (i) single-stream, where modalities are jointly processed by one Transformer encoder, and (ii) multiple-stream, where the inputs are encoded separately before being jointly modeled. A total of fourteen multi-modal architectures are evaluated using different ranking strategies based on dice similarity coefficient (DSC) and average symmetric surface distance (ASSD) metrics. In addition, cost indicators such as the number of trainable parameters and the number of multiply-accumulate operations (MACs) are reported. The results demonstrate that multi-path hybrid CNN-Transformer-based models improve segmentation accuracy when compared to traditional methods, but come at the cost of increased computation time and potentially larger model size.
Collapse
Affiliation(s)
| | - Vincent Jaouen
- LaTIM UMR 1101, Inserm, Brest, France; IMT Atlantique, Brest, France.
| | - Olena Tankyevych
- LaTIM UMR 1101, Inserm, Brest, France; Nuclear Medicine, University Hospital of Poitiers, Poitiers, France.
| | - Catherine Cheze Le Rest
- LaTIM UMR 1101, Inserm, Brest, France; Nuclear Medicine, University Hospital of Poitiers, Poitiers, France.
| | | | | |
Collapse
|
7
|
Wu J, Wang G, Gu R, Lu T, Chen Y, Zhu W, Vercauteren T, Ourselin S, Zhang S. UPL-SFDA: Uncertainty-Aware Pseudo Label Guided Source-Free Domain Adaptation for Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3932-3943. [PMID: 37738202 DOI: 10.1109/tmi.2023.3318364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Domain Adaptation (DA) is important for deep learning-based medical image segmentation models to deal with testing images from a new target domain. As the source-domain data are usually unavailable when a trained model is deployed at a new center, Source-Free Domain Adaptation (SFDA) is appealing for data and annotation-efficient adaptation to the target domain. However, existing SFDA methods have a limited performance due to lack of sufficient supervision with source-domain images unavailable and target-domain images unlabeled. We propose a novel Uncertainty-aware Pseudo Label guided (UPL) SFDA method for medical image segmentation. Specifically, we propose Target Domain Growing (TDG) to enhance the diversity of predictions in the target domain by duplicating the pre-trained model's prediction head multiple times with perturbations. The different predictions in these duplicated heads are used to obtain pseudo labels for unlabeled target-domain images and their uncertainty to identify reliable pseudo labels. We also propose a Twice Forward pass Supervision (TFS) strategy that uses reliable pseudo labels obtained in one forward pass to supervise predictions in the next forward pass. The adaptation is further regularized by a mean prediction-based entropy minimization term that encourages confident and consistent results in different prediction heads. UPL-SFDA was validated with a multi-site heart MRI segmentation dataset, a cross-modality fetal brain segmentation dataset, and a 3D fetal tissue segmentation dataset. It improved the average Dice by 5.54, 5.01 and 6.89 percentage points for the three tasks compared with the baseline, respectively, and outperformed several state-of-the-art SFDA methods.
Collapse
|
8
|
Dorent R, Haouchine N, Kogl F, Joutard S, Juvekar P, Torio E, Golby A, Ourselin S, Frisken S, Vercauteren T, Kapur T, Wells WM. Unified Brain MR-Ultrasound Synthesis using Multi-Modal Hierarchical Representations. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2023; 2023:448-458. [PMID: 38655383 PMCID: PMC7615858 DOI: 10.1007/978-3-031-43999-5_43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
We introduce MHVAE, a deep hierarchical variational autoencoder (VAE) that synthesizes missing images from various modalities. Extending multi-modal VAEs with a hierarchical latent structure, we introduce a probabilistic formulation for fusing multi-modal images in a common latent representation while having the flexibility to handle incomplete image sets as input. Moreover, adversarial learning is employed to generate sharper images. Extensive experiments are performed on the challenging problem of joint intra-operative ultrasound (iUS) and Magnetic Resonance (MR) synthesis. Our model outperformed multi-modal VAEs, conditional GANs, and the current state-of-the-art unified method (ResViT) for synthesizing missing images, demonstrating the advantage of using a hierarchical latent representation and a principled probabilistic fusion operation. Our code is publicly available.
Collapse
Affiliation(s)
- Reuben Dorent
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | - Nazim Haouchine
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | - Fryderyk Kogl
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Parikshit Juvekar
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | - Erickson Torio
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | - Alexandra Golby
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Sarah Frisken
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Tina Kapur
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
| | - William M Wells
- Harvard Medical School, Brigham and Women's Hospital, Boston, MA, USA
- Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
9
|
Liu H, Zhuang Y, Song E, Xu X, Ma G, Cetinkaya C, Hung CC. A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations. Med Phys 2023; 50:5460-5478. [PMID: 36864700 DOI: 10.1002/mp.16338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/07/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND Multi-modal learning is widely adopted to learn the latent complementary information between different modalities in multi-modal medical image segmentation tasks. Nevertheless, the traditional multi-modal learning methods require spatially well-aligned and paired multi-modal images for supervised training, which cannot leverage unpaired multi-modal images with spatial misalignment and modality discrepancy. For training accurate multi-modal segmentation networks using easily accessible and low-cost unpaired multi-modal images in clinical practice, unpaired multi-modal learning has received comprehensive attention recently. PURPOSE Existing unpaired multi-modal learning methods usually focus on the intensity distribution gap but ignore the scale variation problem between different modalities. Besides, within existing methods, shared convolutional kernels are frequently employed to capture common patterns in all modalities, but they are typically inefficient at learning global contextual information. On the other hand, existing methods highly rely on a large number of labeled unpaired multi-modal scans for training, which ignores the practical scenario when labeled data is limited. To solve the above problems, we propose a modality-collaborative convolution and transformer hybrid network (MCTHNet) using semi-supervised learning for unpaired multi-modal segmentation with limited annotations, which not only collaboratively learns modality-specific and modality-invariant representations, but also could automatically leverage extensive unlabeled scans for improving performance. METHODS We make three main contributions to the proposed method. First, to alleviate the intensity distribution gap and scale variation problems across modalities, we develop a modality-specific scale-aware convolution (MSSC) module that can adaptively adjust the receptive field sizes and feature normalization parameters according to the input. Secondly, we propose a modality-invariant vision transformer (MIViT) module as the shared bottleneck layer for all modalities, which implicitly incorporates convolution-like local operations with the global processing of transformers for learning generalizable modality-invariant representations. Third, we design a multi-modal cross pseudo supervision (MCPS) method for semi-supervised learning, which enforces the consistency between the pseudo segmentation maps generated by two perturbed networks to acquire abundant annotation information from unlabeled unpaired multi-modal scans. RESULTS Extensive experiments are performed on two unpaired CT and MR segmentation datasets, including a cardiac substructure dataset derived from the MMWHS-2017 dataset and an abdominal multi-organ dataset consisting of the BTCV and CHAOS datasets. Experiment results show that our proposed method significantly outperforms other existing state-of-the-art methods under various labeling ratios, and achieves a comparable segmentation performance close to single-modal methods with fully labeled data by only leveraging a small portion of labeled data. Specifically, when the labeling ratio is 25%, our proposed method achieves overall mean DSC values of 78.56% and 76.18% in cardiac and abdominal segmentation, respectively, which significantly improves the average DSC value of two tasks by 12.84% compared to single-modal U-Net models. CONCLUSIONS Our proposed method is beneficial for reducing the annotation burden of unpaired multi-modal medical images in clinical applications.
Collapse
Affiliation(s)
- Hong Liu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Yuzhou Zhuang
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Enmin Song
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Xiangyang Xu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guangzhi Ma
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Coskun Cetinkaya
- Center for Machine Vision and Security Research, Kennesaw State University, Kennesaw, Georgia, USA
| | - Chih-Cheng Hung
- Center for Machine Vision and Security Research, Kennesaw State University, Kennesaw, Georgia, USA
| |
Collapse
|
10
|
Liu X, Prince JL, Xing F, Zhuo J, Reese T, Stone M, El Fakhri G, Woo J. Attentive continuous generative self-training for unsupervised domain adaptive medical image translation. Med Image Anal 2023; 88:102851. [PMID: 37329854 PMCID: PMC10527936 DOI: 10.1016/j.media.2023.102851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/28/2023] [Accepted: 05/23/2023] [Indexed: 06/19/2023]
Abstract
Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.
Collapse
Affiliation(s)
- Xiaofeng Liu
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA.
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| | - Jiachen Zhuo
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Timothy Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| | - Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| |
Collapse
|
11
|
Liu X, Prince JL, Xing F, Zhuo J, Reese T, Stone M, El Fakhri G, Woo J. Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation. ARXIV 2023:arXiv:2305.14589v1. [PMID: 37292465 PMCID: PMC10246114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.
Collapse
Affiliation(s)
- Xiaofeng Liu
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114
| | - Jiachen Zhuo
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Timothy Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Dept. of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114
| | - Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114
| |
Collapse
|
12
|
Lu J, Öfverstedt J, Lindblad J, Sladoje N. Is image-to-image translation the panacea for multimodal image registration? A comparative study. PLoS One 2022; 17:e0276196. [PMID: 36441754 PMCID: PMC9704666 DOI: 10.1371/journal.pone.0276196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/30/2022] [Indexed: 11/29/2022] Open
Abstract
Despite current advancement in the field of biomedical image processing, propelled by the deep learning revolution, multimodal image registration, due to its several challenges, is still often performed manually by specialists. The recent success of image-to-image (I2I) translation in computer vision applications and its growing use in biomedical areas provide a tempting possibility of transforming the multimodal registration problem into a, potentially easier, monomodal one. We conduct an empirical study of the applicability of modern I2I translation methods for the task of rigid registration of multimodal biomedical and medical 2D and 3D images. We compare the performance of four Generative Adversarial Network (GAN)-based I2I translation methods and one contrastive representation learning method, subsequently combined with two representative monomodal registration methods, to judge the effectiveness of modality translation for multimodal image registration. We evaluate these method combinations on four publicly available multimodal (2D and 3D) datasets and compare with the performance of registration achieved by several well-known approaches acting directly on multimodal image data. Our results suggest that, although I2I translation may be helpful when the modalities to register are clearly correlated, registration of modalities which express distinctly different properties of the sample are not well handled by the I2I translation approach. The evaluated representation learning method, which aims to find abstract image-like representations of the information shared between the modalities, manages better, and so does the Mutual Information maximisation approach, acting directly on the original multimodal images. We share our complete experimental setup as open-source (https://github.com/MIDA-group/MultiRegEval), including method implementations, evaluation code, and all datasets, for further reproducing and benchmarking.
Collapse
Affiliation(s)
- Jiahao Lu
- MIDA Group, Department of Information Technology, Uppsala University, Uppsala, Sweden
- IMAGE Section, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Johan Öfverstedt
- MIDA Group, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Joakim Lindblad
- MIDA Group, Department of Information Technology, Uppsala University, Uppsala, Sweden
- * E-mail:
| | - Nataša Sladoje
- MIDA Group, Department of Information Technology, Uppsala University, Uppsala, Sweden
| |
Collapse
|