1
|
Tong L, Li T, Zhang Q, Zhang Q, Zhu R, Du W, Hu P. LiViT-Net: A U-Net-like, lightweight Transformer network for retinal vessel segmentation. Comput Struct Biotechnol J 2024; 24:213-224. [PMID: 38572168 PMCID: PMC10987887 DOI: 10.1016/j.csbj.2024.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/22/2024] [Accepted: 03/04/2024] [Indexed: 04/05/2024] Open
Abstract
The intricate task of precisely segmenting retinal vessels from images, which is critical for diagnosing various eye diseases, presents significant challenges for models due to factors such as scale variation, complex anatomical patterns, low contrast, and limitations in training data. Building on these challenges, we offer novel contributions spanning model architecture, loss function design, robustness, and real-time efficacy. To comprehensively address these challenges, a new U-Net-like, lightweight Transformer network for retinal vessel segmentation is presented. By integrating MobileViT+ and a novel local representation in the encoder, our design emphasizes lightweight processing while capturing intricate image structures, enhancing vessel edge precision. A novel joint loss is designed, leveraging the characteristics of weighted cross-entropy and Dice loss to effectively guide the model through the task's challenges, such as foreground-background imbalance and intricate vascular structures. Exhaustive experiments were performed on three prominent retinal image databases. The results underscore the robustness and generalizability of the proposed LiViT-Net, which outperforms other methods in complex scenarios, especially in intricate environments with fine vessels or vessel edges. Importantly, optimized for efficiency, LiViT-Net excels on devices with constrained computational power, as evidenced by its fast performance. To demonstrate the model proposed in this study, a freely accessible and interactive website was established (https://hz-t3.matpool.com:28765?token=aQjYR4hqMI), revealing real-time performance with no login requirements.
Collapse
Affiliation(s)
- Le Tong
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Tianjiu Li
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Qian Zhang
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Qin Zhang
- Ophthalmology Department, Jing'an District Central Hospital, No. 259, Xikang Road, Shanghai, 200040, China
| | - Renchaoli Zhu
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Wei Du
- Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, No. 130 Meilong Road, Shanghai, 200237, China
| | - Pengwei Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 40-1 South Beijing Road, Urumqi, 830011, China
| |
Collapse
|
2
|
Garbaz A, Oukdach Y, Charfi S, El Ansari M, Koutti L, Salihoun M. MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation. Methods 2024; 232:52-64. [PMID: 39481818 DOI: 10.1016/j.ymeth.2024.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 10/03/2024] [Accepted: 10/22/2024] [Indexed: 11/03/2024] Open
Abstract
Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.
Collapse
Affiliation(s)
- Anass Garbaz
- Laboratory of Computer Systems and Vision, Faculty of Science, Ibn Zohr University, Agadir, 80000, Morocco.
| | - Yassine Oukdach
- Laboratory of Computer Systems and Vision, Faculty of Science, Ibn Zohr University, Agadir, 80000, Morocco
| | - Said Charfi
- Laboratory of Computer Systems and Vision, Faculty of Science, Ibn Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Science Faculty of sciences, Moulay Ismail University, Meknes, 50000, Morocco
| | - Lahcen Koutti
- Laboratory of Computer Systems and Vision, Faculty of Science, Ibn Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy, Mohammed V University, Rabat, 10100, Morocco
| |
Collapse
|
3
|
Zhu Z, Yu K, Qi G, Cong B, Li Y, Li Z, Gao X. Lightweight medical image segmentation network with multi-scale feature-guided fusion. Comput Biol Med 2024; 182:109204. [PMID: 39366296 DOI: 10.1016/j.compbiomed.2024.109204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 07/29/2024] [Accepted: 09/22/2024] [Indexed: 10/06/2024]
Abstract
In the field of computer-aided medical diagnosis, it is crucial to adapt medical image segmentation to limited computing resources. There is tremendous value in developing accurate, real-time vision processing models that require minimal computational resources. When building lightweight models, there is always a trade-off between computational cost and segmentation performance. Performance often suffers when applying models to meet resource-constrained scenarios characterized by computation, memory, or storage constraints. This remains an ongoing challenge. This paper proposes a lightweight network for medical image segmentation. It introduces a lightweight transformer, proposes a simplified core feature extraction network to capture more semantic information, and builds a multi-scale feature interaction guidance framework. The fusion module embedded in this framework is designed to address spatial and channel complexities. Through the multi-scale feature interaction guidance framework and fusion module, the proposed network achieves robust semantic information extraction from low-resolution feature maps and rich spatial information retrieval from high-resolution feature maps while ensuring segmentation performance. This significantly reduces the parameter requirements for maintaining deep features within the network, resulting in faster inference and reduced floating-point operations (FLOPs) and parameter counts. Experimental results on ISIC2017 and ISIC2018 datasets confirm the effectiveness of the proposed network in medical image segmentation tasks. For instance, on the ISIC2017 dataset, the proposed network achieved a segmentation accuracy of 82.33 % mIoU, and a speed of 71.26 FPS on 256 × 256 images using a GeForce GTX 3090 GPU. Furthermore, the proposed network is tremendously lightweight, containing only 0.524M parameters. The corresponding source codes are available at https://github.com/CurbUni/LMIS-lightweight-network.
Collapse
Affiliation(s)
- Zhiqin Zhu
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Kun Yu
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Guanqiu Qi
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY, 14222, USA.
| | - Baisen Cong
- Diagnostics Digital DH(Shanghai) Diagnostics Co., Ltd, a Danaher Company, Shanghai, 200335, China.
| | - Yuanyuan Li
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Zexin Li
- College of International, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Xinbo Gao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
4
|
Xu X, Bu Q, Xie J, Li H, Xu F, Li J. On-site burn severity assessment using smartphone-captured color burn wound images. Comput Biol Med 2024; 182:109171. [PMID: 39362001 DOI: 10.1016/j.compbiomed.2024.109171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/05/2024] [Accepted: 09/17/2024] [Indexed: 10/05/2024]
Abstract
Accurate assessment of burn severity is crucial for the management of burn injuries. Currently, clinicians mainly rely on visual inspection to assess burns, characterized by notable inter-observer discrepancies. In this study, we introduce an innovative analysis platform using color burn wound images for automatic burn severity assessment. To do this, we propose a novel joint-task deep learning model, which is capable of simultaneously segmenting both burn regions and body parts, the two crucial components in calculating the percentage of total body surface area (%TBSA). Asymmetric attention mechanism is introduced, allowing attention guidance from the body part segmentation task to the burn region segmentation task. A user-friendly mobile application is developed to facilitate a fast assessment of burn severity at clinical settings. The proposed framework was evaluated on a dataset comprising 1340 color burn wound images captured on-site at clinical settings. The average Dice coefficients for burn depth segmentation and body part segmentation are 85.12 % and 85.36 %, respectively. The R2 for %TBSA assessment is 0.9136. The source codes for the joint-task framework and the application are released on Github (https://github.com/xjtu-mia/BurnAnalysis). The proposed platform holds the potential to be widely used at clinical settings to facilitate a fast and precise burn assessment.
Collapse
Affiliation(s)
- Xiayu Xu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Qilong Bu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jingmeng Xie
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an, 710049, China
| | - Hang Li
- Department of Burns and Plastic Surgery, Tangdu Hospital, The Air Force Military Medical University, Xi'an, 710038, Shaanxi, China
| | - Feng Xu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jing Li
- Department of Burns and Plastic Surgery, Tangdu Hospital, The Air Force Military Medical University, Xi'an, 710038, Shaanxi, China.
| |
Collapse
|
5
|
Pang C, Lu X, Liu X, Zhang R, Lyu L. IIAM: Intra and Inter Attention With Mutual Consistency Learning Network for Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:5971-5983. [PMID: 38985557 DOI: 10.1109/jbhi.2024.3426074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Medical image segmentation provides a reliable basis for diagnosis analysis and disease treatment by capturing the global and local features of the target region. To learn global features, convolutional neural networks are replaced with pure transformers, or transformer layers are stacked at the deepest layers of convolutional neural networks. Nevertheless, they are deficient in exploring local-global cues at each scale and the interaction among consensual regions in multiple scales, hindering the learning about the changes in size, shape, and position of target objects. To cope with these defects, we propose a novel Intra and Inter Attention with Mutual Consistency Learning Network (IIAM). Concretely, we design an intra attention module to aggregate the CNN-based local features and transformer-based global information on each scale. In addition, to capture the interaction among consensual regions in multiple scales, we devise an inter attention module to explore the cross-scale dependency of the object and its surroundings. Moreover, to reduce the impact of blurred regions in medical images on the final segmentation results, we introduce multiple decoders to estimate the model uncertainty, where we adopt a mutual consistency learning strategy to minimize the output discrepancy during the end-to-end training and weight the outputs of the three decoders as the final segmentation result. Extensive experiments on three benchmark datasets verify the efficacy of our method and demonstrate superior performance of our model to state-of-the-art techniques.
Collapse
|
6
|
Cao D, Zhang R, Zhang Y. MFLUnet: multi-scale fusion lightweight Unet for medical image segmentation. BIOMEDICAL OPTICS EXPRESS 2024; 15:5574-5591. [PMID: 39421782 PMCID: PMC11482190 DOI: 10.1364/boe.529505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/13/2024] [Accepted: 07/15/2024] [Indexed: 10/19/2024]
Abstract
Recently, the use of point-of-care medical devices has been increasing; however, many Unet and its latest variant networks have numerous parameters, high computational complexity, and slow inference speed, making them unsuitable for deployment on these point-of-care or mobile devices. In order to deploy in the real medical environment, we propose a multi-scale fusion lightweight network (MFLUnet), a CNN-based lightweight medical image segmentation model. For the information extraction ability and utilization efficiency of the network, we propose two modules, MSBDCB and EF module, which enable the model to effectively extract local features and global features and integrate multi-scale and multi-stage information while maintaining low computational complexity. The proposed network is validated on three challenging medical image segmentation tasks: skin lesion segmentation, cell segmentation, and ultrasound image segmentation. The experimental results show that our network has excellent performance without occupying almost any computing resources. Ablation experiments confirm the effectiveness of the proposed encoder-decoder and skip connection module. This study introduces a new method for medical image segmentation and promotes the application of medical image segmentation networks in real medical environments.
Collapse
Affiliation(s)
- Dianlei Cao
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, Shandong 250014, China
| | - Rui Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, Shandong 250014, China
| | - Yunfeng Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, Shandong 250014, China
| |
Collapse
|
7
|
Xu R, Wang C, Zhang J, Xu S, Meng W, Zhang X. SkinFormer: Learning Statistical Texture Representation With Transformer for Skin Lesion Segmentation. IEEE J Biomed Health Inform 2024; 28:6008-6018. [PMID: 38913520 DOI: 10.1109/jbhi.2024.3417247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Accurate skin lesion segmentation from dermoscopic images is of great importance for skin cancer diagnosis. However, automatic segmentation of melanoma remains a challenging task because it is difficult to incorporate useful texture representations into the learning process. Texture representations are not only related to the local structural information learned by CNN, but also include the global statistical texture information of the input image. In this paper, we propose a transFormer network (SkinFormer) that efficiently extracts and fuses statistical texture representation for Skin lesion segmentation. Specifically, to quantify the statistical texture of input features, a Kurtosis-guided Statistical Counting Operator is designed. We propose Statistical Texture Fusion Transformer and Statistical Texture Enhance Transformer with the help of Kurtosis-guided Statistical Counting Operator by utilizing the transformer's global attention mechanism. The former fuses structural texture information and statistical texture information, and the latter enhances the statistical texture of multi-scale features. Extensive experiments on three publicly available skin lesion datasets validate that our SkinFormer outperforms other SOAT methods, and our method achieves 93.2% Dice score on ISIC 2018. It can be easy to extend SkinFormer to segment 3D images in the future.
Collapse
|
8
|
Wang H, Cao P, Yang J, Zaiane O. Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation. Neural Netw 2024; 178:106546. [PMID: 39053196 DOI: 10.1016/j.neunet.2024.106546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/13/2024] [Accepted: 07/14/2024] [Indexed: 07/27/2024]
Abstract
Current state-of-the-art medical image segmentation techniques predominantly employ the encoder-decoder architecture. Despite its widespread use, this U-shaped framework exhibits limitations in effectively capturing multi-scale features through simple skip connections. In this study, we made a thorough analysis to investigate the potential weaknesses of connections across various segmentation tasks, and suggest two key aspects of potential semantic gaps crucial to be considered: the semantic gap among multi-scale features in different encoding stages and the semantic gap between the encoder and the decoder. To bridge these semantic gaps, we introduce a novel segmentation framework, which incorporates a Dual Attention Transformer module for capturing channel-wise and spatial-wise relationships, and a Decoder-guided Recalibration Attention module for fusing DAT tokens and decoder features. These modules establish a principle of learnable connection that resolves the semantic gaps, leading to a high-performance segmentation model for medical images. Furthermore, it provides a new paradigm for effectively incorporating the attention mechanism into the traditional convolution-based architecture. Comprehensive experimental results demonstrate that our model achieves consistent, significant gains and outperforms state-of-the-art methods with relatively fewer parameters. This study contributes to the advancement of medical image segmentation by offering a more effective and efficient framework for addressing the limitations of current encoder-decoder architectures. Code: https://github.com/McGregorWwww/UDTransNet.
Collapse
Affiliation(s)
- Haonan Wang
- School of Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China.
| | - Peng Cao
- School of Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China.
| | - Jinzhu Yang
- School of Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China
| | | |
Collapse
|
9
|
Wang Z, Gu J, Zhou W, He Q, Zhao T, Guo J, Lu L, He T, Bu J. Neural Memory State Space Models for Medical Image Segmentation. Int J Neural Syst 2024:2450068. [PMID: 39343431 DOI: 10.1142/s0129065724500680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
With the rapid advancement of deep learning, computer-aided diagnosis and treatment have become crucial in medicine. UNet is a widely used architecture for medical image segmentation, and various methods for improving UNet have been extensively explored. One popular approach is incorporating transformers, though their quadratic computational complexity poses challenges. Recently, State-Space Models (SSMs), exemplified by Mamba, have gained significant attention as a promising alternative due to their linear computational complexity. Another approach, neural memory Ordinary Differential Equations (nmODEs), exhibits similar principles and achieves good results. In this paper, we explore the respective strengths and weaknesses of nmODEs and SSMs and propose a novel architecture, the nmSSM decoder, which combines the advantages of both approaches. This architecture possesses powerful nonlinear representation capabilities while retaining the ability to preserve input and process global information. We construct nmSSM-UNet using the nmSSM decoder and conduct comprehensive experiments on the PH2, ISIC2018, and BU-COCO datasets to validate its effectiveness in medical image segmentation. The results demonstrate the promising application value of nmSSM-UNet. Additionally, we conducted ablation experiments to verify the effectiveness of our proposed improvements on SSMs and nmODEs.
Collapse
Affiliation(s)
- Zhihua Wang
- College of Computer Science, Zhejiang University, Hangzhou, P. R. China
- Zhejiang Provincial Key Laboratory of Service Robot, Hangzhou, Zhejiang Province, P. R. China
| | - Jingjun Gu
- College of Computer Science, Zhejiang University, Hangzhou, P. R. China
- Zhejiang Provincial Key Laboratory of Service Robot, Hangzhou, Zhejiang Province, P. R. China
| | - Wang Zhou
- Department of Ultrasound, The First Affiliated Hospital of Anhui Medical University, Hefei, P. R. China
| | - Quansong He
- College of Computer Science, Sichuan University, Chengdu, P. R. China
| | - Tianli Zhao
- Department of Cardiovascular Surgery, The Second Xiangya Hospital, Central South University, Changsha, P. R. China
| | - Jialong Guo
- College of Computer Science, Zhejiang University, Hangzhou, P. R. China
- Zhejiang Provincial Key Laboratory of Service Robot, Hangzhou, Zhejiang Province, P. R. China
| | - Li Lu
- Department of Ophthalmology, Eye Center, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, P. R. China
| | - Tao He
- College of Computer Science, Sichuan University, Chengdu, P. R. China
| | - Jiajun Bu
- College of Computer Science, Zhejiang University, Hangzhou, P. R. China
- Zhejiang Provincial Key Laboratory of Service Robot, Hangzhou, Zhejiang Province, P. R. China
| |
Collapse
|
10
|
Lu Z, Tang K, Wu Y, Zhang X, An Z, Zhu X, Feng Q, Zhao Y. BreasTDLUSeg: A coarse-to-fine framework for segmentation of breast terminal duct lobular units on histopathological whole-slide images. Comput Med Imaging Graph 2024; 118:102432. [PMID: 39461144 DOI: 10.1016/j.compmedimag.2024.102432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 06/29/2024] [Accepted: 08/31/2024] [Indexed: 10/29/2024]
Abstract
Automatic segmentation of breast terminal duct lobular units (TDLUs) on histopathological whole-slide images (WSIs) is crucial for the quantitative evaluation of TDLUs in the diagnostic and prognostic analysis of breast cancer. However, TDLU segmentation remains a great challenge due to its highly heterogeneous sizes, structures, and morphologies as well as the small areas on WSIs. In this study, we propose BreasTDLUSeg, an efficient coarse-to-fine two-stage framework based on multi-scale attention to achieve localization and precise segmentation of TDLUs on hematoxylin and eosin (H&E)-stained WSIs. BreasTDLUSeg consists of two networks: a superpatch-based patch-level classification network (SPPC-Net) and a patch-based pixel-level segmentation network (PPS-Net). SPPC-Net takes a superpatch as input and adopts a sub-region classification head to classify each patch within the superpatch as TDLU positive or negative. PPS-Net takes the TDLU positive patches derived from SPPC-Net as input. PPS-Net deploys a multi-scale CNN-Transformer as an encoder to learn enhanced multi-scale morphological representations and an upsampler to generate pixel-wise segmentation masks for the TDLU positive patches. We also constructed two breast cancer TDLU datasets containing a total of 530 superpatch images with patch-level annotations and 2322 patch images with pixel-level annotations to enable the development of TDLU segmentation methods. Experiments on the two datasets demonstrate that BreasTDLUSeg outperforms other state-of-the-art methods with the highest Dice similarity coefficients of 79.97% and 92.93%, respectively. The proposed method shows great potential to assist pathologists in the pathological analysis of breast cancer. An open-source implementation of our approach can be found at https://github.com/Dian-kai/BreasTDLUSeg.
Collapse
Affiliation(s)
- Zixiao Lu
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, Guangdong, China
| | - Kai Tang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, China
| | - Yi Wu
- Wormpex AI Research, Bellevue, WA 98004, USA
| | - Xiaoxuan Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, China
| | - Ziqi An
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, China
| | - Xiongfeng Zhu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, China.
| | - Yinghua Zhao
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, Guangdong, China.
| |
Collapse
|
11
|
Meng X, Yu C, Zhang Z, Zhang X, Wang M. TG-Net: Using text prompts for improved skin lesion segmentation. Comput Biol Med 2024; 179:108819. [PMID: 38964245 DOI: 10.1016/j.compbiomed.2024.108819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/26/2024] [Accepted: 06/24/2024] [Indexed: 07/06/2024]
Abstract
Automatic skin segmentation is an efficient method for the early diagnosis of skin cancer, which can minimize the missed detection rate and treat early skin cancer in time. However, significant variations in texture, size, shape, the position of lesions, and obscure boundaries in dermoscopy images make it extremely challenging to accurately locate and segment lesions. To address these challenges, we propose a novel framework named TG-Net, which exploits textual diagnostic information to guide the segmentation of dermoscopic images. Specifically, TG-Net adopts a dual-stream encoder-decoder architecture. The dual-stream encoder comprises Res2Net for extracting image features and our proposed text attention (TA) block for extracting textual features. Through hierarchical guidance, textual features are embedded into the process of image feature extraction. Additionally, we devise a multi-level fusion (MLF) module to merge higher-level features and generate a global feature map as guidance for subsequent steps. In the decoding stage of the network, local features and the global feature map are utilized in three multi-scale reverse attention modules (MSRA) to produce the final segmentation results. We conduct extensive experiments on three publicly accessible datasets, namely ISIC 2017, HAM10000, and PH2. Experimental results demonstrate that TG-Net outperforms state-of-the-art methods, validating the reliability of our method. Source code is available at https://github.com/ukeLin/TG-Net.
Collapse
Affiliation(s)
- Xiangfu Meng
- School of Electronics and Information Engineering, Liaoning Technical University, Huludao, China
| | - Chunlin Yu
- School of Electronics and Information Engineering, Liaoning Technical University, Huludao, China.
| | - Zhichao Zhang
- School of Electronics and Information Engineering, Liaoning Technical University, Huludao, China
| | - Xiaoyan Zhang
- School of Electronics and Information Engineering, Liaoning Technical University, Huludao, China
| | - Meng Wang
- School of Electronics and Information Engineering, Liaoning Technical University, Huludao, China
| |
Collapse
|
12
|
Fan C, Zhu Z, Peng B, Xuan Z, Zhu X. EAAC-Net: An Efficient Adaptive Attention and Convolution Fusion Network for Skin Lesion Segmentation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01223-6. [PMID: 39147886 DOI: 10.1007/s10278-024-01223-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/13/2024] [Accepted: 07/31/2024] [Indexed: 08/17/2024]
Abstract
Accurate segmentation of skin lesions in dermoscopic images is of key importance for quantitative analysis of melanoma. Although existing medical image segmentation methods significantly improve skin lesion segmentation, they still have limitations in extracting local features with global information, do not handle challenging lesions well, and usually have a large number of parameters and high computational complexity. To address these issues, this paper proposes an efficient adaptive attention and convolutional fusion network for skin lesion segmentation (EAAC-Net). We designed two parallel encoders, where the efficient adaptive attention feature extraction module (EAAM) adaptively establishes global spatial dependence and global channel dependence by constructing the adjacency matrix of the directed graph and can adaptively filter out the least relevant tokens at the coarse-grained region level, thus reducing the computational complexity of the self-attention mechanism. The efficient multiscale attention-based convolution module (EMA⋅C) utilizes multiscale attention for cross-space learning of local features extracted from the convolutional layer to enhance the representation of richly detailed local features. In addition, we designed a reverse attention feature fusion module (RAFM) to enhance the effective boundary information gradually. To validate the performance of our proposed network, we compared it with other methods on ISIC 2016, ISIC 2018, and PH2 public datasets, and the experimental results show that EAAC-Net has superior segmentation performance under commonly used evaluation metrics.
Collapse
Affiliation(s)
- Chao Fan
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou City, Henan Province, China
- Key Laboratory of Grain Information Processing and Control, Ministry of Education, Zhengzhou City, Henan Province, China
| | - Zhentong Zhu
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China.
| | - Bincheng Peng
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China
| | - Zhihui Xuan
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China
| | - Xinru Zhu
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China
| |
Collapse
|
13
|
Cao W, Guo J, You X, Liu Y, Li L, Cui W, Cao Y, Chen X, Zheng J. NeighborNet: Learning Intra- and Inter-Image Pixel Neighbor Representation for Breast Lesion Segmentation. IEEE J Biomed Health Inform 2024; 28:4761-4771. [PMID: 38743530 DOI: 10.1109/jbhi.2024.3400802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Breast lesion segmentation from ultrasound images is essential in computer-aided breast cancer diagnosis. To alleviate the problems of blurry lesion boundaries and irregular morphologies, common practices combine CNN and attention to integrate global and local information. However, previous methods use two independent modules to extract global and local features separately, such feature-wise inflexible integration ignores the semantic gap between them, resulting in representation redundancy/insufficiency and undesirable restrictions in clinic practices. Moreover, medical images are highly similar to each other due to the imaging methods and human tissues, but the captured global information by transformer-based methods in the medical domain is limited within images, the semantic relations and common knowledge across images are largely ignored. To alleviate the above problems, in the neighbor view, this paper develops a pixel neighbor representation learning method (NeighborNet) to flexibly integrate global and local context within and across images for lesion morphology and boundary modeling. Concretely, we design two neighbor layers to investigate two properties (i.e., number and distribution) of neighbors. The neighbor number for each pixel is not fixed but determined by itself. The neighbor distribution is extended from one image to all images in the datasets. With the two properties, for each pixel at each feature level, the proposed NeighborNet can evolve into the transformer or degenerate into the CNN for adaptive context representation learning to cope with the irregular lesion morphologies and blurry boundaries. The state-of-the-art performances on three ultrasound datasets prove the effectiveness of the proposed NeighborNet.
Collapse
|
14
|
Cai L, Hou K, Zhou S. Intelligent skin lesion segmentation using deformable attention Transformer U-Net with bidirectional attention mechanism in skin cancer images. Skin Res Technol 2024; 30:e13783. [PMID: 39113617 PMCID: PMC11306920 DOI: 10.1111/srt.13783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Accepted: 05/20/2024] [Indexed: 08/11/2024]
Abstract
BACKGROUND In recent years, the increasing prevalence of skin cancers, particularly malignant melanoma, has become a major concern for public health. The development of accurate automated segmentation techniques for skin lesions holds immense potential in alleviating the burden on medical professionals. It is of substantial clinical importance for the early identification and intervention of skin cancer. Nevertheless, the irregular shape, uneven color, and noise interference of the skin lesions have presented significant challenges to the precise segmentation. Therefore, it is crucial to develop a high-precision and intelligent skin lesion segmentation framework for clinical treatment. METHODS A precision-driven segmentation model for skin cancer images is proposed based on the Transformer U-Net, called BiADATU-Net, which integrates the deformable attention Transformer and bidirectional attention blocks into the U-Net. The encoder part utilizes deformable attention Transformer with dual attention block, allowing adaptive learning of global and local features. The decoder part incorporates specifically tailored scSE attention modules within skip connection layers to capture image-specific context information for strong feature fusion. Additionally, deformable convolution is aggregated into two different attention blocks to learn irregular lesion features for high-precision prediction. RESULTS A series of experiments are conducted on four skin cancer image datasets (i.e., ISIC2016, ISIC2017, ISIC2018, and PH2). The findings show that our model exhibits satisfactory segmentation performance, all achieving an accuracy rate of over 96%. CONCLUSION Our experiment results validate the proposed BiADATU-Net achieves competitive performance supremacy compared to some state-of-the-art methods. It is potential and valuable in the field of skin lesion segmentation.
Collapse
Affiliation(s)
- Lili Cai
- School of Biomedical EngineeringGuangzhou Xinhua UniversityGuangzhouChina
| | - Keke Hou
- School of Health SciencesGuangzhou Xinhua UniversityGuangzhouChina
| | - Su Zhou
- School of Biomedical EngineeringGuangzhou Xinhua UniversityGuangzhouChina
| |
Collapse
|
15
|
Huang B, Li H, Fujita H, Sun X, Fang Z, Wang H, Su B. G-MBRMD: Lightweight liver segmentation model based on guided teaching with multi-head boundary reconstruction mapping distillation. Comput Biol Med 2024; 178:108733. [PMID: 38897144 DOI: 10.1016/j.compbiomed.2024.108733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/03/2024] [Accepted: 06/08/2024] [Indexed: 06/21/2024]
Abstract
BACKGROUND AND OBJECTIVES Liver segmentation is pivotal for the quantitative analysis of liver cancer. Although current deep learning methods have garnered remarkable achievements for medical image segmentation, they come with high computational costs, significantly limiting their practical application in the medical field. Therefore, the development of an efficient and lightweight liver segmentation model becomes particularly important. METHODS In our paper, we propose a real-time, lightweight liver segmentation model named G-MBRMD. Specifically, we employ a Transformer-based complex model as the teacher and a convolution-based lightweight model as the student. By introducing proposed multi-head mapping and boundary reconstruction strategies during the knowledge distillation process, Our method effectively guides the student model to gradually comprehend and master the global boundary processing capabilities of the complex teacher model, significantly enhancing the student model's segmentation performance without adding any computational complexity. RESULTS On the LITS dataset, we conducted rigorous comparative and ablation experiments, four key metrics were used for evaluation, including model size, inference speed, Dice coefficient, and HD95. Compared to other methods, our proposed model achieved an average Dice coefficient of 90.14±16.78%, with only 0.6 MB memory and 0.095 s inference speed for a single image on a standard CPU. Importantly, this approach improved the average Dice coefficient of the baseline student model by 1.64% without increasing computational complexity. CONCLUSION The results demonstrate that our method successfully realizes the unification of segmentation precision and lightness, and greatly enhances its potential for widespread application in practical settings.
Collapse
Affiliation(s)
- Bo Huang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Hongxu Li
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Hamido Fujita
- Malaysia-Japan International Institute of Technology(MJIIT), Universiti Teknologi Malaysia, 54100 Kuala Lumpur, Malaysia; Andalusian Research Institute in Data Science and Computational Intelligence(DaSCI), University of Granada, Granada, Spain; Iwate Prefectural University, Iwate, Japan.
| | - Xiaoning Sun
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
| | | | - Hailing Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Bo Su
- Central Laboratory, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, China
| |
Collapse
|
16
|
Mao K, Li R, Cheng J, Huang D, Song Z, Liu Z. PL-Net: progressive learning network for medical image segmentation. Front Bioeng Biotechnol 2024; 12:1414605. [PMID: 38994123 PMCID: PMC11236745 DOI: 10.3389/fbioe.2024.1414605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 05/30/2024] [Indexed: 07/13/2024] Open
Abstract
In recent years, deep convolutional neural network-based segmentation methods have achieved state-of-the-art performance for many medical analysis tasks. However, most of these approaches rely on optimizing the U-Net structure or adding new functional modules, which overlooks the complementation and fusion of coarse-grained and fine-grained semantic information. To address these issues, we propose a 2D medical image segmentation framework called Progressive Learning Network (PL-Net), which comprises Internal Progressive Learning (IPL) and External Progressive Learning (EPL). PL-Net offers the following advantages: 1) IPL divides feature extraction into two steps, allowing for the mixing of different size receptive fields and capturing semantic information from coarse to fine granularity without introducing additional parameters; 2) EPL divides the training process into two stages to optimize parameters and facilitate the fusion of coarse-grained information in the first stage and fine-grained information in the second stage. We conducted comprehensive evaluations of our proposed method on five medical image segmentation datasets, and the experimental results demonstrate that PL-Net achieves competitive segmentation performance. It is worth noting that PL-Net does not introduce any additional learnable parameters compared to other U-Net variants.
Collapse
Affiliation(s)
- Kunpeng Mao
- Chongqing City Management College, Chongqing, China
| | - Ruoyu Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Junlong Cheng
- College of Computer Science, Sichuan University, Chengdu, China
| | - Danmei Huang
- Chongqing City Management College, Chongqing, China
| | - Zhiping Song
- Chongqing University of Engineering, Chongqing, China
| | - ZeKui Liu
- Chongqing University of Engineering, Chongqing, China
| |
Collapse
|
17
|
Bougourzi F, Dornaika F, Distante C, Taleb-Ahmed A. D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images. Comput Biol Med 2024; 176:108590. [PMID: 38763066 DOI: 10.1016/j.compbiomed.2024.108590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 04/16/2024] [Accepted: 05/09/2024] [Indexed: 05/21/2024]
Abstract
Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses. To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.
Collapse
Affiliation(s)
- Fares Bougourzi
- Junia, UMR 8520, CNRS, Centrale Lille, University of Polytechnique Hauts-de-France, 59000 Lille, France.
| | - Fadi Dornaika
- University of the Basque Country UPV/EHU, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
| | - Cosimo Distante
- Institute of Applied Sciences and Intelligent Systems, National Research Council of Italy, 73100 Lecce, Italy.
| | - Abdelmalik Taleb-Ahmed
- Université Polytechnique Hauts-de-France, Université de Lille, CNRS, Valenciennes, 59313, Hauts-de-France, France.
| |
Collapse
|
18
|
Huang K, Liao J, He J, Lai S, Peng Y, Deng Q, Wang H, Liu Y, Peng L, Bai Z, Yu N, Li Y, Jiang Z, Su J, Li J, Tang Y, Chen M, Lu L, Chen X, Yao J, Zhao S. A real-time augmented reality system integrated with artificial intelligence for skin tumor surgery: experimental study and case series. Int J Surg 2024; 110:3294-3306. [PMID: 38549223 PMCID: PMC11175769 DOI: 10.1097/js9.0000000000001371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/11/2024] [Indexed: 06/15/2024]
Abstract
BACKGROUND Skin tumors affect many people worldwide, and surgery is the first treatment choice. Achieving precise preoperative planning and navigation of intraoperative sampling remains a problem and is excessively reliant on the experience of surgeons, especially for Mohs surgery for malignant tumors. MATERIALS AND METHODS To achieve precise preoperative planning and navigation of intraoperative sampling, we developed a real-time augmented reality (AR) surgical system integrated with artificial intelligence (AI) to enhance three functions: AI-assisted tumor boundary segmentation, surgical margin design, and navigation in intraoperative tissue sampling. Non-randomized controlled trials were conducted on manikin, tumor-simulated rabbits, and human volunteers in Hunan Engineering Research Center of Skin Health and Disease Laboratory to evaluate the surgical system. RESULTS The results showed that the accuracy of the benign and malignant tumor segmentation was 0.9556 and 0.9548, respectively, and the average AR navigation mapping error was 0.644 mm. The proposed surgical system was applied in 106 skin tumor surgeries, including intraoperative navigation of sampling in 16 Mohs surgery cases. Surgeons who have used this system highly recognize it. CONCLUSIONS The surgical system highlighted the potential to achieve accurate treatment of skin tumors and to fill the gap in global research on skin tumor surgery systems.
Collapse
Affiliation(s)
- Kai Huang
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
- Tencent AI Lab, Shenzhen, People’s Republic of China
| | - Jun Liao
- Tencent AI Lab, Shenzhen, People’s Republic of China
| | - Jishuai He
- Tencent AI Lab, Shenzhen, People’s Republic of China
| | - Sicen Lai
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Yihao Peng
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Qian Deng
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Han Wang
- Tencent AI Lab, Shenzhen, People’s Republic of China
| | - Yuancheng Liu
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Lanyuan Peng
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Ziqi Bai
- Tencent AI Lab, Shenzhen, People’s Republic of China
| | - Nianzhou Yu
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Yixin Li
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Zixi Jiang
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Juan Su
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Jinmao Li
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Yan Tang
- Department of Dermatology
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Mingliang Chen
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Lixia Lu
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Xiang Chen
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| | - Jianhua Yao
- Tencent AI Lab, Shenzhen, People’s Republic of China
| | - Shuang Zhao
- Department of Dermatology
- Hunan Key Laboratory of Skin Cancer and Psoriasis
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital
- Hunan Engineering Research Center of Skin Health and Disease, Central South University
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Hunan
| |
Collapse
|
19
|
Lin Q, Guo X, Feng B, Guo J, Ni S, Dong H. A novel multi-task learning network for skin lesion classification based on multi-modal clues and label-level fusion. Comput Biol Med 2024; 175:108549. [PMID: 38704901 DOI: 10.1016/j.compbiomed.2024.108549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 04/20/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
In this paper, we propose a multi-task learning (MTL) network based on the label-level fusion of metadata and hand-crafted features by unsupervised clustering to generate new clustering labels as an optimization goal. We propose a MTL module (MTLM) that incorporates an attention mechanism to enable the model to learn more integrated, variable information. We propose a dynamic strategy to adjust the loss weights of different tasks, and trade off the contributions of multiple branches. Instead of feature-level fusion, we propose label-level fusion and combine the results of our proposed MTLM with the results of the image classification network to achieve better lesion prediction on multiple dermatological datasets. We verify the effectiveness of the proposed model by quantitative and qualitative measures. The MTL network using multi-modal clues and label-level fusion can yield the significant performance improvement for skin lesion classification.
Collapse
Affiliation(s)
- Qifeng Lin
- College of Software, Jilin University, 2699 Qianjin Street, Changchun, 130012, China
| | - Xiaoxin Guo
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin Street, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| | - Bo Feng
- College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun, 130012, China
| | - Juntong Guo
- College of Software, Jilin University, 2699 Qianjin Street, Changchun, 130012, China
| | - Shuang Ni
- College of Software, Jilin University, 2699 Qianjin Street, Changchun, 130012, China
| | - Hongliang Dong
- College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun, 130012, China
| |
Collapse
|
20
|
Kuang H, Wang Y, Liu J, Wang J, Cao Q, Hu B, Qiu W, Wang J. Hybrid CNN-Transformer Network With Circular Feature Interaction for Acute Ischemic Stroke Lesion Segmentation on Non-Contrast CT Scans. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2303-2316. [PMID: 38319756 DOI: 10.1109/tmi.2024.3362879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Abstract
Lesion segmentation is a fundamental step for the diagnosis of acute ischemic stroke (AIS). Non-contrast CT (NCCT) is still a mainstream imaging modality for AIS lesion measurement. However, AIS lesion segmentation on NCCT is challenging due to low contrast, noise and artifacts. To achieve accurate AIS lesion segmentation on NCCT, this study proposes a hybrid convolutional neural network (CNN) and Transformer network with circular feature interaction and bilateral difference learning. It consists of parallel CNN and Transformer encoders, a circular feature interaction module, and a shared CNN decoder with a bilateral difference learning module. A new Transformer block is particularly designed to solve the weak inductive bias problem of the traditional Transformer. To effectively combine features from CNN and Transformer encoders, we first design a multi-level feature aggregation module to combine multi-scale features in each encoder and then propose a novel feature interaction module containing circular CNN-to-Transformer and Transformer-to-CNN interaction blocks. Besides, a bilateral difference learning module is proposed at the bottom level of the decoder to learn the different information between the ischemic and contralateral sides of the brain. The proposed method is evaluated on three AIS datasets: the public AISD, a private dataset and an external dataset. Experimental results show that the proposed method achieves Dices of 61.39% and 46.74% on the AISD and the private dataset, respectively, outperforming 17 state-of-the-art segmentation methods. Besides, volumetric analysis on segmented lesions and external validation results imply that the proposed method is potential to provide support information for AIS diagnosis.
Collapse
|
21
|
Xu Z, Guo X, Wang J. Enhancing skin lesion segmentation with a fusion of convolutional neural networks and transformer models. Heliyon 2024; 10:e31395. [PMID: 38807881 PMCID: PMC11130697 DOI: 10.1016/j.heliyon.2024.e31395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 05/11/2024] [Accepted: 05/15/2024] [Indexed: 05/30/2024] Open
Abstract
Accurate segmentation is crucial in diagnosing and analyzing skin lesions. However, automatic segmentation of skin lesions is extremely challenging because of their variable sizes, uneven color distributions, irregular shapes, hair occlusions, and blurred boundaries. Owing to the limited range of convolutional networks receptive fields, shallow convolution cannot extract the global features of images and thus has limited segmentation performance. Because medical image datasets are small in scale, the use of excessively deep networks could cause overfitting and increase computational complexity. Although transformer networks can focus on extracting global information, they cannot extract sufficient local information and accurately segment detailed lesion features. In this study, we designed a dual-branch encoder that combines a convolution neural network (CNN) and a transformer. The CNN branch of the encoder comprises four layers, which learn the local features of images through layer-wise downsampling. The transformer branch also comprises four layers, enabling the learning of global image information through attention mechanisms. The feature fusion module in the network integrates local features and global information, emphasizes important channel features through the channel attention mechanism, and filters irrelevant feature expressions. The information exchange between the decoder and encoder is finally achieved through skip connections to supplement the information lost during the sampling process, thereby enhancing segmentation accuracy. The data used in this paper are from four public datasets, including images of melanoma, basal cell tumor, fibroma, and benign nevus. Because of the limited size of the image data, we enhanced them using methods such as random horizontal flipping, random vertical flipping, random brightness enhancement, random contrast enhancement, and rotation. The segmentation accuracy is evaluated through intersection over union and duration, integrity, commitment, and effort indicators, reaching 87.7 % and 93.21 %, 82.05 % and 89.19 %, 86.81 % and 92.72 %, and 92.79 % and 96.21 %, respectively, on the ISIC 2016, ISIC 2017, ISIC 2018, and PH2 datasets, respectively (code: https://github.com/hyjane/CCT-Net).
Collapse
Affiliation(s)
- Zhijian Xu
- School of Electronic Information Engineering, China West Normal University, No. 1 Shida Road, Nanchong, Sichuan, 637009, China
| | - Xingyue Guo
- School of Computer Science, China West Normal University, No. 1 Shida Road, Nanchong, Sichuan, 637009, China
| | - Juan Wang
- School of Computer Science, China West Normal University, No. 1 Shida Road, Nanchong, Sichuan, 637009, China
| |
Collapse
|
22
|
Huang Z, Zhao Y, Yu Z, Qin P, Han X, Wang M, Liu M, Gregersen H. BiU-net: A dual-branch structure based on two-stage fusion strategy for biomedical image segmentation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 252:108235. [PMID: 38776830 DOI: 10.1016/j.cmpb.2024.108235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 04/28/2024] [Accepted: 05/17/2024] [Indexed: 05/25/2024]
Abstract
BACKGROUND AND OBJECTIVE Computer-based biomedical image segmentation plays a crucial role in planning of assisted diagnostics and therapy. However, due to the variable size and irregular shape of the segmentation target, it is still a challenge to construct an effective medical image segmentation structure. Recently, hybrid architectures based on convolutional neural networks (CNNs) and transformers were proposed. However, most current backbones directly replace one or all convolutional layers with transformer blocks, regardless of the semantic gap between features. Thus, how to sufficiently and effectively eliminate the semantic gap as well as combine the global and local information is a critical challenge. METHODS To address the challenge, we propose a novel structure, called BiU-Net, which integrates CNNs and transformers with a two-stage fusion strategy. In the first fusion stage, called Single-Scale Fusion (SSF) stage, the encoding layers of the CNNs and transformers are coupled, with both having the same feature map size. The SSF stage aims to reconstruct local features based on CNNs and long-range information based on transformers in each encoding block. In the second stage, Multi-Scale Fusion (MSF), BiU-Net interacts with multi-scale features from various encoding layers to eliminate the semantic gap between deep and shallow layers. Furthermore, a Context-Aware Block (CAB) is embedded in the bottleneck to reinforce multi-scale features in the decoder. RESULTS Experiments on four public datasets were conducted. On the BUSI dataset, our BiU-Net achieved 85.50 % on Dice coefficient (Dice), 76.73 % on intersection over union (IoU), and 97.23 % on accuracy (ACC). Compared to the state-of-the-art method, BiU-Net improves Dice by 1.17 %. For the Monuseg dataset, the proposed method attained the highest scores, reaching 80.27 % and 67.22 % for Dice and IoU. The BiU-Net achieves 95.33 % and 81.22 % Dice on the PH2 and DRIVE datasets. CONCLUSIONS The results of our experiments showed that BiU-Net transcends existing state-of-the-art methods on four publicly available biomedical datasets. Due to the powerful multi-scale feature extraction ability, our proposed BiU-Net is a versatile medical image segmentation framework for various types of medical images. The source code is released on (https://github.com/ZYLandy/BiU-Net).
Collapse
Affiliation(s)
- Zhiyong Huang
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China.
| | - Yunlan Zhao
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Zhi Yu
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Pinzhong Qin
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Xiao Han
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Mengyao Wang
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Man Liu
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
| | - Hans Gregersen
- California Medical Innovations Institute, San Diego 92121, California
| |
Collapse
|
23
|
Din S, Mourad O, Serpedin E. LSCS-Net: A lightweight skin cancer segmentation network with densely connected multi-rate atrous convolution. Comput Biol Med 2024; 173:108303. [PMID: 38547653 DOI: 10.1016/j.compbiomed.2024.108303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 01/18/2024] [Accepted: 03/12/2024] [Indexed: 04/17/2024]
Abstract
The rising occurrence and notable public health consequences of skin cancer, especially of the most challenging form known as melanoma, have created an urgent demand for more advanced approaches to disease management. The integration of modern computer vision methods into clinical procedures offers the potential for enhancing the detection of skin cancer . The UNet model has gained prominence as a valuable tool for this objective, continuously evolving to tackle the difficulties associated with the inherent diversity of dermatological images. These challenges stem from diverse medical origins and are further complicated by variations in lighting, patient characteristics, and hair density. In this work, we present an innovative end-to-end trainable network crafted for the segmentation of skin cancer . This network comprises an encoder-decoder architecture, a novel feature extraction block, and a densely connected multi-rate Atrous convolution block. We evaluated the performance of the proposed lightweight skin cancer segmentation network (LSCS-Net) on three widely used benchmark datasets for skin lesion segmentation: ISIC 2016, ISIC 2017, and ISIC 2018. The generalization capabilities of LSCS-Net are testified by the excellent performance on breast cancer and thyroid nodule segmentation datasets. The empirical findings confirm that LSCS-net attains state-of-the-art results, as demonstrated by a significantly elevated Jaccard index.
Collapse
Affiliation(s)
- Sadia Din
- Electrical and Computer Engineering Program, Texas A&M University, Doha, Qatar.
| | | | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
24
|
Ma P, Wang G, Li T, Zhao H, Li Y, Wang H. STCS-Net: a medical image segmentation network that fully utilizes multi-scale information. BIOMEDICAL OPTICS EXPRESS 2024; 15:2811-2831. [PMID: 38855673 PMCID: PMC11161382 DOI: 10.1364/boe.517737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/09/2024] [Accepted: 03/19/2024] [Indexed: 06/11/2024]
Abstract
In recent years, significant progress has been made in the field of medical image segmentation through the application of deep learning and neural networks. Numerous studies have focused on optimizing encoders to extract more comprehensive key information. However, the importance of decoders in directly influencing the final output of images cannot be overstated. The ability of decoders to effectively leverage diverse information and further refine crucial details is of paramount importance. This paper proposes a medical image segmentation architecture named STCS-Net. The designed decoder in STCS-Net facilitates multi-scale filtering and correction of information from the encoder, thereby enhancing the accuracy of extracting vital features. Additionally, an information enhancement module is introduced in skip connections to highlight essential features and improve the inter-layer information interaction capabilities. Comprehensive evaluations on the ISIC2016, ISIC2018, and Lung datasets validate the superiority of STCS-Net across different scenarios. Experimental results demonstrate the outstanding performance of STCS-Net on all three datasets. Comparative experiments highlight the advantages of our proposed network in terms of accuracy and parameter efficiency. Ablation studies confirm the effectiveness of the introduced decoder and skip connection module. This research introduces a novel approach to the field of medical image segmentation, providing new perspectives and solutions for future developments in medical image processing and analysis.
Collapse
Affiliation(s)
- Pengchong Ma
- College of Electronic And Information Engineering, Hebei University, Hebei 071002, China
- Hebei Key Laboratory of Precise Imaging of Inflammation Related Tumors, Hebei 071000, China
| | - Guanglei Wang
- College of Electronic And Information Engineering, Hebei University, Hebei 071002, China
- Hebei Key Laboratory of Precise Imaging of Inflammation Related Tumors, Hebei 071000, China
| | - Tong Li
- College of Electronic And Information Engineering, Hebei University, Hebei 071002, China
- Hebei Key Laboratory of Precise Imaging of Inflammation Related Tumors, Hebei 071000, China
| | - Haiyang Zhao
- College of Electronic And Information Engineering, Hebei University, Hebei 071002, China
| | - Yan Li
- College of Electronic And Information Engineering, Hebei University, Hebei 071002, China
| | - Hongrui Wang
- College of Electronic And Information Engineering, Hebei University, Hebei 071002, China
| |
Collapse
|
25
|
Zhi Y, Bie H, Wang J, Ren L. Masked autoencoders with generalizable self-distillation for skin lesion segmentation. Med Biol Eng Comput 2024:10.1007/s11517-024-03086-z. [PMID: 38653880 DOI: 10.1007/s11517-024-03086-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/29/2024] [Indexed: 04/25/2024]
Abstract
In the field of skin lesion image segmentation, accurate identification and partitioning of diseased regions is of vital importance for in-depth analysis of skin cancer. Self-supervised learning, i.e., MAE, has emerged as a potent force in the medical imaging domain, which autonomously learns and extracts latent features from unlabeled data, thereby yielding pre-trained models that greatly assist downstream tasks. To encourage pre-trained models to more comprehensively learn the global structural and local detail information inherent in dermoscopy images, we introduce a Teacher-Student architecture, named TEDMAE, by incorporating a self-distillation mechanism, it learns holistic image feature information to improve the generalizable global knowledge learning of the student MAE model. To make the image features learned by the model suitable for unknown test images, two optimization strategies are, Exterior Conversion Augmentation (EC) utilizes random convolutional kernels and linear interpolation to effectively transform the input image into one with the same shape but altered intensities and textures, while Dynamic Feature Generation (DF) employs a nonlinear attention mechanism for feature merging, enhancing the expressive power of the features, are proposed to enhance the generalizability of global features learned by the teacher model, thereby improving the overall generalization capability of the pre-trained models. Experimental results from the three public skin disease datasets, ISIC2019, ISIC2017, and PH2 indicate that our proposed TEDMAE method outperforms several similar approaches. Specifically, TEDMAE demonstrated optimal segmentation and generalization performance on the ISIC2017 and PH2 datasets, with Dice scores reaching 82.1% and 91.2%, respectively. The best Jaccard values were 72.6% and 84.5%, while the optimal HD95% values were 13.0% and 8.9%, respectively.
Collapse
Affiliation(s)
- Yichen Zhi
- Department of Intelligent Media Computing Center, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, People's Republic of China
| | - Hongxia Bie
- Department of Intelligent Media Computing Center, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, People's Republic of China.
| | - Jiali Wang
- Department of Intelligent Media Computing Center, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, People's Republic of China
| | - Lihan Ren
- Department of Intelligent Media Computing Center, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, People's Republic of China
| |
Collapse
|
26
|
Ding Y, Yi Z, Xiao J, Hu M, Guo Y, Liao Z, Wang Y. CTH-Net: A CNN and Transformer hybrid network for skin lesion segmentation. iScience 2024; 27:109442. [PMID: 38523786 PMCID: PMC10957498 DOI: 10.1016/j.isci.2024.109442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/25/2024] [Accepted: 03/04/2024] [Indexed: 03/26/2024] Open
Abstract
Automatically and accurately segmenting skin lesions can be challenging, due to factors such as low contrast and fuzzy boundaries. This paper proposes a hybrid encoder-decoder model (CTH-Net) based on convolutional neural network (CNN) and Transformer, capitalizing on the advantages of these approaches. We propose three modules for skin lesion segmentation and seamlessly connect them with carefully designed model architecture. Better segmentation performance is achieved by introducing SoftPool in the CNN branch and sandglass block in the bottleneck layer. Extensive experiments were conducted on four publicly accessible skin lesion datasets, ISIC 2016, ISIC 2017, ISIC 2018, and PH2 to confirm the efficacy and benefits of the proposed strategy. Experimental results show that the proposed CTH-Net provides better skin lesion segmentation performance in both quantitative and qualitative testing when compared with state-of-the-art approaches. We believe the CTH-Net design is inspiring and can be extended to other applications/frameworks.
Collapse
Affiliation(s)
- Yuhan Ding
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zhenglin Yi
- Departments of Urology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jiatong Xiao
- Departments of Urology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Minghui Hu
- Departments of Urology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yu Guo
- Department of Burns and Plastic Surgery, Xiangya Hospital, Central South University, Changsha 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhifang Liao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yongjie Wang
- Department of Burns and Plastic Surgery, Xiangya Hospital, Central South University, Changsha 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China
| |
Collapse
|
27
|
Li B, Xu Y, Wang Y, Zhang B. DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation. PLoS One 2024; 19:e0301019. [PMID: 38573957 PMCID: PMC10994332 DOI: 10.1371/journal.pone.0301019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Automatic and accurate segmentation of medical images plays an essential role in disease diagnosis and treatment planning. Convolution neural networks have achieved remarkable results in medical image segmentation in the past decade. Meanwhile, deep learning models based on Transformer architecture also succeeded tremendously in this domain. However, due to the ambiguity of the medical image boundary and the high complexity of physical organization structures, implementing effective structure extraction and accurate segmentation remains a problem requiring a solution. In this paper, we propose a novel Dual Encoder Network named DECTNet to alleviate this problem. Specifically, the DECTNet embraces four components, which are a convolution-based encoder, a Transformer-based encoder, a feature fusion decoder, and a deep supervision module. The convolutional structure encoder can extract fine spatial contextual details in images. Meanwhile, the Transformer structure encoder is designed using a hierarchical Swin Transformer architecture to model global contextual information. The novel feature fusion decoder integrates the multi-scale representation from two encoders and selects features that focus on segmentation tasks by channel attention mechanism. Further, a deep supervision module is used to accelerate the convergence of the proposed method. Extensive experiments demonstrate that, compared to the other seven models, the proposed method achieves state-of-the-art results on four segmentation tasks: skin lesion segmentation, polyp segmentation, Covid-19 lesion segmentation, and MRI cardiac segmentation.
Collapse
Affiliation(s)
- Boliang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yaming Xu
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yan Wang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Zhang
- Sergeant Schools of Army Academy of Armored Forces, Changchun, Jilin, China
| |
Collapse
|
28
|
Wang Y, Yang Z, Liu X, Li Z, Wu C, Wang Y, Jin K, Chen D, Jia G, Chen X, Ye J, Huang X. PGKD-Net: Prior-guided and Knowledge Diffusive Network for Choroid Segmentation. Artif Intell Med 2024; 150:102837. [PMID: 38553151 DOI: 10.1016/j.artmed.2024.102837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 03/01/2024] [Accepted: 03/03/2024] [Indexed: 04/02/2024]
Abstract
The thickness of the choroid is considered to be an important indicator of clinical diagnosis. Therefore, accurate choroid segmentation in retinal OCT images is crucial for monitoring various ophthalmic diseases. However, this is still challenging due to the blurry boundaries and interference from other lesions. To address these issues, we propose a novel prior-guided and knowledge diffusive network (PGKD-Net) to fully utilize retinal structural information to highlight choroidal region features and boost segmentation performance. Specifically, it is composed of two parts: a Prior-mask Guided Network (PG-Net) for coarse segmentation and a Knowledge Diffusive Network (KD-Net) for fine segmentation. In addition, we design two novel feature enhancement modules, Multi-Scale Context Aggregation (MSCA) and Multi-Level Feature Fusion (MLFF). The MSCA module captures the long-distance dependencies between features from different receptive fields and improves the model's ability to learn global context. The MLFF module integrates the cascaded context knowledge learned from PG-Net to benefit fine-level segmentation. Comprehensive experiments are conducted to evaluate the performance of the proposed PGKD-Net. Experimental results show that our proposed method achieves superior segmentation accuracy over other state-of-the-art methods. Our code is made up publicly available at: https://github.com/yzh-hdu/choroid-segmentation.
Collapse
Affiliation(s)
- Yaqi Wang
- College of Media Engineering, Communication University of Zhejiang, Hangzhou, China.
| | - Zehua Yang
- Hangzhou Dianzi University, Hangzhou, China.
| | - Xindi Liu
- Department of Ophthalmology, School of Medicine, The Second Affiliated Hospital of Zhejiang University, Hangzhou, China.
| | - Zhi Li
- Hangzhou Dianzi University, Hangzhou, China.
| | - Chengyu Wu
- Department of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, China.
| | - Yizhen Wang
- Hangzhou Dianzi University, Hangzhou, China.
| | - Kai Jin
- Department of Ophthalmology, School of Medicine, The Second Affiliated Hospital of Zhejiang University, Hangzhou, China.
| | - Dechao Chen
- Hangzhou Dianzi University, Hangzhou, China.
| | | | | | - Juan Ye
- Department of Ophthalmology, School of Medicine, The Second Affiliated Hospital of Zhejiang University, Hangzhou, China.
| | | |
Collapse
|
29
|
Li Y, Tian T, Hu J, Yuan C. SUTrans-NET: a hybrid transformer approach to skin lesion segmentation. PeerJ Comput Sci 2024; 10:e1935. [PMID: 38660200 PMCID: PMC11042008 DOI: 10.7717/peerj-cs.1935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 02/18/2024] [Indexed: 04/26/2024]
Abstract
Melanoma is a malignant skin tumor that threatens human life and health. Early detection is essential for effective treatment. However, the low contrast between melanoma lesions and normal skin and the irregularity in size and shape make skin lesions difficult to detect with the naked eye in the early stages, making the task of skin lesion segmentation challenging. Traditional encoder-decoder built with U-shaped networks using convolutional neural network (CNN) networks have limitations in establishing long-term dependencies and global contextual connections, while the Transformer architecture is limited in its application to small medical datasets. To address these issues, we propose a new skin lesion segmentation network, SUTrans-NET, which combines CNN and Transformer in a parallel fashion to form a dual encoder, where both CNN and Transformer branches perform dynamic interactive fusion of image information in each layer. At the same time, we introduce our designed multi-grouping module SpatialGroupAttention (SGA) to complement the spatial and texture information of the Transformer branch, and utilize the Focus idea of YOLOV5 to construct the Patch Embedding module in the Transformer to prevent the loss of pixel accuracy. In addition, we design a decoder with full-scale information fusion capability to fully fuse shallow and deep features at different stages of the encoder. The effectiveness of our method is demonstrated on the ISIC 2016, ISIC 2017, ISIC 2018 and PH2 datasets and its advantages over existing methods are verified.
Collapse
Affiliation(s)
- Yaqin Li
- School of Mathematics and Computer Science, Wuhan Polytechnic University School, Wuhan, Hubei, China
| | - Tonghe Tian
- School of Mathematics and Computer Science, Wuhan Polytechnic University School, Wuhan, Hubei, China
| | - Jing Hu
- School of Mathematics and Computer Science, Wuhan Polytechnic University School, Wuhan, Hubei, China
| | - Cao Yuan
- School of Mathematics and Computer Science, Wuhan Polytechnic University School, Wuhan, Hubei, China
| |
Collapse
|
30
|
Fu B, Peng Y, He J, Tian C, Sun X, Wang R. HmsU-Net: A hybrid multi-scale U-net based on a CNN and transformer for medical image segmentation. Comput Biol Med 2024; 170:108013. [PMID: 38271837 DOI: 10.1016/j.compbiomed.2024.108013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/26/2023] [Accepted: 01/18/2024] [Indexed: 01/27/2024]
Abstract
Accurate medical image segmentation is of great significance for subsequent diagnosis and analysis. The acquisition of multi-scale information plays an important role in segmenting regions of interest of different sizes. With the emergence of Transformers, numerous networks adopted hybrid structures incorporating Transformers and CNNs to learn multi-scale information. However, the majority of research has focused on the design and composition of CNN and Transformer structures, neglecting the inconsistencies in feature learning between Transformer and CNN. This oversight has resulted in the hybrid network's performance not being fully realized. In this work, we proposed a novel hybrid multi-scale segmentation network named HmsU-Net, which effectively fused multi-scale features. Specifically, HmsU-Net employed a parallel design incorporating both CNN and Transformer architectures. To address the inconsistency in feature learning between CNN and Transformer within the same stage, we proposed the multi-scale feature fusion module. For feature fusion across different stages, we introduced the cross-attention module. Comprehensive experiments conducted on various datasets demonstrate that our approach surpasses current state-of-the-art methods.
Collapse
Affiliation(s)
- Bangkang Fu
- Medical College, Guizhou University, Guizhou 550000, China; Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Yunsong Peng
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Junjie He
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Chong Tian
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Xinhuan Sun
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Rongpin Wang
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China.
| |
Collapse
|
31
|
He Y, Yi Y, Zheng C, Kong J. BGF-Net: Boundary guided filter network for medical image segmentation. Comput Biol Med 2024; 171:108184. [PMID: 38417386 DOI: 10.1016/j.compbiomed.2024.108184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/12/2024] [Accepted: 02/18/2024] [Indexed: 03/01/2024]
Abstract
How to fuse low-level and high-level features effectively is crucial to improving the accuracy of medical image segmentation. Most CNN-based segmentation models on this topic usually adopt attention mechanisms to achieve the fusion of different level features, but they have not effectively utilized the guided information of high-level features, which is often highly beneficial to improve the performance of the segmentation model, to guide the extraction of low-level features. To address this problem, we design multiple guided modules and develop a boundary-guided filter network (BGF-Net) to obtain more accurate medical image segmentation. To the best of our knowledge, this is the first time that boundary guided information is introduced into the medical image segmentation task. Specifically, we first propose a simple yet effective channel boundary guided module to make the segmentation model pay more attention to the relevant channel weights. We further design a novel spatial boundary guided module to complement the channel boundary guided module and aware of the most important spatial positions. Finally, we propose a boundary guided filter to preserve the structural information from the previous feature map and guide the model to learn more important feature information. Moreover, we conduct extensive experiments on skin lesion, polyp, and gland segmentation datasets including ISIC 2016, CVC-EndoSceneStil and GlaS to test the proposed BGF-Net. The experimental results demonstrate that BGF-Net performs better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Yanlin He
- College of Information Sciences and Technology, Northeast Normal University, Changchun, 130117, China
| | - Yugen Yi
- School of Software, Jiangxi Normal University, Nanchang, 330022, China
| | - Caixia Zheng
- College of Information Sciences and Technology, Northeast Normal University, Changchun, 130117, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
| | - Jun Kong
- Institute for Intelligent Elderly Care, Changchun Humanities and Sciences College, Changchun, 130117, China.
| |
Collapse
|
32
|
Manh V, Jia X, Xue W, Xu W, Mei Z, Dong Y, Zhou J, Huang R, Ni D. An efficient framework for lesion segmentation in ultrasound images using global adversarial learning and region-invariant loss. Comput Biol Med 2024; 171:108137. [PMID: 38447499 DOI: 10.1016/j.compbiomed.2024.108137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/16/2024] [Accepted: 02/12/2024] [Indexed: 03/08/2024]
Abstract
Lesion segmentation in ultrasound images is an essential yet challenging step for early evaluation and diagnosis of cancers. In recent years, many automatic CNN-based methods have been proposed to assist this task. However, most modern approaches often lack capturing long-range dependencies and prior information making it difficult to identify the lesions with unfixed shapes, sizes, locations, and textures. To address this, we present a novel lesion segmentation framework that guides the model to learn the global information about lesion characteristics and invariant features (e.g., morphological features) of lesions to improve the segmentation in ultrasound images. Specifically, the segmentation model is guided to learn the characteristics of lesions from the global maps using an adversarial learning scheme with a self-attention-based discriminator. We argue that under such a lesion characteristics-based guidance mechanism, the segmentation model gets more clues about the boundaries, shapes, sizes, and positions of lesions and can produce reliable predictions. In addition, as ultrasound lesions have different textures, we embed this prior knowledge into a novel region-invariant loss to constrain the model to focus on invariant features for robust segmentation. We demonstrate our method on one in-house breast ultrasound (BUS) dataset and two public datasets (i.e., breast lesion (BUS B) and thyroid nodule from TNSCUI2020). Experimental results show that our method is specifically suitable for lesion segmentation in ultrasound images and can outperform the state-of-the-art approaches with Dice of 0.931, 0.906, and 0.876, respectively. The proposed method demonstrates that it can provide more important information about the characteristics of lesions for lesion segmentation in ultrasound images, especially for lesions with irregular shapes and small sizes. It can assist the current lesion segmentation models to better suit clinical needs.
Collapse
Affiliation(s)
- Van Manh
- Medical Ultrasound Image Computing (MUSIC) lab, School of Biomedical Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Xiaohong Jia
- Department of Ultrasound Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Wufeng Xue
- Medical Ultrasound Image Computing (MUSIC) lab, School of Biomedical Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Wenwen Xu
- Department of Ultrasound Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Zihan Mei
- Department of Ultrasound Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Yijie Dong
- Department of Ultrasound Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Jianqiao Zhou
- Department of Ultrasound Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, 200240, China.
| | - Ruobing Huang
- Medical Ultrasound Image Computing (MUSIC) lab, School of Biomedical Engineering, Shenzhen University, Shenzhen, 518060, China.
| | - Dong Ni
- Medical Ultrasound Image Computing (MUSIC) lab, School of Biomedical Engineering, Shenzhen University, Shenzhen, 518060, China.
| |
Collapse
|
33
|
Moon JH, Choi G, Kim YH, Kim WY. PCTC-Net: A Crack Segmentation Network with Parallel Dual Encoder Network Fusing Pre-Conv-Based Transformers and Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2024; 24:1467. [PMID: 38475003 DOI: 10.3390/s24051467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/19/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
Cracks are common defects that occur on the surfaces of objects and structures. Crack detection is a critical maintenance task that traditionally requires manual labor. Large-scale manual inspections are expensive. Research has been conducted to replace expensive human labor with cheaper computing resources. Recently, crack segmentation based on convolutional neural networks (CNNs) and transformers has been actively investigated for local and global information. However, the transformer is data-intensive owing to its weak inductive bias. Existing labeled datasets for crack segmentation are relatively small. Additionally, a limited amount of fine-grained crack data is available. To address this data-intensive problem, we propose a parallel dual encoder network fusing Pre-Conv-based Transformers and convolutional neural networks (PCTC-Net). The Pre-Conv module automatically optimizes each color channel with a small spatial kernel before the input of the transformer. The proposed model, PCTC-Net, was tested with the DeepCrack, Crack500, and Crackseg9k datasets. The experimental results showed that our model achieved higher generalization performance, stability, and F1 scores than the SOTA model DTrC-Net.
Collapse
Affiliation(s)
- Ji-Hwan Moon
- Department of Artificial Intelligence Engineering, Chosun University, Gwangju 61452, Republic of Korea
| | - Gyuho Choi
- Department of Artificial Intelligence Engineering, Chosun University, Gwangju 61452, Republic of Korea
| | - Yu-Hwan Kim
- Department of Computer Engineering, Chosun University, Gwangju 61452, Republic of Korea
| | - Won-Yeol Kim
- Department of Artificial Intelligence Engineering, Chosun University, Gwangju 61452, Republic of Korea
| |
Collapse
|
34
|
Xin C, Liu Z, Ma Y, Wang D, Zhang J, Li L, Zhou Q, Xu S, Zhang Y. Transformer guided self-adaptive network for multi-scale skin lesion image segmentation. Comput Biol Med 2024; 169:107846. [PMID: 38184865 DOI: 10.1016/j.compbiomed.2023.107846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/03/2023] [Accepted: 12/11/2023] [Indexed: 01/09/2024]
Abstract
BACKGROUND In recent years, skin lesion has become a major public health concern, and the diagnosis and management of skin lesions depend heavily on the correct segmentation of the lesions. Traditional convolutional neural networks (CNNs) have demonstrated promising results in skin lesion segmentation, but they are limited in their ability to capture distant connections and intricate features. In addition, current medical image segmentation algorithms rarely consider the distribution of different categories in different regions of the image and do not consider the spatial relationship between pixels. OBJECTIVES This study proposes a self-adaptive position-aware skin lesion segmentation model SapFormer to capture global context and fine-grained detail, better capture spatial relationships, and adapt to different positional characteristics. The SapFormer is a multi-scale dynamic position-aware structure designed to provide a more flexible representation of the relationships between skin lesion characteristics and lesion distribution. Additionally, it increases skin lesion segmentation accuracy and decreases incorrect segmentation of non-lesion areas. INNOVATIONS SapFormer designs multiple hybrid transformers for multi-scale feature encoding of skin images and multi-scale positional feature sensing of the encoded features using a transformer decoder to obtain fine-grained features of the lesion area and optimize the regional feature distribution. The self-adaptive feature framework, built upon the transformer decoder module, dynamically and automatically generates parameterizations with learnable properties at different positions. These parameterizations are derived from the multi-scale encoding characteristics of the input image. Simultaneously, this paper utilizes the cross-attention network to optimize the features of the current region according to the features of other regions, aiming to increase skin lesion segmentation accuracy. MAIN RESULTS The ISIC-2016, ISIC-2017, and ISIC-2018 datasets for skin lesions are used as the basis for the experiment. On these datasets, the proposed model has accuracy values of 97.9 %, 94.3 %, and 95.7 %, respectively. The proposed model's IOU values are, in order, 93.2 %, 86.4 %, and 89.4 %. The proposed model's DSC values are 96.4 %, 92.6 %, and 94.3 %, respectively. All three metrics surpass the performance of the majority of state-of-the-art (SOTA) models. SapFormer's metrics on these datasets demonstrate that it can precisely segment skin lesions. Notably, our approach exhibits remarkable noise resistance in non-lesion areas, while simultaneously conducting finer-grained regional feature extraction on the skin lesion image. CONCLUSIONS In conclusion, the integration of a transformer-guided position-aware network into semantic skin lesion segmentation results in a notable performance boost. The ability of our proposed network to capture spatial relationships and fine-grained details proves beneficial for effective skin lesion segmentation. By enhancing lesion localization, feature extraction, quantitative analysis, and classification accuracy, the proposed segmentation model improves the diagnostic efficiency of skin lesion analysis on dermoscopic images. It assists dermatologists in making more accurate and efficient diagnoses, ultimately leading to better patient care and outcomes. This research paves the way for advances in diagnosing and treating skin lesions, promoting better understanding and decision-making in the clinical setting.
Collapse
Affiliation(s)
- Chao Xin
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Zhifang Liu
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Yizhao Ma
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Dianchen Wang
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Jing Zhang
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Lingzhi Li
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Qiongyan Zhou
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | - Suling Xu
- The First Affiliated Hospital of Ningbo University, Ningbo, 315211, China.
| | | |
Collapse
|
35
|
Zhang D, Fan X, Kang X, Tian S, Xiao G, Yu L, Wu W. Class key feature extraction and fusion for 2D medical image segmentation. Med Phys 2024; 51:1263-1276. [PMID: 37552522 DOI: 10.1002/mp.16636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 06/28/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND The size variation, complex semantic environment and high similarity in medical images often prevent deep learning models from achieving good performance. PURPOSE To overcome these problems and improve the model segmentation performance and generalizability. METHODS We propose the key class feature reconstruction module (KCRM), which ranks channel weights and selects key features (KFs) that contribute more to the segmentation results for each class. Meanwhile, KCRM reconstructs all local features to establish the dependence relationship from local features to KFs. In addition, we propose the spatial gating module (SGM), which employs KFs to generate two spatial maps to suppress irrelevant regions, strengthening the ability to locate semantic objects. Finally, we enable the model to adapt to size variations by diversifying the receptive field. RESULTS We integrate these modules into class key feature extraction and fusion network (CKFFNet) and validate its performance on three public medical datasets: CHAOS, UW-Madison, and ISIC2017. The experimental results show that our method achieves better segmentation results and generalizability than those of mainstream methods. CONCLUSION Through quantitative and qualitative research, the proposed module improves the segmentation results and enhances the model generalizability, making it suitable for application and expansion.
Collapse
Affiliation(s)
- Dezhi Zhang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| | - Xin Fan
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
| | - Xiaojing Kang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| | - Shengwei Tian
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
- Key Laboratory of Software Engineering Technology, College of Software, Xin Jiang University, Urumqi, China
| | - Guangli Xiao
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, China
- Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Weidong Wu
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| |
Collapse
|
36
|
Song E, Zhan B, Liu H. Combining external-latent attention for medical image segmentation. Neural Netw 2024; 170:468-477. [PMID: 38039684 DOI: 10.1016/j.neunet.2023.10.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/19/2023] [Accepted: 10/29/2023] [Indexed: 12/03/2023]
Abstract
The attention mechanism comes as a new entry point for improving the performance of medical image segmentation. How to reasonably assign weights is a key element of the attention mechanism, and the current popular schemes include the global squeezing and the non-local information interactions using self-attention (SA) operation. However, these approaches over-focus on external features and lack the exploitation of latent features. The global squeezing approach crudely represents the richness of contextual information by the global mean or maximum value, while non-local information interactions focus on the similarity of external features between different regions. Both ignore the fact that the contextual information is presented more in terms of the latent features like the frequency change within the data. To tackle above problems and make proper use of attention mechanisms in medical image segmentation, we propose an external-latent attention collaborative guided image segmentation network, named TransGuider. This network consists of three key components: 1) a latent attention module that uses an improved entropy quantification method to accurately explore and locate the distribution of latent contextual information. 2) an external self-attention module using sparse representation, which can preserve external global contextual information while reducing computational overhead by selecting representative feature description map for SA operation. 3) a multi-attention collaborative module to guide the network to continuously focus on the region of interest, refining the segmentation mask. Our experimental results on several benchmark medical image segmentation datasets show that TransGuider outperforms the state-of-the-art methods, and extensive ablation experiments demonstrate the effectiveness of the proposed components. Our code will be available at https://github.com/chasingone/TransGuider.
Collapse
Affiliation(s)
- Enmin Song
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China.
| | - Bangcheng Zhan
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Hong Liu
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
37
|
Hu B, Zhou P, Yu H, Dai Y, Wang M, Tan S, Sun Y. LeaNet: Lightweight U-shaped architecture for high-performance skin cancer image segmentation. Comput Biol Med 2024; 169:107919. [PMID: 38176212 DOI: 10.1016/j.compbiomed.2024.107919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/10/2023] [Accepted: 01/01/2024] [Indexed: 01/06/2024]
Abstract
Skin cancer diagnosis often relies on image segmentation as a crucial aid, and a high-performance segmentation can lower misdiagnosis risks. Part of the medical devices often have limited computing power for deploying image segmentation algorithms. However, existing high-performance algorithms for image segmentation primarily rely on computationally intensive large models, making it challenging to meet the lightweight deployment requirement of medical devices. State-of-the-art lightweight models are not able to capture both local and global feature information of lesion edges due to their model structures, result in pixel loss of lesion edge. To tackle this problem, we propose LeaNet, a novel U-shaped network for high-performance yet lightweight skin cancer image segmentation. Specifically, LeaNet employs multiple attention blocks in a lightweight symmetric U-shaped design. Each blocks contains a dilated efficient channel attention (DECA) module for global and local contour information and an inverted external attention (IEA) module to improve information correlation between data samples. Additionally, LeaNet uses an attention bridge (AB) module to connect the left and right sides of the U-shaped architecture, thereby enhancing the model's multi-level feature extraction capability. We tested our model on ISIC2017 and ISIC2018 datasets. Compared with large models like ResUNet, LeaNet improved the ACC, SEN, and SPEC metrics by 1.09 %, 2.58 %, and 1.6 %, respectively, while reducing the model's parameter number and computational complexity by 570x and 1182x. Compared with lightweight models like MALUNet, LeaNet achieved improvements of 2.07 %, 4.26 %, and 3.11 % in ACC, SEN, and SPEC, respectively, reducing the parameter number and computational complexity by 1.54x and 1.04x.
Collapse
Affiliation(s)
- Binbin Hu
- College of Electronic and Information, Southwest Minzu University, Chengdu, 610225, China; Key Laboratory of Electronic Information Engineering, Southwest Minzu University, Chengdu, 610225, China
| | - Pan Zhou
- College of Electronic and Information, Southwest Minzu University, Chengdu, 610225, China; Key Laboratory of Electronic Information Engineering, Southwest Minzu University, Chengdu, 610225, China.
| | - Hongfang Yu
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Yueyue Dai
- School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Ming Wang
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| | - Shengbo Tan
- College of Electronic and Information, Southwest Minzu University, Chengdu, 610225, China; Key Laboratory of Electronic Information Engineering, Southwest Minzu University, Chengdu, 610225, China
| | - Ying Sun
- College of Electronic and Information, Southwest Minzu University, Chengdu, 610225, China; Key Laboratory of Electronic Information Engineering, Southwest Minzu University, Chengdu, 610225, China
| |
Collapse
|
38
|
Dai W, Liu R, Wu T, Wang M, Yin J, Liu J. Deeply Supervised Skin Lesions Diagnosis With Stage and Branch Attention. IEEE J Biomed Health Inform 2024; 28:719-729. [PMID: 37624725 DOI: 10.1109/jbhi.2023.3308697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Abstract
Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical to classify the images for early diagnosis of skin disorders. However, the practical use of these ensembled CNNs is limited as these networks are heavyweight and inadequate for processing contextual information. Although lightweight networks (e.g., MobileNetV3 and EfficientNet) were developed to achieve parameter reduction for implementing deep neural networks on mobile devices, insufficient depth of feature representation restricts the performance. To address the existing limitations, we develop a new lite and effective neural network, namely HierAttn. The HierAttn applies a novel deep supervision strategy to learn the local and global features by using multi-stage and multi-branch attention mechanisms with only one training loss. The efficacy of HierAttn was evaluated by using the dermoscopy images dataset ISIC2019 and smartphone photos dataset PAD-UFES-20 (PAD2020). The experimental results show that HierAttn achieves the best accuracy and area under the curve (AUC) among the state-of-the-art lightweight networks.
Collapse
|
39
|
Zhang N, Yu L, Zhang D, Wu W, Tian S, Kang X, Li M. CT-Net: Asymmetric compound branch Transformer for medical image segmentation. Neural Netw 2024; 170:298-311. [PMID: 38006733 DOI: 10.1016/j.neunet.2023.11.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 09/08/2023] [Accepted: 11/13/2023] [Indexed: 11/27/2023]
Abstract
The Transformer architecture has been widely applied in the field of image segmentation due to its powerful ability to capture long-range dependencies. However, its ability to capture local features is relatively weak and it requires a large amount of data for training. Medical image segmentation tasks, on the other hand, demand high requirements for local features and are often applied to small datasets. Therefore, existing Transformer networks show a significant decrease in performance when applied directly to this task. To address these issues, we have designed a new medical image segmentation architecture called CT-Net. It effectively extracts local and global representations using an asymmetric asynchronous branch parallel structure, while reducing unnecessary computational costs. In addition, we propose a high-density information fusion strategy that efficiently fuses the features of two branches using a fusion module of only 0.05M. This strategy ensures high portability and provides conditions for directly applying transfer learning to solve dataset dependency issues. Finally, we have designed a parameter-adjustable multi-perceptive loss function for this architecture to optimize the training process from both pixel-level and global perspectives. We have tested this network on 5 different tasks with 9 datasets, and compared to SwinUNet, CT-Net improves the IoU by 7.3% and 1.8% on Glas and MoNuSeg datasets respectively. Moreover, compared to SwinUNet, the average DSC on the Synapse dataset is improved by 3.5%.
Collapse
Affiliation(s)
- Ning Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Long Yu
- College of Information Science and Engineering, Xinjiang University, Urumqi 830000, China; College of Network Center, Xinjiang University, Urumqi 830000, China.
| | - Dezhi Zhang
- People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China
| | - Weidong Wu
- People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi 830000, China
| | - Xiaojing Kang
- People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
40
|
Huang HY, Nguyen HT, Lin TL, Saenprasarn P, Liu PH, Wang HC. Identification of Skin Lesions by Snapshot Hyperspectral Imaging. Cancers (Basel) 2024; 16:217. [PMID: 38201644 PMCID: PMC10778186 DOI: 10.3390/cancers16010217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 12/28/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024] Open
Abstract
This study pioneers the application of artificial intelligence (AI) and hyperspectral imaging (HSI) in the diagnosis of skin cancer lesions, particularly focusing on Mycosis fungoides (MF) and its differentiation from psoriasis (PsO) and atopic dermatitis (AD). By utilizing a comprehensive dataset of 1659 skin images, including cases of MF, PsO, AD, and normal skin, a novel multi-frame AI algorithm was used for computer-aided diagnosis. The automatic segmentation and classification of skin lesions were further explored using advanced techniques, such as U-Net Attention models and XGBoost algorithms, transforming images from the color space to the spectral domain. The potential of AI and HSI in dermatological diagnostics was underscored, offering a noninvasive, efficient, and accurate alternative to traditional methods. The findings are particularly crucial for early-stage invasive lesion detection in MF, showcasing the model's robust performance in segmenting and classifying lesions and its superior predictive accuracy validated through k-fold cross-validation. The model attained its optimal performance with a k-fold cross-validation value of 7, achieving a sensitivity of 90.72%, a specificity of 96.76%, an F1-score of 90.08%, and an ROC-AUC of 0.9351. This study marks a substantial advancement in dermatological diagnostics, thereby contributing significantly to the early and precise identification of skin malignancies and inflammatory conditions.
Collapse
Affiliation(s)
- Hung-Yi Huang
- Department of Dermatology, Ditmanson Medical Foundation Chiayi Christian Hospital, Chia Yi City 60002, Taiwan;
| | - Hong-Thai Nguyen
- Department of Mechanical Engineering, National Chung Cheng University, 168, University Rd., Min Hsiung, Chia Yi City 62102, Taiwan
| | - Teng-Li Lin
- Department of Dermatology, Dalin Tzu Chi General Hospital, No. 2, Min-Sheng Rd., Dalin Town, Chia Yi City 62247, Taiwan;
| | - Penchun Saenprasarn
- School of Nursing, Shinawatra University, 99 Moo 10, Bangtoey, Samkhok, Pathum Thani 12160, Thailand;
| | - Ping-Hung Liu
- Division of General Surgery, Department of Surgery, Kaohsiung Armed Forces General Hospital, 2, Zhongzheng 1st.Rd., Lingya District, Kaohsiung City 80284, Taiwan
| | - Hsiang-Chen Wang
- Department of Mechanical Engineering, National Chung Cheng University, 168, University Rd., Min Hsiung, Chia Yi City 62102, Taiwan
- Director of Technology Development, Hitspectra Intelligent Technology Co., Ltd., Kaohsiung City 80661, Taiwan
| |
Collapse
|
41
|
Li Y, Yan B, Hou J, Bai B, Huang X, Xu C, Fang L. UNet based on dynamic convolution decomposition and triplet attention. Sci Rep 2024; 14:271. [PMID: 38168684 PMCID: PMC10761743 DOI: 10.1038/s41598-023-50989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 12/28/2023] [Indexed: 01/05/2024] Open
Abstract
The robustness and generalization of medical image segmentation models are being challenged by the differences between different disease types, different image types, and different cases.Deep learning based semantic segmentation methods have been providing state-of-the-art performance in the last few years. One deep learning technique, U-Net, has become the most popular architecture in the medical imaging segmentation. Despite outstanding overall performance in segmenting medical images, it still has the problems of limited feature expression ability and inaccurate segmentation. To this end, we propose a DTA-UNet based on Dynamic Convolution Decomposition (DCD) and Triple Attention (TA). Firstly, the model with Attention U-Net as the baseline network uses DCD to replace all the conventional convolution in the encoding-decoding process to enhance its feature extraction capability. Secondly, we combine TA with Attention Gate (AG) to be used for skip connection in order to highlight lesion regions by removing redundant information in both spatial and channel dimensions. The proposed model are tested on the two public datasets and actual clinical dataset such as the public COVID-SemiSeg dataset, the ISIC 2018 dataset, and the cooperative hospital stroke segmentation dataset. Ablation experiments on the clinical stroke segmentation dataset show the effectiveness of DCD and TA with only a 0.7628 M increase in the number of parameters compared to the baseline model. The proposed DTA-UNet is further evaluated on the three datasets of different types of images to verify its universality. Extensive experimental results show superior performance on different segmentation metrics compared to eight state-of-art methods.The GitHub URL of our code is https://github.com/shuaihou1234/DTA-UNet .
Collapse
Affiliation(s)
- Yang Li
- Academy for Advanced Interdisciplinary Studies, Northeast Normal University, Changchun, 130024, Jilin, China
- Shanghai Zhangjiang Institute of Mathematics, Shanghai, 201203, China
| | - Bobo Yan
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, 130012, Jilin, China
- Pazhou Lab, Guangzhou, China
| | - Jianxin Hou
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, 130012, Jilin, China
| | - Bingyang Bai
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, 130012, Jilin, China
| | - Xiaoyu Huang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, 130012, Jilin, China
| | - Canfei Xu
- The Third Hospital of Jilin University, Changchun, 130117, Jilin, China
| | - Limei Fang
- Encephalopathy Center, The Third Affiliated Hospital of Changchun University of Chinese Medicine, Changchun, 130117, Jilin, China.
| |
Collapse
|
42
|
Zhu W, Tian J, Chen M, Chen L, Chen J. MSS-UNet: A Multi-Spatial-Shift MLP-based UNet for skin lesion segmentation. Comput Biol Med 2024; 168:107719. [PMID: 38007976 DOI: 10.1016/j.compbiomed.2023.107719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/17/2023] [Accepted: 11/15/2023] [Indexed: 11/28/2023]
Abstract
Multilayer perceptron (MLP) networks have become a popular alternative to convolutional neural networks and transformers because of fewer parameters. However, existing MLP-based models improve performance by increasing model depth, which adds computational complexity when processing local features of images. To meet this challenge, we propose MSS-UNet, a lightweight convolutional neural network (CNN) and MLP model for the automated segmentation of skin lesions from dermoscopic images. Specifically, MSS-UNet first uses the convolutional module to extract local information, which is essential for precisely segmenting the skin lesion. We propose an efficient double-spatial-shift MLP module, named DSS-MLP, which enhances the vanilla MLP by enabling communication between different spatial locations through double spatial shifts. We also propose a module named MSSEA with multiple spatial shifts of different strides and lighter external attention to enlarge the local receptive field and capture the boundary continuity of skin lesions. We extensively evaluated the MSS-UNet on ISIC 2017, 2018, and PH2 skin lesion datasets. On three datasets, the method achieves IoU metrics of 85.01%±0.65, 83.65%±1.05, and 92.71%±1.03, with a parameter size and computational complexity of 0.33M and 15.98G, respectively, outperforming most state-of-the-art methods.The code is publicly available at https://github.com/AirZWH/MSS-UNet.
Collapse
Affiliation(s)
- Wenhao Zhu
- Computer School, University of South China, Hengyang, China
| | - Jiya Tian
- School of Information Engineering, Xinjiang Institute of Technology, Aksu, China
| | - Mingzhi Chen
- School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Lingna Chen
- Computer School, University of South China, Hengyang, China.
| | - Junxi Chen
- Affiliated Nanhua Hospital, University of South China, Hengyang, China.
| |
Collapse
|
43
|
Wang Y, Yu X, Yang Y, Zhang X, Zhang Y, Zhang L, Feng R, Xue J. A multi-branched semantic segmentation network based on twisted information sharing pattern for medical images. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107914. [PMID: 37992569 DOI: 10.1016/j.cmpb.2023.107914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/12/2023] [Accepted: 11/03/2023] [Indexed: 11/24/2023]
Abstract
BACKGROUND Semantic segmentation plays an indispensable role in clinical diagnosis support, intelligent surgical assistance, personalized treatment planning, and drug development, making it a core area of research in smart healthcare. However, the main challenge in medical image semantic segmentation lies in the accuracy bottleneck, primarily due to the low interactivity of feature information and the lack of deep exploration of local features during feature fusion. METHODS To address this issue, a novel approach called Twisted Information-sharing Pattern for Multi-branched Network (TP-MNet) has been proposed. This architecture facilitates the mutual transfer of features among neighboring branches at the next level, breaking the barrier of semantic isolation and achieving the goal of semantic fusion. Additionally, performing a secondary feature mining during the transfer process effectively enhances the detection accuracy. Building upon the Twisted Pattern transmission in the encoding and decoding stages, enhanced and refined modules for feature fusion have been developed. These modules aim to capture key features of lesions by acquiring contextual semantic information in a broader context. RESULTS The experiments extensively and objectively validated the TP-MNet on 5 medical datasets and compared it with 21 other semantic segmentation models using 7 metrics. Through metric analysis, image comparisons, process examination, and ablation tests, the superiority of TP-MNet was convincingly demonstrated. Additionally, further investigations were conducted to explore the limitations of TP-MNet, thereby clarifying the practical utility of the Twisted Information-sharing Pattern. CONCLUSIONS TP-MNet adopts the Twisted Information-sharing Pattern, leading to a substantial improvement in the semantic fusion effect and directly contributing to enhanced segmentation performance on medical images. Additionally, this semantic broadcasting mode not only underscores the importance of semantic fusion but also highlights a pivotal direction for the advancement of multi-branched architectures.
Collapse
Affiliation(s)
- Yuefei Wang
- College of Computer Science, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| | - Xi Yu
- Stirling College, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China.
| | - Yixi Yang
- Institute of Cancer Biology and Drug Discovery, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| | - Xiang Zhang
- College of Computer Science, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| | - Yutong Zhang
- College of Computer Science, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| | - Li Zhang
- College of Computer Science, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| | - Ronghui Feng
- Stirling College, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| | - Jiajing Xue
- Stirling College, Chengdu University, 2025 Chengluo Rd., Chengdu, Sichuan 610106, China
| |
Collapse
|
44
|
Hu H, Zhang J, Yang T, Hu Q, Yu Y, Huang Q. PATrans: Pixel-Adaptive Transformer for edge segmentation of cervical nuclei on small-scale datasets. Comput Biol Med 2024; 168:107823. [PMID: 38061155 DOI: 10.1016/j.compbiomed.2023.107823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/22/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024]
Abstract
Transformer has shown excellent performance in various visual tasks, making its application in medicine an inevitable trend. Nevertheless, simply using transformer for small-scale cervical nuclei datasets will result in disastrous performance. Scarce nuclei pixels are not enough to compensate for the lack of CNNs-inherent intrinsic inductive biases, making transformer difficult to model local visual structures and deal with scale variations. Thus, we propose a Pixel Adaptive Transformer(PATrans) to improve the segmentation performance of nuclei edges on small datasets through adaptive pixel tuning. Specifically, to mitigate information loss resulting from mapping different patches into similar latent representations, Consecutive Pixel Patch (CPP) embeds rich multi-scale context into isolated image patches. In this way, it can provide intrinsic scale invariance for 1D input sequences to maintain semantic consistency, allowing the PATrans to establish long-range dependencies quickly. Futhermore, due to the existing handcrafted-attention is agnostic to the widely varying pixel distributions, the Pixel Adaptive Transformer Block (PATB) effectively models the relationships between different pixels across the entire feature map in a data-dependent manner, guided by the important regions. By collaboratively learning local features and global dependencies, PATrans can adaptively reduce the interference of irrelevant pixels. Extensive experiments demonstrate the superiority of our model on three datasets(Ours, ISBI, Herlev).
Collapse
Affiliation(s)
- Hexuan Hu
- Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, 211100, PR China; College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Jianyu Zhang
- Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, 211100, PR China; College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Tianjin Yang
- Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, 211100, PR China; College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Qiang Hu
- Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, 211100, PR China; College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Yufeng Yu
- Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, 211100, PR China; College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Qian Huang
- Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, 211100, PR China; College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| |
Collapse
|
45
|
Shao J, Luan S, Ding Y, Xue X, Zhu B, Wei W. Attention Connect Network for Liver Tumor Segmentation from CT and MRI Images. Technol Cancer Res Treat 2024; 23:15330338231219366. [PMID: 38179668 PMCID: PMC10771068 DOI: 10.1177/15330338231219366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/18/2023] [Accepted: 11/21/2023] [Indexed: 01/06/2024] Open
Abstract
Introduction: Currently, the incidence of liver cancer is on the rise annually. Precise identification of liver tumors is crucial for clinicians to strategize the treatment and combat liver cancer. Thus far, liver tumor contours have been derived through labor-intensive and subjective manual labeling. Computers have gained widespread application in the realm of liver tumor segmentation. Nonetheless, liver tumor segmentation remains a formidable challenge owing to the diverse range of volumes, shapes, and image intensities encountered. Methods: In this article, we introduce an innovative solution called the attention connect network (AC-Net) designed for automated liver tumor segmentation. Building upon the U-shaped network architecture, our approach incorporates 2 critical attention modules: the axial attention module (AAM) and the vision transformer module (VTM), which replace conventional skip-connections to seamlessly integrate spatial features. The AAM facilitates feature fusion by computing axial attention across feature maps, while the VTM operates on the lowest resolution feature maps, employing multihead self-attention, and reshaping the output into a feature map for subsequent concatenation. Furthermore, we employ a specialized loss function tailored to our approach. Our methodology begins with pretraining AC-Net using the LiTS2017 dataset and subsequently fine-tunes it using computed tomography (CT) and magnetic resonance imaging (MRI) data sourced from Hubei Cancer Hospital. Results: The performance metrics for AC-Net on CT data are as follows: dice similarity coefficient (DSC) of 0.90, Jaccard coefficient (JC) of 0.82, recall of 0.92, average symmetric surface distance (ASSD) of 4.59, Hausdorff distance (HD) of 11.96, and precision of 0.89. For AC-Net on MRI data, the metrics are DSC of 0.80, JC of 0.70, recall of 0.82, ASSD of 7.58, HD of 30.26, and precision of 0.84. Conclusion: The comparative experiments highlight that AC-Net exhibits exceptional tumor recognition accuracy when tested on the Hubei Cancer Hospital dataset, demonstrating highly competitive performance for practical clinical applications. Furthermore, the ablation experiments provide conclusive evidence of the efficacy of each module proposed in this article. For those interested, the code for this research article can be accessed at the following GitHub repository: https://github.com/killian-zero/py_tumor-segmentation.git.
Collapse
Affiliation(s)
- Jiakang Shao
- School of Integrated Circuits, Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Shunyao Luan
- School of Integrated Circuits, Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yi Ding
- Department of Radiation Oncology, Hubei Cancer Hospital, TongJi Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xudong Xue
- Department of Radiation Oncology, Hubei Cancer Hospital, TongJi Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Benpeng Zhu
- School of Integrated Circuits, Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Wei Wei
- Department of Radiation Oncology, Hubei Cancer Hospital, TongJi Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
46
|
Jiang Z, Wu Y, Huang L, Gu M. FDB-Net: Fusion double branch network combining CNN and transformer for medical image segmentation. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024; 32:931-951. [PMID: 38848160 DOI: 10.3233/xst-230413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
BACKGROUND The rapid development of deep learning techniques has greatly improved the performance of medical image segmentation, and medical image segmentation networks based on convolutional neural networks and Transformer have been widely used in this field. However, due to the limitation of the restricted receptive field of convolutional operation and the lack of local fine information extraction ability of the self-attention mechanism in Transformer, the current neural networks with pure convolutional or Transformer structure as the backbone still perform poorly in medical image segmentation. METHODS In this paper, we propose FDB-Net (Fusion Double Branch Network, FDB-Net), a double branch medical image segmentation network combining CNN and Transformer, by using a CNN containing gnConv blocks and a Transformer containing Varied-Size Window Attention (VWA) blocks as the feature extraction backbone network, the dual-path encoder ensures that the network has a global receptive field as well as access to the target local detail features. We also propose a new feature fusion module (Deep Feature Fusion, DFF), which helps the image to simultaneously fuse features from two different structural encoders during the encoding process, ensuring the effective fusion of global and local information of the image. CONCLUSION Our model achieves advanced results in all three typical tasks of medical image segmentation, which fully validates the effectiveness of FDB-Net.
Collapse
Affiliation(s)
- Zhongchuan Jiang
- State Key Laboratory of Public Big Data, Guiyang, China
- College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Yun Wu
- State Key Laboratory of Public Big Data, Guiyang, China
- College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Lei Huang
- State Key Laboratory of Public Big Data, Guiyang, China
- College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Maohua Gu
- State Key Laboratory of Public Big Data, Guiyang, China
- College of Computer Science and Technology, Guizhou University, Guiyang, China
| |
Collapse
|
47
|
Chen L, Li J, Zou Y, Wang T. ETU-Net: edge enhancement-guided U-Net with transformer for skin lesion segmentation. Phys Med Biol 2023; 69:015001. [PMID: 38131313 DOI: 10.1088/1361-6560/ad13d2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023]
Abstract
Objective.Convolutional neural network (CNN)-based deep learning algorithms have been widely used in recent years for automatic skin lesion segmentation. However, the limited receptive fields of convolutional architectures hinder their ability to effectively model dependencies between different image ranges. The transformer is often employed in conjunction with CNN to extract both global and local information from images, as it excels at capturing long-range dependencies. However, this method cannot accurately segment skin lesions with blurred boundaries. To overcome this difficulty, we proposed ETU-Net.Approach.ETU-Net, a novel multi-scale architecture, combines edge enhancement, CNN, and transformer. We introduce the concept of edge detection operators into difference convolution, resulting in the design of the edge enhanced convolution block (EC block) and the local transformer block (LT block), which emphasize edge features. To capture the semantic information contained in local features, we propose the multi-scale local attention block (MLA block), which utilizes convolutions with different kernel sizes. Furthermore, to address the boundary uncertainty caused by patch division in the transformer, we introduce a novel global transformer block (GT block), which allows each patch to gather full-size feature information.Main results.Extensive experimental results on three publicly available skin datasets (PH2, ISIC-2017, and ISIC-2018) demonstrate that ETU-Net outperforms state-of-the-art hybrid methods based on CNN and Transformer in terms of segmentation performance. Moreover, ETU-Net exhibits excellent generalization ability in practical segmentation applications on dermatoscopy images contributed by the Wuxi No.2 People's Hospital.Significance.We propose ETU-Net, a novel multi-scale U-Net model guided by edge enhancement, which can address the challenges posed by complex lesion shapes and ambiguous boundaries in skin lesion segmentation tasks.
Collapse
Affiliation(s)
- Lifang Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, People's Republic of China
| | - Jiawei Li
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, People's Republic of China
| | - Yunmin Zou
- Department of Dermatology, Wuxi No.2 People's Hospital, Wuxi, People's Republic of China
| | - Tao Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, People's Republic of China
| |
Collapse
|
48
|
Gao C, Cheng J, Yang Z, Chen Y, Zhu M. SCA-Former: transformer-like network based on stream-cross attention for medical image segmentation. Phys Med Biol 2023; 68:245008. [PMID: 37802056 DOI: 10.1088/1361-6560/ad00fe] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 10/06/2023] [Indexed: 10/08/2023]
Abstract
Objective. Deep convolutional neural networks (CNNs) have been widely applied in medical image analysis and achieved satisfactory performances. While most CNN-based methods exhibit strong feature representation capabilities, they face challenges in encoding long-range interaction information due to the limited receptive fields. Recently, the Transformer has been proposed to alleviate this issue, but its cost is greatly enlarging the model size, which may inhibit its promotion.Approach. To take strong long-range interaction modeling ability and small model size into account simultaneously, we propose a Transformer-like block-based U-shaped network for medical image segmentation, dubbed as SCA-Former. Furthermore, we propose a novel stream-cross attention (SCA) module to enforce the network to focus on finding a balance between local and global representations by extracting multi-scale and interactive features along spatial and channel dimensions. And SCA can effectively extract channel, multi-scale spatial, and long-range information for a more comprehensive feature representation.Main results. Experimental results demonstrate that SCA-Former outperforms the current state-of-the-art (SOTA) methods on three public datasets, including GLAS, ISIC 2017 and LUNG.Significance. This work exhibits a promising method to enhance the feature representation of convolutional neural networks and improve segmentation performance.
Collapse
Affiliation(s)
- Chengrui Gao
- School of Computer Science, Sichuan University, Chengdu, People's Republic of China
- Vision Computing Lab, Sichuan University, Chengdu, People's Republic of China
| | - Junlong Cheng
- School of Computer Science, Sichuan University, Chengdu, People's Republic of China
- Vision Computing Lab, Sichuan University, Chengdu, People's Republic of China
| | - Ziyuan Yang
- School of Computer Science, Sichuan University, Chengdu, People's Republic of China
| | - Yingyu Chen
- School of Computer Science, Sichuan University, Chengdu, People's Republic of China
| | - Min Zhu
- School of Computer Science, Sichuan University, Chengdu, People's Republic of China
- Vision Computing Lab, Sichuan University, Chengdu, People's Republic of China
| |
Collapse
|
49
|
Wang Z, Lyu J, Tang X. autoSMIM: Automatic Superpixel-Based Masked Image Modeling for Skin Lesion Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3501-3511. [PMID: 37379178 DOI: 10.1109/tmi.2023.3290700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Skin lesion segmentation from dermoscopic images plays a vital role in early diagnoses and prognoses of various skin diseases. However, it is a challenging task due to the large variability of skin lesions and their blurry boundaries. Moreover, most existing skin lesion datasets are designed for disease classification, with relatively fewer segmentation labels having been provided. To address these issues, we propose a novel automatic superpixel-based masked image modeling method, named autoSMIM, in a self-supervised setting for skin lesion segmentation. It explores implicit image features from abundant unlabeled dermoscopic images. autoSMIM begins with restoring an input image with randomly masked superpixels. The policy of generating and masking superpixels is then updated via a novel proxy task through Bayesian Optimization. The optimal policy is subsequently used for training a new masked image modeling model. Finally, we finetune such a model on the downstream skin lesion segmentation task. Extensive experiments are conducted on three skin lesion segmentation datasets, including ISIC 2016, ISIC 2017, and ISIC 2018. Ablation studies demonstrate the effectiveness of superpixel-based masked image modeling and establish the adaptability of autoSMIM. Comparisons with state-of-the-art methods show the superiority of our proposed autoSMIM. The source code is available at https://github.com/Wzhjerry/autoSMIM.
Collapse
|
50
|
Jiang X, Zhu Y, Liu Y, Wang N, Yi L. MC-DC: An MLP-CNN Based Dual-path Complementary Network for Medical Image Segmentation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 242:107846. [PMID: 37806121 DOI: 10.1016/j.cmpb.2023.107846] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/10/2023]
Abstract
BACKGROUND Fusing the CNN and Transformer in the encoder has recently achieved outstanding performance in medical image segmentation. However, two obvious limitations require addressing: (1) The utilization of Transformer leads to heavy parameters, and its intricate structure demands ample data and resources for training, and (2) most previous research had predominantly focused on enhancing the performance of the feature encoder, with little emphasis placed on the design of the feature decoder. METHODS To this end, we propose a novel MLP-CNN based dual-path complementary (MC-DC) network for medical image segmentation, which replaces the complex Transformer with a cost-effective Multi-Layer Perceptron (MLP). Specifically, a dual-path complementary (DPC) module is designed to effectively fuse multi-level features from MLP and CNN. To respectively reconstruct global and local information, the dual-path decoder is proposed which is mainly composed of cross-scale global feature fusion (CS-GF) module and cross-scale local feature fusion (CS-LF) module. Moreover, we leverage a simple and efficient segmentation mask feature fusion (SMFF) module to merge the segmentation outcomes generated by the dual-path decoder. RESULTS Comprehensive experiments were performed on three typical medical image segmentation tasks. For skin lesions segmentation, our MC-DC network achieved 91.69% Dice and 9.52mm ASSD on the ISIC2018 dataset. In addition, the 91.6% Dice and 94.4% Dice were respectively obtained on the Kvasir-SEG dataset and CVC-ClinicDB dataset for polyp segmentation. Moreover, we also conducted experiments on the private COVID-DS36 dataset for lung lesion segmentation. Our MC-DC has achieved 87.6% [87.1%, 88.1%], and 92.3% [91.8%, 92.7%] on ground-glass opacity, interstitial infiltration, and lung consolidation, respectively. CONCLUSIONS The experimental results indicate that the proposed MC-DC network exhibits exceptional generalization capability and surpasses other state-of-the-art methods in higher results and lower computational complexity.
Collapse
Affiliation(s)
- Xiaoben Jiang
- School of Information Science and Technology, East China University of Science and Technology, Shanghai, 200237, China
| | - Yu Zhu
- School of Information Science and Technology, East China University of Science and Technology, Shanghai, 200237, China.
| | - Yatong Liu
- School of Information Science and Technology, East China University of Science and Technology, Shanghai, 200237, China
| | - Nan Wang
- School of Information Science and Technology, East China University of Science and Technology, Shanghai, 200237, China
| | - Lei Yi
- Department of Burn, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| |
Collapse
|