1
|
Ovi TB, Bashree N, Nyeem H, Wahed MA. FocusU 2Net: Pioneering dual attention with gated U-Net for colonoscopic polyp segmentation. Comput Biol Med 2025; 186:109617. [PMID: 39793349 DOI: 10.1016/j.compbiomed.2024.109617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/09/2024] [Accepted: 12/22/2024] [Indexed: 01/13/2025]
Abstract
The detection and excision of colorectal polyps, precursors to colorectal cancer (CRC), can improve survival rates by up to 90%. Automated polyp segmentation in colonoscopy images expedites diagnosis and aids in the precise identification of adenomatous polyps, thus mitigating the burden of manual image analysis. This study introduces FocusU2Net, an innovative bi-level nested U-structure integrated with a dual-attention mechanism. The model integrates Focus Gate (FG) modules for spatial and channel-wise attention and Residual U-blocks (RSU) with multi-scale receptive fields for capturing diverse contextual information. Comprehensive evaluations on five benchmark datasets - Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETISLarib, and EndoScene - demonstrate Dice score improvements of 3.14% to 43.59% over state-of-the-art models, with an 85% success rate in cross-dataset validations, significantly surpassing prior competing models with sub-5% success rates. The model combines high segmentation accuracy with computational efficiency, featuring 46.64 million parameters, 78.09 GFLOPs, and 39.02 GMacs, making it suitable for real-time applications. Enhanced with Explainable AI techniques, FocusU2Net provides clear insights into its decision-making process, improving interpretability. This combination of high performance, efficiency, and transparency positions FocusU2Net as a powerful, scalable solution for automated polyp segmentation in clinical practice, advancing medical image analysis and computer-aided diagnosis.
Collapse
Affiliation(s)
- Tareque Bashar Ovi
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Nomaiya Bashree
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Hussain Nyeem
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Md Abdul Wahed
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| |
Collapse
|
2
|
Fu J, Ouyang A, Yang J, Yang D, Ge G, Jin H, He B. SMDFnet: Saliency multiscale dense fusion network for MRI and CT image fusion. Comput Biol Med 2025; 185:109577. [PMID: 39709865 DOI: 10.1016/j.compbiomed.2024.109577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 11/16/2024] [Accepted: 12/11/2024] [Indexed: 12/24/2024]
Abstract
MRI-CT image fusion technology combines magnetic resonance imaging (MRI) and computed tomography (CT) imaging to provide more comprehensive and accurate image information. This fusion technology can play an important role in medical diagnosis and surgical planning. However, there are several issues with current MRI-CT image fusion, such as the presence of artifacts in both MRI and CT images, which may affect the quality and accuracy of the images during the fusion process. The current fusion strategy of MRI and CT images is complex and prone to losing a large amount of information, which still needs further research and improvement. This article proposes a saliency multi-scale dense fusion network for MRI and CT image fusion to address existing issues. The proposed method first uses a pretrained network to extract depth information from MRI and CT images, which can effectively overcome the noise and artifacts caused by directly training and extracting features and improve the saliency information in the original images. Then, a multi-scale dense network is used to further enhance the extracted pre training features and achieve fusion, and multiple loss functions are used to optimize the network and improve the fusion quality. The experimental results show that the fusion results of the proposed method perform better than the objective indicators of the reference method, while retaining more significant information.
Collapse
Affiliation(s)
- Jun Fu
- School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou, 563006, China
| | - Aijia Ouyang
- School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou, 563006, China.
| | - Jie Yang
- School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou, 563006, China
| | - Daoping Yang
- School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou, 563006, China
| | - Gengyu Ge
- School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou, 563006, China
| | - Hongxu Jin
- School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou, 563006, China
| | - Baiqing He
- Gandong University, Fuzhou, Jiangxi, 344000, China
| |
Collapse
|
3
|
Du X, Xu X, Chen J, Zhang X, Li L, Liu H, Li S. UM-Net: Rethinking ICGNet for polyp segmentation with uncertainty modeling. Med Image Anal 2025; 99:103347. [PMID: 39316997 DOI: 10.1016/j.media.2024.103347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/26/2024] [Accepted: 09/10/2024] [Indexed: 09/26/2024]
Abstract
Automatic segmentation of polyps from colonoscopy images plays a critical role in the early diagnosis and treatment of colorectal cancer. Nevertheless, some bottlenecks still exist. In our previous work, we mainly focused on polyps with intra-class inconsistency and low contrast, using ICGNet to solve them. Due to the different equipment, specific locations and properties of polyps, the color distribution of the collected images is inconsistent. ICGNet was designed primarily with reverse-contour guide information and local-global context information, ignoring this inconsistent color distribution, which leads to overfitting problems and makes it difficult to focus only on beneficial image content. In addition, a trustworthy segmentation model should not only produce high-precision results but also provide a measure of uncertainty to accompany its predictions so that physicians can make informed decisions. However, ICGNet only gives the segmentation result and lacks the uncertainty measure. To cope with these novel bottlenecks, we further extend the original ICGNet to a comprehensive and effective network (UM-Net) with two main contributions that have been proved by experiments to have substantial practical value. Firstly, we employ a color transfer operation to weaken the relationship between color and polyps, making the model more concerned with the shape of the polyps. Secondly, we provide the uncertainty to represent the reliability of the segmentation results and use variance to rectify uncertainty. Our improved method is evaluated on five polyp datasets, which shows competitive results compared to other advanced methods in both learning ability and generalization capability. The source code is available at https://github.com/dxqllp/UM-Net.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuebin Xu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
4
|
Xu Z, Rittscher J, Ali S. SSL-CPCD: Self-Supervised Learning With Composite Pretext-Class Discrimination for Improved Generalisability in Endoscopic Image Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4105-4119. [PMID: 38857149 DOI: 10.1109/tmi.2024.3411933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Data-driven methods have shown tremendous progress in medical image analysis. In this context, deep learning-based supervised methods are widely popular. However, they require a large amount of training data and face issues in generalisability to unseen datasets that hinder clinical translation. Endoscopic imaging data is characterised by large inter- and intra-patient variability that makes these models more challenging to learn representative features for downstream tasks. Thus, despite the publicly available datasets and datasets that can be generated within hospitals, most supervised models still underperform. While self-supervised learning has addressed this problem to some extent in natural scene data, there is a considerable performance gap in the medical image domain. In this paper, we propose to explore patch-level instance-group discrimination and penalisation of inter-class variation using additive angular margin within the cosine similarity metrics. Our novel approach enables models to learn to cluster similar representations, thereby improving their ability to provide better separation between different classes. Our results demonstrate significant improvement on all metrics over the state-of-the-art (SOTA) methods on the test set from the same and diverse datasets. We evaluated our approach for classification, detection, and segmentation. SSL-CPCD attains notable Top 1 accuracy of 79.77% in ulcerative colitis classification, an 88.62% mean average precision (mAP) for detection, and an 82.32% dice similarity coefficient for polyp segmentation tasks. These represent improvements of over 4%, 2%, and 3%, respectively, compared to the baseline architectures. We demonstrate that our method generalises better than all SOTA methods to unseen datasets, reporting over 7% improvement.
Collapse
|
5
|
Lijin P, Ullah M, Vats A, Cheikh FA, Santhosh Kumar G, Nair MS. PolySegNet: improving polyp segmentation through swin transformer and vision transformer fusion. Biomed Eng Lett 2024; 14:1421-1431. [PMID: 39465118 PMCID: PMC11502643 DOI: 10.1007/s13534-024-00415-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 07/20/2024] [Accepted: 08/07/2024] [Indexed: 10/29/2024] Open
Abstract
Colorectal cancer ranks as the second most prevalent cancer worldwide, with a high mortality rate. Colonoscopy stands as the preferred procedure for diagnosing colorectal cancer. Detecting polyps at an early stage is critical for effective prevention and diagnosis. However, challenges in colonoscopic procedures often lead medical practitioners to seek support from alternative techniques for timely polyp identification. Polyp segmentation emerges as a promising approach to identify polyps in colonoscopy images. In this paper, we propose an advanced method, PolySegNet, that leverages both Vision Transformer and Swin Transformer, coupled with a Convolutional Neural Network (CNN) decoder. The fusion of these models facilitates a comprehensive analysis of various modules in our proposed architecture.To assess the performance of PolySegNet, we evaluate it on three colonoscopy datasets, a combined dataset, and their augmented versions. The experimental results demonstrate that PolySegNet achieves competitive results in terms of polyp segmentation accuracy and efficacy, achieving a mean Dice score of 0.92 and a mean Intersection over Union (IoU) of 0.86. These metrics highlight the superior performance of PolySegNet in accurately delineating polyp boundaries compared to existing methods. PolySegNet has shown great promise in accurately and efficiently segmenting polyps in medical images. The proposed method could be the foundation for a new class of transformer-based segmentation models in medical image analysis.
Collapse
Affiliation(s)
- P. Lijin
- Artificial Intelligence and Computer Vision Lab, Department of Computer Science, Cochin University of Science and Technology, Kochi, Kerala 682022 India
| | - Mohib Ullah
- Norwegian University of Science and Technology, Teknologivegen 22, 2815 Gjøvik, Norway
| | - Anuja Vats
- Norwegian University of Science and Technology, Teknologivegen 22, 2815 Gjøvik, Norway
| | - Faouzi Alaya Cheikh
- Norwegian Colour and Visual Computing Lab, Norwegian University of Science and Technology, Teknologivegen 22, 2815 Gjøvik, Norway
| | - G. Santhosh Kumar
- Artificial Intelligence and Computer Vision Lab, Department of Computer Science, Cochin University of Science and Technology, Kochi, Kerala 682022 India
| | - Madhu S. Nair
- Artificial Intelligence and Computer Vision Lab, Department of Computer Science, Cochin University of Science and Technology, Kochi, Kerala 682022 India
| |
Collapse
|
6
|
Wei X, Sun J, Su P, Wan H, Ning Z. BCL-Former: Localized Transformer Fusion with Balanced Constraint for polyp image segmentation. Comput Biol Med 2024; 182:109182. [PMID: 39341109 DOI: 10.1016/j.compbiomed.2024.109182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/18/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024]
Abstract
Polyp segmentation remains challenging for two reasons: (a) the size and shape of colon polyps are variable and diverse; (b) the distinction between polyps and mucosa is not obvious. To solve the above two challenging problems and enhance the generalization ability of segmentation method, we propose the Localized Transformer Fusion with Balanced Constraint (BCL-Former) for Polyp Segmentation. In BCL-Former, the Strip Local Enhancement module (SLE module) is proposed to capture the enhanced local features. The Progressive Feature Fusion module (PFF module) is presented to make the feature aggregation smoother and eliminate the difference between high-level and low-level features. Moreover, the Tversky-based Appropriate Constrained Loss (TacLoss) is proposed to achieve the balance and constraint between True Positives and False Negatives, improving the ability to generalize across datasets. Extensive experiments are conducted on four benchmark datasets. Results show that our proposed method achieves state-of-the-art performance in both segmentation precision and generalization ability. Also, the proposed method is 5%-8% faster than the benchmark method in training and inference. The code is available at: https://github.com/sjc-lbj/BCL-Former.
Collapse
Affiliation(s)
- Xin Wei
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Jiacheng Sun
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Pengxiang Su
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Huan Wan
- School of Computer Information Engineering, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China.
| | - Zhitao Ning
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| |
Collapse
|
7
|
Xue H, Yonggang L, Min L, Lin L. A lighter hybrid feature fusion framework for polyp segmentation. Sci Rep 2024; 14:23179. [PMID: 39369043 PMCID: PMC11455952 DOI: 10.1038/s41598-024-72763-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 09/10/2024] [Indexed: 10/07/2024] Open
Abstract
Colonoscopy is widely recognized as the most effective method for the detection of colon polyps, which is crucial for early screening of colorectal cancer. Polyp identification and segmentation in colonoscopy images require specialized medical knowledge and are often labor-intensive and expensive. Deep learning provides an intelligent and efficient approach for polyp segmentation. However, the variability in polyp size and the heterogeneity of polyp boundaries and interiors pose challenges for accurate segmentation. Currently, Transformer-based methods have become a mainstream trend for polyp segmentation. However, these methods tend to overlook local details due to the inherent characteristics of Transformer, leading to inferior results. Moreover, the computational burden brought by self-attention mechanisms hinders the practical application of these models. To address these issues, we propose a novel CNN-Transformer hybrid model for polyp segmentation (CTHP). CTHP combines the strengths of CNN, which excels at modeling local information, and Transformer, which excels at modeling global semantics, to enhance segmentation accuracy. We transform the self-attention computation over the entire feature map into the width and height directions, significantly improving computational efficiency. Additionally, we design a new information propagation module and introduce additional positional bias coefficients during the attention computation process, which reduces the dispersal of information introduced by deep and mixed feature fusion in the Transformer. Extensive experimental results demonstrate that our proposed model achieves state-of-the-art performance on multiple benchmark datasets for polyp segmentation. Furthermore, cross-domain generalization experiments show that our model exhibits excellent generalization performance.
Collapse
Affiliation(s)
- He Xue
- Department of Anesthesia Surgery, The Affiliated Huaian No.1 People's Hospital of Nanjing Medical University, Huai'an, 223300, China
| | - Luo Yonggang
- Department of Cardiothoracic Surgery, The Affiliated Huaian No.1 People's Hospital of Nanjing Medical University, Huai'an, 223300, China
| | - Liu Min
- Department of Laboratory Medicine, The Affiliated Huaian No.1 People's Hospital of Nanjing Medical University, Huai'an, 223300, China
| | - Li Lin
- Department of Anesthesia Surgery, The Affiliated Huaian No.1 People's Hospital of Nanjing Medical University, Huai'an, 223300, China.
| |
Collapse
|
8
|
Li H, Hussin N, He D, Geng Z, Li S. Design of image segmentation model based on residual connection and feature fusion. PLoS One 2024; 19:e0309434. [PMID: 39361568 PMCID: PMC11449362 DOI: 10.1371/journal.pone.0309434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 08/12/2024] [Indexed: 10/05/2024] Open
Abstract
With the development of deep learning technology, convolutional neural networks have made great progress in the field of image segmentation. However, for complex scenes and multi-scale target images, the existing technologies are still unable to achieve effective image segmentation. In view of this, an image segmentation model based on residual connection and feature fusion is proposed. The model makes comprehensive use of the deep feature extraction ability of residual connections and the multi-scale feature integration ability of feature fusion. In order to solve the problem of background complexity and information loss in traditional image segmentation, experiments were carried out on two publicly available data sets. The results showed that in the ISPRS Vaihingen dataset and the Caltech UCSD Birds200 dataset, when the model completed the 56th and 84th iterations, respectively, the average accuracy of FRes-MFDNN was the highest, which was 97.89% and 98.24%, respectively. In the ISPRS Vaihingen dataset and the Caltech UCSD Birds200 dataset, when the system model ran to 0.20s and 0.26s, the F1 value of the FRes-MFDNN method was the largest, and the F1 value approached 100% infinitely. The FRes-MFDNN segmented four images in the ISPRS Vaihingen dataset, and the segmentation accuracy of images 1, 2, 3 and 4 were 91.44%, 92.12%, 94.02% and 91.41%, respectively. In practical applications, the MSRF-Net method, LBN-AA-SPN method, ARG-Otsu method, and FRes-MFDNN were used to segment unlabeled bird images. The results showed that the FRes-MFDNN was more complete in details, and the overall effect was significantly better than the other three models. Meanwhile, in ordinary scene images, although there was a certain degree of noise and occlusion, the model still accurately recognized and segmented the main bird images. The results show that compared with the traditional model, after FRes-MFDNN segmentation, the completeness, detail, and spatial continuity of pixels have been significantly improved, making it more suitable for complex scenes.
Collapse
Affiliation(s)
- Hong Li
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
- Faculty of Engineering, Built Environment and Information Technology, SEGi University, Kota Damansara, Malaysia
| | - Norriza Hussin
- Faculty of Engineering, Built Environment and Information Technology, SEGi University, Kota Damansara, Malaysia
| | - Dandan He
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
- Faculty of Engineering, Built Environment and Information Technology, SEGi University, Kota Damansara, Malaysia
| | - Zexun Geng
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
| | - Shengpu Li
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
| |
Collapse
|
9
|
Lee JH, Ku E, Chung YS, Kim YJ, Kim KG. Intraoperative detection of parathyroid glands using artificial intelligence: optimizing medical image training with data augmentation methods. Surg Endosc 2024; 38:5732-5745. [PMID: 39138679 PMCID: PMC11458679 DOI: 10.1007/s00464-024-11115-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 07/21/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND Postoperative hypoparathyroidism is a major complication of thyroidectomy, occurring when the parathyroid glands are inadvertently damaged during surgery. Although intraoperative images are rarely used to train artificial intelligence (AI) because of its complex nature, AI may be trained to intraoperatively detect parathyroid glands using various augmentation methods. The purpose of this study was to train an effective AI model to detect parathyroid glands during thyroidectomy. METHODS Video clips of the parathyroid gland were collected during thyroid lobectomy procedures. Confirmed parathyroid images were used to train three types of datasets according to augmentation status: baseline, geometric transformation, and generative adversarial network-based image inpainting. The primary outcome was the average precision of the performance of AI in detecting parathyroid glands. RESULTS 152 Fine-needle aspiration-confirmed parathyroid gland images were acquired from 150 patients who underwent unilateral lobectomy. The average precision of the AI model in detecting parathyroid glands based on baseline data was 77%. This performance was enhanced by applying both geometric transformation and image inpainting augmentation methods, with the geometric transformation data augmentation dataset showing a higher average precision (79%) than the image inpainting model (78.6%). When this model was subjected to external validation using a completely different thyroidectomy approach, the image inpainting method was more effective (46%) than both the geometric transformation (37%) and baseline (33%) methods. CONCLUSION This AI model was found to be an effective and generalizable tool in the intraoperative identification of parathyroid glands during thyroidectomy, especially when aided by appropriate augmentation methods. Additional studies comparing model performance and surgeon identification, however, are needed to assess the true clinical relevance of this AI model.
Collapse
Affiliation(s)
- Joon-Hyop Lee
- Division of Endocrine Surgery, Department of Surgery, Samsung Medical Center, 81 Irwon-ro, Gangnam-gu, Seoul, Korea
| | - EunKyung Ku
- Department of Digital Media, The Catholic University of Korea, 43, Jibong-ro, Wonmi-gu, Bucheon, Gyeonggi, 14662, Korea
| | - Yoo Seung Chung
- Division of Endocrine Surgery, Department of Surgery, Gachon University, College of Medicine, Gil Medical Center, Incheon, Korea
| | - Young Jae Kim
- Department of Biomedical Engineering, College of Medicine, Gachon University, Gil Medical Center, 38-13 Dokjeom-ro 3Beon-gil, Namdong-gu, Incheon, 21565, Korea
| | - Kwang Gi Kim
- Department of Biomedical Engineering, College of Medicine, Gachon University, Gil Medical Center, 38-13 Dokjeom-ro 3Beon-gil, Namdong-gu, Incheon, 21565, Korea.
| |
Collapse
|
10
|
Theocharopoulos C, Davakis S, Ziogas DC, Theocharopoulos A, Foteinou D, Mylonakis A, Katsaros I, Gogas H, Charalabopoulos A. Deep Learning for Image Analysis in the Diagnosis and Management of Esophageal Cancer. Cancers (Basel) 2024; 16:3285. [PMID: 39409906 PMCID: PMC11475041 DOI: 10.3390/cancers16193285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 09/21/2024] [Accepted: 09/24/2024] [Indexed: 10/20/2024] Open
Abstract
Esophageal cancer has a dismal prognosis and necessitates a multimodal and multidisciplinary approach from diagnosis to treatment. High-definition white-light endoscopy and histopathological confirmation remain the gold standard for the definitive diagnosis of premalignant and malignant lesions. Artificial intelligence using deep learning (DL) methods for image analysis constitutes a promising adjunct for the clinical endoscopist that could effectively decrease BE overdiagnosis and unnecessary surveillance, while also assisting in the timely detection of dysplastic BE and esophageal cancer. A plethora of studies published during the last five years have consistently reported highly accurate DL algorithms with comparable or superior performance compared to endoscopists. Recent efforts aim to expand DL utilization into further aspects of esophageal neoplasia management including histologic diagnosis, segmentation of gross tumor volume, pretreatment prediction and post-treatment evaluation of patient response to systemic therapy and operative guidance during minimally invasive esophagectomy. Our manuscript serves as an introduction to the growing literature of DL applications for image analysis in the management of esophageal neoplasia, concisely presenting all currently published studies. We also aim to guide the clinician across basic functional principles, evaluation metrics and limitations of DL for image recognition to facilitate the comprehension and critical evaluation of the presented studies.
Collapse
Affiliation(s)
| | - Spyridon Davakis
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| | - Dimitrios C. Ziogas
- First Department of Medicine, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (D.C.Z.); (D.F.); (H.G.)
| | - Achilleas Theocharopoulos
- Department of Electrical and Computer Engineering, National Technical University of Athens, 10682 Athens, Greece;
| | - Dimitra Foteinou
- First Department of Medicine, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (D.C.Z.); (D.F.); (H.G.)
| | - Adam Mylonakis
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| | - Ioannis Katsaros
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| | - Helen Gogas
- First Department of Medicine, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (D.C.Z.); (D.F.); (H.G.)
| | - Alexandros Charalabopoulos
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| |
Collapse
|
11
|
Wang Z, Liu M, Jiang J, Qu X. Colorectal polyp segmentation with denoising diffusion probabilistic models. Comput Biol Med 2024; 180:108981. [PMID: 39146839 DOI: 10.1016/j.compbiomed.2024.108981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 07/13/2024] [Accepted: 08/01/2024] [Indexed: 08/17/2024]
Abstract
Early detection of polyps is essential to decrease colorectal cancer(CRC) incidence. Therefore, developing an efficient and accurate polyp segmentation technique is crucial for clinical CRC prevention. In this paper, we propose an end-to-end training approach for polyp segmentation that employs diffusion model. The images are considered as priors, and the segmentation is formulated as a mask generation process. In the sampling process, multiple predictions are generated for each input image using the trained model, and significant performance enhancements are achieved through the use of majority vote strategy. Four public datasets and one in-house dataset are used to train and test the model performance. The proposed method achieves mDice scores of 0.934 and 0.967 for datasets Kvasir-SEG and CVC-ClinicDB respectively. Furthermore, one cross-validation is applied to test the generalization of the proposed model, and the proposed methods outperformed previous state-of-the-art(SOTA) models to the best of our knowledge. The proposed method also significantly improves the segmentation accuracy and has strong generalization capability.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospital, the Third Clinical Medical College of Capital Medical University, Beijing, China.
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard Exploration, Changsha, China
| | - Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY, United States
| | - Xiaolei Qu
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China
| |
Collapse
|
12
|
Hussain MS, Asgher U, Nisar S, Socha V, Shaukat A, Wang J, Feng T, Paracha RZ, Khan MA. Enhanced accuracy with Segmentation of Colorectal Polyp using NanoNetB, and Conditional Random Field Test-Time Augmentation. Front Robot AI 2024; 11:1387491. [PMID: 39184863 PMCID: PMC11341306 DOI: 10.3389/frobt.2024.1387491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 07/09/2024] [Indexed: 08/27/2024] Open
Abstract
Colonoscopy is a reliable diagnostic method to detect colorectal polyps early on and prevent colorectal cancer. The current examination techniques face a significant challenge of high missed rates, resulting in numerous undetected polyps and irregularities. Automated and real-time segmentation methods can help endoscopists to segment the shape and location of polyps from colonoscopy images in order to facilitate clinician's timely diagnosis and interventions. Different parameters like shapes, small sizes of polyps, and their close resemblance to surrounding tissues make this task challenging. Furthermore, high-definition image quality and reliance on the operator make real-time and accurate endoscopic image segmentation more challenging. Deep learning models utilized for segmenting polyps, designed to capture diverse patterns, are becoming progressively complex. This complexity poses challenges for real-time medical operations. In clinical settings, utilizing automated methods requires the development of accurate, lightweight models with minimal latency, ensuring seamless integration with endoscopic hardware devices. To address these challenges, in this study a novel lightweight and more generalized Enhanced Nanonet model, an improved version of Nanonet using NanonetB for real-time and precise colonoscopy image segmentation, is proposed. The proposed model enhances the performance of Nanonet using Nanonet B on the overall prediction scheme by applying data augmentation, Conditional Random Field (CRF), and Test-Time Augmentation (TTA). Six publicly available datasets are utilized to perform thorough evaluations, assess generalizability, and validate the improvements: Kvasir-SEG, Endotect Challenge 2020, Kvasir-instrument, CVC-ClinicDB, CVC-ColonDB, and CVC-300. Through extensive experimentation, using the Kvasir-SEG dataset, our model achieves a mIoU score of 0.8188 and a Dice coefficient of 0.8060 with only 132,049 parameters and employing minimal computational resources. A thorough cross-dataset evaluation was performed to assess the generalization capability of the proposed Enhanced Nanonet model across various publicly available polyp datasets for potential real-world applications. The result of this study shows that using CRF (Conditional Random Fields) and TTA (Test-Time Augmentation) enhances performance within the same dataset and also across diverse datasets with a model size of just 132,049 parameters. Also, the proposed method indicates improved results in detecting smaller and sessile polyps (flats) that are significant contributors to the high miss rates.
Collapse
Affiliation(s)
- Muhammad Sajjad Hussain
- Department of Computer Science, Sir Syed (CASE) Institute of Technology, Islamabad, Pakistan
| | - Umer Asgher
- Laboratory of Human Factors and Automation in Aviation, Department of Air Transport, Faculty of Transportation Sciences, Czech Technical University in Prague (CTU), Prague, Czechia
- School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Sajid Nisar
- Department of Mechanical and Electrical Systems Engineering, Faculty of Engineering, Kyoto University of Advanced Science, Kyoto, Japan
| | - Vladimir Socha
- Laboratory of Human Factors and Automation in Aviation, Department of Air Transport, Faculty of Transportation Sciences, Czech Technical University in Prague (CTU), Prague, Czechia
- Department of Information and Communication Technology in Medicine, Faculty of Biomedical Engineering, Czech Technical University in Prague, Prague, Czechia
| | - Arslan Shaukat
- Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering (CoEME), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Jinhui Wang
- Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou, China
| | - Tian Feng
- Department of Physical Education, Physical Education College of Zhengzhou University, Zhengzhou, China
| | - Rehan Zafar Paracha
- School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Muhammad Ali Khan
- Department of Mechanical Engineering, College of Electrical and Mechanical Engineering (CoEME), National University of Sciences and Technology (NUST), Islamabad, Pakistan
- School of Mechanical and Manufacturing Engineering (SMME), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| |
Collapse
|
13
|
Chang Q, Ahmad D, Toth J, Bascom R, Higgins WE. ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video. J Imaging 2024; 10:191. [PMID: 39194980 DOI: 10.3390/jimaging10080191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/19/2024] [Accepted: 08/01/2024] [Indexed: 08/29/2024] Open
Abstract
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
Collapse
Affiliation(s)
- Qi Chang
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| | - Danish Ahmad
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Jennifer Toth
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Rebecca Bascom
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - William E Higgins
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
14
|
Li G, Zou C, Jiang G, Jiang D, Yun J, Zhao G, Cheng Y. Multi-View Fusion Network-Based Gesture Recognition Using sEMG Data. IEEE J Biomed Health Inform 2024; 28:4432-4443. [PMID: 37339021 DOI: 10.1109/jbhi.2023.3287979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
sEMG(surface electromyography) signals have been widely used in rehabilitation medicine in the past decades because of their non-invasive, convenient and informative features, especially in human action recognition, which has developed rapidly. However, the research on sparse EMG in multi-view fusion has made less progress compared to high-density EMG signals, and for the problem of how to enrich sparse EMG feature information, a method that can effectively reduce the information loss of feature signals in the channel dimension is needed. In this article, a novel IMSE (Inception-MaxPooling-Squeeze- Excitation) network module is proposed to reduce the loss of feature information during deep learning. Then, multiple feature encoders are constructed to enrich the information of sparse sEMG feature maps based on the multi-core parallel processing method in multi-view fusion networks, while SwT (Swin Transformer) is used as the classification backbone network. By comparing the feature fusion effects of different decision layers of the multi-view fusion network, it is experimentally obtained that the fusion of decision layers can better improve the classification performance of the network. In NinaPro DB1, the proposed network achieves 93.96% average accuracy in gesture action classification with the feature maps obtained in 300ms time window, and the maximum variation range of action recognition rate of individuals is less than 11.2%. The results show that the proposed framework of multi-view learning plays a good role in reducing individuality differences and augmenting channel feature information, which provides a certain reference for non-dense biosignal pattern recognition.
Collapse
|
15
|
Huang X, Wang L, Jiang S, Xu L. DHAFormer: Dual-channel hybrid attention network with transformer for polyp segmentation. PLoS One 2024; 19:e0306596. [PMID: 38985710 PMCID: PMC11236112 DOI: 10.1371/journal.pone.0306596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 06/17/2024] [Indexed: 07/12/2024] Open
Abstract
The accurate early diagnosis of colorectal cancer significantly relies on the precise segmentation of polyps in medical images. Current convolution-based and transformer-based segmentation methods show promise but still struggle with the varied sizes and shapes of polyps and the often low contrast between polyps and their background. This research introduces an innovative approach to confronting the aforementioned challenges by proposing a Dual-Channel Hybrid Attention Network with Transformer (DHAFormer). Our proposed framework features a multi-scale channel fusion module, which excels at recognizing polyps across a spectrum of sizes and shapes. Additionally, the framework's dual-channel hybrid attention mechanism is innovatively conceived to reduce background interference and improve the foreground representation of polyp features by integrating local and global information. The DHAFormer demonstrates significant improvements in the task of polyp segmentation compared to currently established methodologies.
Collapse
Affiliation(s)
- Xuejie Huang
- School of Computer Science and Technology, Xinjiang University, Urumqi, China
| | - Liejun Wang
- School of Computer Science and Technology, Xinjiang University, Urumqi, China
| | - Shaochen Jiang
- School of Computer Science and Technology, Xinjiang University, Urumqi, China
| | - Lianghui Xu
- School of Computer Science and Technology, Xinjiang University, Urumqi, China
| |
Collapse
|
16
|
Yang L, Gu Y, Bian G, Liu Y. MSDE-Net: A Multi-Scale Dual-Encoding Network for Surgical Instrument Segmentation. IEEE J Biomed Health Inform 2024; 28:4072-4083. [PMID: 38117619 DOI: 10.1109/jbhi.2023.3344716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Minimally invasive surgery, which relies on surgical robots and microscopes, demands precise image segmentation to ensure safe and efficient procedures. Nevertheless, achieving accurate segmentation of surgical instruments remains challenging due to the complexity of the surgical environment. To tackle this issue, this paper introduces a novel multiscale dual-encoding segmentation network, termed MSDE-Net, designed to automatically and precisely segment surgical instruments. The proposed MSDE-Net leverages a dual-branch encoder comprising a convolutional neural network (CNN) branch and a transformer branch to effectively extract both local and global features. Moreover, an attention fusion block (AFB) is introduced to ensure effective information complementarity between the dual-branch encoding paths. Additionally, a multilayer context fusion block (MCF) is proposed to enhance the network's capacity to simultaneously extract global and local features. Finally, to extend the scope of global feature information under larger receptive fields, a multi-receptive field fusion (MRF) block is incorporated. Through comprehensive experimental evaluations on two publicly available datasets for surgical instrument segmentation, the proposed MSDE-Net demonstrates superior performance compared to existing methods.
Collapse
|
17
|
Wang Z, Liu Z, Yu J, Gao Y, Liu M. Multi-scale nested UNet with transformer for colorectal polyp segmentation. J Appl Clin Med Phys 2024; 25:e14351. [PMID: 38551396 PMCID: PMC11163511 DOI: 10.1002/acm2.14351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/13/2024] [Accepted: 02/19/2024] [Indexed: 06/11/2024] Open
Abstract
BACKGROUND Polyp detection and localization are essential tasks for colonoscopy. U-shape network based convolutional neural networks have achieved remarkable segmentation performance for biomedical images, but lack of long-range dependencies modeling limits their receptive fields. PURPOSE Our goal was to develop and test a novel architecture for polyp segmentation, which takes advantage of learning local information with long-range dependencies modeling. METHODS A novel architecture combining with multi-scale nested UNet structure integrated transformer for polyp segmentation was developed. The proposed network takes advantage of both CNN and transformer to extract distinct feature information. The transformer layer is embedded between the encoder and decoder of a U-shape net to learn explicit global context and long-range semantic information. To address the challenging of variant polyp sizes, a MSFF unit was proposed to fuse features with multiple resolution. RESULTS Four public datasets and one in-house dataset were used to train and test the model performance. Ablation study was also conducted to verify each component of the model. For dataset Kvasir-SEG and CVC-ClinicDB, the proposed model achieved mean dice score of 0.942 and 0.950 respectively, which were more accurate than the other methods. To show the generalization of different methods, we processed two cross dataset validations, the proposed model achieved the highest mean dice score. The results demonstrate that the proposed network has powerful learning and generalization capability, significantly improving segmentation accuracy and outperforming state-of-the-art methods. CONCLUSIONS The proposed model produced more accurate polyp segmentation than current methods on four different public and one in-house datasets. Its capability of polyps segmentation in different sizes shows the potential clinical application.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Zhen Liu
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Jianfeng Yu
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Yingxin Gao
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard ExplorationChangshaChina
| |
Collapse
|
18
|
Zhan F, Wang W, Chen Q, Guo Y, He L, Wang L. Three-Direction Fusion for Accurate Volumetric Liver and Tumor Segmentation. IEEE J Biomed Health Inform 2024; 28:2175-2186. [PMID: 38109246 DOI: 10.1109/jbhi.2023.3344392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Biomedical image segmentation of organs, tissues and lesions has gained increasing attention in clinical treatment planning and navigation, which involves the exploration of two-dimensional (2D) and three-dimensional (3D) contexts in the biomedical image. Compared to 2D methods, 3D methods pay more attention to inter-slice correlations, which offer additional spatial information for image segmentation. An organ or tumor has a 3D structure that can be observed from three directions. Previous studies focus only on the vertical axis, limiting the understanding of the relationship between a tumor and its surrounding tissues. Important information can also be obtained from sagittal and coronal axes. Therefore, spatial information of organs and tumors can be obtained from three directions, i.e. the sagittal, coronal and vertical axes, to understand better the invasion depth of tumor and its relationship with the surrounding tissues. Moreover, the edges of organs and tumors in biomedical image may be blurred. To address these problems, we propose a three-direction fusion volumetric segmentation (TFVS) model for segmenting 3D biomedical images from three perspectives in sagittal, coronal and transverse planes, respectively. We use the dataset of the liver task provided by the Medical Segmentation Decathlon challenge to train our model. The TFVS method demonstrates a competitive performance on the 3D-IRCADB dataset. In addition, the t-test and Wilcoxon signed-rank test are also performed to show the statistical significance of the improvement by the proposed method as compared with the baseline methods. The proposed method is expected to be beneficial in guiding and facilitating clinical diagnosis and treatment.
Collapse
|
19
|
Li G, Jin D, Zheng Y, Cui J, Gai W, Qi M. A generic plug & play diffusion-based denosing module for medical image segmentation. Neural Netw 2024; 172:106096. [PMID: 38194885 DOI: 10.1016/j.neunet.2024.106096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 12/11/2023] [Accepted: 12/31/2023] [Indexed: 01/11/2024]
Abstract
Medical image segmentation faces challenges because of the small sample size of the dataset and the fact that images often have noise and artifacts. In recent years, diffusion models have proven very effective in image generation and have been used widely in computer vision. This paper presents a new feature map denoising module (FMD) based on the diffusion model for feature refinement, which is plug-and-play, allowing flexible integration into popular used segmentation networks for seamless end-to-end training. We evaluate the performance of the FMD module on four models, UNet, UNeXt, TransUNet, and IB-TransUNet, by conducting experiments on four datasets. The experimental data analysis shows that adding the FMD module significantly positively impacts the model performance. Furthermore, especially for small lesion areas and minor organs, adding the FMD module allows users to obtain more accurate segmentation results than the original model.
Collapse
Affiliation(s)
- Guangju Li
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Dehu Jin
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jia Cui
- School of Design, South China University of Technology, Guangzhou, China
| | - Wei Gai
- School of Software, Shandong University, Jinan, Shandong, China
| | - Meng Qi
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
| |
Collapse
|
20
|
Shu X, Wang J, Zhang A, Shi J, Wu XJ. CSCA U-Net: A channel and space compound attention CNN for medical image segmentation. Artif Intell Med 2024; 150:102800. [PMID: 38553146 DOI: 10.1016/j.artmed.2024.102800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/10/2023] [Accepted: 02/03/2024] [Indexed: 04/02/2024]
Abstract
Image segmentation is one of the vital steps in medical image analysis. A large number of methods based on convolutional neural networks have emerged, which can extract abstract features from multiple-modality medical images, learn valuable information that is difficult to recognize by humans, and obtain more reliable results than traditional image segmentation approaches. U-Net, due to its simple structure and excellent performance, is widely used in medical image segmentation. In this paper, to further improve the performance of U-Net, we propose a channel and space compound attention (CSCA) convolutional neural network, CSCA U-Net in abbreviation, which increases the network depth and employs a double squeeze-and-excitation (DSE) block in the bottleneck layer to enhance feature extraction and obtain more high-level semantic features. Moreover, the characteristics of the proposed method are three-fold: (1) channel and space compound attention (CSCA) block, (2) cross-layer feature fusion (CLFF), and (3) deep supervision (DS). Extensive experiments on several available medical image datasets, including Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-T, 2018 Data Science Bowl (2018 DSB), ISIC 2018, and JSUAH-Cerebellum, show that CSCA U-Net achieves competitive results and significantly improves generalization performance. The codes and trained models are available at https://github.com/xiaolanshu/CSCA-U-Net.
Collapse
Affiliation(s)
- Xin Shu
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China; Development and Related Diseases of Women and Children Key Laboratory of Sichuan Province, Chengdu, 610041, Sichuan, China.
| | - Jiashu Wang
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
| | - Aoping Zhang
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
| | - Jinlong Shi
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
| | - Xiao-Jun Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, Jiangsu, China
| |
Collapse
|
21
|
Zheng J, Yan Y, Zhao L, Pan X. CGMA-Net: Cross-Level Guidance and Multi-Scale Aggregation Network for Polyp Segmentation. IEEE J Biomed Health Inform 2024; 28:1424-1435. [PMID: 38127598 DOI: 10.1109/jbhi.2023.3345479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Colonoscopy is considered the best prevention and control method for colorectal cancer, which suffers extremely high rates of mortality and morbidity. Automated polyp segmentation of colonoscopy images is of great importance since manual polyp segmentation requires a considerable time of experienced specialists. However, due to the high similarity between polyps and mucosa, accompanied by the complex morphological features of colonic polyps, the performance of automatic polyp segmentation is still unsatisfactory. Accordingly, we propose a network, namely Cross-level Guidance and Multi-scale Aggregation (CGMA-Net), to earn a performance promotion. Specifically, three modules, including Cross-level Feature Guidance (CFG), Multi-scale Aggregation Decoder (MAD), and Details Refinement (DR), are individually proposed and synergistically assembled. With CFG, we generate spatial attention maps from the higher-level features and then multiply them with the lower-level features, highlighting the region of interest and suppressing the background information. In MAD, we parallelly use multiple dilated convolutions of different sizes to capture long-range dependencies between features. For DR, an asynchronous convolution is used along with the attention mechanism to enhance both the local details and the global information. The proposed CGMA-Net is evaluated on two benchmark datasets, i.e., CVC-ClinicDB and Kvasir-SEG, whose results demonstrate that our method not only presents state-of-the-art performance but also holds relatively fewer parameters. Concretely, we achieve the Dice Similarity Coefficient (DSC) of 91.85% and 95.73% on Kvasir-SEG and CVC-ClinicDB, respectively. The assessment of model generalization is also conducted, resulting in DSC scores of 86.25% and 86.97% on the two datasets respectively.
Collapse
|
22
|
Ao Y, Shi W, Ji B, Miao Y, He W, Jiang Z. MS-TCNet: An effective Transformer-CNN combined network using multi-scale feature learning for 3D medical image segmentation. Comput Biol Med 2024; 170:108057. [PMID: 38301516 DOI: 10.1016/j.compbiomed.2024.108057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/31/2023] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
Medical image segmentation is a fundamental research problem in the field of medical image processing. Recently, the Transformer have achieved highly competitive performance in computer vision. Therefore, many methods combining Transformer with convolutional neural networks (CNNs) have emerged for segmenting medical images. However, these methods cannot effectively capture the multi-scale features in medical images, even though texture and contextual information embedded in the multi-scale features are extremely beneficial for segmentation. To alleviate this limitation, we propose a novel Transformer-CNN combined network using multi-scale feature learning for three-dimensional (3D) medical image segmentation, which is called MS-TCNet. The proposed model utilizes a shunted Transformer and CNN to construct an encoder and pyramid decoder, allowing six different scale levels of feature learning. It captures multi-scale features with refinement at each scale level. Additionally, we propose a novel lightweight multi-scale feature fusion (MSFF) module that can fully fuse the different-scale semantic features generated by the pyramid decoder for each segmentation class, resulting in a more accurate segmentation output. We conducted experiments on three widely used 3D medical image segmentation datasets. The experimental results indicated that our method outperformed state-of-the-art medical image segmentation methods, suggesting its effectiveness, robustness, and superiority. Meanwhile, our model has a smaller number of parameters and lower computational complexity than conventional 3D segmentation networks. The results confirmed that the model is capable of effective multi-scale feature learning and that the learned multi-scale features are useful for improving segmentation performance. We open-sourced our code, which can be found at https://github.com/AustinYuAo/MS-TCNet.
Collapse
Affiliation(s)
- Yu Ao
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China
| | - Weili Shi
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
| | - Bai Ji
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, 130061, China
| | - Yu Miao
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
| | - Wei He
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
| | - Zhengang Jiang
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China.
| |
Collapse
|
23
|
Gangrade S, Sharma PC, Sharma AK, Singh YP. Modified DeeplabV3+ with multi-level context attention mechanism for colonoscopy polyp segmentation. Comput Biol Med 2024; 170:108096. [PMID: 38320340 DOI: 10.1016/j.compbiomed.2024.108096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/31/2024] [Accepted: 02/01/2024] [Indexed: 02/08/2024]
Abstract
The development of automated methods for analyzing medical images of colon cancer is one of the main research fields. A colonoscopy is a medical treatment that enables a doctor to look for any abnormalities like polyps, cancer, or inflammatory tissue inside the colon and rectum. It falls under the category of gastrointestinal illnesses, and it claims the lives of almost two million people worldwide. Video endoscopy is an advanced medical imaging approach to diagnose gastrointestinal disorders such as inflammatory bowel, ulcerative colitis, esophagitis, and polyps. Medical video endoscopy generates several images, which must be reviewed by specialists. The difficulty of manual diagnosis has sparked research towards computer-aided techniques that can quickly and reliably diagnose all generated images. The proposed methodology establishes a framework for diagnosing coloscopy diseases. Endoscopists can lower the risk of polyps turning into cancer during colonoscopies by using more accurate computer-assisted polyp detection and segmentation. With the aim of creating a model that can automatically distinguish polyps from images, we presented a modified DeeplabV3+ model in this study to carry out segmentation tasks successfully and efficiently. The framework's encoder uses a pre-trained dilated convolutional residual network for optimal feature map resolution. The robustness of the modified model is tested against state-of-the-art segmentation approaches. In this work, we employed two publicly available datasets, CVC-Clinic DB and Kvasir-SEG, and obtained Dice similarity coefficients of 0.97 and 0.95, respectively. The results show that the improved DeeplabV3+ model improves segmentation efficiency and effectiveness in both software and hardware with only minor changes.
Collapse
Affiliation(s)
- Shweta Gangrade
- School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India; School of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India
| | - Prakash Chandra Sharma
- School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India; School of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India
| | - Akhilesh Kumar Sharma
- School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India; School of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India
| | - Yadvendra Pratap Singh
- School of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India; School of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India.
| |
Collapse
|
24
|
Zhang Y, Yu M, Tong C, Zhao Y, Han J. CA-UNet Segmentation Makes a Good Ischemic Stroke Risk Prediction. Interdiscip Sci 2024; 16:58-72. [PMID: 37626263 DOI: 10.1007/s12539-023-00583-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/27/2023]
Abstract
Stroke is still the World's second major factor of death, as well as the third major factor of death and disability. Ischemic stroke is a type of stroke, in which early detection and treatment are the keys to preventing ischemic strokes. However, due to the limitation of privacy protection and labeling difficulties, there are only a few studies on the intelligent automatic diagnosis of stroke or ischemic stroke, and the results are unsatisfactory. Therefore, we collect some data and propose a 3D carotid Computed Tomography Angiography (CTA) image segmentation model called CA-UNet for fully automated extraction of carotid arteries. We explore the number of down-sampling times applicable to carotid segmentation and design a multi-scale loss function to resolve the loss of detailed features during the process of down-sampling. Moreover, based on CA-Unet, we propose an ischemic stroke risk prediction model to predict the risk in patients using their 3D CTA images, electronic medical records, and medical history. We have validated the efficacy of our segmentation model and prediction model through comparison tests. Our method can provide reliable diagnoses and results that benefit patients and medical professionals.
Collapse
Affiliation(s)
- Yuqi Zhang
- School of Computer Science and Engineering, Beihang University, Beijing, China
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Mengbo Yu
- School of Computer Science and Engineering, Beihang University, Beijing, China
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Chao Tong
- School of Computer Science and Engineering, Beihang University, Beijing, China.
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China.
| | - Yanqing Zhao
- Department of Interventional Radiology and Vascular Surgery, Peking University Third Hospital, Beijing, China
| | - Jintao Han
- Department of Interventional Radiology and Vascular Surgery, Peking University Third Hospital, Beijing, China
| |
Collapse
|
25
|
Wang Z, Yu L, Tian S, Huo X. CRMEFNet: A coupled refinement, multiscale exploration and fusion network for medical image segmentation. Comput Biol Med 2024; 171:108202. [PMID: 38402839 DOI: 10.1016/j.compbiomed.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/22/2023] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Accurate segmentation of target areas in medical images, such as lesions, is essential for disease diagnosis and clinical analysis. In recent years, deep learning methods have been intensively researched and have generated significant progress in medical image segmentation tasks. However, most of the existing methods have limitations in modeling multilevel feature representations and identification of complex textured pixels at contrasting boundaries. This paper proposes a novel coupled refinement and multiscale exploration and fusion network (CRMEFNet) for medical image segmentation, which explores in the optimization and fusion of multiscale features to address the abovementioned limitations. The CRMEFNet consists of three main innovations: a coupled refinement module (CRM), a multiscale exploration and fusion module (MEFM), and a cascaded progressive decoder (CPD). The CRM decouples features into low-frequency body features and high-frequency edge features, and performs targeted optimization of both to enhance intraclass uniformity and interclass differentiation of features. The MEFM performs a two-stage exploration and fusion of multiscale features using our proposed multiscale aggregation attention mechanism, which explores the differentiated information within the cross-level features, and enhances the contextual connections between the features, to achieves adaptive feature fusion. Compared to existing complex decoders, the CPD decoder (consisting of the CRM and MEFM) can perform fine-grained pixel recognition while retaining complete semantic location information. It also has a simple design and excellent performance. The experimental results from five medical image segmentation tasks, ten datasets and twelve comparison models demonstrate the state-of-the-art performance, interpretability, flexibility and versatility of our CRMEFNet.
Collapse
Affiliation(s)
- Zhi Wang
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China.
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Xiangzuo Huo
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China
| |
Collapse
|
26
|
Song E, Zhan B, Liu H. Combining external-latent attention for medical image segmentation. Neural Netw 2024; 170:468-477. [PMID: 38039684 DOI: 10.1016/j.neunet.2023.10.046] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/19/2023] [Accepted: 10/29/2023] [Indexed: 12/03/2023]
Abstract
The attention mechanism comes as a new entry point for improving the performance of medical image segmentation. How to reasonably assign weights is a key element of the attention mechanism, and the current popular schemes include the global squeezing and the non-local information interactions using self-attention (SA) operation. However, these approaches over-focus on external features and lack the exploitation of latent features. The global squeezing approach crudely represents the richness of contextual information by the global mean or maximum value, while non-local information interactions focus on the similarity of external features between different regions. Both ignore the fact that the contextual information is presented more in terms of the latent features like the frequency change within the data. To tackle above problems and make proper use of attention mechanisms in medical image segmentation, we propose an external-latent attention collaborative guided image segmentation network, named TransGuider. This network consists of three key components: 1) a latent attention module that uses an improved entropy quantification method to accurately explore and locate the distribution of latent contextual information. 2) an external self-attention module using sparse representation, which can preserve external global contextual information while reducing computational overhead by selecting representative feature description map for SA operation. 3) a multi-attention collaborative module to guide the network to continuously focus on the region of interest, refining the segmentation mask. Our experimental results on several benchmark medical image segmentation datasets show that TransGuider outperforms the state-of-the-art methods, and extensive ablation experiments demonstrate the effectiveness of the proposed components. Our code will be available at https://github.com/chasingone/TransGuider.
Collapse
Affiliation(s)
- Enmin Song
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China.
| | - Bangcheng Zhan
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Hong Liu
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
27
|
Li G, Jin D, Yu Q, Zheng Y, Qi M. MultiIB-TransUNet: Transformer with multiple information bottleneck blocks for CT and ultrasound image segmentation. Med Phys 2024; 51:1178-1189. [PMID: 37528654 DOI: 10.1002/mp.16662] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/07/2023] [Accepted: 07/19/2023] [Indexed: 08/03/2023] Open
Abstract
BACKGROUND Accurate medical image segmentation is crucial for disease diagnosis and surgical planning. Transformer networks offer a promising alternative for medical image segmentation as they can learn global features through self-attention mechanisms. To further enhance performance, many researchers have incorporated more Transformer layers into their models. However, this approach often results in the model parameters increasing significantly, causing a potential rise in complexity. Moreover, the datasets of medical image segmentation usually have fewer samples, which leads to the risk of overfitting of the model. PURPOSE This paper aims to design a medical image segmentation model that has fewer parameters and can effectively alleviate overfitting. METHODS We design a MultiIB-Transformer structure consisting of a single Transformer layer and multiple information bottleneck (IB) blocks. The Transformer layer is used to capture long-distance spatial relationships to extract global feature information. The IB block is used to compress noise and improve model robustness. The advantage of this structure is that it only needs one Transformer layer to achieve the state-of-the-art (SOTA) performance, significantly reducing the number of model parameters. In addition, we designed a new skip connection structure. It only needs two 1× 1 convolutions, the high-resolution feature map can effectively have both semantic and spatial information, thereby alleviating the semantic gap. RESULTS The proposed model is on the Breast UltraSound Images (BUSI) dataset, and the IoU and F1 evaluation indicators are 67.75 and 87.78. On the Synapse multi-organ segmentation dataset, the Param, Hausdorff Distance (HD) and Dice Similarity Cofficient (DSC) evaluation indicators are 22.30, 20.04 and 81.83. CONCLUSIONS Our proposed model (MultiIB-TransUNet) achieved superior results with fewer parameters compared to other models.
Collapse
Affiliation(s)
- Guangju Li
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Dehu Jin
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qi Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Meng Qi
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
28
|
Seo H, Lee S, Yun S, Leem S, So S, Han DH. RenseNet: A Deep Learning Network Incorporating Residual and Dense Blocks with Edge Conservative Module to Improve Small-Lesion Classification and Model Interpretation. Cancers (Basel) 2024; 16:570. [PMID: 38339320 PMCID: PMC10854971 DOI: 10.3390/cancers16030570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/16/2024] [Accepted: 01/27/2024] [Indexed: 02/12/2024] Open
Abstract
Deep learning has become an essential tool in medical image analysis owing to its remarkable performance. Target classification and model interpretability are key applications of deep learning in medical image analysis, and hence many deep learning-based algorithms have emerged. Many existing deep learning-based algorithms include pooling operations, which are a type of subsampling used to enlarge the receptive field. However, pooling operations degrade the image details in terms of signal processing theory, which is significantly sensitive to small objects in an image. Therefore, in this study, we designed a Rense block and edge conservative module to effectively manipulate previous feature information in the feed-forward learning process. Specifically, a Rense block, an optimal design that incorporates skip connections of residual and dense blocks, was demonstrated through mathematical analysis. Furthermore, we avoid blurring of the features in the pooling operation through a compensation path in the edge conservative module. Two independent CT datasets of kidney stones and lung tumors, in which small lesions are often included in the images, were used to verify the proposed RenseNet. The results of the classification and explanation heatmaps show that the proposed RenseNet provides the best inference and interpretation compared to current state-of-the-art methods. The proposed RenseNet can significantly contribute to efficient diagnosis and treatment because it is effective for small lesions that might be misclassified or misinterpreted.
Collapse
Affiliation(s)
- Hyunseok Seo
- Bionics Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; (S.L.); (S.Y.); (S.L.); (S.S.)
| | - Seokjun Lee
- Bionics Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; (S.L.); (S.Y.); (S.L.); (S.S.)
| | - Sojin Yun
- Bionics Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; (S.L.); (S.Y.); (S.L.); (S.S.)
| | - Saebom Leem
- Bionics Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; (S.L.); (S.Y.); (S.L.); (S.S.)
| | - Seohee So
- Bionics Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; (S.L.); (S.Y.); (S.L.); (S.S.)
| | - Deok Hyun Han
- Department of Urology, Samsung Medical Center (SMC), Seoul 06351, Republic of Korea;
| |
Collapse
|
29
|
Zhang W, Lu F, Su H, Hu Y. Dual-branch multi-information aggregation network with transformer and convolution for polyp segmentation. Comput Biol Med 2024; 168:107760. [PMID: 38064849 DOI: 10.1016/j.compbiomed.2023.107760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/21/2023] [Accepted: 11/21/2023] [Indexed: 01/10/2024]
Abstract
Computer-Aided Diagnosis (CAD) for polyp detection offers one of the most notable showcases. By using deep learning technologies, the accuracy of polyp segmentation is surpassing human experts. In such CAD process, a critical step is concerned with segmenting colorectal polyps from colonoscopy images. Despite remarkable successes attained by recent deep learning related works, much improvement is still anticipated to tackle challenging cases. For instance, the effects of motion blur and light reflection can introduce significant noise into the image. The same type of polyps has a diversity of size, color and texture. To address such challenges, this paper proposes a novel dual-branch multi-information aggregation network (DBMIA-Net) for polyp segmentation, which is able to accurately and reliably segment a variety of colorectal polyps with efficiency. Specifically, a dual-branch encoder with transformer and convolutional neural networks (CNN) is employed to extract polyp features, and two multi-information aggregation modules are applied in the decoder to fuse multi-scale features adaptively. Two multi-information aggregation modules include global information aggregation (GIA) module and edge information aggregation (EIA) module. In addition, to enhance the representation learning capability of the potential channel feature association, this paper also proposes a novel adaptive channel graph convolution (ACGC). To validate the effectiveness and advantages of the proposed network, we compare it with several state-of-the-art (SOTA) methods on five public datasets. Experimental results consistently demonstrate that the proposed DBMIA-Net obtains significantly superior segmentation performance across six popularly used evaluation matrices. Especially, we achieve 94.12% mean Dice on CVC-ClinicDB dataset which is 4.22% improvement compared to the previous state-of-the-art method PraNet. Compared with SOTA algorithms, DBMIA-Net has a better fitting ability and stronger generalization ability.
Collapse
Affiliation(s)
- Wenyu Zhang
- School of Information Science and Engineering, Lanzhou University, China
| | - Fuxiang Lu
- School of Information Science and Engineering, Lanzhou University, China.
| | - Hongjing Su
- School of Information Science and Engineering, Lanzhou University, China
| | - Yawen Hu
- School of Information Science and Engineering, Lanzhou University, China
| |
Collapse
|
30
|
Jain S, Atale R, Gupta A, Mishra U, Seal A, Ojha A, Jaworek-Korjakowska J, Krejcar O. CoInNet: A Convolution-Involution Network With a Novel Statistical Attention for Automatic Polyp Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3987-4000. [PMID: 37768798 DOI: 10.1109/tmi.2023.3320151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Polyps are very common abnormalities in human gastrointestinal regions. Their early diagnosis may help in reducing the risk of colorectal cancer. Vision-based computer-aided diagnostic systems automatically identify polyp regions to assist surgeons in their removal. Due to their varying shape, color, size, texture, and unclear boundaries, polyp segmentation in images is a challenging problem. Existing deep learning segmentation models mostly rely on convolutional neural networks that have certain limitations in learning the diversity in visual patterns at different spatial locations. Further, they fail to capture inter-feature dependencies. Vision transformer models have also been deployed for polyp segmentation due to their powerful global feature extraction capabilities. But they too are supplemented by convolution layers for learning contextual local information. In the present paper, a polyp segmentation model CoInNet is proposed with a novel feature extraction mechanism that leverages the strengths of convolution and involution operations and learns to highlight polyp regions in images by considering the relationship between different feature maps through a statistical feature attention unit. To further aid the network in learning polyp boundaries, an anomaly boundary approximation module is introduced that uses recursively fed feature fusion to refine segmentation results. It is indeed remarkable that even tiny-sized polyps with only 0.01% of an image area can be precisely segmented by CoInNet. It is crucial for clinical applications, as small polyps can be easily overlooked even in the manual examination due to the voluminous size of wireless capsule endoscopy videos. CoInNet outperforms thirteen state-of-the-art methods on five benchmark polyp segmentation datasets.
Collapse
|
31
|
Xia Y, Yun H, Liu Y, Luan J, Li M. MGCBFormer: The multiscale grid-prior and class-inter boundary-aware transformer for polyp segmentation. Comput Biol Med 2023; 167:107600. [PMID: 37931522 DOI: 10.1016/j.compbiomed.2023.107600] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/23/2023] [Accepted: 10/17/2023] [Indexed: 11/08/2023]
Abstract
The polyp segmentation technology based on deep learning could better and faster help doctors diagnose the polyps in the intestinal wall, which are predecessors of colorectal cancer. Mainstream polyp segmentation methods are implemented under full supervision. For these methods, expensive and precious pixel-level labels couldn't be utilized sufficiently, and it's a deviation direction to strengthen the feature expression only using the more powerful backbone network instead of fully mining existing polyp target information. To address the situation, the multiscale grid-prior and class-inter boundary-aware transformer (MGCBFormer) is proposed. MGCBFormer is composed of highly interpretable components: 1) the multiscale grid-prior and nested channel attention block (MGNAB) for seeking the optimal feature expression, 2) the class-inter boundary-aware block (CBB) for focusing on the foreground boundary and fully inhibiting the background boundary by combining the boundary preprocessing strategy, 3) reasonable deep supervision branches and noise filters called the global double-axis association coupler (GDAC). Numerous persuasive experiments are conducted on five public polyp datasets (Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, CVC-300, and ETIS-LaribPolypDB) comparing with twelve methods of polyp segmentation, and demonstrate the superior predictive performance and generalization ability of MGCBFormer over the state-of-the-art polyp segmentation methods.
Collapse
Affiliation(s)
- Yang Xia
- School of Electronic Information Engineering, Changchun University, Changchun, 130022, China
| | - Haijiao Yun
- School of Electronic Information Engineering, Changchun University, Changchun, 130022, China.
| | - Yanjun Liu
- School of Electronic Information Engineering, Changchun University, Changchun, 130022, China
| | - Jinyang Luan
- School of Electronic Information Engineering, Changchun University, Changchun, 130022, China
| | - Mingjing Li
- School of Electronic Information Engineering, Changchun University, Changchun, 130022, China
| |
Collapse
|
32
|
Liu W, Li Z, Li C, Gao H. ECTransNet: An Automatic Polyp Segmentation Network Based on Multi-scale Edge Complementary. J Digit Imaging 2023; 36:2427-2440. [PMID: 37491542 PMCID: PMC10584793 DOI: 10.1007/s10278-023-00885-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 07/13/2023] [Accepted: 07/14/2023] [Indexed: 07/27/2023] Open
Abstract
Colonoscopy is acknowledged as the foremost technique for detecting polyps and facilitating early screening and prevention of colorectal cancer. In clinical settings, the segmentation of polyps from colonoscopy images holds paramount importance as it furnishes critical diagnostic and surgical information. Nevertheless, the precise segmentation of colon polyp images is still a challenging task owing to the varied sizes and morphological features of colon polyps and the indistinct boundary between polyps and mucosa. In this study, we present a novel network architecture named ECTransNet to address the challenges in polyp segmentation. Specifically, we propose an edge complementary module that effectively fuses the differences between features with multiple resolutions. This enables the network to exchange features across different levels and results in a substantial improvement in the edge fineness of the polyp segmentation. Additionally, we utilize a feature aggregation decoder that leverages residual blocks to adaptively fuse high-order to low-order features. This strategy restores local edges in low-order features while preserving the spatial information of targets in high-order features, ultimately enhancing the segmentation accuracy. According to extensive experiments conducted on ECTransNet, the results demonstrate that this method outperforms most state-of-the-art approaches on five publicly available datasets. Specifically, our method achieved mDice scores of 0.901 and 0.923 on the Kvasir-SEG and CVC-ClinicDB datasets, respectively. On the Endoscene, CVC-ColonDB, and ETIS datasets, we obtained mDice scores of 0.907, 0.766, and 0.728, respectively.
Collapse
Affiliation(s)
- Weikang Liu
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Zhigang Li
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| | - Chunyang Li
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Hongyan Gao
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| |
Collapse
|
33
|
Wang Z, Gao F, Yu L, Tian S. UACENet: Uncertain area attention and cross‐image context extraction network for polyp segmentation. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY 2023; 33:1973-1987. [DOI: 10.1002/ima.22906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 04/23/2023] [Indexed: 12/09/2024]
Abstract
AbstractAccurately segmenting polyp from colonoscopy images is essential for early screening and diagnosis of colorectal cancer. In recent years, with the proposed encoder‐decoder architecture, many advanced methods have been applied to this task and have achieved significant improvements. However, accurate segmentation of polyps has always been a challenging task due to the irregular shape and size of polyps, the low contrast between the polyp and the background in some images, and the influence of the environment such as illumination and mucus. In order to tackle these challenges, we propose a novel uncertain area attention and cross‐image context extraction network for accurate polyp segmentation, which consists of the uncertain area attention module (UAAM), the cross‐image context extraction module (CCEM), and the adaptive fusion module (AFM). UAAM is guided by the output prediction of the adjacent decoding layer, and focuses on the difficult region of the boundary without neglecting the attention to the background and foreground so that more edge details and uncertain information can be captured. CCEM innovatively captures multi‐scale global context within an image and implicit contextual information between multiple images, fusing them to enhance the extraction of global location information. AFM fuses the local detail information extracted by UAAM and the global location information extracted by CCEM with the decoding layer feature for multiple fusion and adaptive attention to enhance feature representation. Our method is extensively experimented on four public datasets and generally achieves state‐of‐the‐art performance compared to other advanced methods.
Collapse
Affiliation(s)
- Zhi Wang
- College of Software Engineering Xinjiang University Urumqi China
| | - Feng Gao
- Department of Gastroenterology People's Hospital of Xinjiang Uygur Autonomous Region Urumqi China
| | - Long Yu
- College of Information Science and Engineering Xinjiang University Urumqi China
| | - Shengwei Tian
- College of Software Engineering Xinjiang University Urumqi China
| |
Collapse
|
34
|
Kuang H, Wang Y, Liang Y, Liu J, Wang J. BEA-Net: Body and Edge Aware Network With Multi-Scale Short-Term Concatenation for Medical Image Segmentation. IEEE J Biomed Health Inform 2023; 27:4828-4839. [PMID: 37578920 DOI: 10.1109/jbhi.2023.3304662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
Medical image segmentation is indispensable for diagnosis and prognosis of many diseases. To improve the segmentation performance, this study proposes a new 2D body and edge aware network with multi-scale short-term concatenation for medical image segmentation. Multi-scale short-term concatenation modules which concatenate successive convolution layers with different receptive fields, are proposed for capturing multi-scale representations with fewer parameters. Body generation modules with feature adjustment based on weight map computing via enlarging the receptive fields, and edge generation modules with multi-scale convolutions using Sobel kernels for edge detection, are proposed to separately learn body and edge features from convolutional features in decoders, making the proposed network be body and edge aware. Based on the body and edge modules, we design parallel body and edge decoders whose outputs are fused to achieve the final segmentation. Besides, deep supervision from the body and edge decoders is applied to ensure the effectiveness of the generated body and edge features and further improve the final segmentation. The proposed method is trained and evaluated on six public medical image segmentation datasets to show its effectiveness and generality. Experimental results show that the proposed method achieves better average Dice similarity coefficient and 95% Hausdorff distance than several benchmarks on all used datasets. Ablation studies validate the effectiveness of the proposed multi-scale representation learning modules, body and edge generation modules and deep supervision.
Collapse
|
35
|
Xu Z, Zhang X, Zhang H, Liu Y, Zhan Y, Lukasiewicz T. EFPN: Effective medical image detection using feature pyramid fusion enhancement. Comput Biol Med 2023; 163:107149. [PMID: 37348265 DOI: 10.1016/j.compbiomed.2023.107149] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 05/15/2023] [Accepted: 06/07/2023] [Indexed: 06/24/2023]
Abstract
Feature pyramid networks (FPNs) are widely used in the existing deep detection models to help them utilize multi-scale features. However, there exist two multi-scale feature fusion problems for the FPN-based deep detection models in medical image detection tasks: insufficient multi-scale feature fusion and the same importance for multi-scale features. Therefore, in this work, we propose a new enhanced backbone model, EFPNs, to overcome these problems and help the existing FPN-based detection models to achieve much better medical image detection performances. We first introduce an additional top-down pyramid to help the detection networks fuse deeper multi-scale information; then, a scale enhancement module is developed to use different sizes of kernels to generate more diverse multi-scale features. Finally, we propose a feature fusion attention module to estimate and assign different importance weights to features with different depths and scales. Extensive experiments are conducted on two public lesion detection datasets for different medical image modalities (X-ray and MRI). On the mAP and mR evaluation metrics, EFPN-based Faster R-CNNs improved 1.55% and 4.3% on the PenD (X-ray) dataset, and 2.74% and 3.1% on the BraTs (MRI) dataset, respectively. EFPN-based Faster R-CNNs achieve much better performances than the state-of-the-art baselines in medical image detection tasks. The proposed three improvements are all essential and effective for EFPNs to achieve superior performances; and besides Faster R-CNNs, EFPNs can be easily applied to other deep models to significantly enhance their performances in medical image detection tasks.
Collapse
Affiliation(s)
- Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China.
| | - Xudong Zhang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China
| | - Hexiang Zhang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China.
| | - Yunxin Liu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China
| | - Yuefu Zhan
- Department of Radiology, Hainan Women and Children's Medical Center, Haikou, China.
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, TU Wien, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
36
|
Xu B, Fan Y, Liu J, Zhang G, Wang Z, Li Z, Guo W, Tang X. CHSNet: Automatic lesion segmentation network guided by CT image features for acute cerebral hemorrhage. Comput Biol Med 2023; 164:107334. [PMID: 37573720 DOI: 10.1016/j.compbiomed.2023.107334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 07/28/2023] [Accepted: 08/07/2023] [Indexed: 08/15/2023]
Abstract
Stroke is a cerebrovascular disease that can lead to severe sequelae such as hemiplegia and mental retardation with a mortality rate of up to 40%. In this paper, we proposed an automatic segmentation network (CHSNet) to segment the lesions in cranial CT images based on the characteristics of acute cerebral hemorrhage images, such as high density, multi-scale, and variable location, and realized the three-dimensional (3D) visualization and localization of the cranial lesions after the segmentation was completed. To enhance the feature representation of high-density regions, and capture multi-scale and up-down information on the target location, we constructed a convolutional neural network with encoding-decoding backbone, Res-RCL module, Atrous Spatial Pyramid Pooling, and Attention Gate. We collected images of 203 patients with acute cerebral hemorrhage, constructed a dataset containing 5998 cranial CT slices, and conducted comparative and ablation experiments on the dataset to verify the effectiveness of our model. Our model achieved the best results on both test sets with different segmentation difficulties, test1: Dice = 0.918, IoU = 0.853, ASD = 0.476, RVE = 0.113; test2: Dice = 0.716, IoU = 0.604, ASD = 5.402, RVE = 1.079. Based on the segmentation results, we achieved 3D visualization and localization of hemorrhage in CT images of stroke patients. The study has important implications for clinical adjuvant diagnosis.
Collapse
Affiliation(s)
- Bohao Xu
- School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yingwei Fan
- School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Jingming Liu
- Emergency Department, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Guobin Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Zhiping Wang
- Department of Radiology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100050, China
| | - Zhili Li
- BECHOICE (Beijing) Science and Technology Development Ltd., Beijing, 100050, China
| | - Wei Guo
- Emergency Department, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China.
| | - Xiaoying Tang
- School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
37
|
Liu W, Li Z, Xia J, Li C. MCSF-Net: a multi-scale channel spatial fusion network for real-time polyp segmentation. Phys Med Biol 2023; 68:175041. [PMID: 37582393 DOI: 10.1088/1361-6560/acf090] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 08/15/2023] [Indexed: 08/17/2023]
Abstract
Colorectal cancer is a globally prevalent cancer type that necessitates prompt screening. Colonoscopy is the established diagnostic technique for identifying colorectal polyps. However, missed polyp rates remain a concern. Early detection of polyps, while still precancerous, is vital for minimizing cancer-related mortality and economic impact. In the clinical setting, precise segmentation of polyps from colonoscopy images can provide valuable diagnostic and surgical information. Recent advances in computer-aided diagnostic systems, specifically those based on deep learning techniques, have shown promise in improving the detection rates of missed polyps, and thereby assisting gastroenterologists in improving polyp identification. In the present investigation, we introduce MCSF-Net, a real-time automatic segmentation framework that utilizes a multi-scale channel space fusion network. The proposed architecture leverages a multi-scale fusion module in conjunction with spatial and channel attention mechanisms to effectively amalgamate high-dimensional multi-scale features. Additionally, a feature complementation module is employed to extract boundary cues from low-dimensional features, facilitating enhanced representation of low-level features while keeping computational complexity to a minimum. Furthermore, we incorporate shape blocks to facilitate better model supervision for precise identification of boundary features of polyps. Our extensive evaluation of the proposed MCSF-Net on five publicly available benchmark datasets reveals that it outperforms several existing state-of-the-art approaches with respect to different evaluation metrics. The proposed approach runs at an impressive ∼45 FPS, demonstrating notable advantages in terms of scalability and real-time segmentation.
Collapse
Affiliation(s)
- Weikang Liu
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, People's Republic of China
| | - Zhigang Li
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, People's Republic of China
| | - Jiaao Xia
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, People's Republic of China
| | - Chunyang Li
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, People's Republic of China
| |
Collapse
|
38
|
Fang J, Lv N, Li J, Zhang H, Wen J, Yang W, Wu J, Wen Z. Decoupled learning for brain image registration. Front Neurosci 2023; 17:1246769. [PMID: 37694117 PMCID: PMC10485259 DOI: 10.3389/fnins.2023.1246769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 08/11/2023] [Indexed: 09/12/2023] Open
Abstract
Image registration is one of the important parts in medical image processing and intelligent analysis. The accuracy of image registration will greatly affect the subsequent image processing and analysis. This paper focuses on the problem of brain image registration based on deep learning, and proposes the unsupervised deep learning methods based on model decoupling and regularization learning. Specifically, we first decompose the highly ill-conditioned inverse problem of brain image registration into two simpler sub-problems, to reduce the model complexity. Further, two light neural networks are constructed to approximate the solution of the two sub-problems and the training strategy of alternating iteration is used to solve the problem. The performance of algorithms utilizing model decoupling is evaluated through experiments conducted on brain MRI images from the LPBA40 dataset. The obtained experimental results demonstrate the superiority of the proposed algorithm over conventional learning methods in the context of brain image registration tasks.
Collapse
Affiliation(s)
- Jinwu Fang
- Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, Shanghai, China
- China Academy of Information and Communication Technology, Beijing, China
- Industrial Internet Innovation Center (Shanghai) Co., Ltd., Shanghai, China
| | - Na Lv
- School of Health and Social Care, Shanghai Urban Construction Vocational College, Shanghai, China
| | - Jia Li
- Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, Shanghai, China
| | - Hao Zhang
- Department of Mathematics, School of Science, Shanghai University, Shanghai, China
| | - Jiayuan Wen
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wan Yang
- Department of Mathematics, School of Science, Shanghai University, Shanghai, China
| | - Jingfei Wu
- School of Economics, Shanghai University, Shanghai, China
| | - Zhijie Wen
- Department of Mathematics, School of Science, Shanghai University, Shanghai, China
| |
Collapse
|
39
|
Gong X, Zhang S. An Analysis of Plant Diseases Identification Based on Deep Learning Methods. THE PLANT PATHOLOGY JOURNAL 2023; 39:319-334. [PMID: 37550979 PMCID: PMC10412967 DOI: 10.5423/ppj.oa.02.2023.0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/25/2023] [Accepted: 06/12/2023] [Indexed: 08/09/2023]
Abstract
Plant disease is an important factor affecting crop yield. With various types and complex conditions, plant diseases cause serious economic losses, as well as modern agriculture constraints. Hence, rapid, accurate, and early identification of crop diseases is of great significance. Recent developments in deep learning, especially convolutional neural network (CNN), have shown impressive performance in plant disease classification. However, most of the existing datasets for plant disease classification are a single background environment rather than a real field environment. In addition, the classification can only obtain the category of a single disease and fail to obtain the location of multiple different diseases, which limits the practical application. Therefore, the object detection method based on CNN can overcome these shortcomings and has broad application prospects. In this study, an annotated apple leaf disease dataset in a real field environment was first constructed to compensate for the lack of existing datasets. Moreover, the Faster R-CNN and YOLOv3 architectures were trained to detect apple leaf diseases in our dataset. Finally, comparative experiments were conducted and a variety of evaluation indicators were analyzed. The experimental results demonstrate that deep learning algorithms represented by YOLOv3 and Faster R-CNN are feasible for plant disease detection and have their own strong points and weaknesses.
Collapse
Affiliation(s)
- Xulu Gong
- College of Agricultural Engineering, Shanxi Agricultural University, Jinzhong 030801,
China
- School of Software, Shanxi Agricultural University, Jinzhong 030801,
China
| | - Shujuan Zhang
- College of Agricultural Engineering, Shanxi Agricultural University, Jinzhong 030801,
China
| |
Collapse
|
40
|
Chong Y, Xie N, Liu X, Pan S. P-TransUNet: an improved parallel network for medical image segmentation. BMC Bioinformatics 2023; 24:285. [PMID: 37464322 DOI: 10.1186/s12859-023-05409-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/10/2023] [Indexed: 07/20/2023] Open
Abstract
Deep learning-based medical image segmentation has made great progress over the past decades. Scholars have proposed many novel transformer-based segmentation networks to solve the problems of building long-range dependencies and global context connections in convolutional neural networks (CNNs). However, these methods usually replace the CNN-based blocks with improved transformer-based structures, which leads to the lack of local feature extraction ability, and these structures require a huge number of data for training. Moreover, those methods did not pay attention to edge information, which is essential in medical image segmentation. To address these problems, we proposed a new network structure, called P-TransUNet. This network structure combines the designed efficient P-Transformer and the fusion module, which extract distance-related long-range dependencies and local information respectively and produce the fused features. Besides, we introduced edge loss into training to focus the attention of the network on the edge of the lesion area to improve segmentation performance. Extensive experiments across four tasks of medical image segmentation demonstrated the effectiveness of P-TransUNet, and showed that our network outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Yanwen Chong
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Ningdi Xie
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Xin Liu
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Shaoming Pan
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China.
| |
Collapse
|
41
|
Jin Q, Hou H, Zhang G, Li Z. FEGNet: A Feedback Enhancement Gate Network for Automatic Polyp Segmentation. IEEE J Biomed Health Inform 2023; 27:3420-3430. [PMID: 37126617 DOI: 10.1109/jbhi.2023.3272168] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Regular colonoscopy is an effective way to prevent colorectal cancer by detecting colorectal polyps. Automatic polyp segmentation significantly aids clinicians in precisely locating polyp areas for further diagnosis. However, polyp segmentation is a challenge problem, since polyps appear in a variety of shapes, sizes and textures, and they tend to have ambiguous boundaries. In this paper, we propose a U-shaped model named Feedback Enhancement Gate Network (FEGNet) for accurate polyp segmentation to overcome these difficulties. Specifically, for the high-level features, we design a novel Recurrent Gate Module (RGM) based on the feedback mechanism, which can refine attention maps without any additional parameters. RGM consists of Feature Aggregation Attention Gate (FAAG) and Multi-Scale Module (MSM). FAAG can aggregate context and feedback information, and MSM is applied for capturing multi-scale information, which is critical for the segmentation task. In addition, we propose a straightforward but effective edge extraction module to detect boundaries of polyps for low-level features, which is used to guide the training of early features. In our experiments, quantitative and qualitative evaluations show that the proposed FEGNet has achieved the best results in polyp segmentation compared to other state-of-the-art models on five colonoscopy datasets.
Collapse
|
42
|
Srivastava A, Jha D, Keles E, Aydogan B, Abazeed M, Bagci U. An Efficient Multi-Scale Fusion Network for 3D Organs at Risk (OARs) Segmentation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38082949 DOI: 10.1109/embc40787.2023.10340307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Accurate segmentation of organs-at-risks (OARs) is a precursor for optimizing radiation therapy planning. Existing deep learning-based multi-scale fusion architectures have demonstrated a tremendous capacity for 2D medical image segmentation. The key to their success is aggregating global context and maintaining high resolution representations. However, when translated into 3D segmentation problems, existing multi-scale fusion architectures might underperform due to their heavy computation overhead and substantial data diet. To address this issue, we propose a new OAR segmentation framework, called OARFocalFuseNet, which fuses multi-scale features and employs focal modulation for capturing global-local context across multiple scales. Each resolution stream is enriched with features from different resolution scales, and multi-scale information is aggregated to model diverse contextual ranges. As a result, feature representations are further boosted. The comprehensive comparisons in our experimental setup with OAR segmentation as well as multi-organ segmentation show that our proposed OARFocalFuseNet outperforms the recent state-of-the-art methods on publicly available OpenKBP datasets and Synapse multi-organ segmentation. Both of the proposed methods (3D-MSF and OARFocalFuseNet) showed promising performance in terms of standard evaluation metrics. Our best performing method (OARFocalFuseNet) obtained a dice coefficient of 0.7995 and hausdorff distance of 5.1435 on OpenKBP datasets and dice coefficient of 0.8137 on Synapse multi-organ segmentation dataset. Our code is available at https://github.com/NoviceMAn-prog/OARFocalFuse.
Collapse
|
43
|
Khan MS, Ali S, Lee YR, Park SY, Tak WY, Jung SK. Cell Nuclei Segmentation With Dynamic Token-Based Attention Network. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083030 DOI: 10.1109/embc40787.2023.10340818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Cell nuclei segmentation is crucial for analyzing cell structure in different tasks, i.e., cell identification, classification, etc., to treat various diseases. Several convolutional neural network-based architectures have been proposed for segmenting cell nuclei. Although these methods show superior performance, they lack the ability to predict reliable masks when using biomedical image data. This paper proposes a novel Dynamic Token-based Attention Network (DTA-Net). Combining convolutional neural networks (CNN) with a vision transformer (ViT) allows us to capture detailed spatial information from images efficiently by encoding local and global features. Dynamic Token-based Attention (DTA) module calculates attention maps keeping the overall computational and training costs minimal. For the nuclei segmentation task on the 2018 Science Bowl dataset, our proposed method outperformed SOTA networks with the highest Dice similarity score (DSC) of 93.02% and Intersection over Union (IoU) of 87.91% without using image pre- or post-processing techniques. The results showed that high-quality segmentation masks could be obtained by configuring a ViT in the most straight forward manner.Clinical relevance- In this work, the segmentation of cell nuclei in microscopy images is carried out automatically, irrespective of their appearance, density, magnification, illumination, and modality.
Collapse
|
44
|
Dumitru RG, Peteleaza D, Craciun C. Using DUCK-Net for polyp image segmentation. Sci Rep 2023; 13:9803. [PMID: 37328572 PMCID: PMC10276013 DOI: 10.1038/s41598-023-36940-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 06/13/2023] [Indexed: 06/18/2023] Open
Abstract
This paper presents a novel supervised convolutional neural network architecture, "DUCK-Net", capable of effectively learning and generalizing from small amounts of medical images to perform accurate segmentation tasks. Our model utilizes an encoder-decoder structure with a residual downsampling mechanism and a custom convolutional block to capture and process image information at multiple resolutions in the encoder segment. We employ data augmentation techniques to enrich the training set, thus increasing our model's performance. While our architecture is versatile and applicable to various segmentation tasks, in this study, we demonstrate its capabilities specifically for polyp segmentation in colonoscopy images. We evaluate the performance of our method on several popular benchmark datasets for polyp segmentation, Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, and ETIS-LARIBPOLYPDB showing that it achieves state-of-the-art results in terms of mean Dice coefficient, Jaccard index, Precision, Recall, and Accuracy. Our approach demonstrates strong generalization capabilities, achieving excellent performance even with limited training data.
Collapse
|
45
|
An EffcientNet-encoder U-Net Joint Residual Refinement Module with Tversky–Kahneman Baroni–Urbani–Buser loss for biomedical image Segmentation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
46
|
Yang L, Fan C, Lin H, Qiu Y. Rema-Net: An efficient multi-attention convolutional neural network for rapid skin lesion segmentation. Comput Biol Med 2023; 159:106952. [PMID: 37084639 DOI: 10.1016/j.compbiomed.2023.106952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/10/2023] [Accepted: 04/15/2023] [Indexed: 04/23/2023]
Abstract
For clinical treatment, the accurate segmentation of lesions from dermoscopic images is extremely valuable. Convolutional neural networks (such as U-Net and its numerous variants) have become the main methods for skin lesion segmentation in recent years. However, because these methods frequently have a large number of parameters and complicated algorithm structures, which results in high hardware requirements and long training time, it is difficult to effectively use them for fast training and segmentation tasks. For this reason, we proposed an efficient multi-attention convolutional neural network (Rema-Net) for rapid skin lesion segmentation. The down-sampling module of the network only uses a convolutional layer and a pooling layer, with spatial attention added to improve useful features. We also designed skip-connections between the down-sampling and up-sampling parts of the network, and used reverse attention operation on the skip-connections to strengthen segmentation performance of the network. We conducted extensive experiments on five publicly available datasets to validate the effectiveness of our method, including the ISIC-2016, ISIC-2017, ISIC-2018, PH2, and HAM10000 datasets. The results show that the proposed method reduced the number of parameters by nearly 40% when compared with U-Net. Furthermore, the segmentation metrics are significantly better than some previous methods, and the predictions are closer to the real lesion.
Collapse
Affiliation(s)
- Litao Yang
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou City, Henan Province, 450001, China
| | - Chao Fan
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou City, Henan Province, 450001, China; Key Laboratory of Grain Information Processing and Control, Ministry of Education, Zhengzhou City, Henan Province, 450001, China.
| | - Hao Lin
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou City, Henan Province, 450001, China
| | - Yingying Qiu
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou City, Henan Province, 450001, China
| |
Collapse
|
47
|
Zhan B, Song E, Liu H. FSA-Net: Rethinking the attention mechanisms in medical image segmentation from releasing global suppressed information. Comput Biol Med 2023; 161:106932. [PMID: 37230013 DOI: 10.1016/j.compbiomed.2023.106932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/28/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023]
Abstract
Attention mechanism-based medical image segmentation methods have developed rapidly recently. For the attention mechanisms, it is crucial to accurately capture the distribution weights of the effective features contained in the data. To accomplish this task, most attention mechanisms prefer using the global squeezing approach. However, it will lead to a problem of over-focusing on the global most salient effective features of the region of interest, while suppressing the secondary salient ones. Making partial fine-grained features are abandoned directly. To address this issue, we propose to use a multiple-local perception method to aggregate global effective features, and design a fine-grained medical image segmentation network, named FSA-Net. This network consists of two key components: 1) the novel Separable Attention Mechanisms which replace global squeezing with local squeezing to release the suppressed secondary salient effective features. 2) a Multi-Attention Aggregator (MAA) which can fuse multi-level attention to efficiently aggregate task-relevant semantic information. We conduct extensive experimental evaluations on five publicly available medical image segmentation datasets: MoNuSeg, COVID-19-CT100, GlaS, CVC-ClinicDB, ISIC2018, and DRIVE datasets. Experimental results show that FSA-Net outperforms state-of-the-art methods in medical image segmentation.
Collapse
Affiliation(s)
- Bangcheng Zhan
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Enmin Song
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| | - Hong Liu
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| |
Collapse
|
48
|
Baskaran D, Nagamani Y, Merugula S, Premnath SP. MSRFNet for skin lesion segmentation and deep learning with hybrid optimization for skin cancer detection. THE IMAGING SCIENCE JOURNAL 2023. [DOI: 10.1080/13682199.2023.2187518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
49
|
Ahmed MR, Ashrafi AF, Ahmed RU, Shatabda S, Islam AKMM, Islam S. DoubleU-NetPlus: a novel attention and context-guided dual U-Net with multi-scale residual feature fusion network for semantic segmentation of medical images. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08493-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
50
|
Pan S, Liu X, Xie N, Chong Y. EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation. BMC Bioinformatics 2023; 24:85. [PMID: 36882688 PMCID: PMC9989586 DOI: 10.1186/s12859-023-05196-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open
Abstract
Although various methods based on convolutional neural networks have improved the performance of biomedical image segmentation to meet the precision requirements of medical imaging segmentation task, medical image segmentation methods based on deep learning still need to solve the following problems: (1) Difficulty in extracting the discriminative feature of the lesion region in medical images during the encoding process due to variable sizes and shapes; (2) difficulty in fusing spatial and semantic information of the lesion region effectively during the decoding process due to redundant information and the semantic gap. In this paper, we used the attention-based Transformer during the encoder and decoder stages to improve feature discrimination at the level of spatial detail and semantic location by its multihead-based self-attention. In conclusion, we propose an architecture called EG-TransUNet, including three modules improved by a transformer: progressive enhancement module, channel spatial attention, and semantic guidance attention. The proposed EG-TransUNet architecture allowed us to capture object variabilities with improved results on different biomedical datasets. EG-TransUNet outperformed other methods on two popular colonoscopy datasets (Kvasir-SEG and CVC-ClinicDB) by achieving 93.44% and 95.26% on mDice. Extensive experiments and visualization results demonstrate that our method advances the performance on five medical segmentation datasets with better generalization ability.
Collapse
Affiliation(s)
- Shaoming Pan
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Xin Liu
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Ningdi Xie
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Yanwen Chong
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China.
| |
Collapse
|