1
|
Wei X, Sun J, Su P, Wan H, Ning Z. BCL-Former: Localized Transformer Fusion with Balanced Constraint for polyp image segmentation. Comput Biol Med 2024; 182:109182. [PMID: 39341109 DOI: 10.1016/j.compbiomed.2024.109182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/18/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024]
Abstract
Polyp segmentation remains challenging for two reasons: (a) the size and shape of colon polyps are variable and diverse; (b) the distinction between polyps and mucosa is not obvious. To solve the above two challenging problems and enhance the generalization ability of segmentation method, we propose the Localized Transformer Fusion with Balanced Constraint (BCL-Former) for Polyp Segmentation. In BCL-Former, the Strip Local Enhancement module (SLE module) is proposed to capture the enhanced local features. The Progressive Feature Fusion module (PFF module) is presented to make the feature aggregation smoother and eliminate the difference between high-level and low-level features. Moreover, the Tversky-based Appropriate Constrained Loss (TacLoss) is proposed to achieve the balance and constraint between True Positives and False Negatives, improving the ability to generalize across datasets. Extensive experiments are conducted on four benchmark datasets. Results show that our proposed method achieves state-of-the-art performance in both segmentation precision and generalization ability. Also, the proposed method is 5%-8% faster than the benchmark method in training and inference. The code is available at: https://github.com/sjc-lbj/BCL-Former.
Collapse
Affiliation(s)
- Xin Wei
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Jiacheng Sun
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Pengxiang Su
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Huan Wan
- School of Computer Information Engineering, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China.
| | - Zhitao Ning
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| |
Collapse
|
2
|
Bobowicz M, Badocha M, Gwozdziewicz K, Rygusik M, Kalinowska P, Szurowska E, Dziubich T. Segmentation-based BI-RADS ensemble classification of breast tumours in ultrasound images. Int J Med Inform 2024; 189:105522. [PMID: 38852288 DOI: 10.1016/j.ijmedinf.2024.105522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/19/2024] [Accepted: 06/05/2024] [Indexed: 06/11/2024]
Abstract
BACKGROUND The development of computer-aided diagnosis systems in breast cancer imaging is exponential. Since 2016, 81 papers have described the automated segmentation of breast lesions in ultrasound images using artificial intelligence. However, only two papers have dealt with complex BI-RADS classifications. PURPOSE This study addresses the automatic classification of breast lesions into binary classes (benign vs. malignant) and multiple BI-RADS classes based on a single ultrasonographic image. Achieving this task should reduce the subjectivity of an individual operator's assessment. MATERIALS AND METHODS Automatic image segmentation methods (PraNet, CaraNet and FCBFormer) adapted to the specific segmentation task were investigated using the U-Net model as a reference. A new classification method was developed using an ensemble of selected segmentation approaches. All experiments were performed on publicly available BUS B, OASBUD, BUSI and private datasets. RESULTS FCBFormer achieved the best outcomes for the segmentation task with intersection over union metric values of 0.81, 0.80 and 0.73 and Dice values of 0.89, 0.87 and 0.82, respectively, for the BUS B, BUSI and OASBUD datasets. Through a series of experiments, we determined that adding an extra 30-pixel margin to the segmentation mask counteracts the potential errors introduced by the segmentation algorithm. An assembly of the full image classifier, bounding box classifier and masked image classifier was the most accurate for binary classification and had the best accuracy (ACC; 0.908), F1 (0.846) and area under the receiver operating characteristics curve (AUROC; 0.871) in the BUS B and ACC (0.982), F1 (0.984) and AUROC (0.998) in the UCC BUS datasets, outperforming each classifier used separately. It was also the most effective for BI-RADS classification, with ACC of 0.953, F1 of 0.920 and AUROC of 0.986 in UCC BUS. Hard voting was the most effective method for dichotomous classification. For the multi-class BI-RADS classification, the soft voting approach was employed. CONCLUSIONS The proposed new classification approach with an ensemble of segmentation and classification approaches proved more accurate than most published results for binary and multi-class BI-RADS classifications.
Collapse
Affiliation(s)
- Maciej Bobowicz
- 2(nd) Department of Radiology, Medical University of Gdansk, 17 Smoluchowskiego Str., Gdansk 80-214, Poland.
| | - Mikołaj Badocha
- 2(nd) Department of Radiology, Medical University of Gdansk, 17 Smoluchowskiego Str., Gdansk 80-214, Poland.
| | - Katarzyna Gwozdziewicz
- 2(nd) Department of Radiology, Medical University of Gdansk, 17 Smoluchowskiego Str., Gdansk 80-214, Poland.
| | - Marlena Rygusik
- 2(nd) Department of Radiology, Medical University of Gdansk, 17 Smoluchowskiego Str., Gdansk 80-214, Poland.
| | - Paulina Kalinowska
- Department of Thoracic Radiology, Karolinska University Hospital, Anna Steckséns g 41, Solna 17176, Sweden.
| | - Edyta Szurowska
- 2(nd) Department of Radiology, Medical University of Gdansk, 17 Smoluchowskiego Str., Gdansk 80-214, Poland.
| | - Tomasz Dziubich
- Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 11/12 G. Narutowicza Str., Gdańsk 80-233, Poland.
| |
Collapse
|
3
|
Tang S, Ran H, Yang S, Wang Z, Li W, Li H, Meng Z. A frequency selection network for medical image segmentation. Heliyon 2024; 10:e35698. [PMID: 39220902 PMCID: PMC11365330 DOI: 10.1016/j.heliyon.2024.e35698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
Collapse
Affiliation(s)
- Shu Tang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Haiheng Ran
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Shuli Yang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zhaoxia Wang
- Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Wei Li
- Children’s Hospital of Chongqing Medical University, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| |
Collapse
|
4
|
Chang Q, Ahmad D, Toth J, Bascom R, Higgins WE. ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video. J Imaging 2024; 10:191. [PMID: 39194980 DOI: 10.3390/jimaging10080191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/19/2024] [Accepted: 08/01/2024] [Indexed: 08/29/2024] Open
Abstract
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
Collapse
Affiliation(s)
- Qi Chang
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| | - Danish Ahmad
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Jennifer Toth
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Rebecca Bascom
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - William E Higgins
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
5
|
Xu Z, Miao Y, Chen G, Liu S, Chen H. GLGFormer: Global Local Guidance Network for Mucosal Lesion Segmentation in Gastrointestinal Endoscopy Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01162-2. [PMID: 38940891 DOI: 10.1007/s10278-024-01162-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 05/05/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
Automatic mucosal lesion segmentation is a critical component in computer-aided clinical support systems for endoscopic image analysis. Image segmentation networks currently rely mainly on convolutional neural networks (CNNs) and Transformers, which have demonstrated strong performance in various applications. However, they cannot cope with blurred lesion boundaries and lesions of different scales in gastrointestinal endoscopy images. To address these challenges, we propose a new Transformer-based network, named GLGFormer, for the task of mucosal lesion segmentation. Specifically, we design the global guidance module to guide single-scale features patch-wise, enabling them to incorporate global information from the global map without information loss. Furthermore, a partial decoder is employed to fuse these enhanced single-scale features, achieving single-scale to multi-scale enhancement. Additionally, the local guidance module is designed to refocus attention on the neighboring patch, thus enhancing local features and refining lesion boundary segmentation. We conduct experiments on a private atrophic gastritis segmentation dataset and four public gastrointestinal polyp segmentation datasets. Compared to the current lesion segmentation networks, our proposed GLGFormer demonstrates outstanding learning and generalization capabilities. On the public dataset ClinicDB, GLGFormer achieved a mean intersection over union (mIoU) of 91.0% and a mean dice coefficient (mDice) of 95.0%. On the private dataset Gastritis-Seg, GLGFormer achieved an mIoU of 90.6% and an mDice of 94.6%.
Collapse
Affiliation(s)
- Zhiyang Xu
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China
| | - Yanzi Miao
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China.
| | - Guangxia Chen
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Shiyu Liu
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Hu Chen
- The First Clinical Medical School of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| |
Collapse
|
6
|
Su D, Luo J, Fei C. An Efficient and Rapid Medical Image Segmentation Network. IEEE J Biomed Health Inform 2024; 28:2979-2990. [PMID: 38457317 DOI: 10.1109/jbhi.2024.3374780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Accurate medical image segmentation is an essential part of the medical image analysis process that provides detailed quantitative metrics. In recent years, extensions of classical networks such as UNet have achieved state-of-the-art performance on medical image segmentation tasks. However, the high model complexity of these networks limits their applicability to devices with constrained computational resources. To alleviate this problem, we propose a shallow hierarchical Transformer for medical image segmentation, called SHFormer. By decreasing the number of transformer blocks utilized, the model complexity of SHFormer can be reduced to an acceptable level. To improve the learned attention while keeping the structure lightweight, we propose a spatial-channel connection module. This module separately learns attention in the spatial and channel dimensions of the feature while interconnecting them to produce more focused attention. To keep the decoder lightweight, the MLP-D module is proposed to progressively fuse multi-scale features in which channels are aligned using Multi-Layer Perceptron (MLP) and spatial information is fused by convolutional blocks. We first validated the performance of SHFormer on the ISIC-2018 dataset. Compared to the latest network, SHFormer exhibits comparable performance with 15 times fewer parameters, 30 times lower computational complexity and 5 times higher inference efficiency. To test the generalizability of SHFormer, we introduced the polyp dataset for additional testing. SHFormer achieves comparable segmentation accuracy to the latest network while having lower computational overhead.
Collapse
|
7
|
白 培, 宋 雪, 刘 庆, 刘 佳, 成 锦, 修 晓, 任 延, 王 成. [Automatic detection method of intracranial aneurysms on maximum intensity projection images based on SE-CaraNet]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2024; 41:228-236. [PMID: 38686402 PMCID: PMC11058495 DOI: 10.7507/1001-5515.202301008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 01/13/2024] [Indexed: 05/02/2024]
Abstract
Conventional maximum intensity projection (MIP) images tend to ignore some morphological features in the detection of intracranial aneurysms, resulting in missed detection and misdetection. To solve this problem, a new method for intracranial aneurysm detection based on omni-directional MIP image is proposed in this paper. Firstly, the three-dimensional magnetic resonance angiography (MRA) images were projected with the maximum density in all directions to obtain the MIP images. Then, the region of intracranial aneurysm was prepositioned by matching filter. Finally, the Squeeze and Excitation (SE) module was used to improve the CaraNet model. Excitation and the improved model were used to detect the predetermined location in the omni-directional MIP image to determine whether there was intracranial aneurysm. In this paper, 245 cases of images were collected to test the proposed method. The results showed that the accuracy and specificity of the proposed method could reach 93.75% and 93.86%, respectively, significantly improved the detection performance of intracranial aneurysms in MIP images.
Collapse
Affiliation(s)
- 培瑞 白
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 雪峰 宋
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 庆一 刘
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 佳慧 刘
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 锦 成
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 晓娜 修
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 延德 任
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| | - 成健 王
- 山东科技大学 电子信息工程学院(山东青岛 266590)School of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| |
Collapse
|
8
|
Dai H, Xie W, Xia E. SK-Unet++: An improved Unet++ network with adaptive receptive fields for automatic segmentation of ultrasound thyroid nodule images. Med Phys 2024; 51:1798-1811. [PMID: 37606374 DOI: 10.1002/mp.16672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/23/2023] Open
Abstract
BACKGROUND The quality of segmentation of thyroid nodules in ultrasound images is a crucial factor in preventing the cancerization of thyroid nodules. However, the existing standards for the ultrasound imaging of cancerous nodules have limitations, and changes of the echo pattern of thyroid nodules pose challenges in accurately segmenting nodules, which can affect the diagnostic results of medical professionals. PURPOSE The aim of this study is to address the challenges related to segmentation accuracy due to noise, low contrast, morphological scale variations, and blurred edges of thyroid nodules in ultrasound images and improve the accuracy of ultrasound-based thyroid nodule segmentation, thereby aiding the clinical diagnosis of thyroid nodules. METHOD In this study, the dataset of thyroid ultrasound images was obtained from Hunan Provincial People's Hospital, consisting of a total of 3572 samples used for the training, validation, and testing of this model at a ratio of 8:1:1. A novel SK-Unet++ network was used to enhance the segmentation accuracy of thyroid nodules. SK-Unet++ is a novel deep learning architecture that adds the adaptive receptive fields based on the selective kernel (SK) attention mechanisms into the Unet++ network. The convolution blocks of the original UNet++ encoder part were replaced with finer SK convolution blocks in SK-Unet++. First, multiple skip connections were incorporated so that SK-Unet++ can make information from previous layers of the neural network to bypass certain layers and directly propagate to subsequent layers. The feature maps of the corresponding locations were fused on the channel, resulting in enhanced segmentation accuracy. Second, we added the adaptive receptive fields. The adaptive receptive fields were used to capture multiscale spatial features better by dynamically adjusting its receptive field. The assessment metrics contained dice similarity coefficient (Dsc), accuracy (Acc), precision (Pre), recall (Re), and Hausdorff distance, and all comparison experiments used the paired t-tests to assess whether statistically significant performance differences existed (p < 0.05). And to address the multi-comparison problem, we performed the false discovery rate (FDR) correction after the test. RESULTS The segmentation model had an Acc of 80.6%, Dsc of 84.7%, Pre of 77.5%, Re of 71.7%, and an average Hausdorff distance of 15.80 mm. Ablation experimental results demonstrated that each module in the network could contribute to the improved performance (p < 0.05) and determined the best combination of parameters. A comparison with other state-of-the-art methods showed that SK-Unet++ significantly outperformed them in terms of segmentation performance (p < 0.05), with a more accurate segmentation contour. Additionally, the adaptive weight changes of the SK module were monitored during the training process, and the resulting change curves demonstrated their convergence. CONCLUSION Our proposed method demonstrates favorable performance in the segmentation of ultrasound images of thyroid nodules. Results confirmed that SK-Unet++ is a feasible and effective method for the automatic segmentation of thyroid nodules in ultrasound images. The high accuracy achieved by our method can facilitate efficient screening of patients with thyroid nodules, ultimately reducing the workload of clinicians and radiologists.
Collapse
Affiliation(s)
- Hong Dai
- Department of Ultrasound Medicine, Hunan Provincial Peoples Hospital, Changsha, China
| | - Wufei Xie
- School of Automation, Central South University, Changsha, China
| | - E Xia
- School of Automation, Central South University, Changsha, China
| |
Collapse
|
9
|
Wang Z, Yu L, Tian S, Huo X. CRMEFNet: A coupled refinement, multiscale exploration and fusion network for medical image segmentation. Comput Biol Med 2024; 171:108202. [PMID: 38402839 DOI: 10.1016/j.compbiomed.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/22/2023] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Accurate segmentation of target areas in medical images, such as lesions, is essential for disease diagnosis and clinical analysis. In recent years, deep learning methods have been intensively researched and have generated significant progress in medical image segmentation tasks. However, most of the existing methods have limitations in modeling multilevel feature representations and identification of complex textured pixels at contrasting boundaries. This paper proposes a novel coupled refinement and multiscale exploration and fusion network (CRMEFNet) for medical image segmentation, which explores in the optimization and fusion of multiscale features to address the abovementioned limitations. The CRMEFNet consists of three main innovations: a coupled refinement module (CRM), a multiscale exploration and fusion module (MEFM), and a cascaded progressive decoder (CPD). The CRM decouples features into low-frequency body features and high-frequency edge features, and performs targeted optimization of both to enhance intraclass uniformity and interclass differentiation of features. The MEFM performs a two-stage exploration and fusion of multiscale features using our proposed multiscale aggregation attention mechanism, which explores the differentiated information within the cross-level features, and enhances the contextual connections between the features, to achieves adaptive feature fusion. Compared to existing complex decoders, the CPD decoder (consisting of the CRM and MEFM) can perform fine-grained pixel recognition while retaining complete semantic location information. It also has a simple design and excellent performance. The experimental results from five medical image segmentation tasks, ten datasets and twelve comparison models demonstrate the state-of-the-art performance, interpretability, flexibility and versatility of our CRMEFNet.
Collapse
Affiliation(s)
- Zhi Wang
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China.
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Xiangzuo Huo
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China
| |
Collapse
|
10
|
Li W, Huang Z, Li F, Zhao Y, Zhang H. CIFG-Net: Cross-level information fusion and guidance network for Polyp Segmentation. Comput Biol Med 2024; 169:107931. [PMID: 38181608 DOI: 10.1016/j.compbiomed.2024.107931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 12/03/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
Colorectal cancer is a common malignant tumor of the digestive tract. Most colorectal cancer is caused by colorectal polyp lesions. Timely detection and removal of colorectal polyps can substantially reduce the incidence of colorectal cancer. Accurate polyp segmentation can provide important polyp information that can aid in the early diagnosis and treatment of colorectal cancer. However, polyps of the same type can vary in texture, color, and even size. Furthermore, some polyps are similar in colour to the surrounding healthy tissue, which makes the boundary between the polyp and the surrounding area unclear. In order to overcome the issues of inaccurate polyp localization and unclear boundary segmentation, we propose a polyp segmentation network based on cross-level information fusion and guidance. We use a Transformer encoder to extract a more robust feature representation. In addition, to refine the processing of feature information from encoders, we propose the edge feature processing module (EFPM) and the cross-level information processing module (CIPM). EFPM is used to focus on the boundary information in polyp features. After processing each feature, EFPM can obtain clear and accurate polyp boundary features, which can mitigate unclear boundary segmentation. CIPM is used to aggregate and process multi-scale features transmitted by various encoder layers and to solve the problem of inaccurate polyp location by using multi-level features to obtain the location information of polyps. In order to better use the processed features to optimise our segmentation effect, we also propose an information guidance module (IGM) to integrate the processed features of EFPM and CIPM to obtain accurate positioning and segmentation of polyps. Through experiments on five public polyp datasets using six metrics, it was demonstrated that the proposed network has better robustness and more accurate segmentation effect. Compared with other advanced algorithms, CIFG-Net has superior performance. Code available at: https://github.com/zspnb/CIFG-Net.
Collapse
Affiliation(s)
- Weisheng Li
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China.
| | - Zhaopeng Huang
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Feiyan Li
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yinghui Zhao
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Hongchuan Zhang
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
11
|
Ahamed MF, Syfullah MK, Sarkar O, Islam MT, Nahiduzzaman M, Islam MR, Khandakar A, Ayari MA, Chowdhury MEH. IRv2-Net: A Deep Learning Framework for Enhanced Polyp Segmentation Performance Integrating InceptionResNetV2 and UNet Architecture with Test Time Augmentation Techniques. SENSORS (BASEL, SWITZERLAND) 2023; 23:7724. [PMID: 37765780 PMCID: PMC10534485 DOI: 10.3390/s23187724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 08/25/2023] [Accepted: 08/30/2023] [Indexed: 09/29/2023]
Abstract
Colorectal polyps in the colon or rectum are precancerous growths that can lead to a more severe disease called colorectal cancer. Accurate segmentation of polyps using medical imaging data is essential for effective diagnosis. However, manual segmentation by endoscopists can be time-consuming, error-prone, and expensive, leading to a high rate of missed anomalies. To solve this problem, an automated diagnostic system based on deep learning algorithms is proposed to find polyps. The proposed IRv2-Net model is developed using the UNet architecture with a pre-trained InceptionResNetV2 encoder to extract most features from the input samples. The Test Time Augmentation (TTA) technique, which utilizes the characteristics of the original, horizontal, and vertical flips, is used to gain precise boundary information and multi-scale image features. The performance of numerous state-of-the-art (SOTA) models is compared using several metrics such as accuracy, Dice Similarity Coefficients (DSC), Intersection Over Union (IoU), precision, and recall. The proposed model is tested on the Kvasir-SEG and CVC-ClinicDB datasets, demonstrating superior performance in handling unseen real-time data. It achieves the highest area coverage in the area under the Receiver Operating Characteristic (ROC-AUC) and area under Precision-Recall (AUC-PR) curves. The model exhibits excellent qualitative testing outcomes across different types of polyps, including more oversized, smaller, over-saturated, sessile, or flat polyps, within the same dataset and across different datasets. Our approach can significantly minimize the number of missed rating difficulties. Lastly, a graphical interface is developed for producing the mask in real-time. The findings of this study have potential applications in clinical colonoscopy procedures and can serve based on further research and development.
Collapse
Affiliation(s)
- Md. Faysal Ahamed
- Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh; (M.F.A.); (M.R.I.)
| | - Md. Khalid Syfullah
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh; (M.K.S.); (O.S.); (M.N.)
| | - Ovi Sarkar
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh; (M.K.S.); (O.S.); (M.N.)
| | - Md. Tohidul Islam
- Department of Information & Communication Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh;
| | - Md. Nahiduzzaman
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh; (M.K.S.); (O.S.); (M.N.)
- Department of Electrical Engineering, Qatar University, Doha 2713, Qatar;
| | - Md. Rabiul Islam
- Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh; (M.F.A.); (M.R.I.)
| | - Amith Khandakar
- Department of Electrical Engineering, Qatar University, Doha 2713, Qatar;
| | - Mohamed Arselene Ayari
- Department of Civil and environmental Engineering, Qatar University, Doha 2713, Qatar;
- Technology Innovation and Engineering Education Unit (TIEE), Qatar University, Doha 2713, Qatar
| | | |
Collapse
|
12
|
梁 礼, 何 安, 朱 晨, 盛 校. [Colorectal polyp segmentation method based on fusion of transformer and cross-level phase awareness]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:234-243. [PMID: 37139753 PMCID: PMC10162923 DOI: 10.7507/1001-5515.202211067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/01/2023] [Indexed: 05/05/2023]
Abstract
In order to address the issues of spatial induction bias and lack of effective representation of global contextual information in colon polyp image segmentation, which lead to the loss of edge details and mis-segmentation of lesion areas, a colon polyp segmentation method that combines Transformer and cross-level phase-awareness is proposed. The method started from the perspective of global feature transformation, and used a hierarchical Transformer encoder to extract semantic information and spatial details of lesion areas layer by layer. Secondly, a phase-aware fusion module (PAFM) was designed to capture cross-level interaction information and effectively aggregate multi-scale contextual information. Thirdly, a position oriented functional module (POF) was designed to effectively integrate global and local feature information, fill in semantic gaps, and suppress background noise. Fourthly, a residual axis reverse attention module (RA-IA) was used to improve the network's ability to recognize edge pixels. The proposed method was experimentally tested on public datasets CVC-ClinicDB, Kvasir, CVC-ColonDB, and EITS, with Dice similarity coefficients of 94.04%, 92.04%, 80.78%, and 76.80%, respectively, and mean intersection over union of 89.31%, 86.81%, 73.55%, and 69.10%, respectively. The simulation experimental results show that the proposed method can effectively segment colon polyp images, providing a new window for the diagnosis of colon polyps.
Collapse
Affiliation(s)
- 礼明 梁
- 江西理工大学 电气工程与自动化学院(江西赣州 341000)School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, P. R. China
| | - 安军 何
- 江西理工大学 电气工程与自动化学院(江西赣州 341000)School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, P. R. China
| | - 晨锟 朱
- 江西理工大学 电气工程与自动化学院(江西赣州 341000)School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, P. R. China
| | - 校棋 盛
- 江西理工大学 电气工程与自动化学院(江西赣州 341000)School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, P. R. China
| |
Collapse
|
13
|
Lee H, Yoo J. Fast Attention CNN for Fine-Grained Crack Segmentation. SENSORS (BASEL, SWITZERLAND) 2023; 23:2244. [PMID: 36850841 PMCID: PMC9962498 DOI: 10.3390/s23042244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 02/06/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Deep learning-based computer vision algorithms, especially image segmentation, have been successfully applied to pixel-level crack detection. The prediction accuracy relies heavily on detecting the performance of fine-grained cracks and removing crack-like noise. We propose a fast encoder-decoder network with scaling attention. We focus on a low-level feature map by minimizing encoder-decoder pairs and adopting an Atrous Spatial Pyramid Pooling (ASPP) layer to improve the detection accuracy of tiny cracks. Another challenge is the reduction in crack-like noise. This introduces a novel scaling attention, AG+, to suppress irrelevant regions. However, removing crack-like noise, such as grooving, is difficult by using only improved segmentation networks. In this study, a crack dataset is generated. It contains 11,226 sets of images and masks, which are effective for detecting detailed tiny cracks and removing non-semantic objects. Our model is evaluated on the generated dataset and compared with state-of-the-art segmentation networks. We use the mean Dice coefficient (mDice) and mean Intersection over union (mIoU) to compare the performance and FLOPs for computational complexity. The experimental results show that our model improves the detection accuracy of fine-grained cracks and reduces the computational cost dramatically. The mDice score of the proposed model is close to the best score, with only a 1.2% difference but two times fewer FLOPs.
Collapse
Affiliation(s)
- Hyunnam Lee
- Incheon International Airport Corporation, Incheon 22382, Republic of Korea
| | - Juhan Yoo
- Department of Computer, Semyung University, Jecheon 02468, Republic of Korea
| |
Collapse
|