1
|
Wang Z, Li T, Liu M, Jiang J, Liu X. DCATNet: polyp segmentation with deformable convolution and contextual-aware attention network. BMC Med Imaging 2025; 25:120. [PMID: 40229681 DOI: 10.1186/s12880-025-01661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 04/03/2025] [Indexed: 04/16/2025] Open
Abstract
Polyp segmentation is crucial in computer-aided diagnosis but remains challenging due to the complexity of medical images and anatomical variations. Current state-of-the-art methods struggle with accurate polyp segmentation due to the variability in size, shape, and texture. These factors make boundary detection challenging, often resulting in incomplete or inaccurate segmentation. To address these challenges, we propose DCATNet, a novel deep learning architecture specifically designed for polyp segmentation. DCATNet is a U-shaped network that combines ResNetV2-50 as an encoder for capturing local features and a Transformer for modeling long-range dependencies. It integrates three key components: the Geometry Attention Module (GAM), the Contextual Attention Gate (CAG), and the Multi-scale Feature Extraction (MSFE) block. We evaluated DCATNet on five public datasets. On Kvasir-SEG and CVC-ClinicDB, the model achieved mean dice scores of 0.9351 and 0.9444, respectively, outperforming previous state-of-the-art (SOTA) methods. Cross-validation further demonstrated its superior generalization capability. Ablation studies confirmed the effectiveness of each component in DCATNet. Integrating GAM, CAG, and MSFE effectively improves feature representation and fusion, leading to precise and reliable segmentation results. These findings underscore DCATNet's potential for clinical application and can be used for a wide range of medical image segmentation tasks.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China
| | - Tianshu Li
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard Exploration, Changsha, China
| | - Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
| | - Xinjuan Liu
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China.
| |
Collapse
|
2
|
Kumar A, Aravind N, Gillani T, Kumar D. Artificial intelligence breakthrough in diagnosis, treatment, and prevention of colorectal cancer – A comprehensive review. Biomed Signal Process Control 2025; 101:107205. [DOI: 10.1016/j.bspc.2024.107205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2024]
|
3
|
Ke X, Chen G, Liu H, Guo W. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation. Comput Biol Med 2025; 186:109601. [PMID: 39740513 DOI: 10.1016/j.compbiomed.2024.109601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/30/2024] [Accepted: 12/18/2024] [Indexed: 01/02/2025]
Abstract
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the "dark corner phenomenon" in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net.
Collapse
Affiliation(s)
- Xiao Ke
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Guanhong Chen
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Hao Liu
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Wenzhong Guo
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China.
| |
Collapse
|
4
|
Yue G, Zhang L, Du J, Zhou T, Zhou W, Lin W. Subjective and Objective Quality Assessment of Colonoscopy Videos. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:841-854. [PMID: 39283779 DOI: 10.1109/tmi.2024.3461737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Captured colonoscopy videos usually suffer from multiple real-world distortions, such as motion blur, low brightness, abnormal exposure, and object occlusion, which impede visual interpretation. However, existing works mainly investigate the impacts of synthesized distortions, which differ from real-world distortions greatly. This research aims to carry out an in-depth study for colonoscopy Video Quality Assessment (VQA). In this study, we advance this topic by establishing both subjective and objective solutions. Firstly, we collect 1,000 colonoscopy videos with typical visual quality degradation conditions in practice and construct a multi-attribute VQA database. The quality of each video is annotated by subjective experiments from five distortion attributes (i.e., temporal-spatial visibility, brightness, specular reflection, stability, and utility), as well as an overall perspective. Secondly, we propose a Distortion Attribute Reasoning Network (DARNet) for automatic VQA. DARNet includes two streams to extract features related to spatial and temporal distortions, respectively. It adaptively aggregates the attribute-related features through a multi-attribute association module to predict the quality score of each distortion attribute. Motivated by the observation that the rating behaviors for all attributes are different, a behavior guided reasoning module is further used to fuse the attribute-aware features, resulting in the overall quality. Experimental results on the constructed database show that our DARNet correlates well with subjective ratings and is superior to nine state-of-the-art methods.
Collapse
|
5
|
Linu Babu P, Jana S. Gastrointestinal tract disease detection via deep learning based Duo-Feature Optimized Hexa-Classification model. Biomed Signal Process Control 2025; 100:106994. [DOI: 10.1016/j.bspc.2024.106994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
6
|
Chu J, Liu W, Tian Q, Lu W. PFPRNet: A Phase-Wise Feature Pyramid With Retention Network for Polyp Segmentation. IEEE J Biomed Health Inform 2025; 29:1137-1150. [PMID: 40030242 DOI: 10.1109/jbhi.2024.3500026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Early detection of colonic polyps is crucial for the prevention and diagnosis of colorectal cancer. Currently, deep learning-based polyp segmentation methods have become mainstream and achieved remarkable results. Acquiring a large number of labeled data is time-consuming and labor-intensive, and meanwhile the presence of numerous similar wrinkles in polyp images also hampers model prediction performance. In this paper, we propose a novel approach called Phase-wise Feature Pyramid with Retention Network (PFPRNet), which leverages a pre-trained Transformer-based Encoder to obtain multi-scale feature maps. A Phase-wise Feature Pyramid with Retention Decoder is designed to gradually integrate global features into local features and guide the model's attention towards key regions. Additionally, our custom Enhance Perception module enables capturing image information from a broader perspective. Finally, we introduce an innovative Low-layer Retention module as an alternative to Transformer for more efficient global attention modeling. Evaluation results on several widely-used polyp segmentation datasets demonstrate that our proposed method has strong learning ability and generalization capability, and outperforms the state-of-the-art approaches.
Collapse
|
7
|
Raju ASN, Venkatesh K, Gatla RK, Konakalla EP, Eid MM, Titova N, Ghoneim SSM, Ghaly RNR. Colorectal cancer detection with enhanced precision using a hybrid supervised and unsupervised learning approach. Sci Rep 2025; 15:3180. [PMID: 39863646 PMCID: PMC11763007 DOI: 10.1038/s41598-025-86590-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 01/13/2025] [Indexed: 01/27/2025] Open
Abstract
The current work introduces the hybrid ensemble framework for the detection and segmentation of colorectal cancer. This framework will incorporate both supervised classification and unsupervised clustering methods to present more understandable and accurate diagnostic results. The method entails several steps with CNN models: ADa-22 and AD-22, transformer networks, and an SVM classifier, all inbuilt. The CVC ClinicDB dataset supports this process, containing 1650 colonoscopy images classified as polyps or non-polyps. The best performance in the ensembles was done by the AD-22 + Transformer + SVM model, with an AUC of 0.99, a training accuracy of 99.50%, and a testing accuracy of 99.00%. This group also saw a high accuracy of 97.50% for Polyps and 99.30% for Non-Polyps, together with a recall of 97.80% for Polyps and 98.90% for Non-Polyps, hence performing very well in identifying both cancerous and healthy regions. The framework proposed here uses K-means clustering in combination with the visualisation of bounding boxes, thereby improving segmentation and yielding a silhouette score of 0.73 with the best cluster configuration. It discusses how to combine feature interpretation challenges into medical imaging for accurate localization and precise segmentation of malignant regions. A good balance between performance and generalization shall be done by hyperparameter optimization-heavy learning rates; dropout rates and overfitting shall be suppressed effectively. The hybrid schema of this work treats the deficiencies of the previous approaches, such as incorporating CNN-based effective feature extraction, Transformer networks for developing attention mechanisms, and finally the fine decision boundary of the support vector machine. Further, we refine this process via unsupervised clustering for the purpose of enhancing the visualisation of such a procedure. Such a holistic framework, hence, further boosts classification and segmentation results by generating understandable outcomes for more rigorous benchmarking of detecting colorectal cancer and with higher reality towards clinical application feasibility.
Collapse
Affiliation(s)
- Akella S Narasimha Raju
- Department of Computer Science and Engineering (Data Science), Institute of Aeronautical Engineering, Dundigul, Hyderabad, Telangana, 500043, India.
| | - K Venkatesh
- Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, 603203, India.
| | - Ranjith Kumar Gatla
- Department of Computer Science and Engineering (Data Science), Institute of Aeronautical Engineering, Dundigul, Hyderabad, Telangana, 500043, India
| | - Eswara Prasad Konakalla
- Department of Physics and Electronics, B.V.Raju College, Bhimavaram, Garagaparru Road, Kovvada, Andhra Pradesh, 534202, India
| | - Marwa M Eid
- College of Applied Medical Science, Taif University, 21944, Taif, Saudi Arabia
| | - Nataliia Titova
- Biomedical Engineering Department, National University Odesa Polytechnic, Odesa, 65044, Ukraine.
| | - Sherif S M Ghoneim
- Department of Electrical Engineering, College of Engineering, Taif University, 21944, Taif, Saudi Arabia
| | - Ramy N R Ghaly
- Ministry of Higher Education, Mataria Technical College, Cairo, 11718, Egypt
- Chitkara Centre for Research and Development, Chitkara University, Solan, Himachal Pradesh, 174103, India
| |
Collapse
|
8
|
Oukdach Y, Garbaz A, Kerkaou Z, Ansari ME, Koutti L, Ouafdi AFE, Salihoun M. InCoLoTransNet: An Involution-Convolution and Locality Attention-Aware Transformer for Precise Colorectal Polyp Segmentation in GI Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01389-7. [PMID: 39825142 DOI: 10.1007/s10278-025-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/18/2024] [Accepted: 12/19/2024] [Indexed: 01/20/2025]
Abstract
Gastrointestinal (GI) disease examination presents significant challenges to doctors due to the intricate structure of the human digestive system. Colonoscopy and wireless capsule endoscopy are the most commonly used tools for GI examination. However, the large amount of data generated by these technologies requires the expertise and intervention of doctors for disease identification, making manual analysis a very time-consuming task. Thus, the development of a computer-assisted system is highly desirable to assist clinical professionals in making decisions in a low-cost and effective way. In this paper, we introduce a novel framework called InCoLoTransNet, designed for polyp segmentation. The study is based on a transformer and convolution-involution neural network, following the encoder-decoder architecture. We employed the vision transformer in the encoder section to focus on the global context, while the decoder involves a convolution-involution collaboration for resampling the polyp features. Involution enhances the model's ability to adaptively capture spatial and contextual information, while convolution focuses on local information, leading to more accurate feature extraction. The essential features captured by the transformer encoder are passed to the decoder through two skip connection pathways. The CBAM module refines the features and passes them to the convolution block, leveraging attention mechanisms to emphasize relevant information. Meanwhile, locality self-attention is employed to pass essential features to the involution block, reinforcing the model's ability to capture more global features in the polyp regions. Experiments were conducted on five public datasets: CVC-ClinicDB, CVC-ColonDB, Kvasir-SEG, Etis-LaribPolypDB, and CVC-300. The results obtained by InCoLoTransNet are optimal when compared with 15 state-of-the-art methods for polyp segmentation, achieving the highest mean dice score of 93% on CVC-ColonDB and 90% on mean intersection over union, outperforming the state-of-the-art methods. Additionally, InCoLoTransNet distinguishes itself in terms of polyp segmentation generalization performance. It achieved high scores in mean dice coefficient and mean intersection over union on unseen datasets as follows: 85% and 79% on CVC-ColonDB, 91% and 87% on CVC-300, and 79% and 70% on Etis-LaribPolypDB, respectively.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
9
|
Wei X, Sun J, Su P, Wan H, Ning Z. BCL-Former: Localized Transformer Fusion with Balanced Constraint for polyp image segmentation. Comput Biol Med 2024; 182:109182. [PMID: 39341109 DOI: 10.1016/j.compbiomed.2024.109182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/18/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024]
Abstract
Polyp segmentation remains challenging for two reasons: (a) the size and shape of colon polyps are variable and diverse; (b) the distinction between polyps and mucosa is not obvious. To solve the above two challenging problems and enhance the generalization ability of segmentation method, we propose the Localized Transformer Fusion with Balanced Constraint (BCL-Former) for Polyp Segmentation. In BCL-Former, the Strip Local Enhancement module (SLE module) is proposed to capture the enhanced local features. The Progressive Feature Fusion module (PFF module) is presented to make the feature aggregation smoother and eliminate the difference between high-level and low-level features. Moreover, the Tversky-based Appropriate Constrained Loss (TacLoss) is proposed to achieve the balance and constraint between True Positives and False Negatives, improving the ability to generalize across datasets. Extensive experiments are conducted on four benchmark datasets. Results show that our proposed method achieves state-of-the-art performance in both segmentation precision and generalization ability. Also, the proposed method is 5%-8% faster than the benchmark method in training and inference. The code is available at: https://github.com/sjc-lbj/BCL-Former.
Collapse
Affiliation(s)
- Xin Wei
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Jiacheng Sun
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Pengxiang Su
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Huan Wan
- School of Computer Information Engineering, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China.
| | - Zhitao Ning
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| |
Collapse
|
10
|
Tan J, Yuan J, Fu X, Bai Y. Colonoscopy polyp classification via enhanced scattering wavelet Convolutional Neural Network. PLoS One 2024; 19:e0302800. [PMID: 39392783 PMCID: PMC11469526 DOI: 10.1371/journal.pone.0302800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 08/26/2024] [Indexed: 10/13/2024] Open
Abstract
Among the most common cancers, colorectal cancer (CRC) has a high death rate. The best way to screen for colorectal cancer (CRC) is with a colonoscopy, which has been shown to lower the risk of the disease. As a result, Computer-aided polyp classification technique is applied to identify colorectal cancer. But visually categorizing polyps is difficult since different polyps have different lighting conditions. Different from previous works, this article presents Enhanced Scattering Wavelet Convolutional Neural Network (ESWCNN), a polyp classification technique that combines Convolutional Neural Network (CNN) and Scattering Wavelet Transform (SWT) to improve polyp classification performance. This method concatenates simultaneously learnable image filters and wavelet filters on each input channel. The scattering wavelet filters can extract common spectral features with various scales and orientations, while the learnable filters can capture image spatial features that wavelet filters may miss. A network architecture for ESWCNN is designed based on these principles and trained and tested using colonoscopy datasets (two public datasets and one private dataset). An n-fold cross-validation experiment was conducted for three classes (adenoma, hyperplastic, serrated) achieving a classification accuracy of 96.4%, and 94.8% accuracy in two-class polyp classification (positive and negative). In the three-class classification, correct classification rates of 96.2% for adenomas, 98.71% for hyperplastic polyps, and 97.9% for serrated polyps were achieved. The proposed method in the two-class experiment reached an average sensitivity of 96.7% with 93.1% specificity. Furthermore, we compare the performance of our model with the state-of-the-art general classification models and commonly used CNNs. Six end-to-end models based on CNNs were trained using 2 dataset of video sequences. The experimental results demonstrate that the proposed ESWCNN method can effectively classify polyps with higher accuracy and efficacy compared to the state-of-the-art CNN models. These findings can provide guidance for future research in polyp classification.
Collapse
Affiliation(s)
- Jun Tan
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Jiamin Yuan
- Health construction administration center, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong, China
- The Second Affiliated Hospital of Guangzhou University of Traditional Chinese Medicine(TCM), Guangzhou, Guangdong, China
| | - Xiaoyong Fu
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Yilin Bai
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
- China Southern Airlines, Guangzhou, Guangdong, China
| |
Collapse
|
11
|
Guo X, Xu L, Li S, Xu M, Chu Y, Jiang Q. Cascade-EC Network: Recognition of Gastrointestinal Multiple Lesions Based on EfficientNet and CA_stm_Retinanet. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1-11. [PMID: 38587768 PMCID: PMC11522239 DOI: 10.1007/s10278-024-01096-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/17/2024] [Accepted: 03/20/2024] [Indexed: 04/09/2024]
Abstract
Capsule endoscopy (CE) is non-invasive and painless during gastrointestinal examination. However, capsule endoscopy can increase the workload of image reviewing for clinicians, making it prone to missed and misdiagnosed diagnoses. Current researches primarily concentrated on binary classifiers, multiple classifiers targeting fewer than four abnormality types and detectors within a specific segment of the digestive tract, and segmenters for a single type of anomaly. Due to intra-class variations, the task of creating a unified scheme for detecting multiple gastrointestinal diseases is particularly challenging. A cascade neural network designed in this study, Cascade-EC, can automatically identify and localize four types of gastrointestinal lesions in CE images: angiectasis, bleeding, erosion, and polyp. Cascade-EC consists of EfficientNet for image classification and CA_stm_Retinanet for lesion detection and location. As the first layer of Cascade-EC, the EfficientNet network classifies CE images. CA_stm_Retinanet, as the second layer, performs the target detection and location task on the classified image. CA_stm_Retinanet adopts the general architecture of Retinanet. Its feature extraction module is the CA_stm_Backbone from the stack of CA_stm Block. CA_stm Block adopts the split-transform-merge strategy and introduces the coordinate attention. The dataset in this study is from Shanghai East Hospital, collected by PillCam SB3 and AnKon capsule endoscopes, which contains a total of 7936 images of 317 patients from the years 2017 to 2021. In the testing set, the average precision of Cascade-EC in the multi-lesions classification task was 94.55%, the average recall was 90.60%, and the average F1 score was 92.26%. The mean mAP@ 0.5 of Cascade-EC for detecting the four types of diseases is 85.88%. The experimental results show that compared with a single target detection network, Cascade-EC has better performance and can effectively assist clinicians to classify and detect multiple lesions in CE images.
Collapse
Affiliation(s)
- Xudong Guo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
| | - Lei Xu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Shengnan Li
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Meidong Xu
- Endoscopy Center, Department of Gastroenterology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China.
| | - Yuan Chu
- Endoscopy Center, Department of Gastroenterology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| | - Qinfen Jiang
- Department of Information Management, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| |
Collapse
|
12
|
Meng L, Li Y, Duan W. Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer. Comput Biol Med 2024; 179:108930. [PMID: 39067285 DOI: 10.1016/j.compbiomed.2024.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/30/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024]
Abstract
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Collapse
Affiliation(s)
- Lingbing Meng
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Yuting Li
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Weiwei Duan
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China.
| |
Collapse
|
13
|
Wang Z, Liu M, Jiang J, Qu X. Colorectal polyp segmentation with denoising diffusion probabilistic models. Comput Biol Med 2024; 180:108981. [PMID: 39146839 DOI: 10.1016/j.compbiomed.2024.108981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 07/13/2024] [Accepted: 08/01/2024] [Indexed: 08/17/2024]
Abstract
Early detection of polyps is essential to decrease colorectal cancer(CRC) incidence. Therefore, developing an efficient and accurate polyp segmentation technique is crucial for clinical CRC prevention. In this paper, we propose an end-to-end training approach for polyp segmentation that employs diffusion model. The images are considered as priors, and the segmentation is formulated as a mask generation process. In the sampling process, multiple predictions are generated for each input image using the trained model, and significant performance enhancements are achieved through the use of majority vote strategy. Four public datasets and one in-house dataset are used to train and test the model performance. The proposed method achieves mDice scores of 0.934 and 0.967 for datasets Kvasir-SEG and CVC-ClinicDB respectively. Furthermore, one cross-validation is applied to test the generalization of the proposed model, and the proposed methods outperformed previous state-of-the-art(SOTA) models to the best of our knowledge. The proposed method also significantly improves the segmentation accuracy and has strong generalization capability.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospital, the Third Clinical Medical College of Capital Medical University, Beijing, China.
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard Exploration, Changsha, China
| | - Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY, United States
| | - Xiaolei Qu
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China
| |
Collapse
|
14
|
Huang T, Shi J, Li J, Wang J, Du J, Shi J. Involution Transformer Based U-Net for Landmark Detection in Ultrasound Images for Diagnosis of Infantile DDH. IEEE J Biomed Health Inform 2024; 28:4797-4809. [PMID: 38630567 DOI: 10.1109/jbhi.2024.3390241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The B-mode ultrasound based computer-aided diagnosis (CAD) has demonstrated its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants, which can conduct the Graf's method by detecting landmarks in hip ultrasound images. However, it is still necessary to explore more valuable information around these landmarks to enhance feature representation for improving detection performance in the detection model. To this end, a novel Involution Transformer based U-Net (IT-UNet) network is proposed for hip landmark detection. The IT-UNet integrates the efficient involution operation into Transformer to develop an Involution Transformer module (ITM), which consists of an involution attention block and a squeeze-and-excitation involution block. The ITM can capture both the spatial-related information and long-range dependencies from hip ultrasound images to effectively improve feature representation. Moreover, an Involution Downsampling block (IDB) is developed to alleviate the issue of feature loss in the encoder modules, which combines involution and convolution for the purpose of downsampling. The experimental results on two DDH ultrasound datasets indicate that the proposed IT-UNet achieves the best landmark detection performance, indicating its potential applications.
Collapse
|
15
|
Yu L, Min W, Wang S. Boundary-Aware Gradient Operator Network for Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:4711-4723. [PMID: 38776204 DOI: 10.1109/jbhi.2024.3404273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Medical image segmentation is a crucial task in computer-aided diagnosis. Although convolutional neural networks (CNNs) have made significant progress in the field of medical image segmentation, the convolution kernels of CNNs are optimized from random initialization without explicitly encoding gradient information, leading to a lack of specificity for certain features, such as blurred boundary features. Furthermore, the frequently applied down-sampling operation also loses the fine structural features in shallow layers. Therefore, we propose a boundary-aware gradient operator network (BG-Net) for medical image segmentation, in which the gradient convolution (GConv) and the boundary-aware mechanism (BAM) modules are developed to simulate image boundary features and the remote dependencies between channels. The GConv module transforms the gradient operator into a convolutional operation that can extract gradient features; it attempts to extract more features such as images boundaries and textures, thereby fully utilizing limited input to capture more features representing boundaries. In addition, the BAM can increase the amount of global contextual information while suppressing invalid information by focusing on feature dependencies and the weight ratios between channels. Thus, the boundary perception ability of BG-Net is improved. Finally, we use a multi-modal fusion mechanism to effectively fuse lightweight gradient convolution and U-shaped branch features into a multilevel feature, enabling global dependencies and low-level spatial details to be effectively captured in a shallower manner. We conduct extensive experiments on eight datasets that broadly cover medical images to evaluate the effectiveness of the proposed BG-Net. The experimental results demonstrate that BG-Net outperforms the state-of-the-art methods, particularly those focused on boundary segmentation.
Collapse
|
16
|
Huang L, Wu Y. ACU-TransNet: Attention and convolution-augmented UNet-transformer network for polyp segmentation. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024; 32:1449-1464. [PMID: 39422983 DOI: 10.3233/xst-240076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
BACKGROUND UNet has achieved great success in medical image segmentation. However, due to the inherent locality of convolution operations, UNet is deficient in capturing global features and long-range dependencies of polyps, resulting in less accurate polyp recognition for complex morphologies and backgrounds. Transformers, with their sequential operations, are better at perceiving global features but lack low-level details, leading to limited localization ability. If the advantages of both architectures can be effectively combined, the accuracy of polyp segmentation can be further improved. METHODS In this paper, we propose an attention and convolution-augmented UNet-Transformer Network (ACU-TransNet) for polyp segmentation. This network is composed of the comprehensive attention UNet and the Transformer head, sequentially connected by the bridge layer. On the one hand, the comprehensive attention UNet enhances specific feature extraction through deformable convolution and channel attention in the first layer of the encoder and achieves more accurate shape extraction through spatial attention and channel attention in the decoder. On the other hand, the Transformer head supplements fine-grained information through convolutional attention and acquires hierarchical global characteristics from the feature maps. RESULTS mcU-TransNet could comprehensively learn dataset features and enhance colonoscopy interpretability for polyp detection. CONCLUSION Experimental results on the CVC-ClinicDB and Kvasir-SEG datasets demonstrate that mcU-TransNet outperforms existing state-of-the-art methods, showcasing its robustness.
Collapse
Affiliation(s)
- Lei Huang
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang, China
- College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Yun Wu
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang, China
- College of Computer Science and Technology, Guizhou University, Guiyang, China
| |
Collapse
|