1
|
Wang L, Wan J, Meng X, Chen B, Shao W. MCH-PAN: gastrointestinal polyp detection model integrating multi-scale feature information. Sci Rep 2024; 14:23382. [PMID: 39379452 PMCID: PMC11461898 DOI: 10.1038/s41598-024-74609-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 09/27/2024] [Indexed: 10/10/2024] Open
Abstract
The rise of object detection models has brought new breakthroughs to the development of clinical decision support systems. However, in the field of gastrointestinal polyp detection, there are still challenges such as uncertainty in polyp identification and inadequate coping with polyp scale variations. To address these challenges, this paper proposes a novel gastrointestinal polyp object detection model. The model can automatically identify polyp regions in gastrointestinal images and accurately label them. In terms of design, the model integrates multi-channel information to enhance the ability and robustness of channel feature expression, thus better coping with the complexity of polyp structures. At the same time, a hierarchical structure is constructed in the model to enhance the model's adaptability to multi-scale targets, effectively addressing the problem of large-scale variations in polyps. Furthermore, a channel attention mechanism is designed in the model to improve the accuracy of target positioning and reduce uncertainty in diagnosis. By integrating these strategies, the proposed gastrointestinal polyp object detection model can achieve accurate polyp detection, providing clinicians with reliable and valuable references. Experimental results show that the model exhibits superior performance in gastrointestinal polyp detection, which helps improve the diagnostic level of digestive system diseases and provides useful references for related research fields.
Collapse
Affiliation(s)
- Ling Wang
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, China.
| | - Jingjing Wan
- Department of Gastroenterology, The Second People's Hospital of Huai'an, The Affiliated Huai'an Hospital of Xuzhou Medical University, Huaian, 223002, China.
| | - Xianchun Meng
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, China
| | - Bolun Chen
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, China
| | - Wei Shao
- Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen, 518038, China.
| |
Collapse
|
2
|
Cai L, Chen L, Huang J, Wang Y, Zhang Y. Know your orientation: A viewpoint-aware framework for polyp segmentation. Med Image Anal 2024; 97:103288. [PMID: 39096844 DOI: 10.1016/j.media.2024.103288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
Collapse
Affiliation(s)
- Linghan Cai
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China; Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
| | - Lijiang Chen
- Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China
| | - Jianhao Huang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yifeng Wang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yongbing Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China.
| |
Collapse
|
3
|
Manan MA, Feng J, Yaqub M, Ahmed S, Imran SMA, Chuhan IS, Khan HA. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps. ALEXANDRIA ENGINEERING JOURNAL 2024; 105:341-359. [DOI: 10.1016/j.aej.2024.06.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
|
4
|
Xu W, Xu R, Wang C, Li X, Xu S, Guo L. PSTNet: Enhanced Polyp Segmentation With Multi-Scale Alignment and Frequency Domain Integration. IEEE J Biomed Health Inform 2024; 28:6042-6053. [PMID: 38954569 DOI: 10.1109/jbhi.2024.3421550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Accurate segmentation of colorectal polyps in colonoscopy images is crucial for effective diagnosis and management of colorectal cancer (CRC). However, current deep learning-based methods primarily rely on fusing RGB information across multiple scales, leading to limitations in accurately identifying polyps due to restricted RGB domain information and challenges in feature misalignment during multi-scale aggregation. To address these limitations, we propose the Polyp Segmentation Network with Shunted Transformer (PSTNet), a novel approach that integrates both RGB and frequency domain cues present in the images. PSTNet comprises three key modules: the Frequency Characterization Attention Module (FCAM) for extracting frequency cues and capturing polyp characteristics, the Feature Supplementary Alignment Module (FSAM) for aligning semantic information and reducing misalignment noise, and the Cross Perception localization Module (CPM) for synergizing frequency cues with high-level semantics to achieve efficient polyp segmentation. Extensive experiments on challenging datasets demonstrate PSTNet's significant improvement in polyp segmentation accuracy across various metrics, consistently outperforming state-of-the-art methods. The integration of frequency domain cues and the novel architectural design of PSTNet contribute to advancing computer-assisted polyp segmentation, facilitating more accurate diagnosis and management of CRC.
Collapse
|
5
|
Dai D, Dong C, Yan Q, Sun Y, Zhang C, Li Z, Xu S. I 2U-Net: A dual-path U-Net with rich information interaction for medical image segmentation. Med Image Anal 2024; 97:103241. [PMID: 38897032 DOI: 10.1016/j.media.2024.103241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/27/2024] [Accepted: 06/10/2024] [Indexed: 06/21/2024]
Abstract
Although the U-shape networks have achieved remarkable performances in many medical image segmentation tasks, they rarely model the sequential relationship of hierarchical layers. This weakness makes it difficult for the current layer to effectively utilize the historical information of the previous layer, leading to unsatisfactory segmentation results for lesions with blurred boundaries and irregular shapes. To solve this problem, we propose a novel dual-path U-Net, dubbed I2U-Net. The newly proposed network encourages historical information re-usage and re-exploration through rich information interaction among the dual paths, allowing deep layers to learn more comprehensive features that contain both low-level detail description and high-level semantic abstraction. Specifically, we introduce a multi-functional information interaction module (MFII), which can model cross-path, cross-layer, and cross-path-and-layer information interactions via a unified design, making the proposed I2U-Net behave similarly to an unfolded RNN and enjoying its advantage of modeling time sequence information. Besides, to further selectively and sensitively integrate the information extracted by the encoder of the dual paths, we propose a holistic information fusion and augmentation module (HIFA), which can efficiently bridge the encoder and the decoder. Extensive experiments on four challenging tasks, including skin lesion, polyp, brain tumor, and abdominal multi-organ segmentation, consistently show that the proposed I2U-Net has superior performance and generalization ability over other state-of-the-art methods. The code is available at https://github.com/duweidai/I2U-Net.
Collapse
Affiliation(s)
- Duwei Dai
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Caixia Dong
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Qingsen Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yongheng Sun
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Chunyan Zhang
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Zongfang Li
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| | - Songhua Xu
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| |
Collapse
|
6
|
Paderno A, Bedi N, Rau A, Holsinger CF. Computer Vision and Videomics in Otolaryngology-Head and Neck Surgery: Bridging the Gap Between Clinical Needs and the Promise of Artificial Intelligence. Otolaryngol Clin North Am 2024; 57:703-718. [PMID: 38981809 DOI: 10.1016/j.otc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Collapse
Affiliation(s)
- Alberto Paderno
- IRCCS Humanitas Research Hospital, via Manzoni 56, Rozzano, Milan 20089, Italy; Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan 20072, Italy.
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | |
Collapse
|
7
|
Oukdach Y, Garbaz A, Kerkaou Z, El Ansari M, Koutti L, El Ouafdi AF, Salihoun M. UViT-Seg: An Efficient ViT and U-Net-Based Framework for Accurate Colorectal Polyp Segmentation in Colonoscopy and WCE Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2354-2374. [PMID: 38671336 DOI: 10.1007/s10278-024-01124-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/01/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024]
Abstract
Colorectal cancer (CRC) stands out as one of the most prevalent global cancers. The accurate localization of colorectal polyps in endoscopy images is pivotal for timely detection and removal, contributing significantly to CRC prevention. The manual analysis of images generated by gastrointestinal screening technologies poses a tedious task for doctors. Therefore, computer vision-assisted cancer detection could serve as an efficient tool for polyp segmentation. Numerous efforts have been dedicated to automating polyp localization, with the majority of studies relying on convolutional neural networks (CNNs) to learn features from polyp images. Despite their success in polyp segmentation tasks, CNNs exhibit significant limitations in precisely determining polyp location and shape due to their sole reliance on learning local features from images. While gastrointestinal images manifest significant variation in their features, encompassing both high- and low-level ones, a framework that combines the ability to learn both features of polyps is desired. This paper introduces UViT-Seg, a framework designed for polyp segmentation in gastrointestinal images. Operating on an encoder-decoder architecture, UViT-Seg employs two distinct feature extraction methods. A vision transformer in the encoder section captures long-range semantic information, while a CNN module, integrating squeeze-excitation and dual attention mechanisms, captures low-level features, focusing on critical image regions. Experimental evaluations conducted on five public datasets, including CVC clinic, ColonDB, Kvasir-SEG, ETIS LaribDB, and Kvasir Capsule-SEG, demonstrate UViT-Seg's effectiveness in polyp localization. To confirm its generalization performance, the model is tested on datasets not used in training. Benchmarking against common segmentation methods and state-of-the-art polyp segmentation approaches, the proposed model yields promising results. For instance, it achieves a mean Dice coefficient of 0.915 and a mean intersection over union of 0.902 on the CVC Colon dataset. Furthermore, UViT-Seg has the advantage of being efficient, requiring fewer computational resources for both training and testing. This feature positions it as an optimal choice for real-world deployment scenarios.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
8
|
Tudela Y, Majó M, de la Fuente N, Galdran A, Krenzer A, Puppe F, Yamlahi A, Tran TN, Matuszewski BJ, Fitzgerald K, Bian C, Pan J, Liu S, Fernández-Esparrach G, Histace A, Bernal J. A complete benchmark for polyp detection, segmentation and classification in colonoscopy images. Front Oncol 2024; 14:1417862. [PMID: 39381041 PMCID: PMC11458519 DOI: 10.3389/fonc.2024.1417862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/11/2024] [Indexed: 10/10/2024] Open
Abstract
Introduction Colorectal cancer (CRC) is one of the main causes of deaths worldwide. Early detection and diagnosis of its precursor lesion, the polyp, is key to reduce its mortality and to improve procedure efficiency. During the last two decades, several computational methods have been proposed to assist clinicians in detection, segmentation and classification tasks but the lack of a common public validation framework makes it difficult to determine which of them is ready to be deployed in the exploration room. Methods This study presents a complete validation framework and we compare several methodologies for each of the polyp characterization tasks. Results Results show that the majority of the approaches are able to provide good performance for the detection and segmentation task, but that there is room for improvement regarding polyp classification. Discussion While studied show promising results in the assistance of polyp detection and segmentation tasks, further research should be done in classification task to obtain reliable results to assist the clinicians during the procedure. The presented framework provides a standarized method for evaluating and comparing different approaches, which could facilitate the identification of clinically prepared assisting methods.
Collapse
Affiliation(s)
- Yael Tudela
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Mireia Majó
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Neil de la Fuente
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Adrian Galdran
- Department of Information and Communication Technologies, SymBioSys Research Group, BCNMedTech, Barcelona, Spain
| | - Adrian Krenzer
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Frank Puppe
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Amine Yamlahi
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thuy Nuong Tran
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Bogdan J. Matuszewski
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Kerr Fitzgerald
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Cheng Bian
- Hebei University of Technology, Baoding, China
| | | | - Shijle Liu
- Hebei University of Technology, Baoding, China
| | | | - Aymeric Histace
- ETIS UMR 8051, École Nationale Supérieure de l'Électronique et de ses Applications (ENSEA), Centre national de la recherche scientifique (CNRS), CY Paris Cergy University, Cergy, France
| | - Jorge Bernal
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| |
Collapse
|
9
|
Du X, Xu X, Chen J, Zhang X, Li L, Liu H, Li S. UM-Net: Rethinking ICGNet for polyp segmentation with uncertainty modeling. Med Image Anal 2024; 99:103347. [PMID: 39316997 DOI: 10.1016/j.media.2024.103347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/26/2024] [Accepted: 09/10/2024] [Indexed: 09/26/2024]
Abstract
Automatic segmentation of polyps from colonoscopy images plays a critical role in the early diagnosis and treatment of colorectal cancer. Nevertheless, some bottlenecks still exist. In our previous work, we mainly focused on polyps with intra-class inconsistency and low contrast, using ICGNet to solve them. Due to the different equipment, specific locations and properties of polyps, the color distribution of the collected images is inconsistent. ICGNet was designed primarily with reverse-contour guide information and local-global context information, ignoring this inconsistent color distribution, which leads to overfitting problems and makes it difficult to focus only on beneficial image content. In addition, a trustworthy segmentation model should not only produce high-precision results but also provide a measure of uncertainty to accompany its predictions so that physicians can make informed decisions. However, ICGNet only gives the segmentation result and lacks the uncertainty measure. To cope with these novel bottlenecks, we further extend the original ICGNet to a comprehensive and effective network (UM-Net) with two main contributions that have been proved by experiments to have substantial practical value. Firstly, we employ a color transfer operation to weaken the relationship between color and polyps, making the model more concerned with the shape of the polyps. Secondly, we provide the uncertainty to represent the reliability of the segmentation results and use variance to rectify uncertainty. Our improved method is evaluated on five polyp datasets, which shows competitive results compared to other advanced methods in both learning ability and generalization capability. The source code is available at https://github.com/dxqllp/UM-Net.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuebin Xu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
10
|
Kusters CHJ, Jaspers TJM, Boers TGW, Jong MR, Jukema JB, Fockens KN, de Groof AJ, Bergman JJ, van der Sommen F, De With PHN. Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization. Med Image Anal 2024; 99:103348. [PMID: 39298861 DOI: 10.1016/j.media.2024.103348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/10/2024] [Accepted: 09/10/2024] [Indexed: 09/22/2024]
Abstract
Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety-critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett's esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.
Collapse
Affiliation(s)
- Carolus H J Kusters
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tim J M Jaspers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Tim G W Boers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Martijn R Jong
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Jelmer B Jukema
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Kiki N Fockens
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert J de Groof
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Jacques J Bergman
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Fons van der Sommen
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Peter H N De With
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
11
|
Meng L, Li Y, Duan W. Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer. Comput Biol Med 2024; 179:108930. [PMID: 39067285 DOI: 10.1016/j.compbiomed.2024.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/30/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024]
Abstract
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Collapse
Affiliation(s)
- Lingbing Meng
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Yuting Li
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Weiwei Duan
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China.
| |
Collapse
|
12
|
Arsa DMS, Ilyas T, Park SH, Chua L, Kim H. Efficient multi-stage feedback attention for diverse lesion in cancer image segmentation. Comput Med Imaging Graph 2024; 116:102417. [PMID: 39067303 DOI: 10.1016/j.compmedimag.2024.102417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/11/2024] [Accepted: 07/10/2024] [Indexed: 07/30/2024]
Abstract
In the domain of Computer-Aided Diagnosis (CAD) systems, the accurate identification of cancer lesions is paramount, given the life-threatening nature of cancer and the complexities inherent in its manifestation. This task is particularly arduous due to the often vague boundaries of cancerous regions, compounded by the presence of noise and the heterogeneity in the appearance of lesions, making precise segmentation a critical yet challenging endeavor. This study introduces an innovative, an iterative feedback mechanism tailored for the nuanced detection of cancer lesions in a variety of medical imaging modalities, offering a refining phase to adjust detection results. The core of our approach is the elimination of the need for an initial segmentation mask, a common limitation in iterative-based segmentation methods. Instead, we utilize a novel system where the feedback for refining segmentation is derived directly from the encoder-decoder architecture of our neural network model. This shift allows for more dynamic and accurate lesion identification. To further enhance the accuracy of our CAD system, we employ a multi-scale feedback attention mechanism to guide and refine predicted mask subsequent iterations. In parallel, we introduce a sophisticated weighted feedback loss function. This function synergistically combines global and iteration-specific loss considerations, thereby refining parameter estimation and improving the overall precision of the segmentation. We conducted comprehensive experiments across three distinct categories of medical imaging: colonoscopy, ultrasonography, and dermoscopic images. The experimental results demonstrate that our method not only competes favorably with but also surpasses current state-of-the-art methods in various scenarios, including both standard and challenging out-of-domain tasks. This evidences the robustness and versatility of our approach in accurately identifying cancer lesions across a spectrum of medical imaging contexts. Our source code can be found at https://github.com/dewamsa/EfficientFeedbackNetwork.
Collapse
Affiliation(s)
- Dewa Made Sri Arsa
- Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea; Department of Information Technology, Universitas Udayana, Indonesia; Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| | - Talha Ilyas
- Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea; Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| | - Seok-Hwan Park
- Division of Electronic Engineering, Jeonbuk National University, Republic of Korea.
| | - Leon Chua
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA.
| | - Hyongsuk Kim
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| |
Collapse
|
13
|
Tang S, Ran H, Yang S, Wang Z, Li W, Li H, Meng Z. A frequency selection network for medical image segmentation. Heliyon 2024; 10:e35698. [PMID: 39220902 PMCID: PMC11365330 DOI: 10.1016/j.heliyon.2024.e35698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
Collapse
Affiliation(s)
- Shu Tang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Haiheng Ran
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Shuli Yang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zhaoxia Wang
- Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Wei Li
- Children’s Hospital of Chongqing Medical University, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| |
Collapse
|
14
|
Chang Q, Ahmad D, Toth J, Bascom R, Higgins WE. ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video. J Imaging 2024; 10:191. [PMID: 39194980 DOI: 10.3390/jimaging10080191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/19/2024] [Accepted: 08/01/2024] [Indexed: 08/29/2024] Open
Abstract
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
Collapse
Affiliation(s)
- Qi Chang
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| | - Danish Ahmad
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Jennifer Toth
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Rebecca Bascom
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - William E Higgins
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
15
|
Jiang Y, Zhang Z, Hu Y, Li G, Wan X, Wu S, Cui S, Huang S, Li Z. ECC-PolypDet: Enhanced CenterNet With Contrastive Learning for Automatic Polyp Detection. IEEE J Biomed Health Inform 2024; 28:4785-4796. [PMID: 37983159 DOI: 10.1109/jbhi.2023.3334240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.
Collapse
|
16
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
17
|
Liu S, Lin Y, Liu D. FreqSNet: a multiaxial integration of frequency and spatial domains for medical image segmentation. Phys Med Biol 2024; 69:145011. [PMID: 38959911 DOI: 10.1088/1361-6560/ad5ef3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 07/03/2024] [Indexed: 07/05/2024]
Abstract
Objective.In recent years, convolutional neural networks, which typically focus on extracting spatial domain features, have shown limitations in learning global contextual information. However, frequency domain can offer a global perspective that spatial domain methods often struggle to capture. To address this limitation, we propose FreqSNet, which leverages both frequency and spatial features for medical image segmentation.Approach.To begin, we propose a frequency-space representation aggregation block (FSRAB) to replace conventional convolutions. FSRAB contains three frequency domain branches to capture global frequency information along different axial combinations, while a convolutional branch is designed to interact information across channels in local spatial features. Secondly, the multiplex expansion attention block extracts long-range dependency information using dilated convolutional blocks, while suppressing irrelevant information via attention mechanisms. Finally, the introduced Feature Integration Block enhances feature representation by integrating semantic features that fuse spatial and channel positional information.Main results.We validated our method on 5 public datasets, including BUSI, CVC-ClinicDB, CVC-ColonDB, ISIC-2018, and Luna16. On these datasets, our method achieved Intersection over Union (IoU) scores of 75.46%, 87.81%, 79.08%, 84.04%, and 96.99%, and Hausdorff distance values of 22.22 mm, 13.20 mm, 13.08 mm, 13.51 mm, and 5.22 mm, respectively. Compared to other state-of-the-art methods, our FreqSNet achieves better segmentation results.Significance.Our method can effectively combine frequency domain information with spatial domain features, enhancing the segmentation performance and generalization capability in medical image segmentation tasks.
Collapse
Affiliation(s)
- Shangwang Liu
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| | - Yinghai Lin
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| | - Danyang Liu
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| |
Collapse
|
18
|
Huang C, Shi Y, Zhang B, Lyu K. Uncertainty-aware prototypical learning for anomaly detection in medical images. Neural Netw 2024; 175:106284. [PMID: 38593560 DOI: 10.1016/j.neunet.2024.106284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/14/2024] [Accepted: 03/29/2024] [Indexed: 04/11/2024]
Abstract
Anomalous object detection (AOD) in medical images aims to recognize the anomalous lesions, and is crucial for early clinical diagnosis of various cancers. However, it is a difficult task because of two reasons: (1) the diversity of the anomalous lesions and (2) the ambiguity of the boundary between anomalous lesions and their normal surroundings. Unlike existing single-modality AOD models based on deterministic mapping, we constructed a probabilistic and deterministic AOD model. Specifically, we designed an uncertainty-aware prototype learning framework, which considers the diversity and ambiguity of anomalous lesions. A prototypical learning transformer (Pformer) is established to extract and store the prototype features of different anomalous lesions. Moreover, Bayesian neural uncertainty quantizer, a probabilistic model, is designed to model the distributions over the outputs of the model to measure the uncertainty of the model's detection results for each pixel. Essentially, the uncertainty of the model's anomaly detection result for a pixel can reflect the anomalous ambiguity of this pixel. Furthermore, an uncertainty-guided reasoning transformer (Uformer) is devised to employ the anomalous ambiguity, encouraging the proposed model to focus on pixels with high uncertainty. Notably, prototypical representations stored in Pformer are also utilized in anomaly reasoning that enables the model to perceive diversities of the anomalous objects. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method. The source code will be available in github.com/umchaohuang/UPformer.
Collapse
Affiliation(s)
- Chao Huang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China; Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Yushu Shi
- Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China.
| | - Ke Lyu
- School of Engineering Sciences, University of the Chinese Academy of Sciences, Beijing, 100049, China; Pengcheng Laboratory, Shenzhen, 518055, China
| |
Collapse
|
19
|
Wan L, Chen Z, Xiao Y, Zhao J, Feng W, Fu H. Iterative feedback-based models for image and video polyp segmentation. Comput Biol Med 2024; 177:108569. [PMID: 38781640 DOI: 10.1016/j.compbiomed.2024.108569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 03/27/2024] [Accepted: 05/05/2024] [Indexed: 05/25/2024]
Abstract
Accurate segmentation of polyps in colonoscopy images has gained significant attention in recent years, given its crucial role in automated colorectal cancer diagnosis. Many existing deep learning-based methods follow a one-stage processing pipeline, often involving feature fusion across different levels or utilizing boundary-related attention mechanisms. Drawing on the success of applying Iterative Feedback Units (IFU) in image polyp segmentation, this paper proposes FlowICBNet by extending the IFU to the domain of video polyp segmentation. By harnessing the unique capabilities of IFU to propagate and refine past segmentation results, our method proves effective in mitigating challenges linked to the inherent limitations of endoscopic imaging, notably the presence of frequent camera shake and frame defocusing. Furthermore, in FlowICBNet, we introduce two pivotal modules: Reference Frame Selection (RFS) and Flow Guided Warping (FGW). These modules play a crucial role in filtering and selecting the most suitable historical reference frames for the task at hand. The experimental results on a large video polyp segmentation dataset demonstrate that our method can significantly outperform state-of-the-art methods by notable margins achieving an average metrics improvement of 7.5% on SUN-SEG-Easy and 7.4% on SUN-SEG-Hard. Our code is available at https://github.com/eraserNut/ICBNet.
Collapse
Affiliation(s)
- Liang Wan
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Zhihao Chen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Yefan Xiao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Junting Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Wei Feng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Huazhu Fu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore, 138632, Republic of Singapore.
| |
Collapse
|
20
|
Xu Z, Miao Y, Chen G, Liu S, Chen H. GLGFormer: Global Local Guidance Network for Mucosal Lesion Segmentation in Gastrointestinal Endoscopy Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01162-2. [PMID: 38940891 DOI: 10.1007/s10278-024-01162-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 05/05/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
Automatic mucosal lesion segmentation is a critical component in computer-aided clinical support systems for endoscopic image analysis. Image segmentation networks currently rely mainly on convolutional neural networks (CNNs) and Transformers, which have demonstrated strong performance in various applications. However, they cannot cope with blurred lesion boundaries and lesions of different scales in gastrointestinal endoscopy images. To address these challenges, we propose a new Transformer-based network, named GLGFormer, for the task of mucosal lesion segmentation. Specifically, we design the global guidance module to guide single-scale features patch-wise, enabling them to incorporate global information from the global map without information loss. Furthermore, a partial decoder is employed to fuse these enhanced single-scale features, achieving single-scale to multi-scale enhancement. Additionally, the local guidance module is designed to refocus attention on the neighboring patch, thus enhancing local features and refining lesion boundary segmentation. We conduct experiments on a private atrophic gastritis segmentation dataset and four public gastrointestinal polyp segmentation datasets. Compared to the current lesion segmentation networks, our proposed GLGFormer demonstrates outstanding learning and generalization capabilities. On the public dataset ClinicDB, GLGFormer achieved a mean intersection over union (mIoU) of 91.0% and a mean dice coefficient (mDice) of 95.0%. On the private dataset Gastritis-Seg, GLGFormer achieved an mIoU of 90.6% and an mDice of 94.6%.
Collapse
Affiliation(s)
- Zhiyang Xu
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China
| | - Yanzi Miao
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China.
| | - Guangxia Chen
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Shiyu Liu
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Hu Chen
- The First Clinical Medical School of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| |
Collapse
|
21
|
Ji Z, Li X, Liu J, Chen R, Liao Q, Lyu T, Zhao L. LightCF-Net: A Lightweight Long-Range Context Fusion Network for Real-Time Polyp Segmentation. Bioengineering (Basel) 2024; 11:545. [PMID: 38927781 PMCID: PMC11201063 DOI: 10.3390/bioengineering11060545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/28/2024] Open
Abstract
Automatically segmenting polyps from colonoscopy videos is crucial for developing computer-assisted diagnostic systems for colorectal cancer. Existing automatic polyp segmentation methods often struggle to fulfill the real-time demands of clinical applications due to their substantial parameter count and computational load, especially those based on Transformer architectures. To tackle these challenges, a novel lightweight long-range context fusion network, named LightCF-Net, is proposed in this paper. This network attempts to model long-range spatial dependencies while maintaining real-time performance, to better distinguish polyps from background noise and thus improve segmentation accuracy. A novel Fusion Attention Encoder (FAEncoder) is designed in the proposed network, which integrates Large Kernel Attention (LKA) and channel attention mechanisms to extract deep representational features of polyps and unearth long-range dependencies. Furthermore, a newly designed Visual Attention Mamba module (VAM) is added to the skip connections, modeling long-range context dependencies in the encoder-extracted features and reducing background noise interference through the attention mechanism. Finally, a Pyramid Split Attention module (PSA) is used in the bottleneck layer to extract richer multi-scale contextual features. The proposed method was thoroughly evaluated on four renowned polyp segmentation datasets: Kvasir-SEG, CVC-ClinicDB, BKAI-IGH, and ETIS. Experimental findings demonstrate that the proposed method delivers higher segmentation accuracy in less time, consistently outperforming the most advanced lightweight polyp segmentation networks.
Collapse
Affiliation(s)
- Zhanlin Ji
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
- College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China
| | - Xiaoyu Li
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
| | - Jianuo Liu
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
| | - Rui Chen
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Qinping Liao
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Tao Lyu
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Li Zhao
- Beijing National Research Center for Information Science and Technology, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
22
|
Biffi C, Antonelli G, Bernhofer S, Hassan C, Hirata D, Iwatate M, Maieron A, Salvagnini P, Cherubini A. REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Sci Data 2024; 11:539. [PMID: 38796533 PMCID: PMC11127922 DOI: 10.1038/s41597-024-03359-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 05/10/2024] [Indexed: 05/28/2024] Open
Abstract
Detection and diagnosis of colon polyps are key to preventing colorectal cancer. Recent evidence suggests that AI-based computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems can enhance endoscopists' performance and boost colonoscopy effectiveness. However, most available public datasets primarily consist of still images or video clips, often at a down-sampled resolution, and do not accurately represent real-world colonoscopy procedures. We introduce the REAL-Colon (Real-world multi-center Endoscopy Annotated video Library) dataset: a compilation of 2.7 M native video frames from sixty full-resolution, real-world colonoscopy recordings across multiple centers. The dataset contains 350k bounding-box annotations, each created under the supervision of expert gastroenterologists. Comprehensive patient clinical data, colonoscopy acquisition information, and polyp histopathological information are also included in each video. With its unprecedented size, quality, and heterogeneity, the REAL-Colon dataset is a unique resource for researchers and developers aiming to advance AI research in colonoscopy. Its openness and transparency facilitate rigorous and reproducible research, fostering the development and benchmarking of more accurate and reliable colonoscopy-related algorithms and models.
Collapse
Affiliation(s)
- Carlo Biffi
- Cosmo Intelligent Medical Devices, Dublin, Ireland.
| | - Giulio Antonelli
- Gastroenterology and Digestive Endoscopy Unit, Ospedale dei Castelli (N.O.C.), Rome, Italy
| | - Sebastian Bernhofer
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Internal Medicine 2, University Hospital St. Pölten, St. Pölten, Austria
| | - Cesare Hassan
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Italy
- Endoscopy Unit, Humanitas Clinical and Research Center IRCCS, Rozzano, Italy
| | - Daizen Hirata
- Gastrointestinal Center, Sano Hospital, Hyogo, Japan
| | - Mineo Iwatate
- Gastrointestinal Center, Sano Hospital, Hyogo, Japan
| | - Andreas Maieron
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Internal Medicine 2, University Hospital St. Pölten, St. Pölten, Austria
| | | | - Andrea Cherubini
- Cosmo Intelligent Medical Devices, Dublin, Ireland.
- Milan Center for Neuroscience, University of Milano-Bicocca, Milano, Italy.
| |
Collapse
|
23
|
Han G, Guo W, Zhang H, Jin J, Gan X, Zhao X. Sample self-selection using dual teacher networks for pathological image classification with noisy labels. Comput Biol Med 2024; 174:108489. [PMID: 38640633 DOI: 10.1016/j.compbiomed.2024.108489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/02/2024] [Accepted: 04/15/2024] [Indexed: 04/21/2024]
Abstract
Deep neural networks (DNNs) involve advanced image processing but depend on large quantities of high-quality labeled data. The presence of noisy data significantly degrades the DNN model performance. In the medical field, where model accuracy is crucial and labels for pathological images are scarce and expensive to obtain, the need to handle noisy data is even more urgent. Deep networks exhibit a memorization effect, they tend to prioritize remembering clean labels initially. Therefore, early stopping is highly effective in managing learning with noisy labels. Previous research has often concentrated on developing robust loss functions or implementing training constraints to mitigate the impact of noisy labels; however, such approaches have frequently resulted in underfitting. We propose using knowledge distillation to slow the learning process of the target network rather than preventing late-stage training from being affected by noisy labels. In this paper, we introduce a data sample self-selection strategy based on early stopping to filter out most of the noisy data. Additionally, we employ the distillation training method with dual teacher networks to ensure the steady learning of the student network. The experimental results show that our method outperforms current state-of-the-art methods for handling noisy labels on both synthetic and real-world noisy datasets. In particular, on the real-world pathological image dataset Chaoyang, the highest classification accuracy increased by 2.39 %. Our method leverages the model's predictions based on training history to select cleaner datasets and retrains them using these cleaner datasets, significantly mitigating the impact of noisy labels on model performance.
Collapse
Affiliation(s)
- Gang Han
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China; School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| | - Wenping Guo
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China.
| | - Haibo Zhang
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| | - Jie Jin
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| | - Xingli Gan
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China
| | - Xiaoming Zhao
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| |
Collapse
|
24
|
Daneshpajooh V, Ahmad D, Toth J, Bascom R, Higgins WE. Automatic lesion detection for narrow-band imaging bronchoscopy. J Med Imaging (Bellingham) 2024; 11:036002. [PMID: 38827776 PMCID: PMC11138083 DOI: 10.1117/1.jmi.11.3.036002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 04/04/2024] [Accepted: 05/14/2024] [Indexed: 06/05/2024] Open
Abstract
Purpose Early detection of cancer is crucial for lung cancer patients, as it determines disease prognosis. Lung cancer typically starts as bronchial lesions along the airway walls. Recent research has indicated that narrow-band imaging (NBI) bronchoscopy enables more effective bronchial lesion detection than other bronchoscopic modalities. Unfortunately, NBI video can be hard to interpret because physicians currently are forced to perform a time-consuming subjective visual search to detect bronchial lesions in a long airway-exam video. As a result, NBI bronchoscopy is not regularly used in practice. To alleviate this problem, we propose an automatic two-stage real-time method for bronchial lesion detection in NBI video and perform a first-of-its-kind pilot study of the method using NBI airway exam video collected at our institution. Approach Given a patient's NBI video, the first method stage entails a deep-learning-based object detection network coupled with a multiframe abnormality measure to locate candidate lesions on each video frame. The second method stage then draws upon a Siamese network and a Kalman filter to track candidate lesions over multiple frames to arrive at final lesion decisions. Results Tests drawing on 23 patient NBI airway exam videos indicate that the method can process an incoming video stream at a real-time frame rate, thereby making the method viable for real-time inspection during a live bronchoscopic airway exam. Furthermore, our studies showed a 93% sensitivity and 86% specificity for lesion detection; this compares favorably to a sensitivity and specificity of 80% and 84% achieved over a series of recent pooled clinical studies using the current time-consuming subjective clinical approach. Conclusion The method shows potential for robust lesion detection in NBI video at a real-time frame rate. Therefore, it could help enable more common use of NBI bronchoscopy for bronchial lesion detection.
Collapse
Affiliation(s)
- Vahid Daneshpajooh
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| | - Danish Ahmad
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Jennifer Toth
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Rebecca Bascom
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - William E. Higgins
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| |
Collapse
|
25
|
Su D, Luo J, Fei C. An Efficient and Rapid Medical Image Segmentation Network. IEEE J Biomed Health Inform 2024; 28:2979-2990. [PMID: 38457317 DOI: 10.1109/jbhi.2024.3374780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Accurate medical image segmentation is an essential part of the medical image analysis process that provides detailed quantitative metrics. In recent years, extensions of classical networks such as UNet have achieved state-of-the-art performance on medical image segmentation tasks. However, the high model complexity of these networks limits their applicability to devices with constrained computational resources. To alleviate this problem, we propose a shallow hierarchical Transformer for medical image segmentation, called SHFormer. By decreasing the number of transformer blocks utilized, the model complexity of SHFormer can be reduced to an acceptable level. To improve the learned attention while keeping the structure lightweight, we propose a spatial-channel connection module. This module separately learns attention in the spatial and channel dimensions of the feature while interconnecting them to produce more focused attention. To keep the decoder lightweight, the MLP-D module is proposed to progressively fuse multi-scale features in which channels are aligned using Multi-Layer Perceptron (MLP) and spatial information is fused by convolutional blocks. We first validated the performance of SHFormer on the ISIC-2018 dataset. Compared to the latest network, SHFormer exhibits comparable performance with 15 times fewer parameters, 30 times lower computational complexity and 5 times higher inference efficiency. To test the generalizability of SHFormer, we introduced the polyp dataset for additional testing. SHFormer achieves comparable segmentation accuracy to the latest network while having lower computational overhead.
Collapse
|
26
|
Jaspers TJM, Boers TGW, Kusters CHJ, Jong MR, Jukema JB, de Groof AJ, Bergman JJ, de With PHN, van der Sommen F. Robustness evaluation of deep neural networks for endoscopic image analysis: Insights and strategies. Med Image Anal 2024; 94:103157. [PMID: 38574544 DOI: 10.1016/j.media.2024.103157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 03/19/2024] [Accepted: 03/21/2024] [Indexed: 04/06/2024]
Abstract
Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experience of the endoscopist and the specifications of the system used for screening. Factors such as poor illumination, motion blur, and specific post-processing settings can significantly alter the quality and general appearance of these images. This so-called domain gap between the data used for developing the system and the data it encounters after deployment, and the impact it has on the performance of deep neural networks (DNNs) supportive endoscopic CAD systems remains largely unexplored. As many of such systems, for e.g. polyp detection, are already being rolled out in clinical practice, this poses severe patient risks in particularly community hospitals, where both the imaging equipment and experience are subject to considerable variation. Therefore, this study aims to evaluate the impact of this domain gap on the clinical performance of CADe/CADx for various endoscopic applications. For this, we leverage two publicly available data sets (KVASIR-SEG and GIANA) and two in-house data sets. We investigate the performance of commonly-used DNN architectures under synthetic, clinically calibrated image degradations and on a prospectively collected dataset including 342 endoscopic images of lower subjective quality. Additionally, we assess the influence of DNN architecture and complexity, data augmentation, and pretraining techniques for improved robustness. The results reveal a considerable decline in performance of 11.6% (±1.5) as compared to the reference, within the clinically calibrated boundaries of image degradations. Nevertheless, employing more advanced DNN architectures and self-supervised in-domain pre-training effectively mitigate this drop to 7.7% (±2.03). Additionally, these enhancements yield the highest performance on the manually collected test set including images with lower subjective quality. By comprehensively assessing the robustness of popular DNN architectures and training strategies across multiple datasets, this study provides valuable insights into their performance and limitations for endoscopic applications. The findings highlight the importance of including robustness evaluation when developing DNNs for endoscopy applications and propose strategies to mitigate performance loss.
Collapse
Affiliation(s)
- Tim J M Jaspers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tim G W Boers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Carolus H J Kusters
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Martijn R Jong
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Jelmer B Jukema
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert J de Groof
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Jacques J Bergman
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Peter H N de With
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Fons van der Sommen
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
27
|
Zhang K, Hu D, Li X, Wang X, Hu X, Wang C, Yang J, Rao N. BFE-Net: bilateral fusion enhanced network for gastrointestinal polyp segmentation. BIOMEDICAL OPTICS EXPRESS 2024; 15:2977-2999. [PMID: 38855696 PMCID: PMC11161362 DOI: 10.1364/boe.522441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/17/2024] [Accepted: 03/17/2024] [Indexed: 06/11/2024]
Abstract
Accurate segmentation of polyp regions in gastrointestinal endoscopic images is pivotal for diagnosis and treatment. Despite advancements, challenges persist, like accurately segmenting small polyps and maintaining accuracy when polyps resemble surrounding tissues. Recent studies show the effectiveness of the pyramid vision transformer (PVT) in capturing global context, yet it may lack detailed information. Conversely, U-Net excels in semantic extraction. Hence, we propose the bilateral fusion enhanced network (BFE-Net) to address these challenges. Our model integrates U-Net and PVT features via a deep feature enhancement fusion module (FEF) and attention decoder module (AD). Experimental results demonstrate significant improvements, validating our model's effectiveness across various datasets and modalities, promising advancements in gastrointestinal polyp diagnosis and treatment.
Collapse
Affiliation(s)
- Kaixuan Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dingcan Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiang Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaotong Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoming Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Chunyang Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jinlin Yang
- Digestive Endoscopic Center of West China Hospital, Sichuan University, Chengdu 610017, China
| | - Nini Rao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
28
|
Li H, Liu D, Zeng Y, Liu S, Gan T, Rao N, Yang J, Zeng B. Single-Image-Based Deep Learning for Segmentation of Early Esophageal Cancer Lesions. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:2676-2688. [PMID: 38530733 DOI: 10.1109/tip.2024.3379902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesions. Our method stands out for its uniqueness, as it relies solely on a single input image from a patient, forming the so-called "You-Only-Have-One" (YOHO) framework. On one hand, this "one-image-one-network" learning ensures complete patient privacy as it does not use any images from other patients as the training data. On the other hand, it avoids nearly all generalization-related problems since each trained network is applied only to the same input image itself. In particular, we can push the training to "over-fitting" as much as possible to increase the segmentation accuracy. Our technical details include an interaction with clinical doctors to utilize their expertise, a geometry-based data augmentation over a single lesion image to generate the training dataset (the biggest novelty), and an edge-enhanced UNet. We have evaluated YOHO over an EEC dataset collected by ourselves and achieved a mean Dice score of 0.888, which is much higher as compared to the existing deep-learning methods, thus representing a significant advance toward clinical applications. The code and dataset are available at: https://github.com/lhaippp/YOHO.
Collapse
|
29
|
Li B, Xu Y, Wang Y, Zhang B. DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation. PLoS One 2024; 19:e0301019. [PMID: 38573957 PMCID: PMC10994332 DOI: 10.1371/journal.pone.0301019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Automatic and accurate segmentation of medical images plays an essential role in disease diagnosis and treatment planning. Convolution neural networks have achieved remarkable results in medical image segmentation in the past decade. Meanwhile, deep learning models based on Transformer architecture also succeeded tremendously in this domain. However, due to the ambiguity of the medical image boundary and the high complexity of physical organization structures, implementing effective structure extraction and accurate segmentation remains a problem requiring a solution. In this paper, we propose a novel Dual Encoder Network named DECTNet to alleviate this problem. Specifically, the DECTNet embraces four components, which are a convolution-based encoder, a Transformer-based encoder, a feature fusion decoder, and a deep supervision module. The convolutional structure encoder can extract fine spatial contextual details in images. Meanwhile, the Transformer structure encoder is designed using a hierarchical Swin Transformer architecture to model global contextual information. The novel feature fusion decoder integrates the multi-scale representation from two encoders and selects features that focus on segmentation tasks by channel attention mechanism. Further, a deep supervision module is used to accelerate the convergence of the proposed method. Extensive experiments demonstrate that, compared to the other seven models, the proposed method achieves state-of-the-art results on four segmentation tasks: skin lesion segmentation, polyp segmentation, Covid-19 lesion segmentation, and MRI cardiac segmentation.
Collapse
Affiliation(s)
- Boliang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yaming Xu
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yan Wang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Zhang
- Sergeant Schools of Army Academy of Armored Forces, Changchun, Jilin, China
| |
Collapse
|
30
|
Goceri E. Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:851-863. [PMID: 38343250 PMCID: PMC11031515 DOI: 10.1007/s10278-023-00954-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/16/2023] [Accepted: 10/02/2023] [Indexed: 04/20/2024]
Abstract
Accurate and early detection of precursor adenomatous polyps and their removal at the early stage can significantly decrease the mortality rate and the occurrence of the disease since most colorectal cancer evolve from adenomatous polyps. However, accurate detection and segmentation of the polyps by doctors are difficult mainly these factors: (i) quality of the screening of the polyps with colonoscopy depends on the imaging quality and the experience of the doctors; (ii) visual inspection by doctors is time-consuming, burdensome, and tiring; (iii) prolonged visual inspections can lead to polyps being missed even when the physician is experienced. To overcome these problems, computer-aided methods have been proposed. However, they have some disadvantages or limitations. Therefore, in this work, a new architecture based on residual transformer layers has been designed and used for polyp segmentation. In the proposed segmentation, both high-level semantic features and low-level spatial features have been utilized. Also, a novel hybrid loss function has been proposed. The loss function designed with focal Tversky loss, binary cross-entropy, and Jaccard index reduces image-wise and pixel-wise differences as well as improves regional consistencies. Experimental works have indicated the effectiveness of the proposed approach in terms of dice similarity (0.9048), recall (0.9041), precision (0.9057), and F2 score (0.8993). Comparisons with the state-of-the-art methods have shown its better performance.
Collapse
|
31
|
Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. BIOMEDICAL OPTICS EXPRESS 2024; 15:2590-2621. [PMID: 38633077 PMCID: PMC11019695 DOI: 10.1364/boe.510908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/08/2024] [Indexed: 04/19/2024]
Abstract
Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.
Collapse
Affiliation(s)
- Feng Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zetao Huang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Lu Zhou
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yuyang Chen
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Shiqing Tang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Pengchao Ding
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Haixia Peng
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yimin Chu
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| |
Collapse
|
32
|
Du H, Wang J, Liu M, Wang Y, Meijering E. SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5355-5366. [PMID: 36121961 DOI: 10.1109/tnnls.2022.3204090] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The precise segmentation of medical images is one of the key challenges in pathology research and clinical practice. However, many medical image segmentation tasks have problems such as large differences between different types of lesions and similar shapes as well as colors between lesions and surrounding tissues, which seriously affects the improvement of segmentation accuracy. In this article, a novel method called Swin Pyramid Aggregation network (SwinPA-Net) is proposed by combining two designed modules with Swin Transformer to learn more powerful and robust features. The two modules, named dense multiplicative connection (DMC) module and local pyramid attention (LPA) module, are proposed to aggregate the multiscale context information of medical images. The DMC module cascades the multiscale semantic feature information through dense multiplicative feature fusion, which minimizes the interference of shallow background noise to improve the feature expression and solves the problem of excessive variation in lesion size and type. Moreover, the LPA module guides the network to focus on the region of interest by merging the global attention and the local attention, which helps to solve similar problems. The proposed network is evaluated on two public benchmark datasets for polyp segmentation task and skin lesion segmentation task as well as a clinical private dataset for laparoscopic image segmentation task. Compared with existing state-of-the-art (SOTA) methods, the SwinPA-Net achieves the most advanced performance and can outperform the second-best method on the mean Dice score by 1.68%, 0.8%, and 1.2% on the three tasks, respectively.
Collapse
|
33
|
Yin X, Zeng J, Hou T, Tang C, Gan C, Jain DK, García S. RSAFormer: A method of polyp segmentation with region self-attention transformer. Comput Biol Med 2024; 172:108268. [PMID: 38493598 DOI: 10.1016/j.compbiomed.2024.108268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/07/2024] [Accepted: 03/07/2024] [Indexed: 03/19/2024]
Abstract
Colonoscopy has attached great importance to early screening and clinical diagnosis of colon cancer. It remains a challenging task to achieve fine segmentation of polyps. However, existing State-of-the-art models still have limited segmentation ability due to the lack of clear and highly similar boundaries between normal tissue and polyps. To deal with this problem, we propose a region self-attention enhancement network (RSAFormer) with a transformer encoder to capture more robust features. Different from other excellent methods, RSAFormer uniquely employs a dual decoder structure to generate various feature maps. Contrasting with traditional methods that typically employ a single decoder, it offers more flexibility and detail in feature extraction. RSAFormer also introduces a region self-attention enhancement module (RSA) to acquire more accurate feature information and foster a stronger interplay between low-level and high-level features. This module enhances uncertain areas to extract more precise boundary information, these areas being signified by regional context. Extensive experiments were conducted on five prevalent polyp datasets to demonstrate RSAFormer's proficiency. It achieves 92.2% and 83.5% mean Dice on Kvasir and ETIS, respectively, which outperformed most of the state-of-the-art models.
Collapse
Affiliation(s)
- Xuehui Yin
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Jun Zeng
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Tianxiao Hou
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Chao Tang
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Chenquan Gan
- School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Deepak Kumar Jain
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; Symbiosis Institute of Technology, Symbiosis International University, Pune 412115, India.
| | - Salvador García
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, University of Granada, Granada 18071, Spain.
| |
Collapse
|
34
|
Yang C, Zhang Z. PFD-Net: Pyramid Fourier Deformable Network for medical image segmentation. Comput Biol Med 2024; 172:108302. [PMID: 38503092 DOI: 10.1016/j.compbiomed.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Medical image segmentation is crucial for accurately locating lesion regions and assisting doctors in diagnosis. However, most existing methods fail to effectively utilize both local details and global semantic information in medical image segmentation, resulting in the inability to effectively capture fine-grained content such as small targets and irregular boundaries. To address this issue, we propose a novel Pyramid Fourier Deformable Network (PFD-Net) for medical image segmentation, which leverages the strengths of CNN and Transformer. The PFD-Net first utilizes PVTv2-based Transformer as the primary encoder to capture global information and further enhances both local and global feature representations with the Fast Fourier Convolution Residual (FFCR) module. Moreover, PFD-Net further proposes the Dilated Deformable Refinement (DDR) module to enhance the model's capacity to comprehend global semantic structures of shape-diverse targets and their irregular boundaries. Lastly, Cross-Level Fusion Block with deformable convolution (CLFB) is proposed to combine the decoded feature maps from the final Residual Decoder Block (DDR) with local features from the CNN auxiliary encoder branch, improving the network's ability to perceive targets resembling the surrounding structures. Extensive experiments were conducted on nine publicly medical image datasets for five types of segmentation tasks including polyp, abdominal, cardiac, gland cells and nuclei. The qualitative and quantitative results demonstrate that PFD-Net outperforms existing state-of-the-art methods in various evaluation metrics, and achieves the highest performance of mDice with the value of 0.826 on the most challenging dataset (ETIS), which is 1.8% improvement compared to the previous best-performing HSNet and 3.6% improvement compared to the next-best PVT-CASCADE. Codes are available at https://github.com/ChaorongYang/PFD-Net.
Collapse
Affiliation(s)
- Chaorong Yang
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China; Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Shijiazhuang 050024, China; Hebei Provincial Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang 050024, China.
| | - Zhaohui Zhang
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China; Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Shijiazhuang 050024, China; Hebei Provincial Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang 050024, China.
| |
Collapse
|
35
|
Zhang Y, Yang G, Gong C, Zhang J, Wang S, Wang Y. Polyp segmentation with interference filtering and dynamic uncertainty mining. Phys Med Biol 2024; 69:075016. [PMID: 38382099 DOI: 10.1088/1361-6560/ad2b94] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/21/2024] [Indexed: 02/23/2024]
Abstract
Objective.Accurate polyp segmentation from colo-noscopy images plays a crucial role in the early diagnosis and treatment of colorectal cancer. However, existing polyp segmentation methods are inevitably affected by various image noises, such as reflections, motion blur, and feces, which significantly affect the performance and generalization of the model. In addition, coupled with ambiguous boundaries between polyps and surrounding tissue, i.e. small inter-class differences, accurate polyp segmentation remains a challenging problem.Approach.To address these issues, we propose a novel two-stage polyp segmentation method that leverages a preprocessing sub-network (Pre-Net) and a dynamic uncertainty mining network (DUMNet) to improve the accuracy of polyp segmentation. Pre-Net identifies and filters out interference regions before feeding the colonoscopy images to the polyp segmentation network DUMNet. Considering the confusing polyp boundaries, DUMNet employs the uncertainty mining module (UMM) to dynamically focus on foreground, background, and uncertain regions based on different pixel confidences. UMM helps to mine and enhance more detailed context, leading to coarse-to-fine polyp segmentation and precise localization of polyp regions.Main results.We conduct experiments on five popular polyp segmentation benchmarks: ETIS, CVC-ClinicDB, CVC-ColonDB, EndoScene, and Kvasir. Our method achieves state-of-the-art performance. Furthermore, the proposed Pre-Net has strong portability and can improve the accuracy of existing polyp segmentation models.Significance.The proposed method improves polyp segmentation performance by eliminating interference and mining uncertain regions. This aids doctors in making precise and reduces the risk of colorectal cancer. Our code will be released athttps://github.com/zyh5119232/DUMNet.
Collapse
Affiliation(s)
- Yunhua Zhang
- Northeastern University, Shenyang 110819, People's Republic of China
- DUT Artificial Intelligence Institute, Dalian 116024, People's Republic of China
| | - Gang Yang
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Congjin Gong
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Jianhao Zhang
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Shuo Wang
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Yutao Wang
- Northeastern University, Shenyang 110819, People's Republic of China
| |
Collapse
|
36
|
Xu C, Fan K, Mo W, Cao X, Jiao K. Dual ensemble system for polyp segmentation with submodels adaptive selection ensemble. Sci Rep 2024; 14:6152. [PMID: 38485963 PMCID: PMC10940608 DOI: 10.1038/s41598-024-56264-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 03/04/2024] [Indexed: 03/18/2024] Open
Abstract
Colonoscopy is one of the main methods to detect colon polyps, and its detection is widely used to prevent and diagnose colon cancer. With the rapid development of computer vision, deep learning-based semantic segmentation methods for colon polyps have been widely researched. However, the accuracy and stability of some methods in colon polyp segmentation tasks show potential for further improvement. In addition, the issue of selecting appropriate sub-models in ensemble learning for the colon polyp segmentation task still needs to be explored. In order to solve the above problems, we first implement the utilization of multi-complementary high-level semantic features through the Multi-Head Control Ensemble. Then, to solve the sub-model selection problem in training, we propose SDBH-PSO Ensemble for sub-model selection and optimization of ensemble weights for different datasets. The experiments were conducted on the public datasets CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-LaribPolypDB and PolypGen. The results show that the DET-Former, constructed based on the Multi-Head Control Ensemble and the SDBH-PSO Ensemble, consistently provides improved accuracy across different datasets. Among them, the Multi-Head Control Ensemble demonstrated superior feature fusion capability in the experiments, and the SDBH-PSO Ensemble demonstrated excellent sub-model selection capability. The sub-model selection capabilities of the SDBH-PSO Ensemble will continue to have significant reference value and practical utility as deep learning networks evolve.
Collapse
Affiliation(s)
- Cun Xu
- Guilin University of Electronic Technology, Guilin, 541000, China
| | - Kefeng Fan
- China Electronics Standardization Institute, Beijing, 100007, China.
| | - Wei Mo
- Guilin University of Electronic Technology, Guilin, 541000, China
| | - Xuguang Cao
- Guilin University of Electronic Technology, Guilin, 541000, China
| | - Kaijie Jiao
- Guilin University of Electronic Technology, Guilin, 541000, China
| |
Collapse
|
37
|
Zhang Y, Shen Z, Jiao R. Segment anything model for medical image segmentation: Current applications and future directions. Comput Biol Med 2024; 171:108238. [PMID: 38422961 DOI: 10.1016/j.compbiomed.2024.108238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
Due to the inherent flexibility of prompting, foundation models have emerged as the predominant force in the fields of natural language processing and computer vision. The recent introduction of the Segment Anything Model (SAM) signifies a noteworthy expansion of the prompt-driven paradigm into the domain of image segmentation, thereby introducing a plethora of previously unexplored capabilities. However, the viability of its application to medical image segmentation remains uncertain, given the substantial distinctions between natural and medical images. In this work, we provide a comprehensive overview of recent endeavors aimed at extending the efficacy of SAM to medical image segmentation tasks, encompassing both empirical benchmarking and methodological adaptations. Additionally, we explore potential avenues for future research directions in SAM's role within medical image segmentation. While direct application of SAM to medical image segmentation does not yield satisfactory performance on multi-modal and multi-target medical datasets so far, numerous insights gleaned from these efforts serve as valuable guidance for shaping the trajectory of foundational models in the realm of medical image analysis. To support ongoing research endeavors, we maintain an active repository that contains an up-to-date paper list and a succinct summary of open-source projects at https://github.com/YichiZhang98/SAM4MIS.
Collapse
Affiliation(s)
- Yichi Zhang
- School of Data Science, Fudan University, Shanghai, China.
| | - Zhenrong Shen
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Rushi Jiao
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
38
|
Wang H, Hu T, Zhang Y, Zhang H, Qi Y, Wang L, Ma J, Du M. Unveiling camouflaged and partially occluded colorectal polyps: Introducing CPSNet for accurate colon polyp segmentation. Comput Biol Med 2024; 171:108186. [PMID: 38394804 DOI: 10.1016/j.compbiomed.2024.108186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 02/02/2024] [Accepted: 02/18/2024] [Indexed: 02/25/2024]
Abstract
BACKGROUND Segmenting colorectal polyps presents a significant challenge due to the diverse variations in their size, shape, texture, and intricate backgrounds. Particularly demanding are the so-called "camouflaged" polyps, which are partially concealed by surrounding tissues or fluids, adding complexity to their detection. METHODS We present CPSNet, an innovative model designed for camouflaged polyp segmentation. CPSNet incorporates three key modules: the Deep Multi-Scale-Feature Fusion Module, the Camouflaged Object Detection Module, and the Multi-Scale Feature Enhancement Module. These modules work collaboratively to improve the segmentation process, enhancing both robustness and accuracy. RESULTS Our experiments confirm the effectiveness of CPSNet. When compared to state-of-the-art methods in colon polyp segmentation, CPSNet consistently outperforms the competition. Particularly noteworthy is its performance on the ETIS-LaribPolypDB dataset, where CPSNet achieved a remarkable 2.3% increase in the Dice coefficient compared to the Polyp-PVT model. CONCLUSION In summary, CPSNet marks a significant advancement in the field of colorectal polyp segmentation. Its innovative approach, encompassing multi-scale feature fusion, camouflaged object detection, and feature enhancement, holds considerable promise for clinical applications.
Collapse
Affiliation(s)
- Huafeng Wang
- School of Information Technology, North China University of Technology, Beijing 100041, China.
| | - Tianyu Hu
- School of Information Technology, North China University of Technology, Beijing 100041, China.
| | - Yanan Zhang
- School of Information Technology, North China University of Technology, Beijing 100041, China.
| | - Haodu Zhang
- School of Intelligent Systems Engineering, Sun Yat-sen University, Guangzhou 510335, China.
| | - Yong Qi
- School of Information Technology, North China University of Technology, Beijing 100041, China.
| | - Longzhen Wang
- Department of Gastroenterology, Second People's Hospital, Changzhi, Shanxi 046000, China.
| | - Jianhua Ma
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510335, China.
| | - Minghua Du
- Department of Emergency, PLA General Hospital, Beijing 100853, China.
| |
Collapse
|
39
|
Mozaffari J, Amirkhani A, Shokouhi SB. ColonGen: an efficient polyp segmentation system for generalization improvement using a new comprehensive dataset. Phys Eng Sci Med 2024; 47:309-325. [PMID: 38224384 DOI: 10.1007/s13246-023-01368-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 12/06/2023] [Indexed: 01/16/2024]
Abstract
Colorectal cancer (CRC) is one of the most common causes of cancer-related deaths. While polyp detection is important for diagnosing CRC, high miss rates for polyps have been reported during colonoscopy. Most deep learning methods extract features from images using convolutional neural networks (CNNs). In recent years, vision transformer (ViT) models have been employed for image processing and have been successful in image segmentation. It is possible to improve image processing by using transformer models that can extract spatial location information, and CNNs that are capable of aggregating local information. Despite this, recent research shows limited effectiveness in increasing data diversity and generalization accuracy. This paper investigates the generalization proficiency of polyp image segmentation based on transformer architecture and proposes a novel approach using two different ViT architectures. This allows the model to learn representations from different perspectives, which can then be combined to create a richer feature representation. Additionally, a more universal and comprehensive dataset has been derived from the datasets presented in the related research, which can be used for improving generalizations. We first evaluated the generalization of our proposed model using three distinct training-testing scenarios. Our experimental results demonstrate that our ColonGen-V1 outperforms other state-of-the-art methods in all scenarios. As a next step, we used the comprehensive dataset for improving the performance of the model against in- and out-of-domain data. The results show that our ColonGen-V2 outperforms state-of-the-art studies by 5.1%, 1.3%, and 1.1% in ETIS-Larib, Kvasir-Seg, and CVC-ColonDB datasets, respectively. The inclusive dataset and the model introduced in this paper are available to the public through this link: https://github.com/javadmozaffari/Polyp_segmentation .
Collapse
Affiliation(s)
- Javad Mozaffari
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| | - Abdollah Amirkhani
- School of Automotive Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran.
| | - Shahriar B Shokouhi
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| |
Collapse
|
40
|
Shao D, Yang H, Liu C, Ma L. AFANet: Adaptive feature aggregation for polyp segmentation. Med Eng Phys 2024; 125:104118. [PMID: 38508807 DOI: 10.1016/j.medengphy.2024.104118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 01/15/2024] [Accepted: 02/13/2024] [Indexed: 03/22/2024]
Abstract
In terms of speed and accuracy, the deep learning-based polyp segmentation method is superior. It is essential for the early detection and treatment of colorectal cancer and has the potential to greatly reduce the disease's overall prevalence. Due to the various forms and sizes of polyps, as well as the blurring of the boundaries between the polyp region and the surrounding mucus, most existing algorithms are unable to provide highly accurate colorectal polyp segmentation. Therefore, to overcome these obstacles, we propose an adaptive feature aggregation network (AFANet). It contains two main modules: the Multi-modal Balancing Attention Module (MMBA) and the Global Context Module (GCM). The MMBA extracts improved local characteristics for inference by integrating local contextual information while paying attention to them in three regions: foreground, background, and border. The GCM takes global information from the top of the encoder and sends it to the decoder layer in order to further investigate global contextual feature information in the pathologic picture. Dice of 92.11 % and 94.76 % and MIoU of 91.07 % and 94.54 %, respectively, are achieved by comprehensive experimental validation of our proposed technique on two benchmark datasets, Kvasir-SEG and CVCClinicDB. The experimental results demonstrate that the strategy outperforms other cutting-edge approaches.
Collapse
Affiliation(s)
- Dangguo Shao
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Haiqiong Yang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Cuiyin Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.
| | - Lei Ma
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
41
|
He Y, Yi Y, Zheng C, Kong J. BGF-Net: Boundary guided filter network for medical image segmentation. Comput Biol Med 2024; 171:108184. [PMID: 38417386 DOI: 10.1016/j.compbiomed.2024.108184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/12/2024] [Accepted: 02/18/2024] [Indexed: 03/01/2024]
Abstract
How to fuse low-level and high-level features effectively is crucial to improving the accuracy of medical image segmentation. Most CNN-based segmentation models on this topic usually adopt attention mechanisms to achieve the fusion of different level features, but they have not effectively utilized the guided information of high-level features, which is often highly beneficial to improve the performance of the segmentation model, to guide the extraction of low-level features. To address this problem, we design multiple guided modules and develop a boundary-guided filter network (BGF-Net) to obtain more accurate medical image segmentation. To the best of our knowledge, this is the first time that boundary guided information is introduced into the medical image segmentation task. Specifically, we first propose a simple yet effective channel boundary guided module to make the segmentation model pay more attention to the relevant channel weights. We further design a novel spatial boundary guided module to complement the channel boundary guided module and aware of the most important spatial positions. Finally, we propose a boundary guided filter to preserve the structural information from the previous feature map and guide the model to learn more important feature information. Moreover, we conduct extensive experiments on skin lesion, polyp, and gland segmentation datasets including ISIC 2016, CVC-EndoSceneStil and GlaS to test the proposed BGF-Net. The experimental results demonstrate that BGF-Net performs better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Yanlin He
- College of Information Sciences and Technology, Northeast Normal University, Changchun, 130117, China
| | - Yugen Yi
- School of Software, Jiangxi Normal University, Nanchang, 330022, China
| | - Caixia Zheng
- College of Information Sciences and Technology, Northeast Normal University, Changchun, 130117, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
| | - Jun Kong
- Institute for Intelligent Elderly Care, Changchun Humanities and Sciences College, Changchun, 130117, China.
| |
Collapse
|
42
|
Soo JMP, Koh FHX. Detection of sessile serrated adenoma using artificial intelligence-enhanced endoscopy: an Asian perspective. ANZ J Surg 2024; 94:362-365. [PMID: 38149749 DOI: 10.1111/ans.18785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/04/2023] [Indexed: 12/28/2023]
Abstract
BACKGROUND As the serrated pathway has gained prominence as an alternative colorectal carcinogenesis pathway, sessile serrated adenomas or polyps (SSA/P) have been highlighted as lesions to rule out during colonoscopy. These lesions are however morphologically difficult to detect on endoscopy and can be mistaken for hyperplastic polyps due to similar endoscopic features. With the underlying nature of rapid progression and malignant transformation, interval cancer is a likely consequence of undetected or overlooked SSA/P. Real-time artificial intelligence (AI)-assisted colonoscopy via the computer-assisted detection system (CADe) is an increasingly useful tool in improving adenoma detection rate by providing a second eye during the procedure. In this article, we describe a guide through a video to illustrate the detection of SSA/P during AI-assisted colonoscopy. METHODS Consultant-grade endoscopists utilized real-time AI-assisted colonoscopy device, as part of a larger prospective study, to detect suspicious lesions which were later histopathologically confirmed to be SSA/P. RESULTS All lesions were picked up by the CADe where a real-time green box highlighted suspicious polyps to the clinician. Three SSA/P of varying morphology are described with reference to classical SSA/P features and with comparison to the features of the hyperplastic polyp found in our study. All three SSA/P observed are in keeping with the JNET Classification (Type 1). CONCLUSION In conclusion, CADe is a most useful aid to clinicians during endoscopy in the detection of SSA/P but must be complemented with factors such as good endoscopy skill and bowel prep for effective detection, and biopsy coupled with subsequent accurate histological diagnosis.
Collapse
Affiliation(s)
- Joycelyn Mun-Peng Soo
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Frederick Hong-Xiang Koh
- Colorectal Service, Department of General Surgery, Sengkang General Hospital, SingHealth Services, Singapore, Singapore
| |
Collapse
|
43
|
Jia X, Shen Y, Yang J, Song R, Zhang W, Meng MQH, Liao JC, Xing L. PolypMixNet: Enhancing semi-supervised polyp segmentation with polyp-aware augmentation. Comput Biol Med 2024; 170:108006. [PMID: 38325216 DOI: 10.1016/j.compbiomed.2024.108006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/29/2023] [Accepted: 01/13/2024] [Indexed: 02/09/2024]
Abstract
BACKGROUND AI-assisted polyp segmentation in colonoscopy plays a crucial role in enabling prompt diagnosis and treatment of colorectal cancer. However, the lack of sufficient annotated data poses a significant challenge for supervised learning approaches. Existing semi-supervised learning methods also suffer from performance degradation, mainly due to task-specific characteristics, such as class imbalance in polyp segmentation. PURPOSE The purpose of this work is to develop an effective semi-supervised learning framework for accurate polyp segmentation in colonoscopy, addressing limited annotated data and class imbalance challenges. METHODS We proposed PolypMixNet, a semi-supervised framework, for colorectal polyp segmentation, utilizing novel augmentation techniques and a Mean Teacher architecture to improve model performance. PolypMixNet introduces the polyp-aware mixup (PolypMix) algorithm and incorporates dual-level consistency regularization. PolypMix addresses the class imbalance in colonoscopy datasets and enhances the diversity of training data. By performing a polyp-aware mixup on unlabeled samples, it generates mixed images with polyp context along with their artificial labels. A polyp-directed soft pseudo-labeling (PDSPL) mechanism was proposed to generate high-quality pseudo labels and eliminate the dilution of lesion features caused by mixup operations. To ensure consistency in the training phase, we introduce the PolypMix prediction consistency (PMPC) loss and PolypMix attention consistency (PMAC) loss, enforcing consistency at both image and feature levels. Code is available at https://github.com/YChienHung/PolypMix. RESULTS PolypMixNet was evaluated on four public colonoscopy datasets, achieving 88.97% Dice and 88.85% mIoU on the benchmark dataset of Kvasir-SEG. In scenarios where the labeled training data is limited to 15%, PolypMixNet outperforms the state-of-the-art semi-supervised approaches with a 2.88-point improvement in Dice. It also shows the ability to reach performance comparable to the fully supervised counterpart. Additionally, we conducted extensive ablation studies to validate the effectiveness of each module and highlight the superiority of our proposed approach. CONCLUSION PolypMixNet effectively addresses the challenges posed by limited annotated data and unbalanced class distributions in polyp segmentation. By leveraging unlabeled data and incorporating novel augmentation and consistency regularization techniques, our method achieves state-of-the-art performance. We believe that the insights and contributions presented in this work will pave the way for further advancements in semi-supervised polyp segmentation and inspire future research in the medical imaging domain.
Collapse
Affiliation(s)
- Xiao Jia
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Yutian Shen
- Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| | - Jianhong Yang
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Ran Song
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Wei Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Max Q-H Meng
- Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| | - Joseph C Liao
- Department of Urology, Stanford University, Stanford, 94305, CA, USA; VA Palo Alto Health Care System, Palo Alto, 94304, CA, USA.
| | - Lei Xing
- Department of Radiation Oncology, Stanford University, Stanford, 94305, CA, USA.
| |
Collapse
|
44
|
Wang Z, Yu L, Tian S, Huo X. CRMEFNet: A coupled refinement, multiscale exploration and fusion network for medical image segmentation. Comput Biol Med 2024; 171:108202. [PMID: 38402839 DOI: 10.1016/j.compbiomed.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/22/2023] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Accurate segmentation of target areas in medical images, such as lesions, is essential for disease diagnosis and clinical analysis. In recent years, deep learning methods have been intensively researched and have generated significant progress in medical image segmentation tasks. However, most of the existing methods have limitations in modeling multilevel feature representations and identification of complex textured pixels at contrasting boundaries. This paper proposes a novel coupled refinement and multiscale exploration and fusion network (CRMEFNet) for medical image segmentation, which explores in the optimization and fusion of multiscale features to address the abovementioned limitations. The CRMEFNet consists of three main innovations: a coupled refinement module (CRM), a multiscale exploration and fusion module (MEFM), and a cascaded progressive decoder (CPD). The CRM decouples features into low-frequency body features and high-frequency edge features, and performs targeted optimization of both to enhance intraclass uniformity and interclass differentiation of features. The MEFM performs a two-stage exploration and fusion of multiscale features using our proposed multiscale aggregation attention mechanism, which explores the differentiated information within the cross-level features, and enhances the contextual connections between the features, to achieves adaptive feature fusion. Compared to existing complex decoders, the CPD decoder (consisting of the CRM and MEFM) can perform fine-grained pixel recognition while retaining complete semantic location information. It also has a simple design and excellent performance. The experimental results from five medical image segmentation tasks, ten datasets and twelve comparison models demonstrate the state-of-the-art performance, interpretability, flexibility and versatility of our CRMEFNet.
Collapse
Affiliation(s)
- Zhi Wang
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China.
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Xiangzuo Huo
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China
| |
Collapse
|
45
|
Zhu PC, Wan JJ, Shao W, Meng XC, Chen BL. Colorectal image analysis for polyp diagnosis. Front Comput Neurosci 2024; 18:1356447. [PMID: 38404511 PMCID: PMC10884282 DOI: 10.3389/fncom.2024.1356447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 01/05/2024] [Indexed: 02/27/2024] Open
Abstract
Colorectal polyp is an important early manifestation of colorectal cancer, which is significant for the prevention of colorectal cancer. Despite timely detection and manual intervention of colorectal polyps can reduce their chances of becoming cancerous, most existing methods ignore the uncertainties and location problems of polyps, causing a degradation in detection performance. To address these problems, in this paper, we propose a novel colorectal image analysis method for polyp diagnosis via PAM-Net. Specifically, a parallel attention module is designed to enhance the analysis of colorectal polyp images for improving the certainties of polyps. In addition, our method introduces the GWD loss to enhance the accuracy of polyp diagnosis from the perspective of polyp location. Extensive experimental results demonstrate the effectiveness of the proposed method compared with the SOTA baselines. This study enhances the performance of polyp detection accuracy and contributes to polyp detection in clinical medicine.
Collapse
Affiliation(s)
- Peng-Cheng Zhu
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, China
| | - Jing-Jing Wan
- Department of Gastroenterology, The Second People's Hospital of Huai'an, The Affiliated Huai'an Hospital of Xuzhou Medical University, Huaian, Jiangsu, China
| | - Wei Shao
- Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen, China
| | - Xian-Chun Meng
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, China
| | - Bo-Lun Chen
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, China
- Department of Physics, University of Fribourg, Fribourg, Switzerland
| |
Collapse
|
46
|
Yue G, Zhuo G, Yan W, Zhou T, Tang C, Yang P, Wang T. Boundary uncertainty aware network for automated polyp segmentation. Neural Netw 2024; 170:390-404. [PMID: 38029720 DOI: 10.1016/j.neunet.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/15/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
Recently, leveraging deep neural networks for automated colorectal polyp segmentation has emerged as a hot topic due to the favored advantages in evading the limitations of visual inspection, e.g., overwork and subjectivity. However, most existing methods do not pay enough attention to the uncertain areas of colonoscopy images and often provide unsatisfactory segmentation performance. In this paper, we propose a novel boundary uncertainty aware network (BUNet) for precise and robust colorectal polyp segmentation. Specifically, considering that polyps vary greatly in size and shape, we first adopt a pyramid vision transformer encoder to learn multi-scale feature representations. Then, a simple yet effective boundary exploration module (BEM) is proposed to explore boundary cues from the low-level features. To make the network focus on the ambiguous area where the prediction score is biased to neither the foreground nor the background, we further introduce a boundary uncertainty aware module (BUM) that explores error-prone regions from the high-level features with the assistance of boundary cues provided by the BEM. Through the top-down hybrid deep supervision, our BUNet implements coarse-to-fine polyp segmentation and finally localizes polyp regions precisely. Extensive experiments on five public datasets show that BUNet is superior to thirteen competing methods in terms of both effectiveness and generalization ability.
Collapse
Affiliation(s)
- Guanghui Yue
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Guibin Zhuo
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Weiqing Yan
- School of Computer and Control Engineering, Yantai University, Yantai 264005, China
| | - Tianwei Zhou
- College of Management, Shenzhen University, Shenzhen 518060, China.
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Peng Yang
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Tianfu Wang
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
47
|
Yin Y, Luo S, Zhou J, Kang L, Chen CYC. LDCNet: Lightweight dynamic convolution network for laparoscopic procedures image segmentation. Neural Netw 2024; 170:441-452. [PMID: 38039682 DOI: 10.1016/j.neunet.2023.11.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/07/2023] [Accepted: 11/24/2023] [Indexed: 12/03/2023]
Abstract
Medical image segmentation is fundamental for modern healthcare systems, especially for reducing the risk of surgery and treatment planning. Transanal total mesorectal excision (TaTME) has emerged as a recent focal point in laparoscopic research, representing a pivotal modality in the therapeutic arsenal for the treatment of colon & rectum cancers. Real-time instance segmentation of surgical imagery during TaTME procedures can serve as an invaluable tool in assisting surgeons, ultimately reducing surgical risks. The dynamic variations in size and shape of anatomical structures within intraoperative images pose a formidable challenge, rendering the precise instance segmentation of TaTME images a task of considerable complexity. Deep learning has exhibited its efficacy in Medical image segmentation. However, existing models have encountered challenges in concurrently achieving a satisfactory level of accuracy while maintaining manageable computational complexity in the context of TaTME data. To address this conundrum, we propose a lightweight dynamic convolution Network (LDCNet) that has the same superior segmentation performance as the state-of-the-art (SOTA) medical image segmentation network while running at the speed of the lightweight convolutional neural network. Experimental results demonstrate the promising performance of LDCNet, which consistently exceeds previous SOTA approaches. Codes are available at github.com/yinyiyang416/LDCNet.
Collapse
Affiliation(s)
- Yiyang Yin
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, Guangdong, China
| | - Shuangling Luo
- Department of General Surgery (Colorectal Surgery), The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, Guangdong, China; Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, Department of Colorectal Surgery, Guangzhou, 510655, Guangdong, China; The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, Guangdong, China
| | - Jun Zhou
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, Guangdong, China
| | - Liang Kang
- Department of General Surgery (Colorectal Surgery), The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, Guangdong, China; Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, Department of Colorectal Surgery, Guangzhou, 510655, Guangdong, China; The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, Guangdong, China.
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, Guangdong, China; AI for Science (AI4S) - Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China; School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China; Department of Medical Research, China Medical University Hospital, Taichung, 40447, Guangdong, Taiwan; Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
48
|
Zhang Y, Zhou T, Tao Y, Wang S, Wu Y, Liu B, Gu P, Chen Q, Chen DZ. TestFit: A plug-and-play one-pass test time method for medical image segmentation. Med Image Anal 2024; 92:103069. [PMID: 38154382 DOI: 10.1016/j.media.2023.103069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 10/16/2023] [Accepted: 12/19/2023] [Indexed: 12/30/2023]
Abstract
Deep learning (DL) based methods have been extensively studied for medical image segmentation, mostly emphasizing the design and training of DL networks. Only few attempts were made on developing methods for applying DL models in test time. In this paper, we study whether a given off-the-shelf segmentation network can be stably improved on-the-fly during test time in an online processing-and-learning fashion. We propose a new online test-time method, called TestFit, to improve results of a given off-the-shelf DL segmentation model in test time by actively fitting the test data distribution. TestFit first creates a supplementary network (SuppNet) from the given trained off-the-shelf segmentation network (this original network is referred to as OGNet) and applies SuppNet together with OGNet for test time inference. OGNet keeps its hypothesis derived from the original training set to prevent the model from collapsing, while SuppNet seeks to fit the test data distribution. Segmentation results and supervision signals (for updating SuppNet) are generated by combining the outputs of OGNet and SuppNet on the fly. TestFit needs only one pass on each test sample - the same as the traditional test model pipeline - and requires no training time preparation. Since it is challenging to look at only one test sample and no manual annotation for model update each time, we develop a series of technical treatments for improving the stability and effectiveness of our proposed online test-time training method. TestFit works in a plug-and-play fashion, requires minimal hyper-parameter tuning, and is easy to use in practice. Experiments on a large collection of 2D and 3D datasets demonstrate the capability of our TestFit method.
Collapse
Affiliation(s)
- Yizhe Zhang
- Nanjing University of Science and Technology, Jiangsu 210094, China.
| | - Tao Zhou
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | - Yuhui Tao
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | - Shuo Wang
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
| | - Ye Wu
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | - Benyuan Liu
- University of Massachusetts Lowell, MA 01854, USA
| | | | - Qiang Chen
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | | |
Collapse
|
49
|
Li W, Huang Z, Li F, Zhao Y, Zhang H. CIFG-Net: Cross-level information fusion and guidance network for Polyp Segmentation. Comput Biol Med 2024; 169:107931. [PMID: 38181608 DOI: 10.1016/j.compbiomed.2024.107931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 12/03/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
Colorectal cancer is a common malignant tumor of the digestive tract. Most colorectal cancer is caused by colorectal polyp lesions. Timely detection and removal of colorectal polyps can substantially reduce the incidence of colorectal cancer. Accurate polyp segmentation can provide important polyp information that can aid in the early diagnosis and treatment of colorectal cancer. However, polyps of the same type can vary in texture, color, and even size. Furthermore, some polyps are similar in colour to the surrounding healthy tissue, which makes the boundary between the polyp and the surrounding area unclear. In order to overcome the issues of inaccurate polyp localization and unclear boundary segmentation, we propose a polyp segmentation network based on cross-level information fusion and guidance. We use a Transformer encoder to extract a more robust feature representation. In addition, to refine the processing of feature information from encoders, we propose the edge feature processing module (EFPM) and the cross-level information processing module (CIPM). EFPM is used to focus on the boundary information in polyp features. After processing each feature, EFPM can obtain clear and accurate polyp boundary features, which can mitigate unclear boundary segmentation. CIPM is used to aggregate and process multi-scale features transmitted by various encoder layers and to solve the problem of inaccurate polyp location by using multi-level features to obtain the location information of polyps. In order to better use the processed features to optimise our segmentation effect, we also propose an information guidance module (IGM) to integrate the processed features of EFPM and CIPM to obtain accurate positioning and segmentation of polyps. Through experiments on five public polyp datasets using six metrics, it was demonstrated that the proposed network has better robustness and more accurate segmentation effect. Compared with other advanced algorithms, CIFG-Net has superior performance. Code available at: https://github.com/zspnb/CIFG-Net.
Collapse
Affiliation(s)
- Weisheng Li
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China.
| | - Zhaopeng Huang
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Feiyan Li
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yinghui Zhao
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Hongchuan Zhang
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
50
|
Tian C, Zhang Z, Gao X, Zhou H, Ran R, Jiao Z. An Implicit-Explicit Prototypical Alignment Framework for Semi-Supervised Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:929-940. [PMID: 37930923 DOI: 10.1109/jbhi.2023.3330667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Semi-supervised learning methods have been explored to mitigate the scarcity of pixel-level annotation in medical image segmentation tasks. Consistency learning, serving as a mainstream method in semi-supervised training, suffers from low efficiency and poor stability due to inaccurate supervision and insufficient feature representation. Prototypical learning is one potential and plausible way to handle this problem due to the nature of feature aggregation in prototype calculation. However, the previous works have not fully studied how to enhance the supervision quality and feature representation using prototypical learning under the semi-supervised condition. To address this issue, we propose an implicit-explicit alignment (IEPAlign) framework to foster semi-supervised consistency training. In specific, we develop an implicit prototype alignment method based on dynamic multiple prototypes on-the-fly. And then, we design a multiple prediction voting strategy for reliable unlabeled mask generation and prototype calculation to improve the supervision quality. Afterward, to boost the intra-class consistency and inter-class separability of pixel-wise features in semi-supervised segmentation, we construct a region-aware hierarchical prototype alignment, which transmits information from labeled to unlabeled and from certain regions to uncertain regions. We evaluate IEPAlign on three medical image segmentation tasks. The extensive experimental results demonstrate that the proposed method outperforms other popular semi-supervised segmentation methods and achieves comparable performance with fully-supervised training methods.
Collapse
|