1
|
AL Qurri A, Almekkawy M. Improved UNet with Attention for Medical Image Segmentation. SENSORS (BASEL, SWITZERLAND) 2023; 23:8589. [PMID: 37896682 PMCID: PMC10611347 DOI: 10.3390/s23208589] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/01/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023]
Abstract
Medical image segmentation is crucial for medical image processing and the development of computer-aided diagnostics. In recent years, deep Convolutional Neural Networks (CNNs) have been widely adopted for medical image segmentation and have achieved significant success. UNet, which is based on CNNs, is the mainstream method used for medical image segmentation. However, its performance suffers owing to its inability to capture long-range dependencies. Transformers were initially designed for Natural Language Processing (NLP), and sequence-to-sequence applications have demonstrated the ability to capture long-range dependencies. However, their abilities to acquire local information are limited. Hybrid architectures of CNNs and Transformer, such as TransUNet, have been proposed to benefit from Transformer's long-range dependencies and CNNs' low-level details. Nevertheless, automatic medical image segmentation remains a challenging task due to factors such as blurred boundaries, the low-contrast tissue environment, and in the context of ultrasound, issues like speckle noise and attenuation. In this paper, we propose a new model that combines the strengths of both CNNs and Transformer, with network architectural improvements designed to enrich the feature representation captured by the skip connections and the decoder. To this end, we devised a new attention module called Three-Level Attention (TLA). This module is composed of an Attention Gate (AG), channel attention, and spatial normalization mechanism. The AG preserves structural information, whereas channel attention helps to model the interdependencies between channels. Spatial normalization employs the spatial coefficient of the Transformer to improve spatial attention akin to TransNorm. To further improve the skip connection and reduce the semantic gap, skip connections between the encoder and decoder were redesigned in a manner similar to that of the UNet++ dense connection. Moreover, deep supervision using a side-output channel was introduced, analogous to BASNet, which was originally used for saliency predictions. Two datasets from different modalities, a CT scan dataset and an ultrasound dataset, were used to evaluate the proposed UNet architecture. The experimental results showed that our model consistently improved the prediction performance of the UNet across different datasets.
Collapse
|
2
|
Mirikharaji Z, Abhishek K, Bissoto A, Barata C, Avila S, Valle E, Celebi ME, Hamarneh G. A survey on deep learning for skin lesion segmentation. Med Image Anal 2023; 88:102863. [PMID: 37343323 DOI: 10.1016/j.media.2023.102863] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 02/01/2023] [Accepted: 05/31/2023] [Indexed: 06/23/2023]
Abstract
Skin cancer is a major public health problem that could benefit from computer-aided diagnosis to reduce the burden of this common disease. Skin lesion segmentation from images is an important step toward achieving this goal. However, the presence of natural and artificial artifacts (e.g., hair and air bubbles), intrinsic factors (e.g., lesion shape and contrast), and variations in image acquisition conditions make skin lesion segmentation a challenging task. Recently, various researchers have explored the applicability of deep learning models to skin lesion segmentation. In this survey, we cross-examine 177 research papers that deal with deep learning-based segmentation of skin lesions. We analyze these works along several dimensions, including input data (datasets, preprocessing, and synthetic data generation), model design (architecture, modules, and losses), and evaluation aspects (data annotation requirements and segmentation performance). We discuss these dimensions both from the viewpoint of select seminal works, and from a systematic viewpoint, examining how those choices have influenced current trends, and how their limitations should be addressed. To facilitate comparisons, we summarize all examined works in a comprehensive table as well as an interactive table available online3.
Collapse
Affiliation(s)
- Zahra Mirikharaji
- Medical Image Analysis Lab, School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Kumar Abhishek
- Medical Image Analysis Lab, School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Alceu Bissoto
- RECOD.ai Lab, Institute of Computing, University of Campinas, Av. Albert Einstein 1251, Campinas 13083-852, Brazil
| | - Catarina Barata
- Institute for Systems and Robotics, Instituto Superior Técnico, Avenida Rovisco Pais, Lisbon 1049-001, Portugal
| | - Sandra Avila
- RECOD.ai Lab, Institute of Computing, University of Campinas, Av. Albert Einstein 1251, Campinas 13083-852, Brazil
| | - Eduardo Valle
- RECOD.ai Lab, School of Electrical and Computing Engineering, University of Campinas, Av. Albert Einstein 400, Campinas 13083-952, Brazil
| | - M Emre Celebi
- Department of Computer Science and Engineering, University of Central Arkansas, 201 Donaghey Ave., Conway, AR 72035, USA.
| | - Ghassan Hamarneh
- Medical Image Analysis Lab, School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada.
| |
Collapse
|
3
|
Hasan MK, Ahamad MA, Yap CH, Yang G. A survey, review, and future trends of skin lesion segmentation and classification. Comput Biol Med 2023; 155:106624. [PMID: 36774890 DOI: 10.1016/j.compbiomed.2023.106624] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 01/04/2023] [Accepted: 01/28/2023] [Indexed: 02/03/2023]
Abstract
The Computer-aided Diagnosis or Detection (CAD) approach for skin lesion analysis is an emerging field of research that has the potential to alleviate the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in developing such CAD systems, with the intention of providing a user-friendly tool to dermatologists to reduce the challenges encountered or associated with manual inspection. This article aims to provide a comprehensive literature survey and review of a total of 594 publications (356 for skin lesion segmentation and 238 for skin lesion classification) published between 2011 and 2022. These articles are analyzed and summarized in a number of different ways to contribute vital information regarding the methods for the development of CAD systems. These ways include: relevant and essential definitions and theories, input data (dataset utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria. We intend to investigate a variety of performance-enhancing approaches, including ensemble and post-processing. We also discuss these dimensions to reveal their current trends based on utilization frequencies. In addition, we highlight the primary difficulties associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these difficulties. Findings, recommendations, and trends are disclosed to inform future research on developing an automated and robust CAD system for skin lesion analysis.
Collapse
Affiliation(s)
- Md Kamrul Hasan
- Department of Bioengineering, Imperial College London, UK; Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh.
| | - Md Asif Ahamad
- Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh.
| | - Choon Hwai Yap
- Department of Bioengineering, Imperial College London, UK.
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, UK; Cardiovascular Research Centre, Royal Brompton Hospital, UK.
| |
Collapse
|
4
|
Jiang Y, Dong J, Cheng T, Zhang Y, Lin X, Liang J. iU-Net: a hybrid structured network with a novel feature fusion approach for medical image segmentation. BioData Min 2023; 16:5. [PMID: 36805687 PMCID: PMC9942350 DOI: 10.1186/s13040-023-00320-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 01/04/2023] [Indexed: 02/23/2023] Open
Abstract
In recent years, convolutional neural networks (CNNs) have made great achievements in the field of medical image segmentation, especially full convolutional neural networks based on U-shaped structures and skip connections. However, limited by the inherent limitations of convolution, CNNs-based methods usually exhibit limitations in modeling long-range dependencies and are unable to extract large amounts of global contextual information, which deprives neural networks of the ability to adapt to different visual modalities. In this paper, we propose our own model, which is called iU-Net bacause its structure closely resembles the combination of i and U. iU-Net is a multiple encoder-decoder structure combining Swin Transformer and CNN. We use a hierarchical Swin Transformer structure with shifted windows as the primary encoder and convolution as the secondary encoder to complement the context information extracted by the primary encoder. To sufficiently fuse the feature information extracted from multiple encoders, we design a feature fusion module (W-FFM) based on wave function representation. Besides, a three branch up sampling method(Tri-Upsample) has developed to replace the patch expand in the Swin Transformer, which can effectively avoid the Checkerboard Artifacts caused by the patch expand. On the skin lesion region segmentation task, the segmentation performance of iU-Net is optimal, with Dice and Iou reaching 90.12% and 83.06%, respectively. To verify the generalization of iU-Net, we used the model trained on ISIC2018 dataset to test on PH2 dataset, and achieved 93.80% Dice and 88.74% IoU. On the lung feild segmentation task, the iU-Net achieved optimal results on IoU and Precision, reaching 98.54% and 94.35% respectively. Extensive experiments demonstrate the segmentation performance and generalization ability of iU-Net.
Collapse
Affiliation(s)
- Yun Jiang
- grid.412260.30000 0004 1760 1427College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
| | - Jinkun Dong
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China.
| | - Tongtong Cheng
- grid.412260.30000 0004 1760 1427College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
| | - Yuan Zhang
- grid.412260.30000 0004 1760 1427College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
| | - Xin Lin
- grid.412260.30000 0004 1760 1427College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
| | - Jing Liang
- grid.412260.30000 0004 1760 1427College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
| |
Collapse
|
5
|
Zhang Z, Jiang Y, Qiao H, Wang M, Yan W, Chen J. SIL-Net: A Semi-Isotropic L-shaped network for dermoscopic image segmentation. Comput Biol Med 2022; 150:106146. [PMID: 36228460 DOI: 10.1016/j.compbiomed.2022.106146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/13/2022] [Accepted: 09/24/2022] [Indexed: 11/28/2022]
Abstract
BACKGROUND Dermoscopic image segmentation using deep learning algorithms is a critical technology for skin cancer detection and therapy. Specifically, this technology is a spatially equivariant task and relies heavily on Convolutional Neural Networks (CNNs), which lost more effective features during cascading down-sampling or up-sampling. Recently, vision isotropic architecture has emerged to eliminate cascade procedures in CNNs as well as demonstrates superior performance. Nevertheless, it cannot be used for the segmentation task directly. Based on these discoveries, this research intends to explore an efficient architecture which not only preserves the advantages of the isotropic architecture but is also suitable for clinical dermoscopic diagnosis. METHODS In this work, we introduce a novel Semi-Isotropic L-shaped network (SIL-Net) for dermoscopic image segmentation. First, we propose a Patch Embedding Weak Correlation (PEWC) module to address the issue of no interaction between adjacent patches during the standard Patch Embedding process. Second, a plug-and-play and zero-parameter Residual Spatial Mirror Information (RSMI) path is proposed to supplement effective features during up-sampling and optimize the lesion boundaries. Third, to further reconstruct deep features and get refined lesion regions, a Depth Separable Transpose Convolution (DSTC) based up-sampling module is designed. RESULTS The proposed architecture obtains state-of-the-art performance on dermoscopy benchmark datasets ISIC-2017, ISIC-2018 and PH2. Respectively, the Dice coefficient (DICE) of above datasets achieves 89.63%, 93.47%, and 95.11%, where the Mean Intersection over Union (MIoU) are 82.02%, 88.21%, and 90.81%. Furthermore, the robustness and generalizability of our method has been demonstrated through additional experiments on standard intestinal polyp datasets (CVC-ClinicDB and Kvasir-SEG). CONCLUSION Our findings demonstrate that SIL-Net not only has great potential for precise segmentation of the lesion region but also exhibits stronger generalizability and robustness, indicating that it meets the requirements for clinical diagnosis. Notably, our method shows state-of-the-art performance on all five datasets, which highlights the effectiveness of the semi-isotropic design mechanism.
Collapse
Affiliation(s)
- Zequn Zhang
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China.
| | - Yun Jiang
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China.
| | - Hao Qiao
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China.
| | - Meiqi Wang
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China.
| | - Wei Yan
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China.
| | - Jie Chen
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China.
| |
Collapse
|
6
|
Li S, Wang H, Xiao Y, Zhang M, Yu N, Zeng A, Wang X. A Workflow for Computer-Aided Evaluation of Keloid Based on Laser Speckle Contrast Imaging and Deep Learning. J Pers Med 2022; 12:jpm12060981. [PMID: 35743764 PMCID: PMC9224605 DOI: 10.3390/jpm12060981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 06/05/2022] [Accepted: 06/07/2022] [Indexed: 11/16/2022] Open
Abstract
A keloid results from abnormal wound healing, which has different blood perfusion and growth states among patients. Active monitoring and treatment of actively growing keloids at the initial stage can effectively inhibit keloid enlargement and has important medical and aesthetic implications. LSCI (laser speckle contrast imaging) has been developed to obtain the blood perfusion of the keloid and shows a high relationship with the severity and prognosis. However, the LSCI-based method requires manual annotation and evaluation of the keloid, which is time consuming. Although many studies have designed deep-learning networks for the detection and classification of skin lesions, there are still challenges to the assessment of keloid growth status, especially based on small samples. This retrospective study included 150 untreated keloid patients, intensity images, and blood perfusion images obtained from LSCI. A newly proposed workflow based on cascaded vision transformer architecture was proposed, reaching a dice coefficient value of 0.895 for keloid segmentation by 2% improvement, an error of 8.6 ± 5.4 perfusion units, and a relative error of 7.8% ± 6.6% for blood calculation, and an accuracy of 0.927 for growth state prediction by 1.4% improvement than baseline.
Collapse
Affiliation(s)
- Shuo Li
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; (S.L.); (Y.X.); (M.Z.); (N.Y.); (A.Z.)
| | - He Wang
- Department of Neurological Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China;
| | - Yiding Xiao
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; (S.L.); (Y.X.); (M.Z.); (N.Y.); (A.Z.)
| | - Mingzi Zhang
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; (S.L.); (Y.X.); (M.Z.); (N.Y.); (A.Z.)
| | - Nanze Yu
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; (S.L.); (Y.X.); (M.Z.); (N.Y.); (A.Z.)
| | - Ang Zeng
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; (S.L.); (Y.X.); (M.Z.); (N.Y.); (A.Z.)
| | - Xiaojun Wang
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; (S.L.); (Y.X.); (M.Z.); (N.Y.); (A.Z.)
- Correspondence:
| |
Collapse
|
7
|
Gu R, Wang L, Zhang L. DE-Net: A deep edge network with boundary information for automatic skin lesion segmentation. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
8
|
Gastrointestinal Disease Classification in Endoscopic Images Using Attention-Guided Convolutional Neural Networks. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112311136] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Gastrointestinal (GI) diseases constitute a leading problem in the human digestive system. Consequently, several studies have explored automatic classification of GI diseases as a means of minimizing the burden on clinicians and improving patient outcomes, for both diagnostic and treatment purposes. The challenge in using deep learning-based (DL) approaches, specifically a convolutional neural network (CNN), is that spatial information is not fully utilized due to the inherent mechanism of CNNs. This paper proposes the application of spatial factors in improving classification performance. Specifically, we propose a deep CNN-based spatial attention mechanism for the classification of GI diseases, implemented with encoder–decoder layers. To overcome the data imbalance problem, we adapt data-augmentation techniques. A total of 12,147 multi-sited, multi-diseased GI images, drawn from publicly available and private sources, were used to validate the proposed approach. Furthermore, a five-fold cross-validation approach was adopted to minimize inconsistencies in intra- and inter-class variability and to ensure that results were robustly assessed. Our results, compared with other state-of-the-art models in terms of mean accuracy (ResNet50 = 90.28, GoogLeNet = 91.38, DenseNets = 91.60, and baseline = 92.84), demonstrated better outcomes (Precision = 92.8, Recall = 92.7, F1-score = 92.8, and Accuracy = 93.19). We also implemented t-distributed stochastic neighbor embedding (t–SNE) and confusion matrix analysis techniques for better visualization and performance validation. Overall, the results showed that the attention mechanism improved the automatic classification of multi-sited GI disease images. We validated clinical tests based on the proposed method by overcoming previous limitations, with the goal of improving automatic classification accuracy in future work.
Collapse
|