1
|
Yan S, Yang B, Chen A. A differential network with multiple gated reverse attention for medical image segmentation. Sci Rep 2024; 14:20274. [PMID: 39217265 PMCID: PMC11365968 DOI: 10.1038/s41598-024-71194-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024] Open
Abstract
UNet architecture has achieved great success in medical image segmentation applications. However, these models still encounter several challenges. One is the loss of pixel-level information caused by multiple down-sampling steps. Additionally, the addition or concatenation method used in the decoder can generate redundant information. These limitations affect the localization ability, weaken the complementarity of features at different levels and can lead to blurred boundaries. However, differential features can effectively compensate for these shortcomings and significantly enhance the performance of image segmentation. Therefore, we propose MGRAD-UNet (multi-gated reverse attention multi-scale differential UNet) based on UNet. We utilize the multi-scale differential decoder to generate abundant differential features at both the pixel level and structure level. These features which serve as gate signals, are transmitted to the gate controller and forwarded to the other differential decoder. In order to enhance the focus on important regions, another differential decoder is equipped with reverse attention. The features obtained by two differential decoders are differentiated for the second time. The resulting differential feature obtained is sent back to the controller as a control signal, then transmitted to the encoder for learning the differential feature by two differential decoders. The core design of MGRAD-UNet lies in extracting comprehensive and accurate features through caching overall differential features and multi-scale differential processing, enabling iterative learning from diverse information. We evaluate MGRAD-UNet against state-of-theart (SOTA) methods on two public datasets. Our method surpasses competitors and provides a new approach for the design of UNet.
Collapse
Affiliation(s)
- Shun Yan
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China
| | - Benquan Yang
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China.
| | - Aihua Chen
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China.
| |
Collapse
|
2
|
Guo R, Zhang R, Zhou H, Xie T, Peng Y, Chen X, Yu G, Wan F, Li L, Zhang Y, Liu R. CTDUNet: A Multimodal CNN-Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments. PLANTS (BASEL, SWITZERLAND) 2024; 13:2274. [PMID: 39204710 PMCID: PMC11359422 DOI: 10.3390/plants13162274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 08/11/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024]
Abstract
Camellia oleifera is a crop of high economic value, yet it is particularly susceptible to various diseases and pests that significantly reduce its yield and quality. Consequently, the precise segmentation and classification of diseased Camellia leaves are vital for managing pests and diseases effectively. Deep learning exhibits significant advantages in the segmentation of plant diseases and pests, particularly in complex image processing and automated feature extraction. However, when employing single-modal models to segment Camellia oleifera diseases, three critical challenges arise: (A) lesions may closely resemble the colors of the complex background; (B) small sections of diseased leaves overlap; (C) the presence of multiple diseases on a single leaf. These factors considerably hinder segmentation accuracy. A novel multimodal model, CNN-Transformer Dual U-shaped Network (CTDUNet), based on a CNN-Transformer architecture, has been proposed to integrate image and text information. This model first utilizes text data to address the shortcomings of single-modal image features, enhancing its ability to distinguish lesions from environmental characteristics, even under conditions where they closely resemble one another. Additionally, we introduce Coordinate Space Attention (CSA), which focuses on the positional relationships between targets, thereby improving the segmentation of overlapping leaf edges. Furthermore, cross-attention (CA) is employed to align image and text features effectively, preserving local information and enhancing the perception and differentiation of various diseases. The CTDUNet model was evaluated on a self-made multimodal dataset compared against several models, including DeeplabV3+, UNet, PSPNet, Segformer, HrNet, and Language meets Vision Transformer (LViT). The experimental results demonstrate that CTDUNet achieved an mean Intersection over Union (mIoU) of 86.14%, surpassing both multimodal models and the best single-modal model by 3.91% and 5.84%, respectively. Additionally, CTDUNet exhibits high balance in the multi-class segmentation of Camellia oleifera diseases and pests. These results indicate the successful application of fused image and text multimodal information in the segmentation of Camellia disease, achieving outstanding performance.
Collapse
Affiliation(s)
- Ruitian Guo
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Ruopeng Zhang
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Hao Zhou
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Tunjun Xie
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Yuting Peng
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Xili Chen
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Guo Yu
- School of Business, Central South University of Forestry and Technology, Changsha 410004, China;
| | - Fangying Wan
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Lin Li
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Yongzhong Zhang
- School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China; (R.G.); (R.Z.); (H.Z.); (T.X.); (Y.P.); (X.C.); (Y.Z.)
| | - Ruifeng Liu
- School of Forestry, Central South University of Forestry and Technology, Changsha 410004, China;
| |
Collapse
|
3
|
Pramanik P, Roy A, Cuevas E, Perez-Cisneros M, Sarkar R. DAU-Net: Dual attention-aided U-Net for segmenting tumor in breast ultrasound images. PLoS One 2024; 19:e0303670. [PMID: 38820462 PMCID: PMC11142567 DOI: 10.1371/journal.pone.0303670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 04/29/2024] [Indexed: 06/02/2024] Open
Abstract
Breast cancer remains a critical global concern, underscoring the urgent need for early detection and accurate diagnosis to improve survival rates among women. Recent developments in deep learning have shown promising potential for computer-aided detection (CAD) systems to address this challenge. In this study, a novel segmentation method based on deep learning is designed to detect tumors in breast ultrasound images. Our proposed approach combines two powerful attention mechanisms: the novel Positional Convolutional Block Attention Module (PCBAM) and Shifted Window Attention (SWA), integrated into a Residual U-Net model. The PCBAM enhances the Convolutional Block Attention Module (CBAM) by incorporating the Positional Attention Module (PAM), thereby improving the contextual information captured by CBAM and enhancing the model's ability to capture spatial relationships within local features. Additionally, we employ SWA within the bottleneck layer of the Residual U-Net to further enhance the model's performance. To evaluate our approach, we perform experiments using two widely used datasets of breast ultrasound images and the obtained results demonstrate its capability in accurately detecting tumors. Our approach achieves state-of-the-art performance with dice score of 74.23% and 78.58% on BUSI and UDIAT datasets, respectively in segmenting the breast tumor region, showcasing its potential to help with precise tumor detection. By leveraging the power of deep learning and integrating innovative attention mechanisms, our study contributes to the ongoing efforts to improve breast cancer detection and ultimately enhance women's survival rates. The source code of our work can be found here: https://github.com/AyushRoy2001/DAUNet.
Collapse
Affiliation(s)
- Payel Pramanik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Ayush Roy
- Department of Electrical Engineering, Jadavpur University, Kolkata, India
| | - Erik Cuevas
- Departamento de Electrónica, Universidad de Guadalajara, Guadalajara, Mexico
| | - Marco Perez-Cisneros
- División de Tecnologías Para La Integración Ciber-Humana, Universidad de Guadalajara, Guadalajara, Mexico
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
4
|
Dao TTP, Huynh TL, Pham MK, Le TN, Nguyen TC, Nguyen QT, Tran BA, Van BN, Ha CC, Tran MT. Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01068-z. [PMID: 38809338 DOI: 10.1007/s10278-024-01068-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/24/2024] [Accepted: 02/26/2024] [Indexed: 05/30/2024]
Abstract
The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.
Collapse
Affiliation(s)
- Thao Thi Phuong Dao
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- Department of Otolaryngology, Thong Nhat Hospital, Tan Binh District, Ho Chi Minh City, Vietnam
| | - Tuan-Luc Huynh
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | | | - Trung-Nghia Le
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Tan-Cong Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- University of Social Sciences and Humanities, Ho Chi Minh City, Vietnam
| | - Quang-Thuc Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, District 5, Ho Chi Minh City, Vietnam
| | - Boi Ngoc Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Binh Thanh District, Ho Chi Minh City, Vietnam
| | - Chanh Cong Ha
- Department of Otolaryngology, District 7 Hospital, District 7, Ho Chi Minh City, Vietnam
| | - Minh-Triet Tran
- University of Science, Ho Chi Minh City, Vietnam.
- John von Neumann Institute, Ho Chi Minh City, Vietnam.
- Vietnam National University, Ho Chi Minh City, Vietnam.
| |
Collapse
|
5
|
Fernández-Vigo JI, Macarro-Merino A, De Moura-Ramos JJ, Alvarez-Rodriguez L, Burgos-Blasco B, Novo-Bujan J, Ortega-Hortas M, Fernández-Vigo JÁ. Comparative study of the glistening between four intraocular lens models assessed by OCT and deep learning. J Cataract Refract Surg 2024; 50:37-42. [PMID: 37702457 DOI: 10.1097/j.jcrs.0000000000001316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 09/08/2023] [Indexed: 09/14/2023]
Abstract
PURPOSE To evaluate the glistening in 4 different models of intraocular lenses (IOLs) using optical coherence tomography (OCT) and deep learning (DL). SETTING Centro Internacional de Oftalmología Avanzada (Madrid, Spain). DESIGN Cross-sectional study. METHODS 325 eyes were assessed for the presence and severity of glistening in 4 IOL models: ReSTOR+3 SN6AD1 (n = 41), SN60WF (n = 110), PanOptix TFNT (n = 128) and Vivity DFT015 (n = 46). The presence of glistening was analyzed using OCT, identifying the presence of hyperreflective foci (HRF) in the central area of the IOL. A manual and an original DL-based quantification algorithm designed for this purpose was applied. RESULTS Glistening was detected in 22 (53.7%) ReSTOR SN6AD1, 44 (40%) SN60WF, 49 (38.3%) PanOptix TFNT, and 4 (8.7%) Vivity DFT015 IOLs, when any grade was considered. In the comparison of the different types of IOLs, global glistening measured as total HRF was 17.3 ± 25.9 for the ReSTOR+3; 9.3 ± 15.7 for the SN60WF; 6.9 ± 10.5 for the PanOptix; and 1.2 ± 2.6 for the Vivity ( P < .05). There was excellent agreement between manual and DL-based quantification (≥0.829). CONCLUSIONS It is possible to quantify, classify and compare the glistening severity in different IOL models using OCT images in a simple and objective manner with a DL algorithm. In the comparative study, the Vivity presented the lowest severity of glistening.
Collapse
Affiliation(s)
- José Ignacio Fernández-Vigo
- From the Centro Internacional de Oftalmología Avanzada, Madrid, Spain (J.I. Fernández-Vigo, Macarro-Merino, J.Á. Fernández-Vigo); Department of Ophthalmology, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria (IdISSC), Madrid, Spain (J.I. Fernández-Vigo, Burgos-Blasco); Department of Computational, Centro de Investigacion CITIC, Universidade da Coruña, A Coruña, Spain (De Moura-Ramos, Alvarez-Rodriguez, Novo-Bujan, Ortega-Hortas); Department of Computational, VARPA Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, A Coruña, Spain (De Moura-Ramos, Alvarez-Rodriguez, Novo-Bujan, Ortega-Hortas); Department of Ophthalmology, Universidad de Extremadura, Badajoz, Spain (J.Á. Fernández-Vigo)
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Elizar E, Zulkifley MA, Muharar R, Zaman MHM, Mustaza SM. A Review on Multiscale-Deep-Learning Applications. SENSORS (BASEL, SWITZERLAND) 2022; 22:7384. [PMID: 36236483 PMCID: PMC9573412 DOI: 10.3390/s22197384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/23/2022] [Accepted: 09/24/2022] [Indexed: 06/16/2023]
Abstract
In general, most of the existing convolutional neural network (CNN)-based deep-learning models suffer from spatial-information loss and inadequate feature-representation issues. This is due to their inability to capture multiscale-context information and the exclusion of semantic information throughout the pooling operations. In the early layers of a CNN, the network encodes simple semantic representations, such as edges and corners, while, in the latter part of the CNN, the network encodes more complex semantic features, such as complex geometric shapes. Theoretically, it is better for a CNN to extract features from different levels of semantic representation because tasks such as classification and segmentation work better when both simple and complex feature maps are utilized. Hence, it is also crucial to embed multiscale capability throughout the network so that the various scales of the features can be optimally captured to represent the intended task. Multiscale representation enables the network to fuse low-level and high-level features from a restricted receptive field to enhance the deep-model performance. The main novelty of this review is the comprehensive novel taxonomy of multiscale-deep-learning methods, which includes details of several architectures and their strengths that have been implemented in the existing works. Predominantly, multiscale approaches in deep-learning networks can be classed into two categories: multiscale feature learning and multiscale feature fusion. Multiscale feature learning refers to the method of deriving feature maps by examining kernels over several sizes to collect a larger range of relevant features and predict the input images' spatial mapping. Multiscale feature fusion uses features with different resolutions to find patterns over short and long distances, without a deep network. Additionally, several examples of the techniques are also discussed according to their applications in satellite imagery, medical imaging, agriculture, and industrial and manufacturing systems.
Collapse
Affiliation(s)
- Elizar Elizar
- Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
- Department of Electrical and Computer Engineering, Faculty of Engineering, Universitas Syiah Kuala, Kopelma Darussalam 23111, Indonesia
| | - Mohd Asyraf Zulkifley
- Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| | - Rusdha Muharar
- Department of Electrical and Computer Engineering, Faculty of Engineering, Universitas Syiah Kuala, Kopelma Darussalam 23111, Indonesia
| | - Mohd Hairi Mohd Zaman
- Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| | - Seri Mastura Mustaza
- Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| |
Collapse
|
7
|
Zhang H, Zhong X, Li Z, Chen Y, Zhu Z, Lv J, Li C, Zhou Y, Li G. TiM-Net: Transformer in M-Net for Retinal Vessel Segmentation. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:9016401. [PMID: 35859930 PMCID: PMC9293566 DOI: 10.1155/2022/9016401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/04/2022] [Accepted: 06/21/2022] [Indexed: 11/17/2022]
Abstract
retinal image is a crucial window for the clinical observation of cardiovascular, cerebrovascular, or other correlated diseases. Retinal vessel segmentation is of great benefit to the clinical diagnosis. Recently, the convolutional neural network (CNN) has become a dominant method in the retinal vessel segmentation field, especially the U-shaped CNN models. However, the conventional encoder in CNN is vulnerable to noisy interference, and the long-rang relationship in fundus images has not been fully utilized. In this paper, we propose a novel model called Transformer in M-Net (TiM-Net) based on M-Net, diverse attention mechanisms, and weighted side output layers to efficaciously perform retinal vessel segmentation. First, to alleviate the effects of noise, a dual-attention mechanism based on channel and spatial is designed. Then the self-attention mechanism in Transformer is introduced into skip connection to re-encode features and model the long-range relationship explicitly. Finally, a weighted SideOut layer is proposed for better utilization of the features from each side layer. Extensive experiments are conducted on three public data sets to show the effectiveness and robustness of our TiM-Net compared with the state-of-the-art baselines. Both quantitative and qualitative results prove its clinical practicality. Moreover, variants of TiM-Net also achieve competitive performance, demonstrating its scalability and generalization ability. The code of our model is available at https://github.com/ZX-ECJTU/TiM-Net.
Collapse
Affiliation(s)
- Hongbin Zhang
- School of Software, East China Jiaotong University, Nanchang, China
| | - Xiang Zhong
- School of Software, East China Jiaotong University, Nanchang, China
| | - Zhijie Li
- School of Software, East China Jiaotong University, Nanchang, China
| | - Yanan Chen
- School of International, East China Jiaotong University, Nanchang, China
| | - Zhiliang Zhu
- School of Software, East China Jiaotong University, Nanchang, China
| | - Jingqin Lv
- School of Software, East China Jiaotong University, Nanchang, China
| | - Chuanxiu Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Ying Zhou
- Medical School, Nanchang University, Nanchang, China
| | - Guangli Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| |
Collapse
|