1
|
Yang T, Lu X, Yang L, Yang M, Chen J, Zhao H. Application of MRI image segmentation algorithm for brain tumors based on improved YOLO. Front Neurosci 2025; 18:1510175. [PMID: 39840016 PMCID: PMC11747661 DOI: 10.3389/fnins.2024.1510175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 12/18/2024] [Indexed: 01/23/2025] Open
Abstract
Objective To assist in the rapid clinical identification of brain tumor types while achieving segmentation detection, this study investigates the feasibility of applying the deep learning YOLOv5s algorithm model to the segmentation of brain tumor magnetic resonance images and optimizes and upgrades it on this basis. Methods The research institute utilized two public datasets of meningioma and glioma magnetic resonance imaging from Kaggle. Dataset 1 contains a total of 3,223 images, and Dataset 2 contains 216 images. From Dataset 1, we randomly selected 3,000 images and used the Labelimg tool to annotate the cancerous regions within the images. These images were then divided into training and validation sets in a 7:3 ratio. The remaining 223 images, along with Dataset 2, were ultimately used as the internal test set and external test set, respectively, to evaluate the model's segmentation effect. A series of optimizations were made to the original YOLOv5 algorithm, introducing the Atrous Spatial Pyramid Pooling (ASPP), Convolutional Block Attention Module (CBAM), Coordinate Attention (CA) for structural improvement, resulting in several optimized versions, namely YOLOv5s-ASPP, YOLOv5s-CBAM, YOLOv5s-CA, YOLOv5s-ASPP-CBAM, and YOLOv5s-ASPP-CA. The training and validation sets were input into the original YOLOv5s model, five optimized models, and the YOLOv8s model for 100 rounds of iterative training. The best weight file of the model with the best evaluation index in the six trained models was used for the final test of the test set. Results After iterative training, the seven models can segment and recognize brain tumor magnetic resonance images. Their precision rates on the validation set are 92.5, 93.5, 91.2, 91.8, 89.6, 90.8, and 93.1%, respectively. The corresponding recall rates are 84, 85.3, 85.4, 84.7, 87.3, 85.4, and 91.9%. The best weight file of the model with the best evaluation index among the six trained models was tested on the test set, and the improved model significantly enhanced the image segmentation ability compared to the original model. Conclusion Compared with the original YOLOv5s model, among the five improved models, the improved YOLOv5s-ASPP model significantly enhanced the segmentation ability of brain tumor magnetic resonance images, which is helpful in assisting clinical diagnosis and treatment planning.
Collapse
Affiliation(s)
- Tao Yang
- The First Clinical Medical College, The Affiliated People’s Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
| | - Xueqi Lu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Lanlan Yang
- The First Clinical Medical College, The Affiliated People’s Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
| | - Miyang Yang
- The First Clinical Medical College, The Affiliated People’s Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
| | - Jinghui Chen
- The First Clinical Medical College, The Affiliated People’s Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
| | - Hongjia Zhao
- The Affiliated People’s Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, China
| |
Collapse
|
2
|
Hettihewa K, Kobchaisawat T, Tanpowpong N, Chalidabhongse TH. MANet: a multi-attention network for automatic liver tumor segmentation in computed tomography (CT) imaging. Sci Rep 2023; 13:20098. [PMID: 37973987 PMCID: PMC10654423 DOI: 10.1038/s41598-023-46580-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
Automatic liver tumor segmentation is a paramount important application for liver tumor diagnosis and treatment planning. However, it has become a highly challenging task due to the heterogeneity of the tumor shape and intensity variation. Automatic liver tumor segmentation is capable to establish the diagnostic standard to provide relevant radiological information to all levels of expertise. Recently, deep convolutional neural networks have demonstrated superiority in feature extraction and learning in medical image segmentation. However, multi-layer dense feature stacks make the model quite inconsistent in imitating visual attention and awareness of radiological expertise for tumor recognition and segmentation task. To bridge that visual attention capability, attention mechanisms have developed for better feature selection. In this paper, we propose a novel network named Multi Attention Network (MANet) as a fusion of attention mechanisms to learn highlighting important features while suppressing irrelevant features for the tumor segmentation task. The proposed deep learning network has followed U-Net as the basic architecture. Moreover, residual mechanism is implemented in the encoder. Convolutional block attention module has split into channel attention and spatial attention modules to implement in encoder and decoder of the proposed architecture. The attention mechanism in Attention U-Net is integrated to extract low-level features to combine with high-level ones. The developed deep learning architecture is trained and evaluated on the publicly available MICCAI 2017 Liver Tumor Segmentation dataset and 3DIRCADb dataset under various evaluation metrics. MANet demonstrated promising results compared to state-of-the-art methods with comparatively small parameter overhead.
Collapse
Affiliation(s)
- Kasun Hettihewa
- Perceptual Intelligent Computing Laboratory, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
| | | | - Natthaporn Tanpowpong
- Department of Radiology, Faculty of Medicine, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Thanarat H Chalidabhongse
- Perceptual Intelligent Computing Laboratory, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
- Applied Digital Technology in Medicine (ATM) Research Group, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
| |
Collapse
|
3
|
Yang H, Xie L, Pan H, Li C, Wang Z, Zhong J. Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1246. [PMID: 37761545 PMCID: PMC10528512 DOI: 10.3390/e25091246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/07/2023] [Accepted: 08/15/2023] [Indexed: 09/29/2023]
Abstract
The emotional changes in facial micro-expressions are combinations of action units. The researchers have revealed that action units can be used as additional auxiliary data to improve facial micro-expression recognition. Most of the researchers attempt to fuse image features and action unit information. However, these works ignore the impact of action units on the facial image feature extraction process. Therefore, this paper proposes a local detail feature enhancement model based on a multimodal dynamic attention fusion network (MADFN) method for micro-expression recognition. This method uses a masked autoencoder based on learnable class tokens to remove local areas with low emotional expression ability in micro-expression images. Then, we utilize the action unit dynamic fusion module to fuse action unit representation to improve the potential representation ability of image features. The state-of-the-art performance of our proposed model is evaluated and verified on SMIC, CASME II, SAMM, and their combined 3DB-Combined datasets. The experimental results demonstrated that the proposed model achieved competitive performance with accuracy rates of 81.71%, 82.11%, and 77.21% on SMIC, CASME II, and SAMM datasets, respectively, that show the MADFN model can help to improve the discrimination of facial image emotional features.
Collapse
Affiliation(s)
- Hongling Yang
- Department of Computer Science, Changzhi University, Changzhi 046011, China;
| | - Lun Xie
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (L.X.); (C.L.); (Z.W.)
| | - Hang Pan
- Department of Computer Science, Changzhi University, Changzhi 046011, China;
| | - Chiqin Li
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (L.X.); (C.L.); (Z.W.)
| | - Zhiliang Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (L.X.); (C.L.); (Z.W.)
| | - Jialiang Zhong
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China;
| |
Collapse
|
4
|
Cîrneanu AL, Popescu D, Iordache D. New Trends in Emotion Recognition Using Image Analysis by Neural Networks, A Systematic Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:7092. [PMID: 37631629 PMCID: PMC10458371 DOI: 10.3390/s23167092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/29/2023] [Accepted: 08/02/2023] [Indexed: 08/27/2023]
Abstract
Facial emotion recognition (FER) is a computer vision process aimed at detecting and classifying human emotional expressions. FER systems are currently used in a vast range of applications from areas such as education, healthcare, or public safety; therefore, detection and recognition accuracies are very important. Similar to any computer vision task based on image analyses, FER solutions are also suitable for integration with artificial intelligence solutions represented by different neural network varieties, especially deep neural networks that have shown great potential in the last years due to their feature extraction capabilities and computational efficiency over large datasets. In this context, this paper reviews the latest developments in the FER area, with a focus on recent neural network models that implement specific facial image analysis algorithms to detect and recognize facial emotions. This paper's scope is to present from historical and conceptual perspectives the evolution of the neural network architectures that proved significant results in the FER area. This paper endorses convolutional neural network (CNN)-based architectures against other neural network architectures, such as recurrent neural networks or generative adversarial networks, highlighting the key elements and performance of each architecture, and the advantages and limitations of the proposed models in the analyzed papers. Additionally, this paper presents the available datasets that are currently used for emotion recognition from facial expressions and micro-expressions. The usage of FER systems is also highlighted in various domains such as healthcare, education, security, or social IoT. Finally, open issues and future possible developments in the FER area are identified.
Collapse
Affiliation(s)
- Andrada-Livia Cîrneanu
- Faculty of Automatic Control and Computers, University Politehnica of Bucharest, 060042 Bucharest, Romania;
| | - Dan Popescu
- Faculty of Automatic Control and Computers, University Politehnica of Bucharest, 060042 Bucharest, Romania;
| | - Dragoș Iordache
- The National Institute for Research & Development in Informatics-ICI Bucharest, 011455 Bucharest, Romania;
| |
Collapse
|
5
|
Pan H, Yang H, Xie L, Wang Z. Multi-scale fusion visual attention network for facial micro-expression recognition. Front Neurosci 2023; 17:1216181. [PMID: 37575295 PMCID: PMC10412924 DOI: 10.3389/fnins.2023.1216181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 06/26/2023] [Indexed: 08/15/2023] Open
Abstract
Introduction Micro-expressions are facial muscle movements that hide genuine emotions. In response to the challenge of micro-expression low-intensity, recent studies have attempted to locate localized areas of facial muscle movement. However, this ignores the feature redundancy caused by the inaccurate locating of the regions of interest. Methods This paper proposes a novel multi-scale fusion visual attention network (MFVAN), which learns multi-scale local attention weights to mask regions of redundancy features. Specifically, this model extracts the multi-scale features of the apex frame in the micro-expression video clips by convolutional neural networks. The attention mechanism focuses on the weights of local region features in the multi-scale feature maps. Then, we mask operate redundancy regions in multi-scale features and fuse local features with high attention weights for micro-expression recognition. The self-supervision and transfer learning reduce the influence of individual identity attributes and increase the robustness of multi-scale feature maps. Finally, the multi-scale classification loss, mask loss, and removing individual identity attributes loss joint to optimize the model. Results The proposed MFVAN method is evaluated on SMIC, CASME II, SAMM, and 3DB-Combined datasets that achieve state-of-the-art performance. The experimental results show that focusing on local at the multi-scale contributes to micro-expression recognition. Discussion This paper proposed MFVAN model is the first to combine image generation with visual attention mechanisms to solve the combination challenge problem of individual identity attribute interference and low-intensity facial muscle movements. Meanwhile, the MFVAN model reveal the impact of individual attributes on the localization of local ROIs. The experimental results show that a multi-scale fusion visual attention network contributes to micro-expression recognition.
Collapse
Affiliation(s)
- Hang Pan
- Department of Computer Science, Changzhi University, Changzhi, China
| | - Hongling Yang
- Department of Computer Science, Changzhi University, Changzhi, China
| | - Lun Xie
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Zhiliang Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| |
Collapse
|
6
|
Yuan H, Jin T, Ye X. Modification and Evaluation of Attention-Based Deep Neural Network for Structural Crack Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:6295. [PMID: 37514590 PMCID: PMC10386673 DOI: 10.3390/s23146295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
Cracks are one of the safety-evaluation indicators for structures, providing a maintenance basis for the health and safety of structures in service. Most structural inspections rely on visual observation, while bridges rely on traditional methods such as bridge inspection vehicles, which are inefficient and pose safety risks. To alleviate the problem of low efficiency and the high cost of structural health monitoring, deep learning, as a new technology, is increasingly being applied to crack detection and recognition. Focusing on this, the current paper proposes an improved model based on the attention mechanism and the U-Net network for crack-identification research. First, the training results of the two original models, U-Net and lrassp, were compared in the experiment. The results showed that U-Net performed better than lrassp according to various indicators. Therefore, we improved the U-Net network with the attention mechanism. After experimenting with the improved network, we found that the proposed ECA-UNet network increased the Intersection over Union (IOU) and recall indicators compared to the original U-Net network by 0.016 and 0.131, respectively. In practical large-scale structural crack recognition, the proposed model had better recognition performance than the other two models, with almost no errors in identifying noise under the premise of accurately identifying cracks, demonstrating a stronger capacity for crack recognition.
Collapse
Affiliation(s)
- Hangming Yuan
- Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Tao Jin
- Department of Civil Engineering, Zhejiang University, Hangzhou 310058, China
| | - Xiaowei Ye
- Department of Civil Engineering, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
7
|
Zhou H, Huang S, Li J, Wang SJ. Dual-ATME: Dual-Branch Attention Network for Micro-Expression Recognition. ENTROPY (BASEL, SWITZERLAND) 2023; 25:460. [PMID: 36981348 PMCID: PMC10048169 DOI: 10.3390/e25030460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 02/26/2023] [Accepted: 02/28/2023] [Indexed: 06/18/2023]
Abstract
Micro-expression recognition (MER) is challenging due to the difficulty of capturing the instantaneous and subtle motion changes of micro-expressions (MEs). Early works based on hand-crafted features extracted from prior knowledge showed some promising results, but have recently been replaced by deep learning methods based on the attention mechanism. However, with limited ME sample sizes, features extracted by these methods lack discriminative ME representations, in yet-to-be improved MER performance. This paper proposes the Dual-branch Attention Network (Dual-ATME) for MER to address the problem of ineffective single-scale features representing MEs. Specifically, Dual-ATME consists of two components: Hand-crafted Attention Region Selection (HARS) and Automated Attention Region Selection (AARS). HARS uses prior knowledge to manually extract features from regions of interest (ROIs). Meanwhile, AARS is based on attention mechanisms and extracts hidden information from data automatically. Finally, through similarity comparison and feature fusion, the dual-scale features could be used to learn ME representations effectively. Experiments on spontaneous ME datasets (including CASME II, SAMM, SMIC) and their composite dataset, MEGC2019-CD, showed that Dual-ATME achieves better, or more competitive, performance than the state-of-the-art MER methods.
Collapse
Affiliation(s)
- Haoliang Zhou
- School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
- Key Laboratory of Behavior Sciences, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
| | - Shucheng Huang
- School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
| | - Jingting Li
- Key Laboratory of Behavior Sciences, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
- Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Su-Jing Wang
- Key Laboratory of Behavior Sciences, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
- Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
8
|
Temporal augmented contrastive learning for micro-expression recognition. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
9
|
A Survey of Micro-expression Recognition Methods Based on LBP, Optical Flow and Deep Learning. Neural Process Lett 2023. [DOI: 10.1007/s11063-022-11123-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
10
|
Deep learning-based microexpression recognition: a survey. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07157-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
11
|
Motion magnification multi-feature relation network for facial microexpression recognition. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00680-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractMicroexpressions cannot be observed easily due to their short duration and small-expression range. These properties pose considerable challenges for the recognition of microexpressions. Thus, video motion magnification techniques help us to see small motions previously invisible to the naked eye. This study aimed to enhance the microexpression features with the help of motion amplification technology. Also, a motion magnification multi-feature relation network (MMFRN) combining video motion amplification and two feature relation modules was proposed. The spatial feature is enlarged while completing the spatial feature extraction, which is used for classification. In addition, we transferred Resnet50 network to extract the global features and improve feature comprehensiveness. The magnification of the features is controlled through hyperparameter amplification factor α. The effects of different magnification factors on the results are compared, and the best is selected. The experiments have verified that the enlarged network can resolve the misclassification problem caused by the one-to-one correspondence between microexpressions and facial action coding units. On CASME II datasets, MMFRN outperforms the traditional recognition methods and other neural networks.
Collapse
|
12
|
Sun X, Guo W, Shen J. Toward attention-based learning to predict the risk of brain degeneration with multimodal medical data. Front Neurosci 2022; 16:1043626. [PMID: 36741058 PMCID: PMC9889549 DOI: 10.3389/fnins.2022.1043626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 12/12/2022] [Indexed: 01/20/2023] Open
Abstract
Introduction Brain degeneration is commonly caused by some chronic diseases, such as Alzheimer's disease (AD) and diabetes mellitus (DM). The risk prediction of brain degeneration aims to forecast the situation of disease progression of patients in the near future based on their historical health records. It is beneficial for patients to make an accurate clinical diagnosis and early prevention of disease. Current risk predictions of brain degeneration mainly rely on single-modality medical data, such as Electronic Health Records (EHR) or magnetic resonance imaging (MRI). However, only leveraging EHR or MRI data for the pertinent and accurate prediction is insufficient because of single-modality information (e.g., pixel or volume information of image data or clinical context information of non-image data). Methods Several deep learning-based methods have used multimodal data to predict the risks of specified diseases. However, most of them simply integrate different modalities in an early, intermediate, or late fusion structure and do not care about the intra-modal and intermodal dependencies. A lack of these dependencies would lead to sub-optimal prediction performance. Thus, we propose an encoder-decoder framework for better risk prediction of brain degeneration by using MRI and EHR. An encoder module is one of the key components and mainly focuses on feature extraction of input data. Specifically, we introduce an encoder module, which integrates intra-modal and inter-modal dependencies with the spatial-temporal attention and cross-attention mechanism. The corresponding decoder module is another key component and mainly parses the features from the encoder. In the decoder module, a disease-oriented module is used to extract the most relevant disease representation features. We take advantage of a multi-head attention module followed by a fully connected layer to produce the predicted results. Results As different types of AD and DM influence the nature and severity of brain degeneration, we evaluate the proposed method for three-class prediction of AD and three-class prediction of DM. Our results show that the proposed method with integrated MRI and EHR data achieves an accuracy of 0.859 and 0.899 for the risk prediction of AD and DM, respectively. Discussion The prediction performance is significantly better than the benchmarks, including MRI-only, EHR-only, and state-of-the-art multimodal fusion methods.
Collapse
Affiliation(s)
- Xiaofei Sun
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Weiwei Guo
- EchoX Technology Limited, Hong Kong, Hong Kong SAR, China
| | - Jing Shen
- Department of Radiology, Affiliated Zhongshan Hospital of Dalian University, Dalian, Liaoning, China
- *Correspondence: Jing Shen,
| |
Collapse
|
13
|
Luan S, Xue X, Ding Y, Wei W, Zhu B. Adaptive Attention Convolutional Neural Network for Liver Tumor Segmentation. Front Oncol 2021; 11:680807. [PMID: 34434891 PMCID: PMC8381250 DOI: 10.3389/fonc.2021.680807] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 07/12/2021] [Indexed: 12/29/2022] Open
Abstract
Purpose Accurate segmentation of liver and liver tumors is critical for radiotherapy. Liver tumor segmentation, however, remains a difficult and relevant problem in the field of medical image processing because of the various factors like complex and variable location, size, and shape of liver tumors, low contrast between tumors and normal tissues, and blurred or difficult-to-define lesion boundaries. In this paper, we proposed a neural network (S-Net) that can incorporate attention mechanisms to end-to-end segmentation of liver tumors from CT images. Methods First, this study adopted a classical coding-decoding structure to realize end-to-end segmentation. Next, we introduced an attention mechanism between the contraction path and the expansion path so that the network could encode a longer range of semantic information in the local features and find the corresponding relationship between different channels. Then, we introduced long-hop connections between the layers of the contraction path and the expansion path, so that the semantic information extracted in both paths could be fused. Finally, the application of closed operation was used to dissipate the narrow interruptions and long, thin divide. This eliminated small cavities and produced a noise reduction effect. Results In this paper, we used the MICCAI 2017 liver tumor segmentation (LiTS) challenge dataset, 3DIRCADb dataset and doctors' manual contours of Hubei Cancer Hospital dataset to test the network architecture. We calculated the Dice Global (DG) score, Dice per Case (DC) score, volumetric overlap error (VOE), average symmetric surface distance (ASSD), and root mean square error (RMSE) to evaluate the accuracy of the architecture for liver tumor segmentation. The segmentation DG for tumor was found to be 0.7555, DC was 0.613, VOE was 0.413, ASSD was 1.186 and RMSE was 1.804. For a small tumor, DG was 0.3246 and DC was 0.3082. For a large tumor, DG was 0.7819 and DC was 0.7632. Conclusion S-Net obtained more semantic information with the introduction of an attention mechanism and long jump connection. Experimental results showed that this method effectively improved the effect of tumor recognition in CT images and could be applied to assist doctors in clinical treatment.
Collapse
Affiliation(s)
- Shunyao Luan
- Department of Optoelectronic Engineering, Huazhong University of Science and Technology, Wuhan, China
| | - Xudong Xue
- Oncology Radiotherapy Department, Hubei Cancer Hospital, Wuhan, China
| | - Yi Ding
- Oncology Radiotherapy Department, Hubei Cancer Hospital, Wuhan, China
| | - Wei Wei
- Oncology Radiotherapy Department, Hubei Cancer Hospital, Wuhan, China
| | - Benpeng Zhu
- Department of Optoelectronic Engineering, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
14
|
A New CBAM-P-Net Model for Few-Shot Forest Species Classification Using Airborne Hyperspectral Images. REMOTE SENSING 2021. [DOI: 10.3390/rs13071269] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
High-precision automatic identification and mapping of forest tree species composition is an important content of forest resource survey and monitoring. The airborne hyperspectral image contains rich spectral and spatial information, which provides the possibility of high-precision classification and mapping of forest tree species. Few-shot learning, as an application of deep learning, has become an effective method of image classification. Prototypical networks (P-Net) is a simple and practical deep learning network, which has significant advantages in solving few-shot classification problems. Considering the high band correlation and large data volume associated with airborne hyperspectral images, how to fully extract effective features, filter or reduce redundant features is the key to improving the classification accuracy of P-Net, in order to extract effective features in hyperspectral images and obtain a high-precision forest tree species classification model with limited samples. In this research, we embedded the convolutional block attention module (CBAM) between the convolution blocks of P-Net, the CBAM-P-Net was constructed, and a method to improve the feature extraction efficiency of the P-Net was proposed, although this method makes the network more complex and increases the computational cost to a certain extent. The results show that the combination strategy using Channel First for CBAM greatly improves the feature extraction efficiency of the model. In different sample windows, CBAM-P-Net has an average increase of 1.17% and 0.0129 in testing overall accuracy (OA) and kappa coefficient (Kappa). The optimal classification window is 17 × 17, the OA reaches 97.28%, and Kappa reaches 0.97, which is an increase of 1.95% and 0.0214 along with just 49 s of training time expended, respectively, compared with P-Net. Therefore, using a suitable sample window and applying the proposed CBAM-P-Net to classify airborne hyperspectral images can achieve high-precision classification and mapping of forest tree species.
Collapse
|