1
|
Hou G, Chen H, Ma Y, Jiang M, Hua C, Jiang C, Niu R. An occluded cherry tomato recognition model based on improved YOLOv7. Front Plant Sci 2023; 14:1260808. [PMID: 37929164 PMCID: PMC10625446 DOI: 10.3389/fpls.2023.1260808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/02/2023] [Indexed: 11/07/2023]
Abstract
The typical occlusion of cherry tomatoes in the natural environment is one of the most critical factors affecting the accurate picking of cherry tomato picking robots. To recognize occluded cherry tomatoes accurately and efficiently using deep convolutional neural networks, a new occluded cherry tomato recognition model DSP-YOLOv7-CA is proposed. Firstly, images of cherry tomatoes with different degrees of occlusion are acquired, four occlusion areas and four occlusion methods are defined, and a cherry tomato dataset (TOSL) is constructed. Then, based on YOLOv7, the convolution module of the original residual edges was replaced with null residual edges, depth-separable convolutional layers were added, and jump connections were added to reuse feature information. Then, a depth-separable convolutional layer is added to the SPPF module with fewer parameters to replace the original SPPCSPC module to solve the problem of loss of small target information by different pooled residual layers. Finally, a coordinate attention mechanism (CA) layer is introduced at the critical position of the enhanced feature extraction network to strengthen the attention to the occluded cherry tomato. The experimental results show that the DSP-YOLOv7-CA model outperforms other target detection models, with an average detection accuracy (mAP) of 98.86%, and the number of model parameters is reduced from 37.62MB to 33.71MB, which is better on the actual detection of cherry tomatoes with less than 95% occlusion. Relatively average results were obtained on detecting cherry tomatoes with a shade level higher than 95%, but such cherry tomatoes were not targeted for picking. The DSP-YOLOv7-CA model can accurately recognize the occluded cherry tomatoes in the natural environment, providing an effective solution for accurately picking cherry tomato picking robots.
Collapse
Affiliation(s)
- Guangyu Hou
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Science Island Branch, University of Science and Technology of China Country, Hefei, China
| | - Haihua Chen
- Institute of Computer Science, Chinese Academy of Sciences, Beijing, China
| | - Yike Ma
- Institute of Computer Science, Chinese Academy of Sciences, Beijing, China
| | - Mingkun Jiang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Science Island Branch, University of Science and Technology of China Country, Hefei, China
| | - Chen Hua
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Science Island Branch, University of Science and Technology of China Country, Hefei, China
| | - Chunmao Jiang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Science Island Branch, University of Science and Technology of China Country, Hefei, China
| | - Runxin Niu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| |
Collapse
|
2
|
Zeng Y, Zeng P, Shen S, Liang W, Li J, Zhao Z, Zhang K, Shen C. DCTR U-Net: automatic segmentation algorithm for medical images of nasopharyngeal cancer in the context of deep learning. Front Oncol 2023; 13:1190075. [PMID: 37546396 PMCID: PMC10402756 DOI: 10.3389/fonc.2023.1190075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/30/2023] [Indexed: 08/08/2023] Open
Abstract
Nasopharyngeal carcinoma (NPC) is a malignant tumor that occurs in the wall of the nasopharyngeal cavity and is prevalent in Southern China, Southeast Asia, North Africa, and the Middle East. According to studies, NPC is one of the most common malignant tumors in Hainan, China, and it has the highest incidence rate among otorhinolaryngological malignancies. We proposed a new deep learning network model to improve the segmentation accuracy of the target region of nasopharyngeal cancer. Our model is based on the U-Net-based network, to which we add Dilated Convolution Module, Transformer Module, and Residual Module. The new deep learning network model can effectively solve the problem of restricted convolutional fields of perception and achieve global and local multi-scale feature fusion. In our experiments, the proposed network was trained and validated using 10-fold cross-validation based on the records of 300 clinical patients. The results of our network were evaluated using the dice similarity coefficient (DSC) and the average symmetric surface distance (ASSD). The DSC and ASSD values are 0.852 and 0.544 mm, respectively. With the effective combination of the Dilated Convolution Module, Transformer Module, and Residual Module, we significantly improved the segmentation performance of the target region of the NPC.
Collapse
Affiliation(s)
- Yan Zeng
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
- ChinaPersonnel Department, Hainan Medical University, Haikou, China
| | - PengHui Zeng
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
| | - ShaoDong Shen
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Wei Liang
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Jun Li
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Zhe Zhao
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Kun Zhang
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
- School of Information Science and Technology, Hainan Normal University, Haikou, China
| | - Chong Shen
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Information and Communication Engineering, Hainan University, Haikou, China
| |
Collapse
|
3
|
Wang X, Tang L, Zheng Q, Yang X, Lu Z. IRDC-Net: An Inception Network with a Residual Module and Dilated Convolution for Sign Language Recognition Based on Surface Electromyography. Sensors (Basel) 2023; 23:5775. [PMID: 37447625 DOI: 10.3390/s23135775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/15/2023]
Abstract
Deaf and hearing-impaired people always face communication barriers. Non-invasive surface electromyography (sEMG) sensor-based sign language recognition (SLR) technology can help them to better integrate into social life. Since the traditional tandem convolutional neural network (CNN) structure used in most CNN-based studies inadequately captures the features of the input data, we propose a novel inception architecture with a residual module and dilated convolution (IRDC-net) to enlarge the receptive fields and enrich the feature maps, applying it to SLR tasks for the first time. This work first transformed the time domain signal into a time-frequency domain using discrete Fourier transformation. Second, an IRDC-net was constructed to recognize ten Chinese sign language signs. Third, the tandem CNN networks VGG-net and ResNet-18 were compared with our proposed parallel structure network, IRDC-net. Finally, the public dataset Ninapro DB1 was utilized to verify the generalization performance of the IRDC-net. The results showed that after transforming the time domain sEMG signal into the time-frequency domain, the classification accuracy (acc) increased from 84.29% to 91.70% when using the IRDC-net on our sign language dataset. Furthermore, for the time-frequency information of the public dataset Ninapro DB1, the classification accuracy reached 89.82%; this value is higher than that achieved in other recent studies. As such, our findings contribute to research into SLR tasks and to improving deaf and hearing-impaired people's daily lives.
Collapse
Affiliation(s)
- Xiangrui Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Lu Tang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Qibin Zheng
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Xilin Yang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zhiyuan Lu
- School of Rehabilitation Science and Engineering, University of Health and Rehabilitation Sciences, Qingdao 266072, China
| |
Collapse
|
4
|
Huang A, Jiang L, Zhang J, Wang Q. Attention-VGG16-UNet: a novel deep learning approach for automatic segmentation of the median nerve in ultrasound images. Quant Imaging Med Surg 2022; 12:3138-3150. [PMID: 35655843 PMCID: PMC9131343 DOI: 10.21037/qims-21-1074] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 03/07/2022] [Indexed: 10/15/2023]
Abstract
BACKGROUND Ultrasonography-an imaging technique that can show the anatomical section of nerves and surrounding tissues-is one of the most effective imaging methods to diagnose nerve diseases. However, segmenting the median nerve in two-dimensional (2D) ultrasound images is challenging due to the tiny and inconspicuous size of the nerve, the low contrast of images, and imaging noise. This study aimed to apply deep learning approaches to improve the accuracy of automatic segmentation of the median nerve in ultrasound images. METHODS In this study, we proposed an improved network called VGG16-UNet, which incorporates a contracting path and an expanding path. The contracting path is the VGG16 model with the 3 fully connected layers removed. The architecture of the expanding path resembles the upsampling path of U-Net. Moreover, attention mechanisms or/and residual modules were added to the U-Net and VGG16-UNet, which sequentially obtained Attention-UNet (A-UNet), Summation-UNet (S-UNet), Attention-Summation-UNet (AS-UNet), Attention-VGG16-UNet (A-VGG16-UNet), Summation-VGG16-UNet (S-VGG16-UNet), and Attention-Summation-VGG16-UNet (AS-VGG16-UNet). Each model was trained on the dataset of 910 median nerve images from 19 participants and tested on 207 frames from a new image sequence. The performance of the models was evaluated by metrics including Dice similarity coefficient (Dice), Jaccard similarity coefficient (Jaccard), Precision, and Recall. Based on the best segmentation results, we reconstructed a 3D median nerve image using the volume rendering method in the Visualization Toolkit (VTK) to assist in clinical nerve diagnosis. RESULTS The results of paired t-tests showed significant differences (P<0.01) in the metrics' values of different models. It showed that AS-UNet ranked first in U-Net models. The VGG16-UNet and its variants performed better than the corresponding U-Net models. Furthermore, the model's performance with the attention mechanism was superior to that with the residual module either based on U-Net or VGG16-UNet. The A-VGG16-UNet achieved the best performance (Dice =0.904±0.035, Jaccard =0.826±0.057, Precision =0.905±0.061, and Recall =0.909±0.061). Finally, we applied the trained A-VGG16-UNet to segment the median nerve in the image sequence, then reconstructed and visualized the 3D image of the median nerve. CONCLUSIONS This study demonstrates that the attention mechanism and residual module improve deep learning models for segmenting ultrasound images. The proposed VGG16-UNet-based models performed better than U-Net-based models. With segmentation, a 3D median nerve image can be reconstructed and can provide a visual reference for nerve diagnosis.
Collapse
Affiliation(s)
- Aiyue Huang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, China
| | - Li Jiang
- Department of Rehabilitation, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Jiangshan Zhang
- Department of Rehabilitation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Qing Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, China
| |
Collapse
|
5
|
Cao T, Wang G, Ren L, Li Y, Wang H. Brain tumor magnetic resonance image segmentation by a multiscale contextual attention module combined with a deep residual UNet (MCA-ResUNet). Phys Med Biol 2022; 67. [PMID: 35294935 DOI: 10.1088/1361-6560/ac5e5c] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/16/2022] [Indexed: 11/12/2022]
Abstract
Background and Objective. Automatic segmentation of MRI brain tumor area is a key step in the diagnosis and treatment of brain tumor. In recent years, the improved network based on UNet encoding and decoding structure has been widely used in brain tumor segmentation. However, due to continuous convolution and pooling operations, some spatial context information in existing networks will be discontinuous or even missing. It will affect the segmentation accuracy of the model. Therefore, the method proposed in this paper is to alleviate the lack of spatial context information and improve the accuracy of the model.Approach. This paper proposes a context attention module (multiscale contextual attention) to capture and filter out high-level features with spatial context information, which solves the problem of context information loss in feature extraction. The channel attention mechanism is introduced into the decoding structure to realize the fusion of high-level features and low-level features. The standard convolution block in the encoding and decoding structure is replaced by the pre-activated residual block to optimize the network training and improve the network performance.Results. This paper uses two public data sets (BraTs 2017 and BraTs 2019) to evaluate and verify the proposed method. Experimental results show that the proposed method can effectively alleviate the lack of spatial context information, and the segmentation performance is better than other existing methods.Significance. The method improves the segmentation performance of the model. It will assist doctors in making accurate diagnosis and provide reference basis for tumor resection. As a result, the proposed method will reduce the operation risk of patients and the postoperative recurrence rate.
Collapse
Affiliation(s)
- Tianyi Cao
- College of Electronic and Information Engineering, Hebei University, Baoding, Hebei 071002, People's Republic of China
| | - Guanglei Wang
- College of Electronic and Information Engineering, Hebei University, Baoding, Hebei 071002, People's Republic of China
| | - Lili Ren
- The Affiliated Hospital of Hebei University, Baoding, Hebei 071002, People's Republic of China
| | - Yan Li
- College of Electronic and Information Engineering, Hebei University, Baoding, Hebei 071002, People's Republic of China
| | - Hongrui Wang
- Hebei University, Baoding, Hebei 071002, People's Republic of China
| |
Collapse
|
6
|
Li H, Luo H, Liu Y. Paraspinal Muscle Segmentation Based on Deep Neural Network. Sensors (Basel) 2019; 19:E2650. [PMID: 31212736 PMCID: PMC6630766 DOI: 10.3390/s19122650] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 06/03/2019] [Accepted: 06/07/2019] [Indexed: 12/23/2022]
Abstract
The accurate segmentation of the paraspinal muscle in Magnetic Resonance (MR) images is a critical step in the automated analysis of lumbar diseases such as chronic low back pain, disc herniation and lumbar spinal stenosis. However, the automatic segmentation of multifidus and erector spinae has not yet been achieved due to three unusual challenges: (1) the muscle boundary is unclear; (2) the gray histogram distribution of the target overlaps with the background; (3) the intra- and inter-patient shape is variable. We propose to tackle the problem of the automatic segmentation of paravertebral muscles using a deformed U-net consisting of two main modules: the residual module and the feature pyramid attention (FPA) module. The residual module can directly return the gradient while preserving the details of the image to make the model easier to train. The FPA module fuses different scales of context information and provides useful salient features for high-level feature maps. In this paper, 120 cases were used for experiments, which were provided and labeled by the spine surgery department of Shengjing Hospital of China Medical University. The experimental results show that the model can achieve higher predictive capability. The dice coefficient of the multifidus is as high as 0.949, and the Hausdorff distance is 4.62 mm. The dice coefficient of the erector spinae is 0.913 and the Hausdorff distance is 7.89 mm. The work of this paper will contribute to the development of an automatic measurement system for paraspinal muscles, which is of great significance for the treatment of spinal diseases.
Collapse
Affiliation(s)
- Haixing Li
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Science, Shenyang 110016, China.
- The Key Lab of Image Understanding and Computer Vision, Liaoning province, Shenyang 110016, China.
| | - Haibo Luo
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Science, Shenyang 110016, China.
- The Key Lab of Image Understanding and Computer Vision, Liaoning province, Shenyang 110016, China.
| | - Yunpeng Liu
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Science, Shenyang 110016, China.
- The Key Lab of Image Understanding and Computer Vision, Liaoning province, Shenyang 110016, China.
| |
Collapse
|