101
|
Du Y, Liu X, Yi Y, Wei K. Optimizing Road Safety: Advancements in Lightweight YOLOv8 Models and GhostC2f Design for Real-Time Distracted Driving Detection. Sensors (Basel) 2023; 23:8844. [PMID: 37960543 PMCID: PMC10649436 DOI: 10.3390/s23218844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/28/2023] [Accepted: 10/29/2023] [Indexed: 11/15/2023]
Abstract
The rapid detection of distracted driving behaviors is crucial for enhancing road safety and preventing traffic accidents. Compared with the traditional methods of distracted-driving-behavior detection, the YOLOv8 model has been proven to possess powerful capabilities, enabling it to perceive global information more swiftly. Currently, the successful application of GhostConv in edge computing and embedded systems further validates the advantages of lightweight design in real-time detection using large models. Effectively integrating lightweight strategies into YOLOv8 models and reducing their impact on model performance has become a focal point in the field of real-time distracted driving detection based on deep learning. Inspired by GhostConv, this paper presents an innovative GhostC2f design, aiming to integrate the idea of linear transformation to generate more feature maps without additional computation into YOLOv8 for real-time distracted-driving-detection tasks. The goal is to reduce model parameters and computational load. Additionally, enhancements have been made to the path aggregation network (PAN) to amplify multi-level feature fusion and contextual information propagation. Furthermore, simple attention mechanisms (SimAMs) are introduced to perform self-normalization on each feature map, emphasizing feature maps with valuable information and suppressing redundant information interference in complex backgrounds. Lastly, the nine distinct distracted driving types in the publicly available SFDDD dataset were expanded to 14 categories, and nighttime scenarios were introduced. The results indicate a 5.1% improvement in model accuracy, with model weight size and computational load reduced by 36.7% and 34.6%, respectively. During 30 real vehicle tests, the distracted-driving-detection accuracy reached 91.9% during daylight and 90.3% at night, affirming the exceptional performance of the proposed model in assisting distracted driving detection when driving and contributing to accident-risk reduction.
Collapse
Affiliation(s)
- Yingjie Du
- School of Automotive and Transportation, Tianjin University of Technology and Education, Tianjin 300222, China; (X.L.); (Y.Y.); (K.W.)
| | | | | | | |
Collapse
|
102
|
Liu F, Zhu X, Feng P, Zeng L. Anomaly Detection via Progressive Reconstruction and Hierarchical Feature Fusion. Sensors (Basel) 2023; 23:8750. [PMID: 37960450 PMCID: PMC10647205 DOI: 10.3390/s23218750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/08/2023] [Accepted: 10/12/2023] [Indexed: 11/15/2023]
Abstract
The main challenges in reconstruction-based anomaly detection include the breakdown of the generalization gap due to improved fitting capabilities and the overfitting problem arising from simulated defects. To overcome this, we propose a new method called PRFF-AD, which utilizes progressive reconstruction and hierarchical feature fusion. It consists of a reconstructive sub-network and a discriminative sub-network. The former achieves anomaly-free reconstruction while maintaining nominal patterns, and the latter locates defects based on pre- and post-reconstruction information. Given defective samples, we find that adopting a progressive reconstruction approach leads to higher-quality reconstructions without compromising the assumption of a generalization gap. Meanwhile, to alleviate the network's overfitting of synthetic defects and address the issue of reconstruction errors, we fuse hierarchical features as guidance for discriminating defects. Moreover, with the help of an attention mechanism, the network achieves higher classification and localization accuracy. In addition, we construct a large dataset for packaging chips, named GTanoIC, with 1750 real non-defective samples and 470 real defective samples, and we provide their pixel-level annotations. Evaluation results demonstrate that our method outperforms other reconstruction-based methods on two challenging datasets: MVTec AD and GTanoIC.
Collapse
Affiliation(s)
| | | | | | - Long Zeng
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (F.L.); (X.Z.); (P.F.)
| |
Collapse
|
103
|
Li G, Fu M, Sun M, Liu X, Zheng B. A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model. Sensors (Basel) 2023; 23:8770. [PMID: 37960477 PMCID: PMC10647675 DOI: 10.3390/s23218770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/24/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023]
Abstract
The cocktail party problem can be more effectively addressed by leveraging the speaker's visual and audio information. This paper proposes a method to improve the audio's separation using two visual cues: facial features and lip movement. Firstly, residual connections are introduced in the audio separation module to extract detailed features. Secondly, considering the video stream contains information other than the face, which has a minimal correlation with the audio, an attention mechanism is employed in the face module to focus on crucial information. Then, the loss function considers the audio-visual similarity to take advantage of the relationship between audio and visual completely. Experimental results on the public VoxCeleb2 dataset show that the proposed model significantly enhanced SDR, PSEQ, and STOI, especially 4 dB improvements in SDR.
Collapse
Affiliation(s)
- Guizhu Li
- College of Electronic Engineering, Ocean University of China, Qingdao 266100, China
| | - Min Fu
- College of Electronic Engineering, Ocean University of China, Qingdao 266100, China
- Sanya Oceanography Institution, Ocean University of China, Sanya 572024, China
| | - Mengnan Sun
- College of Electronic Engineering, Ocean University of China, Qingdao 266100, China
| | - Xuefeng Liu
- College of Automation and Electronic Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Bing Zheng
- College of Electronic Engineering, Ocean University of China, Qingdao 266100, China
- Sanya Oceanography Institution, Ocean University of China, Sanya 572024, China
| |
Collapse
|
104
|
Chung WH, Gu YH, Yoo SJ. CHP Engine Anomaly Detection Based on Parallel CNN-LSTM with Residual Blocks and Attention. Sensors (Basel) 2023; 23:8746. [PMID: 37960445 PMCID: PMC10650369 DOI: 10.3390/s23218746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 11/15/2023]
Abstract
The extreme operating environment of the combined heat and power (CHP) engine is likely to cause anomalies and defects, which can lead to engine failure; thus, detecting engine anomalies is essential. In this study, we propose a parallel convolutional neural network-long short-term memory (CNN-LSTM) residual blocks attention (PCLRA) anomaly detection model with engine sensor data. To our knowledge, this is the first time that parallel CNN-LSTM-based networks have been used in the field of CHP engine anomaly detection. In PCLRA, spatiotemporal features are extracted via CNN-LSTM in parallel and the information loss is compensated using the residual blocks and attention mechanism. The performance of PCLRA is compared with various hybrid models for 15 cases. First, the performances of serial and parallel models are compared. In addition, we evaluated the contributions of the residual blocks and attention mechanism to the performance of the CNN-LSTM hybrid model. The results indicate that PCLRA achieves the best performance, with a macro f1 score (mean ± standard deviation) of 0.951 ± 0.033, an anomaly f1 score of 0.903 ± 0.064, and an accuracy of 0.999 ± 0.002. We expect that the energy efficiency and safety of CHP engines can be improved by applying the PCLRA anomaly detection model.
Collapse
Affiliation(s)
- Won Hee Chung
- Artificial Intelligence Department, Sejong University, Seoul 05006, Republic of Korea;
| | - Yeong Hyeon Gu
- Artificial Intelligence Department, Sejong University, Seoul 05006, Republic of Korea;
| | - Seong Joon Yoo
- Computer Science and Engineering Department, Sejong University, Seoul 05006, Republic of Korea;
| |
Collapse
|
105
|
Tian Y, Zhang Z, Zhao B, Liu L, Liu X, Feng Y, Tian J, Kou D. Coarse-to-fine prior-guided attention network for multi-structure segmentation on dental panoramic radiographs. Phys Med Biol 2023; 68:215010. [PMID: 37816372 DOI: 10.1088/1361-6560/ad0218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 10/10/2023] [Indexed: 10/12/2023]
Abstract
Objective. Accurate segmentation of various anatomical structures from dental panoramic radiographs is essential for the diagnosis and treatment planning of various diseases in digital dentistry. In this paper, we propose a novel deep learning-based method for accurate and fully automatic segmentation of the maxillary sinus, mandibular condyle, mandibular nerve, alveolar bone and teeth on panoramic radiographs.Approach. A two-stage coarse-to-fine prior-guided segmentation framework is proposed to segment multiple structures on dental panoramic radiographs. In the coarse stage, a multi-label segmentation network is used to generate the coarse segmentation mask, and in the fine-tuning stage, a prior-guided attention network with an encoder-decoder architecture is proposed to precisely predict the mask of each anatomical structure. First, a prior-guided edge fusion module is incorporated into the network at the input of each convolution level of the encode path to generate edge-enhanced image feature maps. Second, a prior-guided spatial attention module is proposed to guide the network to extract relevant spatial features from foreground regions based on the combination of the prior information and the spatial attention mechanism. Finally, a prior-guided hybrid attention module is integrated at the bottleneck of the network to explore global context from both spatial and category perspectives.Main results. We evaluated the segmentation performance of our method on a testing dataset that contains 150 panoramic radiographs collected from real-world clinical scenarios. The segmentation results indicate that our proposed method achieves more accurate segmentation performance compared with state-of-the-art methods. The average Jaccard scores are 87.91%, 85.25%, 63.94%, 93.46% and 88.96% for the maxillary sinus, mandibular condyle, mandibular nerve, alveolar bone and teeth, respectively.Significance. The proposed method was able to accurately segment multiple structures on panoramic radiographs. This method has the potential to be part of the process of automatic pathology diagnosis from dental panoramic radiographs.
Collapse
Affiliation(s)
- Yuan Tian
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Zhejia Zhang
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Bailiang Zhao
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Lichao Liu
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Xiaolin Liu
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Yang Feng
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Jie Tian
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Dazhi Kou
- Shanghai Supercomputer Center. No. 585 Guoshoujing Road, Pudong New District, Shanghai, People's Republic of China
| |
Collapse
|
106
|
Cao R, Ning L, Zhou C, Wei P, Ding Y, Tan D, Zheng C. CFANet: Context Feature Fusion and Attention Mechanism Based Network for Small Target Segmentation in Medical Images. Sensors (Basel) 2023; 23:8739. [PMID: 37960438 PMCID: PMC10650041 DOI: 10.3390/s23218739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 11/15/2023]
Abstract
Medical image segmentation plays a crucial role in clinical diagnosis, treatment planning, and disease monitoring. The automatic segmentation method based on deep learning has developed rapidly, with segmentation results comparable to clinical experts for large objects, but the segmentation accuracy for small objects is still unsatisfactory. Current segmentation methods based on deep learning find it difficult to extract multiple scale features of medical images, leading to an insufficient detection capability for smaller objects. In this paper, we propose a context feature fusion and attention mechanism based network for small target segmentation in medical images called CFANet. CFANet is based on U-Net structure, including the encoder and the decoder, and incorporates two key modules, context feature fusion (CFF) and effective channel spatial attention (ECSA), in order to improve segmentation performance. The CFF module utilizes contextual information from different scales to enhance the representation of small targets. By fusing multi-scale features, the network captures local and global contextual cues, which are critical for accurate segmentation. The ECSA module further enhances the network's ability to capture long-range dependencies by incorporating attention mechanisms at the spatial and channel levels, which allows the network to focus on information-rich regions while suppressing irrelevant or noisy features. Extensive experiments are conducted on four challenging medical image datasets, namely ADAM, LUNA16, Thoracic OAR, and WORD. Experimental results show that CFANet outperforms state-of-the-art methods in terms of segmentation accuracy and robustness. The proposed method achieves excellent performance in segmenting small targets in medical images, demonstrating its potential in various clinical applications.
Collapse
Affiliation(s)
- Ruifen Cao
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (R.C.); (L.N.)
| | - Long Ning
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (R.C.); (L.N.)
| | - Chao Zhou
- Institute of Energy, Hefei Comprehensive National Science Center, Hefei 230031, China;
| | - Pijing Wei
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China;
| | - Yun Ding
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| | - Dayu Tan
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China;
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| |
Collapse
|
107
|
Zhang X, He L, Chen J, Wang B, Wang Y, Zhou Y. Multi attention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving. Sensors (Basel) 2023; 23:8732. [PMID: 37960432 PMCID: PMC10649988 DOI: 10.3390/s23218732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/11/2023] [Accepted: 10/19/2023] [Indexed: 11/15/2023]
Abstract
This paper proposes a multimodal fusion 3D target detection algorithm based on the attention mechanism to improve the performance of 3D target detection. The algorithm utilizes point cloud data and information from the camera. For image feature extraction, the ResNet50 + FPN architecture extracts features at four levels. Point cloud feature extraction employs the voxel method and FCN to extract point and voxel features. The fusion of image and point cloud features is achieved through regional point fusion and voxel fusion methods. After information fusion, the Coordinate and SimAM attention mechanisms extract fusion features at a deep level. The algorithm's performance is evaluated using the DAIR-V2X dataset. The results show that compared to the Part-A2 algorithm; the proposed algorithm improves the mAP value by 7.9% in the BEV view and 7.8% in the 3D view at IOU = 0.5 (cars) and IOU = 0.25 (pedestrians and cyclists). At IOU = 0.7 (cars) and IOU = 0.5 (pedestrians and cyclists), the mAP value of the SECOND algorithm is improved by 5.4% in the BEV view and 4.3% in the 3D view, compared to other comparison algorithms.
Collapse
Affiliation(s)
| | - Lei He
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China; (X.Z.); (J.C.); (B.W.); (Y.W.); (Y.Z.)
| | | | | | | | | |
Collapse
|
108
|
Zhang H, Hu Y, Yan M. Thermal Image Super-Resolution Based on Lightweight Dynamic Attention Network for Infrared Sensors. Sensors (Basel) 2023; 23:8717. [PMID: 37960417 PMCID: PMC10648050 DOI: 10.3390/s23218717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/22/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023]
Abstract
Infrared sensors capture infrared rays radiated by objects to form thermal images. They have a steady ability to penetrate smoke and fog, and are widely used in security monitoring, military, etc. However, civilian infrared detectors with lower resolution cannot compare with megapixel RGB camera sensors. In this paper, we propose a dynamic attention mechanism-based thermal image super-resolution network for infrared sensors. Specifically, the dynamic attention modules adaptively reweight the outputs of the attention and non-attention branches according to features at different depths of the network. The attention branch, which consists of channel- and pixel-wise attention blocks, is responsible for extracting the most informative features, while the non-attention branch is adopted as a supplement to extract the remaining ignored features. The dynamic weights block operates with 1D convolution instead of the full multi-layer perceptron on the global average pooled features, reducing parameters and enhancing information interaction between channels, and the same structure is adopted in the channel attention block. Qualitative and quantitative results on three testing datasets demonstrate that the proposed network can superior restore high-frequency details while improving the resolution of thermal images. And the lightweight structure of the proposed network with lower computing cost can be practically deployed on edge devices, effectively improving the imaging perception quality of infrared sensors.
Collapse
Affiliation(s)
| | - Yueli Hu
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China; (H.Z.); (M.Y.)
| | | |
Collapse
|
109
|
Meng W, Yuan Y. SGN-YOLO: Detecting Wood Defects with Improved YOLOv5 Based on Semi-Global Network. Sensors (Basel) 2023; 23:8705. [PMID: 37960405 PMCID: PMC10649724 DOI: 10.3390/s23218705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 11/15/2023]
Abstract
Object detection based on wood defects involves using bounding boxes to label defects in the surface image of the wood. This step is crucial before the transformation of wood products. Due to the small size and diverse shape of wood defects, most previous object detection models are unable to filter out critical features effectively. Consequently, they have faced challenges in generating adequate contextual information to detect defects accurately. In this paper, we proposed a YOLOv5 model based on a Semi-Global Network (SGN) to detect wood defects. Unlike previous models, firstly, a lightweight SGN is introduced in the backbone to model the global context, which can improve the accuracy and reduce the complexity of the network at the same time; the backbone is embedded with the Extended Efficient Layer Aggregation Network (E-ELAN), which continuously enhances the learning ability of the network; and finally, the Efficient Intersection and Merger (EIOU) loss is used to solve the problems of slow convergence speed and inaccurate regression results. Experimental results on public wood defect datasets demonstrated that our approach outperformed existing target detection models. The mAP value was 86.4%, a 3.1% improvement over the baseline network model, a 7.1% improvement over SSD, and a 13.6% improvement over Faster R-CNN. These results show the effectiveness of our proposed methodology.
Collapse
Affiliation(s)
- Wei Meng
- College of Information, Beijing Forestry University, Beijing 100083, China
- Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China;
| | - Yilin Yuan
- College of Information, Beijing Forestry University, Beijing 100083, China
- Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China;
| |
Collapse
|
110
|
Dong Y, Li X, Yang Y, Wang M, Gao B. A Synthesizing Semantic Characteristics Lung Nodules Classification Method Based on 3D Convolutional Neural Network. Bioengineering (Basel) 2023; 10:1245. [PMID: 38002369 PMCID: PMC10669569 DOI: 10.3390/bioengineering10111245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 09/30/2023] [Accepted: 10/11/2023] [Indexed: 11/26/2023] Open
Abstract
Early detection is crucial for the survival and recovery of lung cancer patients. Computer-aided diagnosis system can assist in the early diagnosis of lung cancer by providing decision support. While deep learning methods are increasingly being applied to tasks such as CAD (Computer-aided diagnosis system), these models lack interpretability. In this paper, we propose a convolutional neural network model that combines semantic characteristics (SCCNN) to predict whether a given pulmonary nodule is malignant. The model synthesizes the advantages of multi-view, multi-task and attention modules in order to fully simulate the actual diagnostic process of radiologists. The 3D (three dimensional) multi-view samples of lung nodules are extracted by spatial sampling method. Meanwhile, semantic characteristics commonly used in radiology reports are used as an auxiliary task and serve to explain how the model interprets. The introduction of the attention module in the feature fusion stage improves the classification of lung nodules as benign or malignant. Our experimental results using the LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative) show that this study achieves 95.45% accuracy and 97.26% ROC (Receiver Operating Characteristic) curve area. The results show that the method we proposed not only realize the classification of benign and malignant compared to standard 3D CNN approaches but can also be used to intuitively explain how the model makes predictions, which can assist clinical diagnosis.
Collapse
Affiliation(s)
| | - Xiaoqin Li
- Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (Y.D.); (Y.Y.); (M.W.); (B.G.)
| | | | | | | |
Collapse
|
111
|
Pan K, Hu H, Gu P. WD-YOLO: A More Accurate YOLO for Defect Detection in Weld X-ray Images. Sensors (Basel) 2023; 23:8677. [PMID: 37960377 PMCID: PMC10649023 DOI: 10.3390/s23218677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 10/18/2023] [Accepted: 10/19/2023] [Indexed: 11/15/2023]
Abstract
X-ray images are an important industrial non-destructive testing method. However, the contrast of some weld seam images is low, and the shapes and sizes of defects vary greatly, which makes it very difficult to detect defects in weld seams. In this paper, we propose a gray value curve enhancement (GCE) module and a model specifically designed for weld defect detection, namely WD-YOLO. The GCE module can improve image contrast to make detection easier. WD-YOLO adopts feature pyramid and path aggregation designs. In particular, we propose the NeXt backbone for extraction and fusion of image features. In the YOLO head, we added a dual attention mechanism to enable the model to better distinguish between foreground and background areas. Experimental results show that our model achieves a satisfactory balance between performance and accuracy. Our model achieved 92.6% mAP@0.5 with 98 frames per second.
Collapse
Affiliation(s)
| | - Haiyang Hu
- School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China; (K.P.); (P.G.)
| | | |
Collapse
|
112
|
Jin Z, Xing Z, Wang Y, Fang S, Gao X, Dong X. Research on Emotion Recognition Method of Cerebral Blood Oxygen Signal Based on CNN-Transformer Network. Sensors (Basel) 2023; 23:8643. [PMID: 37896736 PMCID: PMC10611153 DOI: 10.3390/s23208643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 09/20/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023]
Abstract
In recent years, research on emotion recognition has become more and more popular, but there are few studies on emotion recognition based on cerebral blood oxygen signals. Since the electroencephalogram (EEG) is easily disturbed by eye movement and the portability is not high, this study uses a more comfortable and convenient functional near-infrared spectroscopy (fNIRS) system to record brain signals from participants while watching three different types of video clips. During the experiment, the changes in cerebral blood oxygen concentration in the 8 channels of the prefrontal cortex of the brain were collected and analyzed. We processed and divided the collected cerebral blood oxygen data, and used multiple classifiers to realize the identification of the three emotional states of joy, neutrality, and sadness. Since the classification accuracy of the convolutional neural network (CNN) in this research is not significantly superior to that of the XGBoost algorithm, this paper proposes a CNN-Transformer network based on the characteristics of time series data to improve the classification accuracy of ternary emotions. The network first uses convolution operations to extract channel features from multi-channel time series, then the features and the output information of the fully connected layer are input to the Transformer netork structure, and its multi-head attention mechanism is used to focus on different channel domain information, which has better spatiality. The experimental results show that the CNN-Transformer network can achieve 86.7% classification accuracy for ternary emotions, which is about 5% higher than the accuracy of CNN, and this provides some help for other research in the field of emotion recognition based on time series data such as fNIRS.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiangmei Dong
- School of Optical-Electrical and Computer Engineer, University of Shanghai for Science and Technology, Shanghai 200093, China; (Z.J.); (Z.X.); (Y.W.) (S.F.); (X.G.)
| |
Collapse
|
113
|
Duan H, Wang H, Chen Y, Liu F, Tao L. EAMNet: an Alzheimer's disease prediction model based on representation learning. Phys Med Biol 2023; 68:215005. [PMID: 37774713 DOI: 10.1088/1361-6560/acfec8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 09/29/2023] [Indexed: 10/01/2023]
Abstract
Objective. Brain18F-FDG PET images indicate brain lesions' metabolic status and offer the predictive potential for Alzheimer's disease (AD). However, the complexity of extracting relevant lesion features and dealing with extraneous information in PET images poses challenges for accurate prediction.Approach. To address these issues, we propose an innovative solution called the efficient adaptive multiscale network (EAMNet) for predicting potential patient populations using positron emission tomography (PET) image slices, enabling effective intervention and treatment. Firstly, we introduce an efficient convolutional strategy to enhance the receptive field of PET images during the feature learning process, avoiding excessive extraction of fine tissue features by deep-level networks while reducing the model's computational complexity. Secondly, we construct a channel attention module that enables the prediction model to adaptively allocate weights between different channels, compensating for the spatial noise in PET images' impact on classification. Finally, we use skip connections to merge features from different-scale lesion information. Through visual analysis, the network constructed in this article aligns with the regions of interest of clinical doctors.Main results. Through visualization analysis, our network aligns with regions of interest identified by clinical doctors. Experimental evaluations conducted on the ADNI (Alzheimer's Disease Neuroimaging Initiative) dataset demonstrate the outstanding classification performance of our proposed method. The accuracy rates for AD versus NC (Normal Controls), AD versus MCI (Mild Cognitive Impairment), MCI versus NC, and AD versus MCI versus NC classifications achieve 97.66%, 96.32%, 95.23%, and 95.68%, respectively.Significance. The proposed method surpasses advanced algorithms in the field, providing a hopeful advancement in accurately predicting and classifying Alzheimer's Disease using18F-FDG PET images. The source code has been uploaded tohttps://github.com/Haoliang-D-AHU/EAMNet/tree/master.
Collapse
Affiliation(s)
- Haoliang Duan
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, Anhui University, Hefei, People's Republic of China
- School of Computer Science and Technology, Anhui University, Hefei, People's Republic of China
| | - Huabin Wang
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, Anhui University, Hefei, People's Republic of China
- School of Computer Science and Technology, Anhui University, Hefei, People's Republic of China
| | - Yonglin Chen
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, Anhui University, Hefei, People's Republic of China
- School of Computer Science and Technology, Anhui University, Hefei, People's Republic of China
| | - Fei Liu
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, Anhui University, Hefei, People's Republic of China
- School of Computer Science and Technology, Anhui University, Hefei, People's Republic of China
| | - Liang Tao
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, Anhui University, Hefei, People's Republic of China
- School of Computer Science and Technology, Anhui University, Hefei, People's Republic of China
| |
Collapse
|
114
|
Liu S, Zhou F, Tang S, Hu X, Wang C, Wang T. Dynamic Semi-Supervised Federated Learning Fault Diagnosis Method Based on an Attention Mechanism. Entropy (Basel) 2023; 25:1470. [PMID: 37895591 PMCID: PMC10606357 DOI: 10.3390/e25101470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/31/2023] [Accepted: 10/11/2023] [Indexed: 10/29/2023]
Abstract
In cases where a client suffers from completely unlabeled data, unsupervised learning has difficulty achieving an accurate fault diagnosis. Semi-supervised federated learning with the ability for interaction between a labeled client and an unlabeled client has been developed to overcome this difficulty. However, the existing semi-supervised federated learning methods may lead to a negative transfer problem since they fail to filter out unreliable model information from the unlabeled client. Therefore, in this study, a dynamic semi-supervised federated learning fault diagnosis method with an attention mechanism (SSFL-ATT) is proposed to prevent the federation model from experiencing negative transfer. A federation strategy driven by an attention mechanism was designed to filter out the unreliable information hidden in the local model. SSFL-ATT can ensure the federation model's performance as well as render the unlabeled client capable of fault classification. In cases where there is an unlabeled client, compared to the existing semi-supervised federated learning methods, SSFL-ATT can achieve increments of 9.06% and 12.53% in fault diagnosis accuracy when datasets provided by Case Western Reserve University and Shanghai Maritime University, respectively, are used for verification.
Collapse
Affiliation(s)
| | - Funa Zhou
- School of Logistic Engineering, Shanghai Maritime University, Shanghai 201306, China; (S.L.); (X.H.); (C.W.); (T.W.)
| | - Shanjie Tang
- School of Logistic Engineering, Shanghai Maritime University, Shanghai 201306, China; (S.L.); (X.H.); (C.W.); (T.W.)
| | | | | | | |
Collapse
|
115
|
Alshahrani H, Sharma G, Anand V, Gupta S, Sulaiman A, Elmagzoub MA, Reshan MSA, Shaikh A, Azar AT. An Intelligent Attention-Based Transfer Learning Model for Accurate Differentiation of Bone Marrow Stains to Diagnose Hematological Disorder. Life (Basel) 2023; 13:2091. [PMID: 37895472 PMCID: PMC10607952 DOI: 10.3390/life13102091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/17/2023] [Accepted: 10/19/2023] [Indexed: 10/29/2023] Open
Abstract
Bone marrow (BM) is an essential part of the hematopoietic system, which generates all of the body's blood cells and maintains the body's overall health and immune system. The classification of bone marrow cells is pivotal in both clinical and research settings because many hematological diseases, such as leukemia, myelodysplastic syndromes, and anemias, are diagnosed based on specific abnormalities in the number, type, or morphology of bone marrow cells. There is a requirement for developing a robust deep-learning algorithm to diagnose bone marrow cells to keep a close check on them. This study proposes a framework for categorizing bone marrow cells into seven classes. In the proposed framework, five transfer learning models-DenseNet121, EfficientNetB5, ResNet50, Xception, and MobileNetV2-are implemented into the bone marrow dataset to classify them into seven classes. The best-performing DenseNet121 model was fine-tuned by adding one batch-normalization layer, one dropout layer, and two dense layers. The proposed fine-tuned DenseNet121 model was optimized using several optimizers, such as AdaGrad, AdaDelta, Adamax, RMSprop, and SGD, along with different batch sizes of 16, 32, 64, and 128. The fine-tuned DenseNet121 model was integrated with an attention mechanism to improve its performance by allowing the model to focus on the most relevant features or regions of the image, which can be particularly beneficial in medical imaging, where certain regions might have critical diagnostic information. The proposed fine-tuned and integrated DenseNet121 achieved the highest accuracy, with a training success rate of 99.97% and a testing success rate of 97.01%. The key hyperparameters, such as batch size, number of epochs, and different optimizers, were all considered for optimizing these pre-trained models to select the best model. This study will help in medical research to effectively classify the BM cells to prevent diseases like leukemia.
Collapse
Affiliation(s)
- Hani Alshahrani
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran 66462, Saudi Arabia; (H.A.); (A.S.)
| | - Gunjan Sharma
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, India; (G.S.); (V.A.); (S.G.)
| | - Vatsala Anand
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, India; (G.S.); (V.A.); (S.G.)
| | - Sheifali Gupta
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, India; (G.S.); (V.A.); (S.G.)
| | - Adel Sulaiman
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran 66462, Saudi Arabia; (H.A.); (A.S.)
| | - M. A. Elmagzoub
- Department of Network and Communication Engineering, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia;
| | - Mana Saleh Al Reshan
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 66462, Saudi Arabia; (M.S.A.R.); (A.S.)
| | - Asadullah Shaikh
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 66462, Saudi Arabia; (M.S.A.R.); (A.S.)
| | - Ahmad Taher Azar
- College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
- Automated Systems and Soft Computing Lab (ASSCL), Prince Sultan University, Riyadh 11586, Saudi Arabia
| |
Collapse
|
116
|
Wang X, Lu R, Bi H, Li Y. An Infrared Small Target Detection Method Based on Attention Mechanism. Sensors (Basel) 2023; 23:8608. [PMID: 37896701 PMCID: PMC10610862 DOI: 10.3390/s23208608] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/22/2023] [Accepted: 08/25/2023] [Indexed: 10/29/2023]
Abstract
The human visual attention system plays an important role in infrared target recognition because it can quickly and accurately recognize infrared small targets and has good scene adaptability. This paper proposes an infrared small target detection method based on an attention mechanism, which consists of three modules: a bottom-up passive attention module, a top-down active attention module, and decision feedback equalization. In the top-down active attention module, given the Gaussian characteristics of infrared small targets, the idea of combining knowledge-experience Gaussian shape features is applied to implement feature extraction, and quaternion cosine transform is performed to achieve multi-dimensional fusion of Gaussian shape features, thereby achieving complementary fusion of multi-dimensional feature information. In the bottom-up passive attention module, considering that the difference in contrast and motion between the target and the background can attract attention easily, an optimal fast local contrast algorithm and improved circular pipeline filtering are adopted to find candidate target regions. Meanwhile, the multi-scale Laplacian of the Gaussian filter is adopted to estimate the optimal size of the infrared small target. The fast local contrast algorithm based on box filter acceleration and structure optimization is employed to extract local contrast features, and candidate target regions can be obtained by using an adaptive threshold. Besides, the mean gray, target size, Gaussian consistency, and circular region constraint are used in pipeline filtering to extract motion regions, and the false-alarm rate is reduced effectively. Finally, decision feedback equalization is adopted to obtain real targets. Experiments are conducted on some real infrared images involving complex backgrounds with sea, sky, and ground clutters, and the experimental results indicate that the proposed method can achieve better detection performance than conventional baseline methods, such as RLCM, ILCM, PQFT, MPCM, and ADMD. Also, mathematical proofs are provided to validate the proposed method.
Collapse
Affiliation(s)
- Xiaotian Wang
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China or (X.W.)
- School of Information and Communication Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Ruitao Lu
- Department of Missile Engineering, Rocket Force University of Engineering, Xi’an 710025, China
| | - Haixia Bi
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China or (X.W.)
| | - Yuhai Li
- National Key Laboratory of Electromagnetic Space Security, Tianjin 300308, China;
| |
Collapse
|
117
|
Tong L, Qian Y, Peng L, Wang C, Hou ZG. A learnable EEG channel selection method for MI-BCI using efficient channel attention. Front Neurosci 2023; 17:1276067. [PMID: 37928726 PMCID: PMC10622956 DOI: 10.3389/fnins.2023.1276067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Introduction During electroencephalography (EEG)-based motor imagery-brain-computer interfaces (MI-BCIs) task, a large number of electrodes are commonly used, and consume much computational resources. Therefore, channel selection is crucial while ensuring classification accuracy. Methods This paper proposes a channel selection method by integrating the efficient channel attention (ECA) module with a convolutional neural network (CNN). During model training process, the ECA module automatically assigns the channel weights by evaluating the relative importance for BCI classification accuracy of every channel. Then a ranking of EEG channel importance can be established so as to select an appropriate number of channels to form a channel subset from the ranking. In this paper, the ECA module is embedded into a commonly used network for MI, and comparative experiments are conducted on the BCI Competition IV dataset 2a. Results and discussion The proposed method achieved an average accuracy of 75.76% with all 22 channels and 69.52% with eight channels in a four-class classification task, outperforming other state-of-the-art EEG channel selection methods. The result demonstrates that the proposed method provides an effective channel selection approach for EEG-based MI-BCI.
Collapse
Affiliation(s)
- Lina Tong
- China University of Mining and Technology-Beijing, Beijing, China
| | - Yihui Qian
- China University of Mining and Technology-Beijing, Beijing, China
| | - Liang Peng
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Chen Wang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Zeng-Guang Hou
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- Chinese Academy of Sciences (CAS) Center for Excellence in Brain Science and Intelligence Technology, Beijing, China
| |
Collapse
|
118
|
Liu D, Zhang D, Wang L, Wang J. Semantic segmentation of autonomous driving scenes based on multi-scale adaptive attention mechanism. Front Neurosci 2023; 17:1291674. [PMID: 37928734 PMCID: PMC10620498 DOI: 10.3389/fnins.2023.1291674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 10/06/2023] [Indexed: 11/07/2023] Open
Abstract
Introduction Semantic segmentation is a crucial visual representation learning task for autonomous driving systems, as it enables the perception of surrounding objects and road conditions to ensure safe and efficient navigation. Methods In this paper, we present a novel semantic segmentation approach for autonomous driving scenes using a Multi-Scale Adaptive Mechanism (MSAAM). The proposed method addresses the challenges associated with complex driving environments, including large-scale variations, occlusions, and diverse object appearances. Our MSAAM integrates multiple scale features and adaptively selects the most relevant features for precise segmentation. We introduce a novel attention module that incorporates spatial, channel-wise and scale-wise attention mechanisms to effectively enhance the discriminative power of features. Results The experimental results of the model on key objectives in the Cityscapes dataset are: ClassAvg:81.13, mIoU:71.46. The experimental results on comprehensive evaluation metrics are: AUROC:98.79, AP:68.46, FPR95:5.72. The experimental results in terms of computational cost are: GFLOPs:2117.01, Infer. Time (ms):61.06. All experimental results data are superior to the comparative method model. Discussion The proposed method achieves superior performance compared to state-of-the-art techniques on several benchmark datasets demonstrating its efficacy in addressing the challenges of autonomous driving scene understanding.
Collapse
Affiliation(s)
- Danping Liu
- School of Advanced Manufacturing Engineering, Hefei University, Hefei, China
| | - Dong Zhang
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China
| | - Lei Wang
- School of Advanced Manufacturing Engineering, Hefei University, Hefei, China
| | - Jun Wang
- School of Advanced Manufacturing Engineering, Hefei University, Hefei, China
| |
Collapse
|
119
|
Cheng H, Li H. Identification of apple leaf disease via novel attention mechanism based convolutional neural network. Front Plant Sci 2023; 14:1274231. [PMID: 37920720 PMCID: PMC10619150 DOI: 10.3389/fpls.2023.1274231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 09/19/2023] [Indexed: 11/04/2023]
Abstract
Introduction The identification of apple leaf diseases is crucial for apple production. Methods To assist farmers in promptly recognizing leaf diseases in apple trees, we propose a novel attention mechanism. Building upon this mechanism and MobileNet v3, we introduce a new deep learning network. Results and discussion Applying this network to our carefully curated dataset, we achieved an impressive accuracy of 98.7% in identifying apple leaf diseases, surpassing similar models such as EfficientNet-B0, ResNet-34, and DenseNet-121. Furthermore, the precision, recall, and f1-score of our model also outperform these models, while maintaining the advantages of fewer parameters and less computational consumption of the MobileNet network. Therefore, our model has the potential in other similar application scenarios and has broad prospects.
Collapse
Affiliation(s)
| | - Heming Li
- School of Intelligence Engineering, Shandong Management University, Jinan, China
| |
Collapse
|
120
|
Jing B, Duan P, Chen L, Du Y. EM-YOLO: An X-ray Prohibited-Item-Detection Method Based on Edge and Material Information Fusion. Sensors (Basel) 2023; 23:8555. [PMID: 37896647 PMCID: PMC10610966 DOI: 10.3390/s23208555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/29/2023]
Abstract
Using X-ray imaging in security inspections is common for the detection of objects. X-ray security images have strong texture and RGB features as well as the characteristics of background clutter and object overlap, which makes X-ray imaging very different from other real-world imaging methods. To better detect prohibited items in security X-ray images with these characteristics, we propose EM-YOLOv7, which is composed of both an edge feature extractor (EFE) and a material feature extractor (MFE). We used the Soft-WIoU NMS method to solve the problem of object overlap. To better extract features, the attention mechanism CBAM was added to the backbone. According to the results of several experiments on the SIXray dataset, our EM-YOLOv7 method can better complete prohibited-item-detection tasks during security inspection with detection accuracy that is 4% and 0.9% higher than that of YOLOv5 and YOLOv7, respectively, and other SOTA models.
Collapse
Affiliation(s)
- Bing Jing
- School of Information and Network Security, People’s Public Security University of China, Beijing 102206, China;
| | - Pianzhang Duan
- School of Information Engineering, Shenyang University of Chemical Technology, Shenyang 110142, China;
| | - Lu Chen
- School of Vehicle and Mobility, Tsinghua University, Beijing 100190, China;
| | - Yanhui Du
- School of Information and Network Security, People’s Public Security University of China, Beijing 102206, China;
| |
Collapse
|
121
|
Li X, Fang L, Zhang L, Cao P. An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue. Sensors (Basel) 2023; 23:8501. [PMID: 37896594 PMCID: PMC10611118 DOI: 10.3390/s23208501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/26/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023]
Abstract
As globalization accelerates, the linguistic diversity and semantic complexity of in-vehicle communication is increasing. In order to meet the needs of different language speakers, this paper proposes an interactive attention-based contrastive learning framework (IABCL) for the field of in-vehicle dialogue, aiming to effectively enhance cross-lingual natural language understanding (NLU). The proposed framework aims to address the challenges of cross-lingual interaction in in-vehicle dialogue systems and provide an effective solution. IABCL is based on a contrastive learning and attention mechanism. First, contrastive learning is applied in the encoder stage. Positive and negative samples are used to allow the model to learn different linguistic expressions of similar meanings. Its main role is to improve the cross-lingual learning ability of the model. Second, the attention mechanism is applied in the decoder stage. By articulating slots and intents with each other, it allows the model to learn the relationship between the two, thus improving the ability of natural language understanding in languages of the same language family. In addition, this paper constructed a multilingual in-vehicle dialogue (MIvD) dataset for experimental evaluation to demonstrate the effectiveness and accuracy of the IABCL framework in cross-lingual dialogue. With the framework studied in this paper, IABCL improves by 2.42% in intent, 1.43% in slot, and 2.67% in overall when compared with the latest model.
Collapse
Affiliation(s)
| | | | | | - Pei Cao
- School of Artificial Intelligence and Big Data, Hefei University, Hefei 230061, China; (X.L.); (L.F.); (L.Z.)
| |
Collapse
|
122
|
Liu J, Lei X, Ji C, Pan Y. Fragment-pair based drug molecule solubility prediction through attention mechanism. Front Pharmacol 2023; 14:1255181. [PMID: 37881183 PMCID: PMC10595153 DOI: 10.3389/fphar.2023.1255181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/26/2023] [Indexed: 10/27/2023] Open
Abstract
The purpose of drug discovery is to identify new drugs, and the solubility of drug molecules is an important physicochemical property in medicinal chemistry, that plays a crucial role in drug discovery. In solubility prediction, high-precision computational methods can significantly reduce the experimental costs and time associated with drug development. Therefore, artificial intelligence technologies have been widely used for solubility prediction. This study utilized the attention layer in mechanism in the deep learning model to consider the atomic-level features of the molecules, and used gated recurrent neural networks to aggregate vectors between layers. It also utilized molecular fragment technology to divide the complete molecule into pairs of fragments, extracted characteristics from each fragment pair, and finally fused the characteristics to predict the solubility of drug molecules. We compared and evaluated our method with five existing models using two performance evaluation indicators, demonstrating that our method has better performance and greater robustness.
Collapse
Affiliation(s)
- Jianping Liu
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Chunyan Ji
- Computer Science Department, BNU-HKBU United International College, Zhuhai, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Shenzhen, China
| |
Collapse
|
123
|
Liu J, Wang X. Tomato disease object detection method combining prior knowledge attention mechanism and multiscale features. Front Plant Sci 2023; 14:1255119. [PMID: 37877077 PMCID: PMC10590886 DOI: 10.3389/fpls.2023.1255119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/21/2023] [Indexed: 10/26/2023]
Abstract
To address the challenges of insufficient accuracy in detecting tomato disease object detection caused by dense target distributions, large-scale variations, and poor feature information of small objects in complex backgrounds, this study proposes the tomato disease object detection method that integrates prior knowledge attention mechanism and multi-scale features (PKAMMF). Firstly, the visual features of tomato disease images are fused with prior knowledge through the prior knowledge attention mechanism to obtain enhanced visual features corresponding to tomato diseases. Secondly, a new feature fusion layer is constructed in the Neck section to reduce feature loss. Furthermore, a specialized prediction layer specifically designed to improve the model's ability to detect small targets is incorporated. Finally, a new loss function known as A-SIOU (Adaptive Structured IoU) is employed to optimize the performance of the model in terms of bounding box regression. The experimental results on the self-built tomato disease dataset demonstrate the effectiveness of the proposed approach, and it achieves a mean average precision (mAP) of 91.96%, which is a 3.86% improvement compared to baseline methods. The results show significant improvements in the detection performance of multi-scale tomato disease objects.
Collapse
Affiliation(s)
- Jun Liu
- Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China
| | - Xuewei Wang
- Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China
| |
Collapse
|
124
|
Tian Y, Tian X. A New Lunar Dome Detection Method Based on Improved YOLOv7. Sensors (Basel) 2023; 23:8304. [PMID: 37837134 PMCID: PMC10575308 DOI: 10.3390/s23198304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 09/30/2023] [Accepted: 10/04/2023] [Indexed: 10/15/2023]
Abstract
Volcanism is an important geological evolutionary process on the Moon. The study of lunar volcanic features is of great significance and value to understanding the geological evolution of the Moon better. Lunar domes are one of the essential volcanic features of the Moon. However, the existing lunar dome detection methods are still traditional manual or semiautomatic identification approaches that require extensive prior knowledge and have a complex identification process. Therefore, this paper proposes an automatic detection method based on improved YOLOv7 for lunar dome detection. First, a new lunar dome dataset was created by digital elevation model (DEM) data, and the effective squeeze and excitation (ESE) attention mechanism module was added to the backbone and neck sections to reduce information loss in the feature map and enhance network expressiveness. Then, a new SPPCSPC-RFE module was proposed by adding the receptive field enhancement (RFE) module into the neck section, which can adapt to dome feature maps of different shapes and sizes. Finally, the bounding box regression loss function complete IOU (CIOU) was replaced by wise IOU (WIOU). The WIOU loss function improved the model's performance for the dome detection effect. Furthermore, this study combined several data enhancement strategies to improve the robustness of the network. To evaluate the performance of the proposed model, we conducted several experiments using the dome dataset developed in this study. The experimental results indicate that the improved method outperforms related methods with a mean average precision (mAP@0.5) value of 88.7%, precision (P) value of 85.6%, and recall (R) value of 86.4%. This study provides an effective solution for lunar dome detection.
Collapse
Affiliation(s)
| | - Xiaolin Tian
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa 999078, Macau
| |
Collapse
|
125
|
Xie T, Yin M, Zhu X, Sun J, Meng C, Bei S. A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention. Sensors (Basel) 2023; 23:8285. [PMID: 37837115 PMCID: PMC10575396 DOI: 10.3390/s23198285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/22/2023] [Accepted: 10/05/2023] [Indexed: 10/15/2023]
Abstract
Lane detection is a vital component of intelligent driving systems, offering indispensable functionality to keep the vehicle within its designated lane, thereby reducing the risk of lane departure. However, the complexity of the traffic environment, coupled with the rapid movement of vehicles, creates many challenges for detection tasks. Current lane detection methods suffer from issues such as low feature extraction capability, poor real-time detection, and inadequate robustness. Addressing these issues, this paper proposes a lane detection algorithm that combines an online re-parameterization ResNet with a hybrid attention mechanism. Firstly, we replaced standard convolution with online re-parameterization convolution, simplifying the convolutional operations during the inference phase and subsequently reducing the detection time. In an effort to enhance the performance of the model, a hybrid attention module is incorporated to enhance the ability to focus on elongated targets. Finally, a row anchor lane detection method is introduced to analyze the existence and location of lane lines row by row in the image and output the predicted lane positions. The experimental outcomes illustrate that the model achieves F1 scores of 96.84% and 75.60% on the publicly available TuSimple and CULane lane datasets, respectively. Moreover, the inference speed reaches a notable 304 frames per second (FPS). The overall performance outperforms other detection models and fulfills the requirements of real-time responsiveness and robustness for lane detection tasks.
Collapse
Affiliation(s)
| | - Mingfeng Yin
- School of Automible and Traffic Engineering, Jiangsu University of Technology, Changzhou 213001, China; (T.X.); (X.Z.); (J.S.); (C.M.); (S.B.)
| | | | | | | | | |
Collapse
|
126
|
Bai T, Zhou S, Pang Y, Luo J, Wang H, Du Y. An image caption model based on attention mechanism and deep reinforcement learning. Front Neurosci 2023; 17:1270850. [PMID: 37869519 PMCID: PMC10585027 DOI: 10.3389/fnins.2023.1270850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 09/04/2023] [Indexed: 10/24/2023] Open
Abstract
Image caption technology aims to convert visual features of images, extracted by computers, into meaningful semantic information. Therefore, the computers can generate text descriptions that resemble human perception, enabling tasks such as image classification, retrieval, and analysis. In recent years, the performance of image caption has been significantly enhanced with the introduction of encoder-decoder architecture in machine translation and the utilization of deep neural networks. However, several challenges still persist in this domain. Therefore, this paper proposes a novel method to address the issue of visual information loss and non-dynamic adjustment of input images during decoding. We introduce a guided decoding network that establishes a connection between the encoding and decoding parts. Through this connection, encoding information can provide guidance to the decoding process, facilitating automatic adjustment of the decoding information. In addition, Dense Convolutional Network (DenseNet) and Multiple Instance Learning (MIL) are adopted in the image encoder, and Nested Long Short-Term Memory (NLSTM) is utilized as the decoder to enhance the extraction and parsing capability of image information during the encoding and decoding process. In order to further improve the performance of our image caption model, this study incorporates an attention mechanism to focus details and constructs a double-layer decoding structure, which facilitates the enhancement of the model in terms of providing more detailed descriptions and enriched semantic information. Furthermore, the Deep Reinforcement Learning (DRL) method is employed to train the model by directly optimizing the identical set of evaluation indexes, which solves the problem of inconsistent training and evaluation standards. Finally, the model is trained and tested on MS COCO and Flickr 30 k datasets, and the results show that the model has improved compared with commonly used models in the evaluation indicators such as BLEU, METEOR and CIDEr.
Collapse
Affiliation(s)
- Tong Bai
- School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Sen Zhou
- Chongqing Academy of Metrology and Quality Inspection, Chongqing, China
| | - Yu Pang
- School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Jiasai Luo
- School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Huiqian Wang
- School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Ya Du
- Department of Peripheral Vascular (Wound Repair), Chongqing Hospital of Traditional Chinese Medicine, Chongqing, China
| |
Collapse
|
127
|
Shen L, Wang Q, Zhang Y, Qin F, Jin H, Zhao W. DSKCA-UNet: Dynamic selective kernel channel attention for medical image segmentation. Medicine (Baltimore) 2023; 102:e35328. [PMID: 37773842 PMCID: PMC10545043 DOI: 10.1097/md.0000000000035328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/31/2023] [Indexed: 10/01/2023] Open
Abstract
U-Net has attained immense popularity owing to its performance in medical image segmentation. However, it cannot be modeled explicitly over remote dependencies. By contrast, the transformer can effectively capture remote dependencies by leveraging the self-attention (SA) of the encoder. Although SA, an important characteristic of the transformer, can find correlations between them based on the original data, secondary computational complexity might retard the processing rate of high-dimensional data (such as medical images). Furthermore, SA is limited because the correlation between samples is overlooked; thus, there is considerable scope for improvement. To this end, based on Swin-UNet, we introduce a dynamic selective attention mechanism for the convolution kernels. The weight of each convolution kernel is calculated to fuse the results dynamically. This attention mechanism permits each neuron to adaptively modify its receptive field size in response to multiscale input information. A local cross-channel interaction strategy without dimensionality reduction was introduced, which effectively eliminated the influence of downscaling on learning channel attention. Through suitable cross-channel interactions, model complexity can be significantly reduced while maintaining its performance. Subsequently, the global interaction between the encoder features is used to extract more fine-grained features. Simultaneously, the mixed loss function of the weighted cross-entropy loss and Dice loss is used to alleviate category imbalances and achieve better results when the sample number is unbalanced. We evaluated our proposed method on abdominal multiorgan segmentation and cardiac segmentation datasets, achieving Dice similarity coefficient and 95% Hausdorff distance metrics of 80.30 and 14.55%, respectively, on the Synapse dataset and Dice similarity coefficient metrics of 90.80 on the ACDC dataset. The experimental results show that our proposed method has good generalization ability and robustness, and it is a powerful tool for medical image segmentation.
Collapse
Affiliation(s)
- Longfeng Shen
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Qiong Wang
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Yingjie Zhang
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Fenglan Qin
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Hengjun Jin
- People’s Hospital of Huaibei City, Huaibei, China
| | - Wei Zhao
- People’s Hospital of Huaibei City, Huaibei, China
| |
Collapse
|
128
|
Quan Z, Wu B, Luo L. An Image Stereo Matching Algorithm with Multi-Spectral Attention Mechanism. Sensors (Basel) 2023; 23:8179. [PMID: 37837009 PMCID: PMC10574877 DOI: 10.3390/s23198179] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/14/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023]
Abstract
With the advancement of artificial intelligence technology and computer hardware, the stereo matching algorithm has been widely researched and applied in the field of image processing. In scenarios such as robot navigation and autonomous driving, stereo matching algorithms are used to assist robots in acquiring depth information about the surrounding environment, thereby improving the robot's ability for autonomous navigation during self-driving. In this paper, we address the issue of low matching accuracy of stereo matching algorithms in specular regions of images and propose a multi-attention-based stereo matching algorithm called MANet. The proposed algorithm embeds a multi-spectral attention module into the residual feature-extraction network of the PSMNet algorithm. It utilizes different 2D discrete cosine transforms to extract frequency-specific feature information, providing rich and effective features for cost computation in matching. The pyramid pooling module incorporates a coordinated attention mechanism, which not only maintains long-range dependencies with directional awareness but also captures more positional information during the pooling process, thereby enhancing the network's representational capacity. The MANet algorithm was evaluated on three major benchmark datasets, namely, SceneFlow, KITTI2015, and KITTI2012, and compared with relevant algorithms. Experimental results demonstrated that the MANet algorithm achieved higher accuracy in predicting disparities and exhibited stronger robustness against specular reflections, enabling more accurate disparity prediction in specular regions.
Collapse
Affiliation(s)
- Zhenhua Quan
- Institute of Electronic Engineering, China Academy of Engineering Physics, Mianyang 621900, China
- School of Information Engineering, Southwest University of Science and Technology, Mianyang 621000, China; (B.W.); (L.L.)
| | - Bin Wu
- School of Information Engineering, Southwest University of Science and Technology, Mianyang 621000, China; (B.W.); (L.L.)
| | - Liang Luo
- School of Information Engineering, Southwest University of Science and Technology, Mianyang 621000, China; (B.W.); (L.L.)
| |
Collapse
|
129
|
Ma J, Yuan G, Guo C, Gang X, Zheng M. SW-UNet: a U-Net fusing sliding window transformer block with CNN for segmentation of lung nodules. Front Med (Lausanne) 2023; 10:1273441. [PMID: 37841008 PMCID: PMC10569032 DOI: 10.3389/fmed.2023.1273441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 09/12/2023] [Indexed: 10/17/2023] Open
Abstract
Medical images are information carriers that visually reflect and record the anatomical structure of the human body, and play an important role in clinical diagnosis, teaching and research, etc. Modern medicine has become increasingly inseparable from the intelligent processing of medical images. In recent years, there have been more and more attempts to apply deep learning theory to medical image segmentation tasks, and it is imperative to explore a simple and efficient deep learning algorithm for medical image segmentation. In this paper, we investigate the segmentation of lung nodule images. We address the above-mentioned problems of medical image segmentation algorithms and conduct research on medical image fusion algorithms based on a hybrid channel-space attention mechanism and medical image segmentation algorithms with a hybrid architecture of Convolutional Neural Networks (CNN) and Visual Transformer. To the problem that medical image segmentation algorithms are difficult to capture long-range feature dependencies, this paper proposes a medical image segmentation model SW-UNet based on a hybrid CNN and Vision Transformer (ViT) framework. Self-attention mechanism and sliding window design of Visual Transformer are used to capture global feature associations and break the perceptual field limitation of convolutional operations due to inductive bias. At the same time, a widened self-attentive vector is used to streamline the number of modules and compress the model size so as to fit the characteristics of a small amount of medical data, which makes the model easy to be overfitted. Experiments on the LUNA16 lung nodule image dataset validate the algorithm and show that the proposed network can achieve efficient medical image segmentation on a lightweight scale. In addition, to validate the migratability of the model, we performed additional validation on other tumor datasets with desirable results. Our research addresses the crucial need for improved medical image segmentation algorithms. By introducing the SW-UNet model, which combines CNN and ViT, we successfully capture long-range feature dependencies and break the perceptual field limitations of traditional convolutional operations. This approach not only enhances the efficiency of medical image segmentation but also maintains model scalability and adaptability to small medical datasets. The positive outcomes on various tumor datasets emphasize the potential migratability and broad applicability of our proposed model in the field of medical image analysis.
Collapse
Affiliation(s)
- Jiajun Ma
- Shenhua Hollysys Information Technology Co., Ltd., Beijing, China
| | - Gang Yuan
- The First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Chenhua Guo
- School of Software, North University of China, Taiyuan, China
| | | | - Minting Zheng
- The First Affiliated Hospital of Dalian Medical University, Dalian, China
| |
Collapse
|
130
|
Liu B, Ge R, Zhu Y, Zhang B, Zhang X, Bao Y. IDAF: Iterative Dual-Scale Attentional Fusion Network for Automatic Modulation Recognition. Sensors (Basel) 2023; 23:8134. [PMID: 37836964 PMCID: PMC10575420 DOI: 10.3390/s23198134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 10/15/2023]
Abstract
Recently, deep learning models have been widely applied to modulation recognition, and they have become a hot topic due to their excellent end-to-end learning capabilities. However, current methods are mostly based on uni-modal inputs, which suffer from incomplete information and local optimization. To complement the advantages of different modalities, we focus on the multi-modal fusion method. Therefore, we introduce an iterative dual-scale attentional fusion (iDAF) method to integrate multimodal data. Firstly, two feature maps with different receptive field sizes are constructed using local and global embedding layers. Secondly, the feature inputs are iterated into the iterative dual-channel attention module (iDCAM), where the two branches capture the details of high-level features and the global weights of each modal channel, respectively. The iDAF not only extracts the recognition characteristics of each of the specific domains, but also complements the strengths of different modalities to obtain a fruitful view. Our iDAF achieves a recognition accuracy of 93.5% at 10 dB and 0.6232 at full signal-to-noise ratio (SNR). The comparative experiments and ablation studies effectively demonstrate the effectiveness and superiority of the iDAF.
Collapse
Affiliation(s)
- Bohan Liu
- Institute of Systems Engineering, Academy of Military Science of the People’s Liberation Army, Beijing 100083, China
| | - Ruixing Ge
- Institute of Systems Engineering, Academy of Military Science of the People’s Liberation Army, Beijing 100083, China
| | - Yuxuan Zhu
- Institute of Systems Engineering, Academy of Military Science of the People’s Liberation Army, Beijing 100083, China
| | - Bolin Zhang
- National Key Laboratory of Science and Technology on Communication, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiaokai Zhang
- College of Communications and Engineering, Army Engineering University of PLA, Nanjing 210007, China
| | - Yanfei Bao
- Institute of Systems Engineering, Academy of Military Science of the People’s Liberation Army, Beijing 100083, China
| |
Collapse
|
131
|
Zhang D, Chen C, Tan F, Qian B, Li W, He X, Lei S. Multi-view and multi-scale behavior recognition algorithm based on attention mechanism. Front Neurorobot 2023; 17:1276208. [PMID: 37822532 PMCID: PMC10562555 DOI: 10.3389/fnbot.2023.1276208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open
Abstract
Human behavior recognition plays a crucial role in the field of smart education. It offers a nuanced understanding of teaching and learning dynamics by revealing the behaviors of both teachers and students. In this study, to address the exigencies of teaching behavior analysis in smart education, we first constructed a teaching behavior analysis dataset called EuClass. EuClass contains 13 types of teacher/student behavior categories and provides multi-view, multi-scale video data for the research and practical applications of teacher/student behavior recognition. We also provide a teaching behavior analysis network containing an attention-based network and an intra-class differential representation learning module. The attention mechanism uses a two-level attention module encompassing spatial and channel dimensions. The intra-class differential representation learning module utilized a unified loss function to reduce the distance between features. Experiments conducted on the EuClass dataset and a widely used action/gesture recognition dataset, IsoGD, demonstrate the effectiveness of our method in comparison to current state-of-the-art methods, with the recognition accuracy increased by 1-2% on average.
Collapse
Affiliation(s)
- Di Zhang
- Department of Telecommunications, Xi'an Jiaotong University, Xi'an, China
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| | - Chen Chen
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| | - Fa Tan
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| | - Beibei Qian
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| | - Wei Li
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| | - Xuan He
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| | - Susan Lei
- School of Information Engineering, Xi'an Eurasia University, Xi'an, China
| |
Collapse
|
132
|
Jiang M, Zhang L, Wang X, Li S, Jiao Y. 6D Object Pose Estimation Based on Cross-Modality Feature Fusion. Sensors (Basel) 2023; 23:8088. [PMID: 37836919 PMCID: PMC10575350 DOI: 10.3390/s23198088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/02/2023] [Accepted: 09/04/2023] [Indexed: 10/15/2023]
Abstract
The 6D pose estimation using RGBD images plays a pivotal role in robotics applications. At present, after obtaining the RGB and depth modality information, most methods directly concatenate them without considering information interactions. This leads to the low accuracy of 6D pose estimation in occlusion and illumination changes. To solve this problem, we propose a new method to fuse RGB and depth modality features. Our method effectively uses individual information contained within each RGBD image modality and fully integrates cross-modality interactive information. Specifically, we transform depth images into point clouds, applying the PointNet++ network to extract point cloud features; RGB image features are extracted by CNNs and attention mechanisms are added to obtain context information within the single modality; then, we propose a cross-modality feature fusion module (CFFM) to obtain the cross-modality information, and introduce a feature contribution weight training module (CWTM) to allocate the different contributions of the two modalities to the target task. Finally, the result of 6D object pose estimation is obtained by the final cross-modality fusion feature. By enabling information interactions within and between modalities, the integration of the two modalities is maximized. Furthermore, considering the contribution of each modality enhances the overall robustness of the model. Our experiments indicate that the accuracy rate of our method on the LineMOD dataset can reach 96.9%, on average, using the ADD (-S) metric, while on the YCB-Video dataset, it can reach 94.7% using the ADD-S AUC metric and 96.5% using the ADD-S score (<2 cm) metric.
Collapse
Affiliation(s)
| | | | - Xiaohua Wang
- School of Electronic Information, Xi’an Polytechnic University, Xi’an 710048, China; (M.J.); (L.Z.); (S.L.); (Y.J.)
| | | | | |
Collapse
|
133
|
Wang S, Wang T, Wang S, Fang Z, Huang J, Zhou Z. MLAM: Multi-Layer Attention Module for Radar Extrapolation Based on Spatiotemporal Sequence Neural Network. Sensors (Basel) 2023; 23:8065. [PMID: 37836895 PMCID: PMC10575230 DOI: 10.3390/s23198065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/11/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023]
Abstract
Precipitation nowcasting is mainly achieved by the radar echo extrapolation method. Due to the timing characteristics of radar echo extrapolation, convolutional recurrent neural networks (ConvRNNs) have been used to solve the task. Most ConvRNNs have been proven to perform far better than traditional optical flow methods, but they still have fatal problems. These models lack differentiation in the prediction of echoes of different intensities, which leads to the omission of responses from regions with high intensities. Moreover, because it is difficult for these models to capture long-term feature dependencies among multiple echo maps, the extrapolation effect declines sharply over time. This paper proposes an embedded multi-layer attention module (MLAM) to address the shortcomings of ConvRNNs. Specifically, an MLAM mainly enhances attention to critical regions in echo images and the processing of long-term spatiotemporal features through the interaction between input and memory features in the current moment. Comprehensive experiments were conducted on the radar dataset HKO-7 provided by the Hong Kong Observatory and the radar dataset HMB provided by the Hunan Meteorological Bureau. Experiments show that ConvRNNs embedded with MLAMs achieve more advanced results than standard ConvRNNs.
Collapse
Affiliation(s)
- Shengchun Wang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China; (S.W.); (T.W.); (S.W.); (Z.F.)
| | - Tianyang Wang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China; (S.W.); (T.W.); (S.W.); (Z.F.)
| | - Sihong Wang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China; (S.W.); (T.W.); (S.W.); (Z.F.)
| | - Zixiong Fang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China; (S.W.); (T.W.); (S.W.); (Z.F.)
| | - Jingui Huang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China; (S.W.); (T.W.); (S.W.); (Z.F.)
| | - Zuxi Zhou
- College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
| |
Collapse
|
134
|
Wu X, Wang G, Shen N. Research on obstacle avoidance optimization and path planning of autonomous vehicles based on attention mechanism combined with multimodal information decision-making thoughts of robots. Front Neurorobot 2023; 17:1269447. [PMID: 37811356 PMCID: PMC10556461 DOI: 10.3389/fnbot.2023.1269447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 08/28/2023] [Indexed: 10/10/2023] Open
Abstract
With the development of machine perception and multimodal information decision-making techniques, autonomous driving technology has become a crucial area of advancement in the transportation industry. The optimization of vehicle navigation, path planning, and obstacle avoidance tasks is of paramount importance. In this study, we explore the use of attention mechanisms in a end-to-end architecture for optimizing obstacle avoidance and path planning in autonomous driving vehicles. We position our research within the broader context of robotics, emphasizing the fusion of information and decision-making capabilities. The introduction of attention mechanisms enables vehicles to perceive the environment more accurately by focusing on important information and making informed decisions in complex scenarios. By inputting multimodal information, such as images and LiDAR data, into the attention mechanism module, the system can automatically learn and weigh crucial environmental features, thereby placing greater emphasis on key information during obstacle avoidance decisions. Additionally, we leverage the end-to-end architecture and draw from classical theories and algorithms in the field of robotics to enhance the perception and decision-making abilities of autonomous driving vehicles. Furthermore, we address the optimization of path planning using attention mechanisms. We transform the vehicle's navigation task into a sequential decision-making problem and employ LSTM (Long Short-Term Memory) models to handle dynamic navigation in varying environments. By applying attention mechanisms to weigh key points along the navigation path, the vehicle can flexibly select the optimal route and dynamically adjust it based on real-time conditions. Finally, we conducted extensive experimental evaluations and software experiments on the proposed end-to-end architecture on real road datasets. The method effectively avoids obstacles, adheres to traffic rules, and achieves stable, safe, and efficient autonomous driving in diverse road scenarios. This research provides an effective solution for optimizing obstacle avoidance and path planning in the field of autonomous driving. Moreover, it contributes to the advancement and practical applications of multimodal information fusion in navigation, localization, and human-robot interaction.
Collapse
Affiliation(s)
- Xuejin Wu
- College of Transport and Communications, Shanghai Maritime University, Shanghai, China
| | - Guangming Wang
- School of Management, Wuhan University of Technology, Wuhan, China
- School of Politics and Public Administration, Zhengzhou University, Zhengzhou, China
| | - Nachuan Shen
- Chinese Academy of Fiscal Science, Beijing, China
| |
Collapse
|
135
|
Zhao Z, Xue X, Mariam I, Zhou X. Integrating Target and Shadow Features for SAR Target Recognition. Sensors (Basel) 2023; 23:8031. [PMID: 37836861 PMCID: PMC10575260 DOI: 10.3390/s23198031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023]
Abstract
Synthetic aperture radar (SAR) sensor often produces a shadow in pairs with the target due to its slant-viewing imaging. As a result, shadows in SAR images can provide critical discriminative features for classifiers, such as target contours and relative positions. However, shadows possess unique properties that differ from targets, such as low intensity and sensitivity to depression angles, making it challenging to extract depth features from shadows directly using convolutional neural networks (CNN). In this paper, we propose a new SAR image-classification framework to utilize target and shadow information comprehensively. First, we design a SAR image segmentation method to extract target regions and shadow masks. Second, based on SAR projection geometry, we propose a data-augmentation method to compensate for the geometric distortion of shadows due to differences in depression angles. Finally, we introduce a feature-enhancement module (FEM) based on depthwise separable convolution (DSC) and convolutional block attention module (CBAM), enabling deep networks to fuse target and shadow features adaptively. The experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset show that when only using target and shadow information, the published deep-learning models can still achieve state-of-the-art performance after embedding the FEM.
Collapse
Affiliation(s)
| | - Xiaorong Xue
- School of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
| | | | | |
Collapse
|
136
|
Gan Y, Liu W, Xu G, Yan C, Zou G. DMFDDI: deep multimodal fusion for drug-drug interaction prediction. Brief Bioinform 2023; 24:bbad397. [PMID: 37930025 DOI: 10.1093/bib/bbad397] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/28/2023] [Accepted: 10/13/2023] [Indexed: 11/07/2023] Open
Abstract
Drug combination therapy has gradually become a promising treatment strategy for complex or co-existing diseases. As drug-drug interactions (DDIs) may cause unexpected adverse drug reactions, DDI prediction is an important task in pharmacology and clinical applications. Recently, researchers have proposed several deep learning methods to predict DDIs. However, these methods mainly exploit the chemical or biological features of drugs, which is insufficient and limits the performances of DDI prediction. Here, we propose a new deep multimodal feature fusion framework for DDI prediction, DMFDDI, which fuses drug molecular graph, DDI network and the biochemical similarity features of drugs to predict DDIs. To fully extract drug molecular structure, we introduce an attention-gated graph neural network for capturing the global features of the molecular graph and the local features of each atom. A sparse graph convolution network is introduced to learn the topological structure information of the DDI network. In the multimodal feature fusion module, an attention mechanism is used to efficiently fuse different features. To validate the performance of DMFDDI, we compare it with 10 state-of-the-art methods. The comparison results demonstrate that DMFDDI achieves better performance in DDI prediction. Our method DMFDDI is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDEBLab/DMFDDI.git.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, 2999 North Renmin Road, 201600, Shanghai, China
| | - Wenxiao Liu
- School of Computer Science and Technology, Donghua University, 2999 North Renmin Road, 201600, Shanghai, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, 2999 North Renmin Road, 201600, Shanghai, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, 2999 North Renmin Road, 201600, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, 99 Shangda Road, 200444, Shanghai, China
| |
Collapse
|
137
|
He Y, Wang X, Yang Z, Xue L, Chen Y, Ji J, Wan F, Mukhopadhyay SC, Men L, Tong MCF, Li G, Chen S. Classification of attention deficit/hyperactivity disorder based on EEG signals using a EEG-Transformer model ∗. J Neural Eng 2023; 20:056013. [PMID: 37683665 DOI: 10.1088/1741-2552/acf7f5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 09/08/2023] [Indexed: 09/10/2023]
Abstract
Objective. Attention-deficit/hyperactivity disorder (ADHD) is the most common neurodevelopmental disorder in adolescents that can seriously impair a person's attention function, cognitive processes, and learning ability. Currently, clinicians primarily diagnose patients based on the subjective assessments of the Diagnostic and Statistical Manual of Mental Disorders-5, which can lead to delayed diagnosis of ADHD and even misdiagnosis due to low diagnostic efficiency and lack of well-trained diagnostic experts. Deep learning of electroencephalogram (EEG) signals recorded from ADHD patients could provide an objective and accurate method to assist physicians in clinical diagnosis.Approach. This paper proposes the EEG-Transformer deep learning model, which is based on the attention mechanism in the traditional Transformer model, and can perform feature extraction and signal classification processing for the characteristics of EEG signals. A comprehensive comparison was made between the proposed transformer model and three existing convolutional neural network models.Main results. The results showed that the proposed EEG-Transformer model achieved an average accuracy of 95.85% and an average AUC value of 0.9926 with the fastest convergence speed, outperforming the other three models. The function and relationship of each module of the model are studied by ablation experiments. The model with optimal performance was identified by the optimization experiment.Significance. The EEG-Transformer model proposed in this paper can be used as an auxiliary tool for clinical diagnosis of ADHD, and at the same time provides a basic model for transferable learning in the field of EEG signal classification.
Collapse
Affiliation(s)
- Yuchao He
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen, Guangdong 518055, People's Republic of China
| | - Xin Wang
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen, Guangdong 518055, People's Republic of China
| | - Zijian Yang
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen, Guangdong 518055, People's Republic of China
| | - Lingbin Xue
- Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China 000000, People's Republic of China
| | - Yuming Chen
- School of Psychology, Shenzhen University, Shenzhen 518060, People's Republic of China
| | - Junyu Ji
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen, Guangdong 518055, People's Republic of China
| | - Feng Wan
- Faculty of Science and Technology, University of Macau, Macau 999078, People's Republic of China
| | | | - Lina Men
- Department of Neonatology, Shenzhen Children's Hospital, Shenzhen 518034, People's Republic of China
| | - Michael Chi Fai Tong
- Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China 000000, People's Republic of China
| | - Guanglin Li
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen, Guangdong 518055, People's Republic of China
| | - Shixiong Chen
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, People's Republic of China
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen, Guangdong 518055, People's Republic of China
| |
Collapse
|
138
|
Liu R, Wang Z, Qiu J, Wang X. Assigning channel weights using an attention mechanism: an EEG interpolation algorithm. Front Neurosci 2023; 17:1251677. [PMID: 37811329 PMCID: PMC10552919 DOI: 10.3389/fnins.2023.1251677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 09/06/2023] [Indexed: 10/10/2023] Open
Abstract
During the acquisition of electroencephalographic (EEG) signals, various factors can influence the data and lead to the presence of one or multiple bad channels. Bad channel interpolation is the use of good channels data to reconstruct bad channel, thereby maintaining the original dimensions of the data for subsequent analysis tasks. The mainstream interpolation algorithm assigns weights to channels based on the physical distance of the electrodes and does not take into account the effect of physiological factors on the EEG signal. The algorithm proposed in this study utilizes an attention mechanism to allocate channel weights (AMACW). The model gets the correlation among channels by learning from good channel data. Interpolation assigns weights based on learned correlations without the need for electrode location information, solving the difficulty that traditional methods cannot interpolate bad channels at unknown locations. To avoid an overly concentrated weight distribution of the model when generating data, we designed the channel masking (CM). This method spreads attention and allows the model to utilize data from multiple channels. We evaluate the reconstruction performance of the model using EEG data with 1 to 5 bad channels. With EEGLAB's interpolation method as a performance reference, tests have shown that the AMACW models can effectively reconstruct bad channels.
Collapse
Affiliation(s)
| | - Zaijun Wang
- Key Laboratory of Flight Techniques and Flight Safety Research Base, Civil Aviation Flight University of China, Guanghan, China
| | | | | |
Collapse
|
139
|
Ashurov A, Chelloug SA, Tselykh A, Muthanna MSA, Muthanna A, Al-Gaashani MSAM. Improved Breast Cancer Classification through Combining Transfer Learning and Attention Mechanism. Life (Basel) 2023; 13:1945. [PMID: 37763348 PMCID: PMC10532552 DOI: 10.3390/life13091945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/16/2023] [Accepted: 09/17/2023] [Indexed: 09/29/2023] Open
Abstract
Breast cancer, a leading cause of female mortality worldwide, poses a significant health challenge. Recent advancements in deep learning techniques have revolutionized breast cancer pathology by enabling accurate image classification. Various imaging methods, such as mammography, CT, MRI, ultrasound, and biopsies, aid in breast cancer detection. Computer-assisted pathological image classification is of paramount importance for breast cancer diagnosis. This study introduces a novel approach to breast cancer histopathological image classification. It leverages modified pre-trained CNN models and attention mechanisms to enhance model interpretability and robustness, emphasizing localized features and enabling accurate discrimination of complex cases. Our method involves transfer learning with deep CNN models-Xception, VGG16, ResNet50, MobileNet, and DenseNet121-augmented with the convolutional block attention module (CBAM). The pre-trained models are finetuned, and the two CBAM models are incorporated at the end of the pre-trained models. The models are compared to state-of-the-art breast cancer diagnosis approaches and tested for accuracy, precision, recall, and F1 score. The confusion matrices are used to evaluate and visualize the results of the compared models. They help in assessing the models' performance. The test accuracy rates for the attention mechanism (AM) using the Xception model on the "BreakHis" breast cancer dataset are encouraging at 99.2% and 99.5%. The test accuracy for DenseNet121 with AMs is 99.6%. The proposed approaches also performed better than previous approaches examined in the related studies.
Collapse
Affiliation(s)
- Asadulla Ashurov
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | - Samia Allaoua Chelloug
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Alexey Tselykh
- Institute of Computer Technologies and Information Security, Southern Federal University, Taganrog 347922, Russia; (A.T.); (M.S.A.M.)
| | - Mohammed Saleh Ali Muthanna
- Institute of Computer Technologies and Information Security, Southern Federal University, Taganrog 347922, Russia; (A.T.); (M.S.A.M.)
| | - Ammar Muthanna
- RUDN University, 6 Miklukho-Maklaya Street, Moscow 117198, Russia;
| | - Mehdhar S. A. M. Al-Gaashani
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| |
Collapse
|
140
|
Zhang C, Yang Z, Xue B, Zhuo H, Liao L, Yang X, Zhu Z. Perceiving like a Bat: Hierarchical 3D Geometric-Semantic Scene Understanding Inspired by a Biomimetic Mechanism. Biomimetics (Basel) 2023; 8:436. [PMID: 37754187 PMCID: PMC10526479 DOI: 10.3390/biomimetics8050436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/28/2023] Open
Abstract
Geometric-semantic scene understanding is a spatial intelligence capability that is essential for robots to perceive and navigate the world. However, understanding a natural scene remains challenging for robots because of restricted sensors and time-varying situations. In contrast, humans and animals are able to form a complex neuromorphic concept of the scene they move in. This neuromorphic concept captures geometric and semantic aspects of the scenario and reconstructs the scene at multiple levels of abstraction. This article seeks to reduce the gap between robot and animal perception by proposing an ingenious scene-understanding approach that seamlessly captures geometric and semantic aspects in an unexplored environment. We proposed two types of biologically inspired environment perception methods, i.e., a set of elaborate biomimetic sensors and a brain-inspired parsing algorithm related to scene understanding, that enable robots to perceive their surroundings like bats. Our evaluations show that the proposed scene-understanding system achieves competitive performance in image semantic segmentation and volumetric-semantic scene reconstruction. Moreover, to verify the practicability of our proposed scene-understanding method, we also conducted real-world geometric-semantic scene reconstruction in an indoor environment with our self-developed drone.
Collapse
Affiliation(s)
| | - Zhong Yang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (C.Z.)
| | | | | | | | | | | |
Collapse
|
141
|
Zhang Q, Shu J, Chen C, Teng Z, Gu Z, Li F, Kan J. Optimization of pneumonia CT classification model using RepVGG and spatial attention features. Front Med (Lausanne) 2023; 10:1233724. [PMID: 37795420 PMCID: PMC10546926 DOI: 10.3389/fmed.2023.1233724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 09/05/2023] [Indexed: 10/06/2023] Open
Abstract
Introduction Pneumonia is a common and widespread infectious disease that seriously affects the life and health of patients. Especially in recent years, the outbreak of COVID-19 has caused a sharp rise in the number of confirmed cases of epidemic spread. Therefore, early detection and treatment of pneumonia are very important. However, the uneven gray distribution and structural intricacy of pneumonia images substantially impair the classification accuracy of pneumonia. In this classification task of COVID-19 and other pneumonia, because there are some commonalities between this pneumonia, even a small gap will lead to the risk of prediction deviation, it is difficult to achieve high classification accuracy by directly using the current network model to optimize the classification model. Methods Consequently, an optimization method for the CT classification model of COVID-19 based on RepVGG was proposed. In detail, it is made up of two essential modules, feature extraction backbone and spatial attention block, which allows it to extract spatial attention features while retaining the benefits of RepVGG. Results The model's inference time is significantly reduced, and it shows better learning ability than RepVGG on both the training and validation sets. Compared with the existing advanced network models VGG-16, ResNet-50, GoogleNet, ViT, AlexNet, MobileViT, ConvNeXt, ShuffleNet, and RepVGG_b0, our model has demonstrated the best performance in a lot of indicators. In testing, it achieved an accuracy of 0.951, an F1 score of 0.952, and a Youden index of 0.902. Discussion Overall, multiple experiments on the large dataset of SARS-CoV-2 CT-scan dataset reveal that this method outperforms most basic models in terms of classification and screening of COVID-19 CT, and has a significant reference value. Simultaneously, in the inspection experiment, this method outperformed other networks with residual structures.
Collapse
Affiliation(s)
| | - Jianhua Shu
- School of Medical Information Engineering, Anhui University of Chinese Medicine, Hefei, China
| | | | | | | | | | - Junling Kan
- School of Medical Information Engineering, Anhui University of Chinese Medicine, Hefei, China
| |
Collapse
|
142
|
Xiao D, Zhu F, Jiang J, Niu X. Leveraging natural cognitive systems in conjunction with ResNet50-BiGRU model and attention mechanism for enhanced medical image analysis and sports injury prediction. Front Neurosci 2023; 17:1273931. [PMID: 37795185 PMCID: PMC10546033 DOI: 10.3389/fnins.2023.1273931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 08/28/2023] [Indexed: 10/06/2023] Open
Abstract
Introduction In this study, we explore the potential benefits of integrating natural cognitive systems (medical professionals' expertise) and artificial cognitive systems (deep learning models) in the realms of medical image analysis and sports injury prediction. We focus on analyzing medical images of athletes to gain valuable insights into their health status. Methods To synergize the strengths of both natural and artificial cognitive systems, we employ the ResNet50-BiGRU model and introduce an attention mechanism. Our goal is to enhance the performance of medical image feature extraction and motion injury prediction. This integrated approach aims to achieve precise identification of anomalies in medical images, particularly related to muscle or bone damage. Results We evaluate the effectiveness of our method on four medical image datasets, specifically pertaining to skeletal and muscle injuries. We use performance indicators such as Peak Signal-to-Noise Ratio and Structural Similarity Index, confirming the robustness of our approach in sports injury analysis. Discussion Our research contributes significantly by providing an effective deep learning-driven method that harnesses both natural and artificial cognitive systems. By combining human expertise with advanced machine learning techniques, we offer a comprehensive understanding of athletes' health status. This approach holds potential implications for enhancing sports injury prevention, improving diagnostic accuracy, and tailoring personalized treatment plans for athletes, ultimately promoting better overall health and performance outcomes. Despite advancements in medical image analysis and sports injury prediction, existing systems often struggle to identify subtle anomalies and provide precise injury risk assessments, underscoring the necessity of a more integrated and comprehensive approach.
Collapse
Affiliation(s)
- Duo Xiao
- Ministry of Culture, Sports and Labor, Jiangxi Gannan Health Vocational College, Ganzhou, Jiangxi, China
| | - Fei Zhu
- Ministry of Culture, Sports and Labor, Jiangxi Gannan Health Vocational College, Ganzhou, Jiangxi, China
| | - Jian Jiang
- Gannan University of Science and Technology, Ganzhou, Jiangxi, China
| | - Xiaoqiang Niu
- Ministry of Culture, Sports and Labor, Jiangxi Gannan Health Vocational College, Ganzhou, Jiangxi, China
| |
Collapse
|
143
|
Huang P, Wang Q, Chen H, Lu G. Gas Sensor Array Fault Diagnosis Based on Multi-Dimensional Fusion, an Attention Mechanism, and Multi-Task Learning. Sensors (Basel) 2023; 23:7836. [PMID: 37765891 PMCID: PMC10535611 DOI: 10.3390/s23187836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/03/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
With the development of gas sensor arrays and computational technology, machine olfactory systems have been widely used in environmental monitoring, medical diagnosis, and other fields. The reliable and stable operation of gas sensing systems depends heavily on the accuracy of the sensors outputs. Therefore, the realization of accurate gas sensor array fault diagnosis is essential to monitor the working status of sensor arrays and ensure the normal operation of the whole system. The existing methods extract features from a single dimension and require the separate training of models for multiple diagnosis tasks, which limits diagnostic accuracy and efficiency. To address these limitations, for this study, a novel fault diagnosis network based on multi-dimensional feature fusion, an attention mechanism, and multi-task learning, MAM-Net, was developed and applied to gas sensor arrays. First, feature fusion models were applied to extract deep and comprehensive features from the original data in multiple dimensions. A residual network equipped with convolutional block attention modules and a Bi-LSTM network were designed for two-dimensional and one-dimensional signals to capture spatial and temporal features simultaneously. Subsequently, a concatenation layer was constructed using feature stitching to integrate the fault details of different dimensions and avoid ignoring useful information. Finally, a multi-task learning module was designed for the parallel learning of the sensor fault diagnosis to effectively improve the diagnosis capability. The experimental results derived from using the proposed framework on gas sensor datasets across different amounts of data, balanced and unbalanced datasets, and different experimental settings show that the proposed framework outperforms the other available methods and demonstrates good recognition accuracy and robustness.
Collapse
Affiliation(s)
| | - Qingfeng Wang
- State Key Laboratory of Integrated Optoelectronics, College of Electronic Science and Engineering, Jilin University, Changchun 130012, China
| | | | | |
Collapse
|
144
|
Wang A, Meng Q, Wang M. Spectrum Sensing Method Based on Residual Dense Network and Attention. Sensors (Basel) 2023; 23:7791. [PMID: 37765847 PMCID: PMC10534694 DOI: 10.3390/s23187791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/06/2023] [Accepted: 09/08/2023] [Indexed: 09/29/2023]
Abstract
To address the problems of gradient vanishing and limited feature extraction capability of traditional CNN spectrum sensing methods in deep network structures and to effectively avoid network degradation issues under deep network structures, this paper proposes a collaborative spectrum sensing method based on Residual Dense Network and attention mechanisms. This method involves stacking and normalizing the time-domain information of the signal, constructing a two-dimensional matrix, and mapping it to a grayscale image. The grayscale images are divided into training and testing sets, and the training set is used to train the neural network to extract deep features. Finally, the test set is fed into the well-trained neural network for spectrum sensing. Experimental results show that, under low signal-to-noise ratios, the proposed method demonstrates superior spectral sensing performance compared to traditional collaborative spectrum sensing methods.
Collapse
Affiliation(s)
- Anyi Wang
- School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Qifeng Meng
- School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Mingbo Wang
- School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
| |
Collapse
|
145
|
Zhao S, Bai Z, Meng L, Han G, Duan E. Pose Estimation and Behavior Classification of Jinling White Duck Based on Improved HRNet. Animals (Basel) 2023; 13:2878. [PMID: 37760278 PMCID: PMC10525901 DOI: 10.3390/ani13182878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/03/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
In breeding ducks, obtaining the pose information is vital for perceiving their physiological health, ensuring welfare in breeding, and monitoring environmental comfort. This paper proposes a pose estimation method by combining HRNet and CBAM to achieve automatic and accurate detection of duck's multi-poses. Through comparison, HRNet-32 is identified as the optimal option for duck pose estimation. Based on this, multiple CBAM modules are densely embedded into the HRNet-32 network to obtain the pose estimation model based on HRNet-32-CBAM, realizing accurate detection and correlation of eight keypoints across six different behaviors. Furthermore, the model's generalization ability is tested under different illumination conditions, and the model's comprehensive detection abilities are evaluated on Cherry Valley ducklings of 12 and 24 days of age. Moreover, this model is compared with mainstream pose estimation methods to reveal its advantages and disadvantages, and its real-time performance is tested using images of 256 × 256, 512 × 512, and 728 × 728 pixel sizes. The experimental results indicate that for the duck pose estimation dataset, the proposed method achieves an average precision (AP) of 0.943, which has a strong generalization ability and can achieve real-time estimation of the duck's multi-poses under different ages, breeds, and farming modes. This study can provide a technical reference and a basis for the intelligent farming of poultry animals.
Collapse
Affiliation(s)
- Shida Zhao
- Institute of Agricultural Facilities and Equipment, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
- Key Laboratory of Protected Agriculture Engineering in the Middle and Lower Reaches of Yangtze River, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
| | - Zongchun Bai
- Institute of Agricultural Facilities and Equipment, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
- Key Laboratory of Protected Agriculture Engineering in the Middle and Lower Reaches of Yangtze River, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
| | - Lili Meng
- Institute of Agricultural Facilities and Equipment, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
- Key Laboratory of Protected Agriculture Engineering in the Middle and Lower Reaches of Yangtze River, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
- School of Civil Engineering, Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 14300, Malaysia
| | - Guofeng Han
- Institute of Agricultural Facilities and Equipment, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
- Key Laboratory of Protected Agriculture Engineering in the Middle and Lower Reaches of Yangtze River, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
| | - Enze Duan
- Institute of Agricultural Facilities and Equipment, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
- Key Laboratory of Protected Agriculture Engineering in the Middle and Lower Reaches of Yangtze River, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
| |
Collapse
|
146
|
Li W, Jia M, Yang C, Lin Z, Yu Y, Zhang W. SPA-UNet: A liver tumor segmentation network based on fused multi-scale features. Open Life Sci 2023; 18:20220685. [PMID: 37724113 PMCID: PMC10505346 DOI: 10.1515/biol-2022-0685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 06/26/2023] [Accepted: 07/24/2023] [Indexed: 09/20/2023] Open
Abstract
Liver tumor segmentation is a critical part in the diagnosis and treatment of liver cancer. While U-shaped convolutional neural networks (UNets) have made significant strides in medical image segmentation, challenges remain in accurately segmenting tumor boundaries and detecting small tumors, resulting in low segmentation accuracy. To improve the segmentation accuracy of liver tumors, this work proposes space pyramid attention (SPA)-UNet, a novel image segmentation network with an encoder-decoder architecture. SPA-UNet consists of four modules: (1) Spatial pyramid convolution block (SPCB), extracting multi-scale features by fusing three sets of dilated convolutions with different rates. (2) Spatial pyramid pooling block (SPPB), performing downsampling to reduce image size. (3) Upsample module, integrating dense positional and semantic information. (4) Residual attention block (RA-Block), enabling precise tumor localization. The encoder incorporates 5 SPCBs and 4 SPPBs to capture contextual information. The decoder consists of the Upsample module and RA-Block, and finally a segmentation head outputs segmented images of liver and liver tumor. Experiments using the liver tumor segmentation dataset demonstrate that SPA-UNet surpasses the traditional UNet model, achieving a 1.0 and 2.0% improvement in intersection over union indicators for liver and tumors, respectively, along with increased recall rates by 1.2 and 1.8%. These advancements provide a dependable foundation for liver cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Weikun Li
- School of Computer and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi, 541000, China
| | - Maoning Jia
- School of Computer and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi, 541000, China
| | - Chen Yang
- School of Business, Guilin University of Electronic Technology, Guilin, Guangxi, 541000, China
| | - Zhenyuan Lin
- School of Computer and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi, 541000, China
| | - Yuekang Yu
- School of Information and Communication, Guilin University of Electronic Technology, Guilin, Guangxi, 541000, China
| | - Wenhui Zhang
- School of Computer and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi, 541000, China
| |
Collapse
|
147
|
Cheng Z, Li Y. Improved YOLOv7 Algorithm for Detecting Bone Marrow Cells. Sensors (Basel) 2023; 23:7640. [PMID: 37688095 PMCID: PMC10490824 DOI: 10.3390/s23177640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/29/2023] [Accepted: 08/31/2023] [Indexed: 09/10/2023]
Abstract
The detection and classification of bone marrow (BM) cells is a critical cornerstone for hematology diagnosis. However, the low accuracy caused by few BM-cell data samples, subtle difference between classes, and small target size, pathologists still need to perform thousands of manual identifications daily. To address the above issues, we propose an improved BM-cell-detection algorithm in this paper, called YOLOv7-CTA. Firstly, to enhance the model's sensitivity to fine-grained features, we design a new module called CoTLAN in the backbone network to enable the model to perform long-term modeling between target feature information. Then, in order to cooperate with the CoTLAN module to pay more attention to the features in the area to be detected, we integrate the coordinate attention (CoordAtt) module between the CoTLAN modules to improve the model's attention to small target features. Finally, we cluster the target boxes of the BM cell dataset based on K-means++ to generate more suitable anchor boxes, which accelerates the convergence of the improved model. In addition, in order to solve the imbalance between positive and negative samples in BM-cell pictures, we use the Focal loss function to replace the multi-class cross entropy. Experimental results demonstrate that the best mean average precision (mAP) of the proposed model reaches 88.6%, which is an improvement of 12.9%, 8.3%, and 6.7% compared with that of the Faster R-CNN model, YOLOv5l model, and YOLOv7 model, respectively. This verifies the effectiveness and superiority of the YOLOv7-CTA model in BM-cell-detection tasks.
Collapse
Affiliation(s)
| | - Yuanyuan Li
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan 430205, China
| |
Collapse
|
148
|
Zheng S, Huang X, Chen J, Lyu Z, Zheng J, Huang J, Gao H, Liu S, Sun L. UR-Net: An Integrated ResUNet and Attention Based Image Enhancement and Classification Network for Stain-Free White Blood Cells. Sensors (Basel) 2023; 23:7605. [PMID: 37688058 PMCID: PMC10490639 DOI: 10.3390/s23177605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/08/2023] [Accepted: 08/29/2023] [Indexed: 09/10/2023]
Abstract
The differential count of white blood cells (WBCs) can effectively provide disease information for patients. Existing stained microscopic WBC classification usually requires complex sample-preparation steps, and is easily affected by external conditions such as illumination. In contrast, the inconspicuous nuclei of stain-free WBCs also bring great challenges to WBC classification. As such, image enhancement, as one of the preprocessing methods of image classification, is essential in improving the image qualities of stain-free WBCs. However, traditional or existing convolutional neural network (CNN)-based image enhancement techniques are typically designed as standalone modules aimed at improving the perceptual quality of humans, without considering their impact on advanced computer vision tasks of classification. Therefore, this work proposes a novel model, UR-Net, which consists of an image enhancement network framed by ResUNet with an attention mechanism and a ResNet classification network. The enhancement model is integrated into the classification model for joint training to improve the classification performance for stain-free WBCs. The experimental results demonstrate that compared to the models without image enhancement and previous enhancement and classification models, our proposed model achieved a best classification performance of 83.34% on our stain-free WBC dataset.
Collapse
Affiliation(s)
- Sikai Zheng
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Xiwei Huang
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Jin Chen
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Zefei Lyu
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Jingwen Zheng
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Jiye Huang
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Haijun Gao
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| | - Shan Liu
- Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu 610072, China;
| | - Lingling Sun
- Ministry of Education Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, Hangzhou 310018, China; (S.Z.); (J.C.); (Z.L.); (J.Z.); (J.H.); (L.S.)
| |
Collapse
|
149
|
Liu H, Zhuang Y, Song E, Xu X, Ma G, Cetinkaya C, Hung CC. A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations. Med Phys 2023; 50:5460-5478. [PMID: 36864700 DOI: 10.1002/mp.16338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/07/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND Multi-modal learning is widely adopted to learn the latent complementary information between different modalities in multi-modal medical image segmentation tasks. Nevertheless, the traditional multi-modal learning methods require spatially well-aligned and paired multi-modal images for supervised training, which cannot leverage unpaired multi-modal images with spatial misalignment and modality discrepancy. For training accurate multi-modal segmentation networks using easily accessible and low-cost unpaired multi-modal images in clinical practice, unpaired multi-modal learning has received comprehensive attention recently. PURPOSE Existing unpaired multi-modal learning methods usually focus on the intensity distribution gap but ignore the scale variation problem between different modalities. Besides, within existing methods, shared convolutional kernels are frequently employed to capture common patterns in all modalities, but they are typically inefficient at learning global contextual information. On the other hand, existing methods highly rely on a large number of labeled unpaired multi-modal scans for training, which ignores the practical scenario when labeled data is limited. To solve the above problems, we propose a modality-collaborative convolution and transformer hybrid network (MCTHNet) using semi-supervised learning for unpaired multi-modal segmentation with limited annotations, which not only collaboratively learns modality-specific and modality-invariant representations, but also could automatically leverage extensive unlabeled scans for improving performance. METHODS We make three main contributions to the proposed method. First, to alleviate the intensity distribution gap and scale variation problems across modalities, we develop a modality-specific scale-aware convolution (MSSC) module that can adaptively adjust the receptive field sizes and feature normalization parameters according to the input. Secondly, we propose a modality-invariant vision transformer (MIViT) module as the shared bottleneck layer for all modalities, which implicitly incorporates convolution-like local operations with the global processing of transformers for learning generalizable modality-invariant representations. Third, we design a multi-modal cross pseudo supervision (MCPS) method for semi-supervised learning, which enforces the consistency between the pseudo segmentation maps generated by two perturbed networks to acquire abundant annotation information from unlabeled unpaired multi-modal scans. RESULTS Extensive experiments are performed on two unpaired CT and MR segmentation datasets, including a cardiac substructure dataset derived from the MMWHS-2017 dataset and an abdominal multi-organ dataset consisting of the BTCV and CHAOS datasets. Experiment results show that our proposed method significantly outperforms other existing state-of-the-art methods under various labeling ratios, and achieves a comparable segmentation performance close to single-modal methods with fully labeled data by only leveraging a small portion of labeled data. Specifically, when the labeling ratio is 25%, our proposed method achieves overall mean DSC values of 78.56% and 76.18% in cardiac and abdominal segmentation, respectively, which significantly improves the average DSC value of two tasks by 12.84% compared to single-modal U-Net models. CONCLUSIONS Our proposed method is beneficial for reducing the annotation burden of unpaired multi-modal medical images in clinical applications.
Collapse
Affiliation(s)
- Hong Liu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Yuzhou Zhuang
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Enmin Song
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Xiangyang Xu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guangzhi Ma
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Coskun Cetinkaya
- Center for Machine Vision and Security Research, Kennesaw State University, Kennesaw, Georgia, USA
| | - Chih-Cheng Hung
- Center for Machine Vision and Security Research, Kennesaw State University, Kennesaw, Georgia, USA
| |
Collapse
|
150
|
Zhou P, Zhang Y, Li Z, Pang K, Zhao D. Protein Complex Identification Based on Heterogeneous Protein Information Network. J Comput Biol 2023; 30:985-998. [PMID: 37669441 DOI: 10.1089/cmb.2023.0081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023] Open
Abstract
Protein complexes are the foundation of all cellular activities, and accurately identifying them is crucial for studying cellular systems. The efficient discovery of protein complexes is a focus of research in the field of bioinformatics. Most existing methods for protein complex identification are based on the structure of the protein-protein interaction (PPI) network, whereas some methods attempt to integrate biological information to enhance the features of the protein network for complex identification. Existing protein complex identification methods are unable to fully integrate network topology information and biological attribute information. Most of these methods are based on homogeneous networks and cannot distinguish the importance of different attributes and protein nodes. To address these issues, a GO attribute Heterogeneous Attention network Embedding (GHAE) method based on heterogeneous protein information networks is proposed. First, GHAE incorporates Gene Ontology (GO) information into the PPI network, constructing a heterogeneous protein information network. Then, GHAE uses a dual attention mechanism and heterogeneous graph convolutional representation learning method to learn protein features and to identify protein complexes. The experimental results show that building heterogeneous protein information networks can fully integrate valuable biological information. The heterogeneous graph embedding learning method can simultaneously mine the features of protein and GO attributes, thereby improving the performance of protein complex identification.
Collapse
Affiliation(s)
- Peixuan Zhou
- School of Information Science and Technology, Dalian Maritime University, Dalian, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, China
| | - Zeqian Li
- School of Information Science and Technology, Dalian Maritime University, Dalian, China
| | - Kuo Pang
- School of Information Science and Technology, Dalian Maritime University, Dalian, China
| | - Di Zhao
- School of Computer Science and Engineering, Dalian Minzu University, Dalian, China
| |
Collapse
|