1
|
Zhang X, Li B, Qi G. A multi-featured expression recognition model incorporating attention mechanism and object detection structure for psychological problem diagnosis. Physiol Behav 2024; 280:114561. [PMID: 38641188 DOI: 10.1016/j.physbeh.2024.114561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 04/21/2024]
Abstract
Expression is the main method for judging the emotional state and psychological condition of the human body, and the prediction of changes in facial expressions can effectively determine the mental health of a person, thus avoiding serious psychological or psychiatric disorders due to early negligence. From a computer vision perspective, most researchers have focused on studying facial expression analysis, and in some cases, body posture is also considered. However their performance is more limited under unconstrained natural conditions, which requires more information to be used in human emotion analysis. In this paper, we design an Adaptive Multi-End Fusion Attention Mechanism suitable for extracting human body information based on the deep learning framework, depending on human expressions, postures and the environment they are in and add it to an object detection model to obtain the information we need from different regions of the human body and face and features of different sizes and use fusion networks for feature fusion and classification, and from different test methods to confirm that this fusion approach for expression recognition and prediction is feasible. This model achieves an average accuracy of 34.51 % in the Emotic contextual expression recognition dataset.
Collapse
Affiliation(s)
| | - Bingyi Li
- Dalian Minzu University, Liaoning Province, China.
| | - Guobin Qi
- Dalian Minzu University, Liaoning Province, China
| |
Collapse
|
2
|
Cheng K. Prediction of emotion distribution of images based on weighted K-nearest neighbor-attention mechanism. Front Comput Neurosci 2024; 18:1350916. [PMID: 38694951 PMCID: PMC11061417 DOI: 10.3389/fncom.2024.1350916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/28/2024] [Indexed: 05/04/2024] Open
Abstract
Existing methods for classifying image emotions often overlook the subjective impact emotions evoke in observers, focusing primarily on emotion categories. However, this approach falls short in meeting practical needs as it neglects the nuanced emotional responses captured within an image. This study proposes a novel approach employing the weighted closest neighbor algorithm to predict the discrete distribution of emotion in abstract paintings. Initially, emotional features are extracted from the images and assigned varying K-values. Subsequently, an encoder-decoder architecture is utilized to derive sentiment features from abstract paintings, augmented by a pre-trained model to enhance classification model generalization and convergence speed. By incorporating a blank attention mechanism into the decoder and integrating it with the encoder's output sequence, the semantics of abstract painting images are learned, facilitating precise and sensible emotional understanding. Experimental results demonstrate that the classification algorithm, utilizing the attention mechanism, achieves a higher accuracy of 80.7% compared to current methods. This innovative approach successfully addresses the intricate challenge of discerning emotions in abstract paintings, underscoring the significance of considering subjective emotional responses in image classification. The integration of advanced techniques such as weighted closest neighbor algorithm and attention mechanisms holds promise for enhancing the comprehension and classification of emotional content in visual art.
Collapse
Affiliation(s)
- Kai Cheng
- School of Artificial Intelligence, Xidian University, Xi'an, China
| |
Collapse
|
3
|
Huang Q, Li M, Agustin D, Li L, Jha M. A Novel CNN Model for Classification of Chinese Historical Calligraphy Styles in Regular Script Font. SENSORS (BASEL, SWITZERLAND) 2023; 24:197. [PMID: 38203059 PMCID: PMC10781260 DOI: 10.3390/s24010197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 12/21/2023] [Accepted: 12/25/2023] [Indexed: 01/12/2024]
Abstract
Chinese calligraphy, revered globally for its therapeutic and mindfulness benefits, encompasses styles such as regular (Kai Shu), running (Xing Shu), official (Li Shu), and cursive (Cao Shu) scripts. Beginners often start with the regular script, advancing to more intricate styles like cursive. Each style, marked by unique historical calligraphy contributions, requires learners to discern distinct nuances. The integration of AI in calligraphy analysis, collection, recognition, and classification is pivotal. This study introduces an innovative convolutional neural network (CNN) architecture, pioneering the application of CNN in the classification of Chinese calligraphy. Focusing on the four principal calligraphy styles from the Tang dynasty (690-907 A.D.), this research spotlights the era when the traditional regular script font (Kai Shu) was refined. A comprehensive dataset of 8282 samples from these calligraphers, representing the zenith of regular style, was compiled for CNN training and testing. The model distinguishes personal styles for classification, showing superior performance over existing networks. Achieving 89.5-96.2% accuracy in calligraphy classification, our approach underscores the significance of CNN in the categorization of both font and artistic styles. This research paves the way for advanced studies in Chinese calligraphy and its cultural implications.
Collapse
Affiliation(s)
- Qing Huang
- School of Education and the Arts, Central Queensland University, Rockhampton, QLD 4701, Australia
| | - Michael Li
- School of Engineering and Technology, Central Queensland University, Rockhampton, QLD 4701, Australia; (M.L.); (L.L.); (M.J.)
| | - Dan Agustin
- Centre of Railway Engineering, School of Engineering and Technology, Central Queensland University, Rockhampton, QLD 4701, Australia;
| | - Lily Li
- School of Engineering and Technology, Central Queensland University, Rockhampton, QLD 4701, Australia; (M.L.); (L.L.); (M.J.)
| | - Meena Jha
- School of Engineering and Technology, Central Queensland University, Rockhampton, QLD 4701, Australia; (M.L.); (L.L.); (M.J.)
| |
Collapse
|
4
|
Chen K, Wu Z, Huang J, Su Y. Self-Attention Mechanism-Based Head Pose Estimation Network with Fusion of Point Cloud and Image Features. SENSORS (BASEL, SWITZERLAND) 2023; 23:9894. [PMID: 38139739 PMCID: PMC10747419 DOI: 10.3390/s23249894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023]
Abstract
Head pose estimation serves various applications, such as gaze estimation, fatigue-driven detection, and virtual reality. Nonetheless, achieving precise and efficient predictions remains challenging owing to the reliance on singular data sources. Therefore, this study introduces a technique involving multimodal feature fusion to elevate head pose estimation accuracy. The proposed method amalgamates data derived from diverse sources, including RGB and depth images, to construct a comprehensive three-dimensional representation of the head, commonly referred to as a point cloud. The noteworthy innovations of this method encompass a residual multilayer perceptron structure within PointNet, designed to tackle gradient-related challenges, along with spatial self-attention mechanisms aimed at noise reduction. The enhanced PointNet and ResNet networks are utilized to extract features from both point clouds and images. These extracted features undergo fusion. Furthermore, the incorporation of a scoring module strengthens robustness, particularly in scenarios involving facial occlusion. This is achieved by preserving features from the highest-scoring point cloud. Additionally, a prediction module is employed, combining classification and regression methodologies to accurately estimate head poses. The proposed method improves the accuracy and robustness of head pose estimation, especially in cases involving facial obstructions. These advancements are substantiated by experiments conducted using the BIWI dataset, demonstrating the superiority of this method over existing techniques.
Collapse
Affiliation(s)
- Kui Chen
- College of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (K.C.); (J.H.); (Y.S.)
| | - Zhaofu Wu
- College of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (K.C.); (J.H.); (Y.S.)
| | - Jianwei Huang
- College of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (K.C.); (J.H.); (Y.S.)
- Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions, Henan University, Ministry of Education, Kaifeng 475004, China
| | - Yiming Su
- College of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (K.C.); (J.H.); (Y.S.)
| |
Collapse
|
5
|
Feng S, Wang Y, Gong J, Li X, Li S. A fine-grained recognition technique for identifying Chinese food images. Heliyon 2023; 9:e21565. [PMID: 38027727 PMCID: PMC10661202 DOI: 10.1016/j.heliyon.2023.e21565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/19/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
As a crucial area of research in the field of computer vision, food recognition technology has become a core technology in many food-related fields, such as unmanned restaurants and food nutrition analysis, which are closely related to our healthy lives. Obtaining accurate classification results is the most important task in food recognition. Food classification is a fine-grained recognition process, which involves extracting features from a group of objects with similar appearances and accurately classifying them into different categories. In a such usage environment, the network is required to not only overview the overall image, but also capture the subtle details within it. In addition, since Chinese food images have unique texture features, the model needs to extract texture information from the image. However, existing CNN methods have not focused on and processed this information. To classify food as accurately as possible, this paper introduces the Laplace pyramid into the convolution layer and proposes a bilinear network that can perceive image texture features and multi-scale features (LMB-Net). The proposed model was evaluated on a public dataset, and the results demonstrate that LMB-Net achieves state-of-the-art classification performance.
Collapse
Affiliation(s)
- Shuo Feng
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| | - Yangang Wang
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| | - Jianhong Gong
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| | - Xiang Li
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| | - Shangxuan Li
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| |
Collapse
|
6
|
Hao N, Ruan S, Song Y, Chen J, Tian L. The Establishment of a precise intelligent evaluation system for sports events: Diving. Heliyon 2023; 9:e21361. [PMID: 37920483 PMCID: PMC10618775 DOI: 10.1016/j.heliyon.2023.e21361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 10/18/2023] [Accepted: 10/19/2023] [Indexed: 11/04/2023] Open
Abstract
The introduction of action quality assessment technology in sports events to achieve precise intelligent evaluation can greatly enhance the objectivity and effectiveness of competition results. Taking diving as the specific application background, this study proposes a novel Multi-granularity Extraction Approach for Temporal-spatial features in judge scoring prediction (MEAT) under the conditions of action quality assessment. On the one hand, it uses dual-modal inflated 3D ConvNet to extract the temporal and spatial features of each modal diving video at the video granularity parallelly and to merge them to form a global feature. On the other hand, the human body pose is modeled, and the simulated athlete's three-dimensional splash state is taken as local characteristics at the object granularity. Finally, the global and local features are concatenated into the fully connected layer, and heuristic method inspired by competition rules using labeled distribution learning are employed to output the probability distribution of the average score of all referees. The maximum probability score is selected and multiplied by the difficulty coefficient to obtain the final diving score. Through comprehensive experiments, comparing the Spearman's rank correlation (SRC) evaluation results of existing methods on the UNIV-Dive dataset, this framework reflects the greater accuracy advantage and further lays the foundation for the actual implementation of the technology.
Collapse
Affiliation(s)
- Ning Hao
- School of Civil Engineering, Southeast University, Nanjing, 211189, China
| | - Sihan Ruan
- School of Civil Engineering, Southeast University, Nanjing, 211189, China
- School of Engineering, RMIT University, Melbourne, 3001, Australia
| | - Yiheng Song
- School of Civil Engineering, Southeast University, Nanjing, 211189, China
- School of Engineering, The University of Tokyo, Tokyo, 113-8654, Japan
| | - Jiashun Chen
- School of Computer Science and Engineering, Southeast University, Nanjing, 211189, China
| | - Longgang Tian
- School of Civil Engineering, Southeast University, Nanjing, 211189, China
| |
Collapse
|
7
|
An Q, Xu Y, Yu J, Tang M, Liu T, Xu F. Research on Safety Helmet Detection Algorithm Based on Improved YOLOv5s. SENSORS (BASEL, SWITZERLAND) 2023; 23:5824. [PMID: 37447673 PMCID: PMC10346515 DOI: 10.3390/s23135824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/19/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023]
Abstract
Safety helmets are essential in various indoor and outdoor workplaces, such as metallurgical high-temperature operations and high-rise building construction, to avoid injuries and ensure safety in production. However, manual supervision is costly and prone to lack of enforcement and interference from other human factors. Moreover, small target object detection frequently lacks precision. Improving safety helmets based on the helmet detection algorithm can address these issues and is a promising approach. In this study, we proposed a modified version of the YOLOv5s network, a lightweight deep learning-based object identification network model. The proposed model extends the YOLOv5s network model and enhances its performance by recalculating the prediction frames, utilizing the IoU metric for clustering, and modifying the anchor frames with the K-means++ method. The global attention mechanism (GAM) and the convolutional block attention module (CBAM) were added to the YOLOv5s network to improve its backbone and neck networks. By minimizing information feature loss and enhancing the representation of global interactions, these attention processes enhance deep learning neural networks' capacity for feature extraction. Furthermore, the CBAM is integrated into the CSP module to improve target feature extraction while minimizing computation for model operation. In order to significantly increase the efficiency and precision of the prediction box regression, the proposed model additionally makes use of the most recent SIoU (SCYLLA-IoU LOSS) as the bounding box loss function. Based on the improved YOLOv5s model, knowledge distillation technology is leveraged to realize the light weight of the network model, thereby reducing the computational workload of the model and improving the detection speed to meet the needs of real-time monitoring. The experimental results demonstrate that the proposed model outperforms the original YOLOv5s network model in terms of accuracy (Precision), recall rate (Recall), and mean average precision (mAP). The proposed model may more effectively identify helmet use in low-light situations and at a variety of distances.
Collapse
Affiliation(s)
- Qing An
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (M.T.); (T.L.); (F.X.)
| | - Yingjian Xu
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430079, China
| | - Jun Yu
- USTC iFLYTEK Co., Ltd., Hefei 230088, China;
| | - Miao Tang
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (M.T.); (T.L.); (F.X.)
| | - Tingting Liu
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (M.T.); (T.L.); (F.X.)
| | - Feihong Xu
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (M.T.); (T.L.); (F.X.)
| |
Collapse
|
8
|
Tian J, He G. Research on the construction of a collaborative ability evaluation system for the joint graduation design of new engineering specialty groups based on digital technology. Heliyon 2023; 9:e16855. [PMID: 37332918 PMCID: PMC10275788 DOI: 10.1016/j.heliyon.2023.e16855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 05/24/2023] [Accepted: 05/31/2023] [Indexed: 06/20/2023] Open
Abstract
Conducting research on the construction of a collaborative ability evaluation system for the joint graduation design of new engineering specialty groups based on digital technology holds great practical relevance. In this paper, which is based on a comprehensive analysis and research of the current situation pertaining to the joint graduation design of college graduates in China and elsewhere and on the construction of a collaborative ability evaluation system, combined with the talent training program of the joint graduation design, the Delphi method and the analytic hierarchy process (AHP) are adopted to establish a hierarchical structure model of the collaborative ability evaluation system for joint graduation design. In this system, collaborative abilities in the areas of cognition, behavior and emergency management are used as the criteria level evaluation indices. Additionally, collaborative ability in regard to targets, to knowledge, to relationships, to software, to the workflow, to organization, to culture, to learning and to conflict are used as evaluation indices. The comparison judgment matrix of the evaluation indices is constructed at the collaborative ability criterion level and at the index level. By calculating the maximum eigenvalue and corresponding eigenvector of the judgment matrix, the weight assignment of the evaluation indices is obtained, and the evaluation indices are sorted. Finally, the related research content is evaluated. The research results show that the key evaluation indicators for the collaborative ability evaluation system of joint graduation design that need to be considered are easy to determine, and these indicators provide a theoretical reference for the reform of graduation design teaching of new engineering specialty groups.
Collapse
|
9
|
Chen S, Yang N. STMP-Net: A Spatiotemporal Prediction Network Integrating Motion Perception. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115133. [PMID: 37299860 DOI: 10.3390/s23115133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/22/2023] [Accepted: 05/24/2023] [Indexed: 06/12/2023]
Abstract
This article proposes a video prediction network called STMP-Net that addresses the problem of the inability of Recurrent Neural Networks (RNNs) to fully extract spatiotemporal information and motion change features during video prediction. STMP-Net combines spatiotemporal memory and motion perception to make more accurate predictions. Firstly, a spatiotemporal attention fusion unit (STAFU) is proposed as the basic module of the prediction network, which learns and transfers spatiotemporal features in both horizontal and vertical directions based on spatiotemporal feature information and contextual attention mechanism. Additionally, a contextual attention mechanism is introduced in the hidden state to focus attention on more important details and improve the capture of detailed features, thus greatly reducing the computational load of the network. Secondly, a motion gradient highway unit (MGHU) is proposed by combining motion perception modules and adding them between adjacent layers, which can adaptively learn the important information of input features and fuse motion change features to significantly improve the predictive performance of the model. Finally, a high-speed channel is provided between layers to quickly transmit important features and alleviate the gradient vanishing problem caused by back-propagation. The experimental results show that compared with mainstream video prediction networks, the proposed method can achieve better prediction results in long-term video prediction, especially in motion scenes.
Collapse
Affiliation(s)
- Suting Chen
- School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Ning Yang
- School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
| |
Collapse
|
10
|
Qiu Y, Wang S, Zhang S, Xu J. A novel gated dual convolutional neural network model with autoregressive method and attention mechanism for probabilistic load forecasting. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04589-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
|
11
|
Wang Y, Zhou W, Zhou J. 2DHeadPose: A simple and effective annotation method for the head pose in RGB images and its dataset. Neural Netw 2023; 160:50-62. [PMID: 36621170 DOI: 10.1016/j.neunet.2022.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/16/2022] [Accepted: 12/28/2022] [Indexed: 01/03/2023]
Abstract
Head pose estimation is one of the essential tasks in computer vision, which predicts the Euler angles of the head in an image. In recent years, CNN-based methods for head pose estimation have achieved excellent performance. Their training relies on RGB images providing facial landmarks or depth images from RGBD cameras. However, labeling facial landmarks is complex for large angular head poses in RGB images, and RGBD cameras are unsuitable for outdoor scenes. We propose a simple and effective annotation method for the head pose in RGB images. The novelty method uses a 3D virtual human head to simulate the head pose in the RGB image. The Euler angle can be calculated from the change in coordinates of the 3D virtual head. We then create a dataset using our annotation method: 2DHeadPose dataset, which contains a rich set of attributes, dimensions, and angles. Finally, we propose Gaussian label smoothing to suppress annotation noises and reflect inter-class relationships. A baseline approach is established using Gaussian label smoothing. Experiments demonstrate that our annotation method, datasets, and Gaussian label smoothing are very effective. Our baseline approach surpasses most current state-of-the-art methods. The annotation tool, dataset, and source code are publicly available at https://github.com/youngnuaa/2DHeadPose.
Collapse
Affiliation(s)
- Yang Wang
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Wanlin Zhou
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Jiakai Zhou
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
| |
Collapse
|
12
|
Cross-language font style transfer. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04375-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
AbstractIn this paper, we propose a cross-language font style transfer system that can synthesize a new font by observing only a few samples from another language. Automatic font synthesis is a challenging task and has attracted much research interest. Most previous works addressed this problem by transferring the style of the given subset to the content of unseen ones. Nevertheless, they only focused on the font style transfer in the same language. In many cases, we need to learn font style from one language and then apply it to other languages. Existing methods make this difficult to accomplish because of the abstraction of style and language differences. To address this problem, we specifically designed the network into a multi-level attention form to capture both local and global features of the font style. To validate the generative ability of our model, we constructed an experimental font dataset of 847 fonts, each containing English and Chinese characters with the same style. Results show that our model generates 80.3% of users’ preferred images compared with state-of-the-art models.
Collapse
|
13
|
Zhang J, Liu K, Yang X, Ju H, Xu S. Multi-label learning with Relief-based label-specific feature selection. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04350-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
14
|
Modeling opponent learning in multiagent repeated games. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04249-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
AbstractMultiagent reinforcement learning (MARL) has been used extensively in the game environment. One of the main challenges in MARL is that the environment of the agent system is dynamic, and the other agents are also updating their strategies. Therefore, modeling the opponents’ learning process and adopting specific strategies to shape learning is an effective way to obtain better training results. Previous studies such as DRON, LOLA and SOS approximated the opponent’s learning process and gave effective applications. However, these studies modeled only transient changes in opponent strategies and lacked stability in the improvement of equilibrium efficiency. In this article, we design the MOL (modeling opponent learning) method based on the Stackelberg game. We use best response theory to approximate the opponents’ preferences for different actions and explore stable equilibrium with higher rewards. We find that MOL achieves better results in several games with classical structures (the Prisoner’s Dilemma, Stackelberg Leader game and Stag Hunt with 3 players), and in randomly generated bimatrix games. MOL performs well in competitive games played against different opponents and converges to stable points that score above the Nash equilibrium in repeated game environments. The results may provide a reference for the definition of equilibrium in multiagent reinforcement learning systems, and contribute to the design of learning objectives in MARL to avoid local disadvantageous equilibrium and improve general efficiency.
Collapse
|
15
|
An Q, Wang H, Chen X. EPSDNet: Efficient Campus Parking Space Detection via Convolutional Neural Networks and Vehicle Image Recognition for Intelligent Human-Computer Interactions. SENSORS (BASEL, SWITZERLAND) 2022; 22:9835. [PMID: 36560202 PMCID: PMC9781189 DOI: 10.3390/s22249835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/06/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
The parking problem, which is caused by a low parking space utilization ratio, has always plagued drivers. In this work, we proposed an intelligent detection method based on deep learning technology. First, we constructed a TensorFlow deep learning platform for detecting vehicles. Second, the optimal time interval for extracting video stream images was determined in accordance with the judgment time for finding a parking space and the length of time taken by a vehicle from arrival to departure. Finally, the parking space order and number were obtained in accordance with the data layering method and the TimSort algorithm, and parking space vacancy was judged via the indirect Monte Carlo method. To improve the detection accuracy between vehicles and parking spaces, the distance between the vehicles in the training dataset was greater than that of the vehicles observed during detection. A case study verified the reliability of the parking space order and number and the judgment of parking space vacancies.
Collapse
Affiliation(s)
- Qing An
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China
| | - Haojun Wang
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China
| | - Xijiang Chen
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China
| |
Collapse
|
16
|
An Q, Wu S, Shi R, Wang H, Yu J, Li Z. Intelligent Detection of Hazardous Goods Vehicles and Determination of Risk Grade Based on Deep Learning. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22197123. [PMID: 36236221 PMCID: PMC9571748 DOI: 10.3390/s22197123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 08/26/2022] [Accepted: 08/26/2022] [Indexed: 05/20/2023]
Abstract
Currently, deep learning has been widely applied in the field of object detection, and some relevant scholars have applied it to vehicle detection. In this paper, the deep learning EfficientDet model is analyzed, and the advantages of the model in the detection of hazardous good vehicles are determined. The adaptive training model is built based on the optimization of the training process, and the training model is used to detect hazardous goods vehicles. The detection results are compared with Cascade R-CNN and CenterNet, and the results show that the proposed method is superior to the other two methods in two aspects of computational complexity and detection accuracy. Simultaneously, the proposed method is suitable for the detection of hazardous goods vehicles in different scenarios. We make statistics on the number of detected hazardous goods vehicles at different times and places. The risk grade of different locations is determined according to the statistical results. Finally, the case study shows that the proposed method can be used to detect hazardous goods vehicles and determine the risk level of different places.
Collapse
Affiliation(s)
- Qing An
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China
| | - Shisong Wu
- China Railway Wuhan Survey and Design Institute Co., Ltd., Building E5, Optics Valley Software Park, No. 1, Guanshan Avenue, Donghu High-Tech Zone, Wuhan 430050, China
| | - Ruizhe Shi
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China
- Correspondence:
| | - Haojun Wang
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China
| | - Jun Yu
- USTC iFLYTEK Co., Ltd., Hefei 230088, China
| | - Zhifeng Li
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China
| |
Collapse
|
17
|
Hu M, Wei Y, Li M, Yao H, Deng W, Tong M, Liu Q. Bimodal Learning Engagement Recognition from Videos in the Classroom. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22165932. [PMID: 36015693 PMCID: PMC9415674 DOI: 10.3390/s22165932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 07/27/2022] [Accepted: 07/27/2022] [Indexed: 06/12/2023]
Abstract
Engagement plays an essential role in the learning process. Recognition of learning engagement in the classroom helps us understand the student's learning state and optimize the teaching and study processes. Traditional recognition methods such as self-report and teacher observation are time-consuming and obtrusive to satisfy the needs of large-scale classrooms. With the development of big data analysis and artificial intelligence, applying intelligent methods such as deep learning to recognize learning engagement has become the research hotspot in education. In this paper, based on non-invasive classroom videos, first, a multi-cues classroom learning engagement database was constructed. Then, we introduced the power IoU loss function to You Only Look Once version 5 (YOLOv5) to detect the students and obtained a precision of 95.4%. Finally, we designed a bimodal learning engagement recognition method based on ResNet50 and CoAtNet. Our proposed bimodal learning engagement method obtained an accuracy of 93.94% using the KNN classifier. The experimental results confirmed that the proposed method outperforms most state-of-the-art techniques.
Collapse
Affiliation(s)
- Meijia Hu
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
- Huanggang High School of Hubei Province, Huanggang 438000, China
| | - Yantao Wei
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Mengsiying Li
- School of Management, Wuhan College, Wuhan 430212, China
| | - Huang Yao
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Wei Deng
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Mingwen Tong
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Qingtang Liu
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| |
Collapse
|
18
|
Wang H, Xie X, Zhou L. Transform networks for cooperative multi-agent deep reinforcement learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03924-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
19
|
Event-Related Potentials during Verbal Recognition of Naturalistic Neutral-to-Emotional Dynamic Facial Expressions. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Event-related potentials during facial emotion recognition have been studied for more than twenty years. Nowadays, there has been a growing interest in the use of naturalistic stimuli. This research was aimed, therefore, at studying event-related potentials (ERP) during recognition of dynamic facial neutral-to-emotional expressions, more ecologically valid than static faces. We recorded the ERP of 112 participants who watched 144 dynamic morphs depicting a gradual change from a neutral expression to a basic emotional expression (anger, disgust, fear, happiness, sadness and surprise) and labelled those emotions verbally. We revealed some typical ERP, like N170, P2, EPN and LPP. Participants with lower accuracy exhibited a larger posterior P2. Participants with faster correct responses exhibited a larger amplitude of P2 and LPP. We also conducted a classification analysis that yielded the accuracy of 76% for prediction of participants who recognise emotions quickly on the basis of the amplitude of posterior P2 and LPP. These results extend data from previous research about the electroencephalographic correlates of facial emotion recognition.
Collapse
|
20
|
Requirements Engineering for Internet of Things (loT) Software Systems Development: A Systematic Mapping Study. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157582] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
The Internet of Things (IoT) paradigm is growing, affecting human life and aiming to solve problems in the real world, i.e., in education, healthcare, smart homes, intelligent transportation, and other areas. However, it is a fact that the development of IoT systems is complicated compared to that of traditional software systems, especially in relation to requirements engineering (RE). The RE of IoT systems is not implemented frequently due to their broad aspects, such as the variety of user needs, making these systems difficult to construct. In this sense, the use of loT-based systems has not been well explored by the research community in order to provide well-planned proposals to improve the quality of their performance. In this work, we present a comprehensive and inclusive review of the RE of loT-based systems. To accomplish this, a systematic mapping study (SMS) is presented to evaluate the use of parameters based on the existing literature. SMS is a methodology used for research in the medical field and has recently been implemented in software engineering (SE) to sort and organize research publications to gain knowledge on progress and identify research gaps. In this article, we aim to classify the existing research publications in the current scientific literature regarding RE proposals for IoT software systems and review their implications for future research. This will make it possible to establish lines of research in order to improve the quality of the development of future IoT systems.
Collapse
|
21
|
The Application of Adaptive Tolerance and Serialized Facial Feature Extraction to Automatic Attendance Systems. ELECTRONICS 2022. [DOI: 10.3390/electronics11142278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The aim of this study was to develop a real-time automatic attendance system (AAS) based on Internet of Things (IoT) technology and facial recognition. A Raspberry Pi camera built into a Raspberry Pi 3B is used to transfer facial images to a cloud server. Face detection and recognition libraries are implemented on this cloud server, which thus can handle all the processes involved with the automatic recording of student attendance. In addition, this study proposes the application of data serialization processing and adaptive tolerance vis-à-vis Euclidean distance. The facial features encountered are processed using data serialization before they are saved in the SQLite database; such serialized data can easily be written and then read back from the database. When examining the differences between the facial features already stored in the SQLite databases and any new facial features, the proposed adaptive tolerance system can improve the performance of the facial recognition method applying Euclidean distance. The results of this study show that the proposed AAS can recognize multiple faces and so record attendance automatically. The AAS proposed in this study can assist in the detection of students who attempt to skip classes without the knowledge of their teachers. The problem of students being unintentionally marked present, though absent, and the problem of proxies is also resolved.
Collapse
|
22
|
An Improved Tiered Head Pose Estimation Network with Self-Adjust Loss Function. ENTROPY 2022; 24:e24070974. [PMID: 35885197 PMCID: PMC9320982 DOI: 10.3390/e24070974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 07/11/2022] [Accepted: 07/12/2022] [Indexed: 12/10/2022]
Abstract
As an important task in computer vision, head pose estimation has been widely applied in both academia and industry. However, there remains two challenges in the field of head pose estimation: (1) even given the same task (e.g., tiredness detection), the existing algorithms usually consider the estimation of the three angles (i.e., roll, yaw, and pitch) as separate facets, which disregard their interplay as well as differences and thus share the same parameters for all layers; and (2) the discontinuity in angle estimation definitely reduces the accuracy. To solve these two problems, a THESL-Net (tiered head pose estimation with self-adjust loss network) model is proposed in this study. Specifically, first, an idea of stepped estimation using distinct network layers is proposed, gaining a greater freedom during angle estimation. Furthermore, the reasons for the discontinuity in angle estimation are revealed, including not only labeling the dataset with quaternions or Euler angles, but also the loss function that simply adds the classification and regression losses. Subsequently, a self-adjustment constraint on the loss function is applied, making the angle estimation more consistent. Finally, to examine the influence of different angle ranges on the proposed model, experiments are conducted on three popular public benchmark datasets, BIWI, AFLW2000, and UPNA, demonstrating that the proposed model outperforms the state-of-the-art approaches.
Collapse
|
23
|
Exploring the Effects of Caputo Fractional Derivative in Spiking Neural Network Training. ELECTRONICS 2022. [DOI: 10.3390/electronics11142114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Fractional calculus is an emerging topic in artificial neural network training, especially when using gradient-based methods. This paper brings the idea of fractional derivatives to spiking neural network training using Caputo derivative-based gradient calculation. We focus on conducting an extensive investigation of performance improvements via a case study of small-scale networks using derivative orders in the unit interval. With particle swarm optimization we provide an example of handling the derivative order as an optimizable hyperparameter to find viable values for it. Using multiple benchmark datasets we empirically show that there is no single generally optimal derivative order, rather this value is data-dependent. However, statistics show that a range of derivative orders can be determined where the Caputo derivative outperforms first-order gradient descent with high confidence. Improvements in convergence speed and training time are also examined and explained by the reformulation of the Caputo derivative-based training as an adaptive weight normalization technique.
Collapse
|
24
|
Li X, Wei Y, Wang C, Hu Q, Liu C. Contextual-wise discriminative feature extraction and robust network learning for subcortical structure segmentation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
25
|
Chen X, Yang R, Guo C, Zhang Q. FOC winding defect detection based on improved texture features and low-rank representation model. APPLIED OPTICS 2022; 61:5599-5607. [PMID: 36255787 DOI: 10.1364/ao.453251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 06/05/2022] [Indexed: 06/16/2023]
Abstract
The defect detection of fiber-optic coils (FOCs) plays an important role in the quality control of the FOC production. In order to overcome the problems of poor performance and low reliability of existing methods, this paper provides a solution for winding defect detection of FOCs based on low-rank representation (LRR) technology. First, we design a feature matrix, which represents the image. Then the LRR model is employed to formulate the defect detection task as a problem of low rank and sparse matrix decomposition. Meanwhile, Laplacian regularization is introduced as a smoothness constraint to expand the distance between defect regions and low-rank background. Experiments are performed on a real dataset to verify the algorithm. The results show that the proposed winding defect detection method of FOCs achieves the highest detection accuracy and lowest false alarm rate compared to other methods, verifying the effectiveness of the proposed method.
Collapse
|
26
|
A cascaded spatiotemporal attention network for dynamic facial expression recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03781-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Hu K, Jin J, Zheng F, Weng L, Ding Y. Overview of behavior recognition based on deep learning. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10210-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
28
|
Security Risk Intelligent Assessment of Power Distribution Internet of Things via Entropy-Weight Method and Cloud Model. SENSORS 2022; 22:s22134663. [PMID: 35808160 PMCID: PMC9268771 DOI: 10.3390/s22134663] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/15/2022] [Accepted: 06/15/2022] [Indexed: 11/16/2022]
Abstract
The current power distribution Internet of Things (PDIoT) lacks security protection terminals and techniques. Network security has a large exposure surface that can be attacked from multiple paths. In addition, there are many network security vulnerabilities and weak security protection capabilities of power distribution Internet of Things terminals. Therefore, it is crucial to conduct a scientific assessment of the security of PDIoT. However, traditional security assessment methods are relatively subjective and ambiguous. To address the problems, we propose to use the entropy-weight method and cloud model theory to assess the security risk of the PDIoT. We first analyze the factors of security risks in PDIoT systems and establish a three-layer PDIoT security evaluation index system, including a perception layer, network layer, and application layer. The index system has three first-level indicators and sixteen second-level indicators. Then, the entropy-weight method is used to optimize the weight of each index. Additionally, the cloud model theory is employed to calculate the affiliation degree and eigenvalue of each evaluation index. Based on a comprehensive analysis of all evaluation indexes, we can achieve the security level of PDIoT. Taking the PDIoT of Meizhou Power Supply Bureau of Guangdong Power Grid as an example for empirical testing, the experimental results show that the evaluation results are consistent with the actual situation, which proves that the proposed method is effective and feasible.
Collapse
|
29
|
Intelligent Scheduling Methodology for UAV Swarm Remote Sensing in Distributed Photovoltaic Array Maintenance. SENSORS 2022; 22:s22124467. [PMID: 35746248 PMCID: PMC9229532 DOI: 10.3390/s22124467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/11/2022] [Indexed: 12/10/2022]
Abstract
In recent years, the unmanned aerial vehicle (UAV) remote sensing technology has been widely used in the planning, design and maintenance of urban distributed photovoltaic arrays (UDPA). However, the existing studies rarely concern the UAV swarm scheduling problem when applied to remoting sensing in UDPA maintenance. In this study, a novel scheduling model and algorithm for UAV swarm remote sensing in UDPA maintenance are developed. Firstly, the UAV swarm scheduling tasks in UDPA maintenance are described as a large-scale global optimization (LSGO) problem, in which the constraints are defined as penalty functions. Secondly, an adaptive multiple variable-grouping optimization strategy including adaptive random grouping, UAV grouping and task grouping is developed. Finally, a novel evolutionary algorithm, namely cooperatively coevolving particle swarm optimization with adaptive multiple variable-grouping and context vector crossover/mutation strategies (CCPSO-mg-cvcm), is developed in order to effectively optimize the aforementioned UAV swarm scheduling model. The results of the case study show that the developed CCPSO-mg-cvcm significantly outperforms the existing algorithms, and the UAV swarm remote sensing in large-scale UDPA maintenance can be optimally scheduled by the developed methodology.
Collapse
|
30
|
Multi-Output Sequential Deep Learning Model for Athlete Force Prediction on a Treadmill Using 3D Markers. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115424] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Reliable and innovative methods for estimating forces are critical aspects of biomechanical sports research. Using them, athletes can improve their performance and technique and reduce the possibility of fractures and other injuries. For this purpose, throughout this project, we proceeded to research the use of video in biomechanics. To refine this method, we propose an RNN trained on a biomechanical dataset of regular runners that measures both kinematics and kinetics. The model will allow analyzing, extracting, and drawing conclusions about continuous variable predictions through the body. It marks different anatomical and reflective points (96 in total, 32 per dimension) that will allow the prediction of forces (N) in three dimensions (Fx, Fy, Fz), measured on a treadmill with a force plate at different velocities (2.5 m/s, 3.5 m/s, 4.5 m/s). In order to obtain the best model, a grid search of different parameters that combined various types of layers (Simple, GRU, LSTM), loss functions (MAE, MSE, MSLE), and sampling techniques (down-sampling, up-sampling) helped obtain the best performing model (LSTM, MSE, down-sampling) achieved an average coefficient of determination of 0.68, although when excluding Fz it reached 0.92.
Collapse
|
31
|
Salient and consensus representation learning based incomplete multiview clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03530-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
32
|
Optimal Query Expansion Based on Hybrid Group Mean Enhanced Chimp Optimization Using Iterative Deep Learning. ELECTRONICS 2022. [DOI: 10.3390/electronics11101556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The internet is surrounded by uncertain information which necessitates the usage of natural language processing and soft computing techniques to extract the relevant documents. The relevant results are retrieved using the query expansion technique which is mainly formulated using the machine learning or deep learning concepts in the existing literature. This paper presents a hybrid group mean-based optimizer-enhanced chimp optimization (GMBO-ECO) algorithm for pseudo-relevance-based query expansion, whereby the actual queries are expanded with their related keywords. The hybrid GMBO-ECO algorithm mainly expands the query based on the terms that have a strong interrelationship with the actual query. To generate the word embeddings, a Word2Vec paradigm is used which learns the word association from large text corpora. The useful context in the text is identified using the improved iterative deep learning framework which determines the user’s intent for the current web search. This step reduces the mismatch of the words and improves the performance of query retrieval. The weak terms are eliminated and the candidate query terms for optimal query expansion are improved via an Okapi measure and cosine similarity techniques. The proposed methodology has been compared to the state-of-the-art methods with and without a query expansion approach. Moreover, the proposed optimal query expansion technique has shown a substantial improvement in terms of a normalized discounted cumulative gain of 0.87, a mean average precision of 0.35, and a mean reciprocal rank of 0.95. The experimental results show the efficiency of the proposed methodology in retrieving the appropriate response for information retrieval. The most common applications for the proposed method are search engines.
Collapse
|
33
|
Zhang Y, Chen W. Decision-level information fusion powered human pose estimation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03623-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
34
|
BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083933] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Sign language recognition is one of the most challenging applications in machine learning and human-computer interaction. Many researchers have developed classification models for different sign languages such as English, Arabic, Japanese, and Bengali; however, no significant research has been done on the general-shape performance for different datasets. Most research work has achieved satisfactory performance with a small dataset. These models may fail to replicate the same performance for evaluating different and larger datasets. In this context, this paper proposes a novel method for recognizing Bengali sign language (BSL) alphabets to overcome the issue of generalization. The proposed method has been evaluated with three benchmark datasets such as ‘38 BdSL’, ‘KU-BdSL’, and ‘Ishara-Lipi’. Here, three steps are followed to achieve the goal: segmentation, augmentation, and Convolutional neural network (CNN) based classification. Firstly, a concatenated segmentation approach with YCbCr, HSV and watershed algorithm was designed to accurately identify gesture signs. Secondly, seven image augmentation techniques are selected to increase the training data size without changing the semantic meaning. Finally, the CNN-based model called BenSignNet was applied to extract the features and classify purposes. The performance accuracy of the model achieved 94.00%, 99.60%, and 99.60% for the BdSL Alphabet, KU-BdSL, and Ishara-Lipi datasets, respectively. Experimental findings confirmed that our proposed method achieved a higher recognition rate than the conventional ones and accomplished a generalization property in all datasets for the BSL domain.
Collapse
|
35
|
An Q, Chen X, Zhang J, Shi R, Yang Y, Huang W. A Robust Fire Detection Model via Convolution Neural Networks for Intelligent Robot Vision Sensing. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22082929. [PMID: 35458913 PMCID: PMC9025736 DOI: 10.3390/s22082929] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 03/30/2022] [Accepted: 04/08/2022] [Indexed: 05/29/2023]
Abstract
Accurate fire identification can help to control fires. Traditional fire detection methods are mainly based on temperature or smoke detectors. These detectors are susceptible to damage or interference from the outside environment. Meanwhile, most of the current deep learning methods are less discriminative with respect to dynamic fire and have lower detection precision when a fire changes. Therefore, we propose a dynamic convolution YOLOv5 fire detection method using a video sequence. Our method first uses the K-mean++ algorithm to optimize anchor box clustering; this significantly reduces the rate of classification error. Then, the dynamic convolution is introduced into the convolution layer of YOLOv5. Finally, pruning of the network heads of YOLOv5's neck and head is carried out to improve the detection speed. Experimental results verify that the proposed dynamic convolution YOLOv5 fire detection method demonstrates better performance than the YOLOv5 method in recall, precision and F1-score. In particular, compared with three other deep learning methods, the precision of the proposed algorithm is improved by 13.7%, 10.8% and 6.1%, respectively, while the F1-score is improved by 15.8%, 12% and 3.8%, respectively. The method described in this paper is applicable not only to short-range indoor fire identification but also to long-range outdoor fire detection.
Collapse
Affiliation(s)
- Qing An
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (Y.Y.); (W.H.)
| | - Xijiang Chen
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430079, China; (J.Z.); (R.S.)
| | - Junqian Zhang
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430079, China; (J.Z.); (R.S.)
| | - Ruizhe Shi
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430079, China; (J.Z.); (R.S.)
| | - Yuanjun Yang
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (Y.Y.); (W.H.)
| | - Wei Huang
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan 430223, China; (Q.A.); (Y.Y.); (W.H.)
| |
Collapse
|
36
|
Zheng B, Zhu Y, Shi Q, Yang D, Shao Y, Xu T. MA-Net:Mutex attention network for COVID-19 diagnosis on CT images. APPL INTELL 2022; 52:18115-18130. [PMID: 35431458 PMCID: PMC8994185 DOI: 10.1007/s10489-022-03431-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2022] [Indexed: 12/01/2022]
Abstract
COVID-19 is an infectious pneumonia caused by 2019-nCoV. The number of newly confirmed cases and confirmed deaths continues to remain at a high level. RT–PCR is the gold standard for the COVID-19 diagnosis, but the computed tomography (CT) imaging technique is an important auxiliary diagnostic tool. In this paper, a deep learning network mutex attention network (MA-Net) is proposed for COVID-19 auxiliary diagnosis on CT images. Using positive and negative samples as mutex inputs, the proposed network combines mutex attention block (MAB) and fusion attention block (FAB) for the diagnosis of COVID-19. MAB uses the distance between mutex inputs as a weight to make features more distinguishable for preferable diagnostic results. FAB acts to fuse features to obtain more representative features. Particularly, an adaptive weight multiloss function is proposed for better effect. The accuracy, specificity and sensitivity were reported to be as high as 98.17%, 97.25% and 98.79% on the COVID-19 dataset-A provided by the Affiliated Medical College of Qingdao University, respectively. State-of-the-art results have also been achieved on three other public COVID-19 datasets. The results show that compared with other methods, the proposed network can provide effective auxiliary information for the diagnosis of COVID-19 on CT images.
Collapse
Affiliation(s)
- BingBing Zheng
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237 China
| | - Yu Zhu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237 China
- Shanghai Engineering Research Center of Internet of Things for Respiratory Medicine, Shanghai, 200237 China
| | - Qin Shi
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237 China
| | - Dawei Yang
- Shanghai Engineering Research Center of Internet of Things for Respiratory Medicine, Shanghai, 200237 China
- Department of Pulmonary and Critical Care Medicine, Zhongshan Hospital, Fudan University, Shanghai, 200032 China
| | - Yanmei Shao
- Department of Pulmonary and Critical Care Medicine, the Affiliated Hospital of Qingdao University, Qingdao, Shandong 266000 China
| | - Tao Xu
- Department of Pulmonary and Critical Care Medicine, the Affiliated Hospital of Qingdao University, Qingdao, Shandong 266000 China
| |
Collapse
|
37
|
Priyanka, N S, Lal S, Nalini J, Reddy CS, Dell’Acqua F. DIResUNet: Architecture for multiclass semantic segmentation of high resolution remote sensing imagery data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03310-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
38
|
Bottom-up improved multistage temporal convolutional network for action segmentation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03382-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
39
|
Golwalkar R, Mehendale N. Masked-face recognition using deep metric learning and FaceMaskNet-21. APPL INTELL 2022; 52:13268-13279. [PMID: 35233149 PMCID: PMC8874736 DOI: 10.1007/s10489-021-03150-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/24/2021] [Indexed: 11/13/2022]
Abstract
The coronavirus disease 2019 (COVID-19) has made it mandatory for people all over the world to wear facial masks to prevent the spread of the virus. The conventional face recognition systems used for security purposes have become ineffective in the current situation since the face mask covers most of the important facial features such as nose, mouth, etc. making it very difficult to recognize the person. We have proposed a system that uses the deep metric learning technique and our own FaceMaskNet-21 deep learning network to produce 128-d encodings that help in the face recognition process from static images, live video streams, as well as, static video files. We achieved a testing accuracy of 88.92% with an execution time of fewer than 10 ms. The ability of the system to perform masked face recognition in real-time makes it suitable to recognize people in CCTV footage in places like malls, banks, ATMs, etc. Due to its fast performance, our system can be used in schools and colleges for attendance, as well as in banks and other high-security zones to grant access to only the authorized ones without asking them to remove the mask.
Collapse
|
40
|
Yi XL, Hua R, Fu Y, Zheng DL, Wang ZY. RNIC-A retrospect network for image captioning. Soft comput 2022. [DOI: 10.1007/s00500-021-06622-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
41
|
|
42
|
Xu L, Diao Z, Wei Y. Non-linear target trajectory prediction for robust visual tracking. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02829-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|