1
|
Salau AO, Tamiru NK, Abeje BT. Derived Amharic alphabet sign language recognition using machine learning methods. Heliyon 2024; 10:e38265. [PMID: 39386773 PMCID: PMC11462330 DOI: 10.1016/j.heliyon.2024.e38265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 09/19/2024] [Accepted: 09/20/2024] [Indexed: 10/12/2024] Open
Abstract
Hearing-impaired people use sign language as a means of communication with those with no hearing disability. It is therefore difficult to communicate with hearing impaired people without the expertise of a signer or knowledge of sign language. As a result, technologies that understands sign language are required to bridge the communication gap between those that have hearing impairments and those that dont. Ethiopian Amharic alphabets sign language (EAMASL) is different from other countries sign languages because Amharic Language is spoken in Ethiopia and has a number of complex alphabets. Presently in Ethiopia, just a few studies on AMASL have been conducted. Previous works, on the other hand, only worked on basic and a few derived Amharic alphabet signs. To solve this challenge, in this paper, we propose Machine Learning techniques such as Support Vector Machine (SVM) with Convolutional Neural Network (CNN), Histogram of Oriented Gradients (HOG), and their hybrid features to recognize the remaining derived Amharic alphabet signs. Because CNN is good for rotation and translation of signs, and HOG works well for low quality data under strong illumination variation and a small quantity of training data, the two have been combined for feature extraction. CNN (Softmax) was utilized as a classifier for normalized hybrid features in addition to SVM. SVM model using CNN, HOG, normalized, and non-normalized hybrid feature vectors achieved an accuracy of 89.02%, 95.42%, 97.40%, and 93.61% using 10-fold cross validation, respectively. With the normalized hybrid features, the other classifier, CNN (sofmax), produced a 93.55% accuracy.
Collapse
Affiliation(s)
- Ayodeji Olalekan Salau
- Department of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria
- Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, India
| | - Nigus Kefyalew Tamiru
- Department of Electrical and Computer Engineering, Debre Markos University, Debre Markos, Ethiopia
| | - Bekalu Tadele Abeje
- Bahir Dar Institute of Technology Department of Computer Science, Bahir Dar, Amhara, Ethiopia
- Department of Information Technology, Haramaya University, Dire Dawa, Ethiopia
| |
Collapse
|
2
|
Deepika D, Rekha G. A hybrid capsule attention-based convolutional bi-GRU method for multi-class mental task classification based brain-computer Interface. Comput Methods Biomech Biomed Engin 2024:1-17. [PMID: 39397592 DOI: 10.1080/10255842.2024.2410221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 02/23/2024] [Accepted: 09/25/2024] [Indexed: 10/15/2024]
Abstract
Electroencephalography analysis is critical for brain computer interface research. The primary goal of brain-computer interface is to establish communication between impaired people and others via brain signals. The classification of multi-level mental activities using the brain-computer interface has recently become more difficult, which affects the accuracy of the classification. However, several deep learning-based techniques have attempted to identify mental tasks using multidimensional data. The hybrid capsule attention-based convolutional bidirectional gated recurrent unit model was introduced in this study as a hybrid deep learning technique for multi-class mental task categorization. Initially, the obtained electroencephalography data is pre-processed with a digital low-pass Butterworth filter and a discrete wavelet transform to remove disturbances. The spectrally adaptive common spatial pattern is used to extract characteristics from pre-processed electroencephalography data. The retrieved features were then loaded into the suggested classification model, which was used to extract the features deeply and classify the mental tasks. To improve classification results, the model's parameters are fine-tuned using a dung beetle optimization approach. Finally, the proposed classifier is assessed for several types of mental task classification using the provided dataset. The simulation results are compared with the existing state-of-the-art techniques in terms of accuracy, precision, recall, etc. The accuracy obtained using the proposed approach is 97.87%, which is higher than that of the other existing methods.
Collapse
Affiliation(s)
- D Deepika
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Hyderabad, Telangana, 500075, India
- Department of Computer Science and Engineering, Mahatma Gandhi Institute of Technology, Hyderabad, Telangana, 500075, India
| | - G Rekha
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Hyderabad, Telangana, 500075, India
| |
Collapse
|
3
|
Zhang J, Bu X, Wang Y, Dong H, Zhang Y, Wu H. Sign language recognition based on dual-path background erasure convolutional neural network. Sci Rep 2024; 14:11360. [PMID: 38762676 PMCID: PMC11102471 DOI: 10.1038/s41598-024-62008-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 05/13/2024] [Indexed: 05/20/2024] Open
Abstract
Sign language is an important way to provide expression information to people with hearing and speaking disabilities. Therefore, sign language recognition has always been a very important research topic. However, many sign language recognition systems currently require complex deep models and rely on expensive sensors, which limits the application scenarios of sign language recognition. To address this issue, based on computer vision, this study proposed a lightweight, dual-path background erasing deep convolutional neural network (DPCNN) model for sign language recognition. The DPCNN consists of two paths. One path is used to learn the overall features, while the other path learns the background features. The background features are gradually subtracted from the overall features to obtain an effective representation of hand features. Then, these features are flatten into a one-dimensional layer, and pass through a fully connected layer with an output unit of 128. Finally, use a fully connected layer with an output unit of 24 as the output layer. Based on the ASL Finger Spelling dataset, the total accuracy and Macro-F1 scores of the proposed method is 99.52% and 0.997, respectively. More importantly, the proposed method can be applied to small terminals, thereby improving the application scenarios of sign language recognition. Through experimental comparison, the dual path background erasure network model proposed in this paper has better generalization ability.
Collapse
Affiliation(s)
- Junming Zhang
- School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China
- Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China
| | - Xiaolong Bu
- School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China
- Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China
| | - Yushuai Wang
- School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China
- Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China
- School of Computer Science, Zhongyuan University of Technology, Xinzheng, 450007, Henan, China
| | - Hao Dong
- School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China
- Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China
- School of Computer Science, Zhongyuan University of Technology, Xinzheng, 450007, Henan, China
| | - Yu Zhang
- School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China
- Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China
| | - Haitao Wu
- School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China.
- Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China.
| |
Collapse
|
4
|
Kakizaki M, Miah ASM, Hirooka K, Shin J. Dynamic Japanese Sign Language Recognition Throw Hand Pose Estimation Using Effective Feature Extraction and Classification Approach. SENSORS (BASEL, SWITZERLAND) 2024; 24:826. [PMID: 38339542 PMCID: PMC10857289 DOI: 10.3390/s24030826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
Japanese Sign Language (JSL) is vital for communication in Japan's deaf and hard-of-hearing community. But probably because of the large number of patterns, 46 types, there is a mixture of static and dynamic, and the dynamic ones have been excluded in most studies. Few researchers have been working to develop a dynamic JSL alphabet, and their performance accuracy is unsatisfactory. We proposed a dynamic JSL recognition system using effective feature extraction and feature selection approaches to overcome the challenges. In the procedure, we follow the hand pose estimation, effective feature extraction, and machine learning techniques. We collected a video dataset capturing JSL gestures through standard RGB cameras and employed MediaPipe for hand pose estimation. Four types of features were proposed. The significance of these features is that the same feature generation method can be used regardless of the number of frames or whether the features are dynamic or static. We employed a Random forest (RF) based feature selection approach to select the potential feature. Finally, we fed the reduced features into the kernels-based Support Vector Machine (SVM) algorithm classification. Evaluations conducted on our proprietary newly created dynamic Japanese sign language alphabet dataset and LSA64 dynamic dataset yielded recognition accuracies of 97.20% and 98.40%, respectively. This innovative approach not only addresses the complexities of JSL but also holds the potential to bridge communication gaps, offering effective communication for the deaf and hard-of-hearing, and has broader implications for sign language recognition systems globally.
Collapse
Affiliation(s)
| | | | | | - Jungpil Shin
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan; (M.K.); (A.S.M.M.); (K.H.)
| |
Collapse
|
5
|
Pathan RK, Biswas M, Yasmin S, Khandaker MU, Salman M, Youssef AAF. Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network. Sci Rep 2023; 13:16975. [PMID: 37813932 PMCID: PMC10562485 DOI: 10.1038/s41598-023-43852-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 09/29/2023] [Indexed: 10/11/2023] Open
Abstract
Sign Language Recognition is a breakthrough for communication among deaf-mute society and has been a critical research topic for years. Although some of the previous studies have successfully recognized sign language, it requires many costly instruments including sensors, devices, and high-end processing power. However, such drawbacks can be easily overcome by employing artificial intelligence-based techniques. Since, in this modern era of advanced mobile technology, using a camera to take video or images is much easier, this study demonstrates a cost-effective technique to detect American Sign Language (ASL) using an image dataset. Here, "Finger Spelling, A" dataset has been used, with 24 letters (except j and z as they contain motion). The main reason for using this dataset is that these images have a complex background with different environments and scene colors. Two layers of image processing have been used: in the first layer, images are processed as a whole for training, and in the second layer, the hand landmarks are extracted. A multi-headed convolutional neural network (CNN) model has been proposed and tested with 30% of the dataset to train these two layers. To avoid the overfitting problem, data augmentation and dynamic learning rate reduction have been used. With the proposed model, 98.981% test accuracy has been achieved. It is expected that this study may help to develop an efficient human-machine communication system for a deaf-mute society.
Collapse
Affiliation(s)
- Refat Khan Pathan
- Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, 47500, Bandar Sunway, Selangor, Malaysia
| | - Munmun Biswas
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong, 4381, Bangladesh
| | - Suraiya Yasmin
- Department of Computer and Information Science, Graduate School of Engineering, Tokyo University of Agriculture and Technology, Koganei, Tokyo, 184-0012, Japan
| | - Mayeen Uddin Khandaker
- Centre for Applied Physics and Radiation Technologies, School of Engineering and Technology, Sunway University, 47500, Bandar Sunway, Selangor, Malaysia.
- Faculty of Graduate Studies, Daffodil International University, Daffodil Smart City, Birulia, Savar, Dhaka, 1216, Bangladesh.
| | - Mohammad Salman
- College of Engineering and Technology, American University of the Middle East, Egaila, Kuwait
| | - Ahmed A F Youssef
- College of Engineering and Technology, American University of the Middle East, Egaila, Kuwait
| |
Collapse
|
6
|
Tan CK, Lim KM, Chang RKY, Lee CP, Alqahtani A. HGR-ViT: Hand Gesture Recognition with Vision Transformer. SENSORS (BASEL, SWITZERLAND) 2023; 23:5555. [PMID: 37420722 DOI: 10.3390/s23125555] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/20/2023] [Accepted: 05/22/2023] [Indexed: 07/09/2023]
Abstract
Hand gesture recognition (HGR) is a crucial area of research that enhances communication by overcoming language barriers and facilitating human-computer interaction. Although previous works in HGR have employed deep neural networks, they fail to encode the orientation and position of the hand in the image. To address this issue, this paper proposes HGR-ViT, a Vision Transformer (ViT) model with an attention mechanism for hand gesture recognition. Given a hand gesture image, it is first split into fixed size patches. Positional embedding is added to these embeddings to form learnable vectors that capture the positional information of the hand patches. The resulting sequence of vectors are then served as the input to a standard Transformer encoder to obtain the hand gesture representation. A multilayer perceptron head is added to the output of the encoder to classify the hand gesture to the correct class. The proposed HGR-ViT obtains an accuracy of 99.98%, 99.36% and 99.85% for the American Sign Language (ASL) dataset, ASL with Digits dataset, and National University of Singapore (NUS) hand gesture dataset, respectively.
Collapse
Affiliation(s)
- Chun Keat Tan
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia
| | - Kian Ming Lim
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia
| | - Roy Kwang Yang Chang
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia
| | - Chin Poo Lee
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia
| | - Ali Alqahtani
- Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia
- Center for Artificial Intelligence (CAI), King Khalid University, Abha 61421, Saudi Arabia
| |
Collapse
|
7
|
Eunice J, J A, Sei Y, Hemanth DJ. Sign2Pose: A Pose-Based Approach for Gloss Prediction Using a Transformer Model. SENSORS (BASEL, SWITZERLAND) 2023; 23:2853. [PMID: 36905057 PMCID: PMC10007493 DOI: 10.3390/s23052853] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/25/2023] [Accepted: 02/28/2023] [Indexed: 06/18/2023]
Abstract
Word-level sign language recognition (WSLR) is the backbone for continuous sign language recognition (CSLR) that infers glosses from sign videos. Finding the relevant gloss from the sign sequence and detecting explicit boundaries of the glosses from sign videos is a persistent challenge. In this paper, we propose a systematic approach for gloss prediction in WLSR using the Sign2Pose Gloss prediction transformer model. The primary goal of this work is to enhance WLSR's gloss prediction accuracy with reduced time and computational overhead. The proposed approach uses hand-crafted features rather than automated feature extraction, which is computationally expensive and less accurate. A modified key frame extraction technique is proposed that uses histogram difference and Euclidean distance metrics to select and drop redundant frames. To enhance the model's generalization ability, pose vector augmentation using perspective transformation along with joint angle rotation is performed. Further, for normalization, we employed YOLOv3 (You Only Look Once) to detect the signing space and track the hand gestures of the signers in the frames. The proposed model experiments on WLASL datasets achieved the top 1% recognition accuracy of 80.9% in WLASL100 and 64.21% in WLASL300. The performance of the proposed model surpasses state-of-the-art approaches. The integration of key frame extraction, augmentation, and pose estimation improved the performance of the proposed gloss prediction model by increasing the model's precision in locating minor variations in their body posture. We observed that introducing YOLOv3 improved gloss prediction accuracy and helped prevent model overfitting. Overall, the proposed model showed 17% improved performance in the WLASL 100 dataset.
Collapse
Affiliation(s)
- Jennifer Eunice
- Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore 641114, India
| | - Andrew J
- Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - Yuichi Sei
- Department of Informatics, The University of Electro-Communications, Tokyo 182-8585, Japan
| | - D. Jude Hemanth
- Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore 641114, India
| |
Collapse
|
8
|
Xia K, Lu W, Fan H, Zhao Q. A Sign Language Recognition System Applied to Deaf-Mute Medical Consultation. SENSORS (BASEL, SWITZERLAND) 2022; 22:9107. [PMID: 36501809 PMCID: PMC9739223 DOI: 10.3390/s22239107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/10/2022] [Accepted: 11/20/2022] [Indexed: 06/17/2023]
Abstract
It is an objective reality that deaf-mute people have difficulty seeking medical treatment. Due to the lack of sign language interpreters, most hospitals in China currently do not have the ability to interpret sign language. Normal medical treatment is a luxury for deaf people. In this paper, we propose a sign language recognition system: Heart-Speaker. Heart-Speaker is applied to a deaf-mute consultation scenario. The system provides a low-cost solution for the difficult problem of treating deaf-mute patients. The doctor only needs to point the Heart-Speaker at the deaf patient and the system automatically captures the sign language movements and translates the sign language semantics. When a doctor issues a diagnosis or asks a patient a question, the system displays the corresponding sign language video and subtitles to meet the needs of two-way communication between doctors and patients. The system uses the MobileNet-YOLOv3 model to recognize sign language. It meets the needs of running on embedded terminals and provides favorable recognition accuracy. We performed experiments to verify the accuracy of the measurements. The experimental results show that the accuracy rate of Heart-Speaker in recognizing sign language can reach 90.77%.
Collapse
Affiliation(s)
| | - Weiwei Lu
- Correspondence: ; Tel.: +86-13671637275
| | | | | |
Collapse
|
9
|
Watanobe Y, Rahman MM, Amin MFI, Kabir R. Identifying algorithm in program code based on structural features using CNN classification model. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04078-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractIn software, an algorithm is a well-organized sequence of actions that provides the optimal way to complete a task. Algorithmic thinking is also essential to break-down a problem and conceptualize solutions in some steps. The proper selection of an algorithm is pivotal to improve computational performance and software productivity as well as to programming learning. That is, determining a suitable algorithm from a given code is widely relevant in software engineering and programming education. However, both humans and machines find it difficult to identify algorithms from code without any meta-information. This study aims to propose a program code classification model that uses a convolutional neural network (CNN) to classify codes based on the algorithm. First, program codes are transformed into a sequence of structural features (SFs). Second, SFs are transformed into a one-hot binary matrix using several procedures. Third, different structures and hyperparameters of the CNN model are fine-tuned to identify the best model for the code classification task. To do so, 61,614 real-world program codes of different types of algorithms collected from an online judge system are used to train, validate, and evaluate the model. Finally, the experimental results show that the proposed model can identify algorithms and classify program codes with a high percentage of accuracy. The average precision, recall, and F-measure scores of the best CNN model are 95.65%, 95.85%, and 95.70%, respectively, indicating that it outperforms other baseline models.
Collapse
|
10
|
Amangeldy N, Kudubayeva S, Kassymova A, Karipzhanova A, Razakhova B, Kuralov S. Sign Language Recognition Method Based on Palm Definition Model and Multiple Classification. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22176621. [PMID: 36081076 PMCID: PMC9460639 DOI: 10.3390/s22176621] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/28/2022] [Accepted: 08/29/2022] [Indexed: 06/01/2023]
Abstract
Technologies for pattern recognition are used in various fields. One of the most relevant and important directions is the use of pattern recognition technology, such as gesture recognition, in socially significant tasks, to develop automatic sign language interpretation systems in real time. More than 5% of the world's population-about 430 million people, including 34 million children-are deaf-mute and not always able to use the services of a living sign language interpreter. Almost 80% of people with a disabling hearing loss live in low- and middle-income countries. The development of low-cost systems of automatic sign language interpretation, without the use of expensive sensors and unique cameras, would improve the lives of people with disabilities, contributing to their unhindered integration into society. To this end, in order to find an optimal solution to the problem, this article analyzes suitable methods of gesture recognition in the context of their use in automatic gesture recognition systems, to further determine the most optimal methods. From the analysis, an algorithm based on the palm definition model and linear models for recognizing the shapes of numbers and letters of the Kazakh sign language are proposed. The advantage of the proposed algorithm is that it fully recognizes 41 letters of the 42 in the Kazakh sign alphabet. Until this time, only Russian letters in the Kazakh alphabet have been recognized. In addition, a unified function has been integrated into our system to configure the frame depth map mode, which has improved recognition performance and can be used to create a multimodal database of video data of gesture words for the gesture recognition system.
Collapse
Affiliation(s)
- Nurzada Amangeldy
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan
| | - Saule Kudubayeva
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan
| | - Akmaral Kassymova
- Institute of Economics, Information Technologies and Professional Education, Zangir Khan West Kazakhstan Agrarion-Technical University, Uralsk 090000, Kazakhstan
| | - Ardak Karipzhanova
- Department of Information and Technical Sciences, Faculty of Information Technologies and Economics, Kazakh Humanitarian Law Innovative University, East Kazakhstan Region, Semey 701400, Kazakhstan
| | - Bibigul Razakhova
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan
| | - Serikbay Kuralov
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan
| |
Collapse
|
11
|
Backhand-Approach-Based American Sign Language Words Recognition Using Spatial-Temporal Body Parts and Hand Relationship Patterns. SENSORS 2022; 22:s22124554. [PMID: 35746330 PMCID: PMC9228298 DOI: 10.3390/s22124554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 11/17/2022]
Abstract
Most of the existing methods focus mainly on the extraction of shape-based, rotation-based, and motion-based features, usually neglecting the relationship between hands and body parts, which can provide significant information to address the problem of similar sign words based on the backhand approach. Therefore, this paper proposes four feature-based models. The spatial–temporal body parts and hand relationship patterns are the main feature. The second model consists of the spatial–temporal finger joint angle patterns. The third model consists of the spatial–temporal 3D hand motion trajectory patterns. The fourth model consists of the spatial–temporal double-hand relationship patterns. Then, a two-layer bidirectional long short-term memory method is used to deal with time-independent data as a classifier. The performance of the method was evaluated and compared with the existing works using 26 ASL letters, with an accuracy and F1-score of 97.34% and 97.36%, respectively. The method was further evaluated using 40 double-hand ASL words and achieved an accuracy and F1-score of 98.52% and 98.54%, respectively. The results demonstrated that the proposed method outperformed the existing works under consideration. However, in the analysis of 72 new ASL words, including single- and double-hand words from 10 participants, the accuracy and F1-score were approximately 96.99% and 97.00%, respectively.
Collapse
|
12
|
Chen M, Xu L, Liu Y, Yu M, Li Y, Ye TT. An All-Fabric Tactile-Sensing Keypad with Uni-Modal and Ultrafast Response/Recovery Time for Smart Clothing Applications. ACS APPLIED MATERIALS & INTERFACES 2022; 14:24946-24954. [PMID: 35593079 DOI: 10.1021/acsami.2c04246] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Keypads constructed from fabric materials are the ideal input devices for smart clothing applications. However, multi-modal reaction problems have to be addressed before they can be of practical use on apparels, i.e., the fabric-based keypads need to distinguish between the legitimate actions by the fingertips and the illegitimate deformations and stresses caused by human movements. In this paper, we propose to use the humidity sensor functionalized from graphene oxide (GO)-coated polyester fibers to construct the e-textile keypads. As the moisture level in the proximity of human fingertips is much higher (over 70%) than other parts of the human body, humidity sensing has many advantages over other tactility mechanisms. Experiments have demonstrated that the GO-functionalized fabric keypad has a stable uni-modal tactility only to fingertip touches, and it is not sensitive to deformation, pressure, temperature variation, and other ambient interferences. With biasing and sensing circuits, the keypad exhibits a quick response and recovery time (around 0.1 s), comparable to mechanical keyboards. To demonstrate its application on smart clothing, the keypad was sewn on a sweater and embroidered conductive yarns were used to control an MP3 player in the pocket.
Collapse
Affiliation(s)
- Mingxun Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Lulu Xu
- Department of Materials, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Yulong Liu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Mengxia Yu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Department of Electrical and Computer Engineering, National University of Singapore, 117583 Singapore
| | - Yi Li
- Department of Materials, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Terry Tao Ye
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| |
Collapse
|
13
|
BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083933] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Sign language recognition is one of the most challenging applications in machine learning and human-computer interaction. Many researchers have developed classification models for different sign languages such as English, Arabic, Japanese, and Bengali; however, no significant research has been done on the general-shape performance for different datasets. Most research work has achieved satisfactory performance with a small dataset. These models may fail to replicate the same performance for evaluating different and larger datasets. In this context, this paper proposes a novel method for recognizing Bengali sign language (BSL) alphabets to overcome the issue of generalization. The proposed method has been evaluated with three benchmark datasets such as ‘38 BdSL’, ‘KU-BdSL’, and ‘Ishara-Lipi’. Here, three steps are followed to achieve the goal: segmentation, augmentation, and Convolutional neural network (CNN) based classification. Firstly, a concatenated segmentation approach with YCbCr, HSV and watershed algorithm was designed to accurately identify gesture signs. Secondly, seven image augmentation techniques are selected to increase the training data size without changing the semantic meaning. Finally, the CNN-based model called BenSignNet was applied to extract the features and classify purposes. The performance accuracy of the model achieved 94.00%, 99.60%, and 99.60% for the BdSL Alphabet, KU-BdSL, and Ishara-Lipi datasets, respectively. Experimental findings confirmed that our proposed method achieved a higher recognition rate than the conventional ones and accomplished a generalization property in all datasets for the BSL domain.
Collapse
|
14
|
Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11094164] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Using gestures can help people with certain disabilities in communicating with other people. This paper proposes a lightweight model based on YOLO (You Only Look Once) v3 and DarkNet-53 convolutional neural networks for gesture recognition without additional preprocessing, image filtering, and enhancement of images. The proposed model achieved high accuracy even in a complex environment, and it successfully detected gestures even in low-resolution picture mode. The proposed model was evaluated on a labeled dataset of hand gestures in both Pascal VOC and YOLO format. We achieved better results by extracting features from the hand and recognized hand gestures of our proposed YOLOv3 based model with accuracy, precision, recall, and an F-1 score of 97.68, 94.88, 98.66, and 96.70%, respectively. Further, we compared our model with Single Shot Detector (SSD) and Visual Geometry Group (VGG16), which achieved an accuracy between 82 and 85%. The trained model can be used for real-time detection, both for static hand images and dynamic gestures recorded on a video.
Collapse
|
15
|
Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities. SUSTAINABILITY 2021. [DOI: 10.3390/su13052961] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Due to the constantly increasing demand for the automatic localization of landmarks in hand gesture recognition, there is a need for a more sustainable, intelligent, and reliable system for hand gesture recognition. The main purpose of this study was to develop an accurate hand gesture recognition system that is capable of error-free auto-landmark localization of any gesture dateable in an RGB image. In this paper, we propose a system based on landmark extraction from RGB images regardless of the environment. The extraction of gestures is performed via two methods, namely, fused and directional image methods. The fused method produced greater extracted gesture recognition accuracy. In the proposed system, hand gesture recognition (HGR) is done via several different methods, namely, (1) HGR via point-based features, which consist of (i) distance features, (ii) angular features, and (iii) geometric features; (2) HGR via full hand features, which are composed of (i) SONG mesh geometry and (ii) active model. To optimize these features, we applied gray wolf optimization. After optimization, a reweighted genetic algorithm was used for classification and gesture recognition. Experimentation was performed on five challenging datasets: Sign Word, Dexter1, Dexter + Object, STB, and NYU. Experimental results proved that auto landmark localization with the proposed feature extraction technique is an efficient approach towards developing a robust HGR system. The classification results of the reweighted genetic algorithm were compared with Artificial Neural Network (ANN) and decision tree. The developed system plays a significant role in healthcare muscle exercise.
Collapse
|
16
|
Multi-Stroke Thai Finger-Spelling Sign Language Recognition System with Deep Learning. Symmetry (Basel) 2021. [DOI: 10.3390/sym13020262] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Sign language is a type of language for the hearing impaired that people in the general public commonly do not understand. A sign language recognition system, therefore, represents an intermediary between the two sides. As a communication tool, a multi-stroke Thai finger-spelling sign language (TFSL) recognition system featuring deep learning was developed in this study. This research uses a vision-based technique on a complex background with semantic segmentation performed with dilated convolution for hand segmentation, hand strokes separated using optical flow, and learning feature and classification done with convolution neural network (CNN). We then compared the five CNN structures that define the formats. The first format was used to set the number of filters to 64 and the size of the filter to 3 × 3 with 7 layers; the second format used 128 filters, each filter 3 × 3 in size with 7 layers; the third format used the number of filters in ascending order with 7 layers, all of which had an equal 3 × 3 filter size; the fourth format determined the number of filters in ascending order and the size of the filter based on a small size with 7 layers; the final format was a structure based on AlexNet. As a result, the average accuracy was 88.83%, 87.97%, 89.91%, 90.43%, and 92.03%, respectively. We implemented the CNN structure based on AlexNet to create models for multi-stroke TFSL recognition systems. The experiment was performed using an isolated video of 42 Thai alphabets, which are divided into three categories consisting of one stroke, two strokes, and three strokes. The results presented an 88.00% average accuracy for one stroke, 85.42% for two strokes, and 75.00% for three strokes.
Collapse
|
17
|
Smoke Object Segmentation and the Dynamic Growth Feature Model for Video-Based Smoke Detection Systems. Symmetry (Basel) 2020. [DOI: 10.3390/sym12071075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This article concerns smoke detection in the early stages of a fire. Using the computer-aided system, the efficient and early detection of smoke may stop a massive fire incident. Without considering the multiple moving objects on background and smoke particles analysis (i.e., pattern recognition), smoke detection models show suboptimal performance. To address this, this paper proposes a hybrid smoke segmentation and an efficient symmetrical simulation model of dynamic smoke to extract a smoke growth feature based on temporal frames from a video. In this model, smoke is segmented from the multi-moving object on the complex background using the Gaussian’s Mixture Model (GMM) and HSV (hue-saturation-value) color segmentation to encounter the candidate smoke and non-smoke regions in the preprocessing stage. The preprocessed temporal frames with moving smoke are analyzed by the dynamic smoke growth analysis and spatial-temporal frame energy feature extraction model. In dynamic smoke growth analysis, the temporal frames are segmented in blocks and the smoke growth representations are formulated from corresponding blocks. Finally, the classifier was trained using the extracted features to classify and detect smoke using a Radial Basis Function (RBF) non-linear Gaussian kernel-based binary Support Vector Machine (SVM). For validating the proposed smoke detection model, multi-conditional video clips are used. The experimental results suggest that the proposed model outperforms state-of-the-art algorithms.
Collapse
|
18
|
Abstract
Nowadays, gesture-based technology is revolutionizing the world and lifestyles, and the users are comfortable and care about their needs, for example, in communication, information security, the convenience of day-to-day operations and so forth. In this case, hand movement information provides an alternative way for users to interact with people, machines or robots. Therefore, this paper presents a character input system using a virtual keyboard based on the analysis of hand movements. We analyzed the signals of the accelerometer, gyroscope, and electromyography (EMG) for movement activity. We explored potential features of removing noise from input signals through the wavelet denoising technique. The envelope spectrum is used for the analysis of the accelerometer and gyroscope and cepstrum for the EMG signal. Furthermore, the support vector machine (SVM) is used to train and detect the signal to perform character input. In order to validate the proposed model, signal information is obtained from predefined gestures, that is, “double-tap”, “hold-fist”, “wave-left”, “wave-right” and “spread-finger” of different respondents for different input actions such as “input a character”, “change character”, “delete a character”, “line break”, “space character”. The experimental results show the superiority of hand gesture recognition and accuracy of character input compared to state-of-the-art systems.
Collapse
|
19
|
A Dynamic Gesture Recognition Interface for Smart Home Control based on Croatian Sign Language. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10072300] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Deaf and hard-of-hearing people are facing many challenges in everyday life. Their communication is based on the use of a sign language, and the ability of the cultural/social environment to fully understand such a language defines whether or not it will be accessible for them. Technology is a key factor that has the potential to provide solutions to achieve a higher accessibility and therefore improve the quality of life of deaf and hard-of-hearing people. In this paper, we introduce a smart home automatization system specifically designed to provide real-time sign language recognition. The contribution of this paper implies several elements. Novel hierarchical architecture is presented, including resource-and-time-aware modules—a wake-up module and high-performance sign recognition module based on the Conv3D network. To achieve high-performance classification, multi-modal fusion of RGB and depth modality was used with the temporal alignment. Then, a small Croatian sign language database containing 25 different language signs for the use in smart home environment was created in collaboration with the deaf community. The system was deployed on a Nvidia Jetson TX2 embedded system with StereoLabs ZED M stereo camera for online testing. Obtained results demonstrate that the proposed practical solution is a viable approach for real-time smart home control.
Collapse
|