1
|
Razzaq MA, Hussain J, Bang J, Hua CH, Satti FA, Rehman UU, Bilal HSM, Kim ST, Lee S. A Hybrid Multimodal Emotion Recognition Framework for UX Evaluation Using Generalized Mixture Functions. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094373. [PMID: 37177574 PMCID: PMC10181635 DOI: 10.3390/s23094373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/03/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Multimodal emotion recognition has gained much traction in the field of affective computing, human-computer interaction (HCI), artificial intelligence (AI), and user experience (UX). There is growing demand to automate analysis of user emotion towards HCI, AI, and UX evaluation applications for providing affective services. Emotions are increasingly being used, obtained through the videos, audio, text or physiological signals. This has led to process emotions from multiple modalities, usually combined through ensemble-based systems with static weights. Due to numerous limitations like missing modality data, inter-class variations, and intra-class similarities, an effective weighting scheme is thus required to improve the aforementioned discrimination between modalities. This article takes into account the importance of difference between multiple modalities and assigns dynamic weights to them by adapting a more efficient combination process with the application of generalized mixture (GM) functions. Therefore, we present a hybrid multimodal emotion recognition (H-MMER) framework using multi-view learning approach for unimodal emotion recognition and introducing multimodal feature fusion level, and decision level fusion using GM functions. In an experimental study, we evaluated the ability of our proposed framework to model a set of four different emotional states (Happiness, Neutral, Sadness, and Anger) and found that most of them can be modeled well with significantly high accuracy using GM functions. The experiment shows that the proposed framework can model emotional states with an average accuracy of 98.19% and indicates significant gain in terms of performance in contrast to traditional approaches. The overall evaluation results indicate that we can identify emotional states with high accuracy and increase the robustness of an emotion classification system required for UX measurement.
Collapse
Affiliation(s)
- Muhammad Asif Razzaq
- Department of Computer Science, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan
- Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
| | - Jamil Hussain
- Department of Data Science, Sejong University, Seoul 30019, Republic of Korea
| | - Jaehun Bang
- Hanwha Corporation/Momentum, Hanwha Building, 86 Cheonggyecheon-ro, Jung-gu, Seoul 04541, Republic of Korea
| | - Cam-Hao Hua
- Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
| | - Fahad Ahmed Satti
- Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
- Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
| | - Ubaid Ur Rehman
- Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
- Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
| | - Hafiz Syed Muhammad Bilal
- Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
| | - Seong Tae Kim
- Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
| | - Sungyoung Lee
- Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
| |
Collapse
|
2
|
Lalitha S, Gupta D. Investigation of automatic mixed-lingual affective state recognition system for diverse Indian languages. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Automatic recognition of human affective state using speech has been the focus of the research world for more than two decades. In the present day, with multi-lingual countries like India and Europe, population are communicating in various languages. However, majority of the existing works have put forth different strategies to recognize affect from various databases, with each comprising single language recordings. There exists a great demand for affective systems to serve the context of mixed-language scenario. Hence, this work focusses on an effective methodology to recognize human affective state using speech samples from a mixed language framework. A unique cepstral and bi-spectral speech features derived from the speech samples classified using random forest (RF) are applied for the task. This work is first of its kind with the proposed approach validated and found to be effective on a self-recorded database with speech samples comprising from eleven various diverse Indian languages. Six different affective states of angry, fear, sad, neutral, surprise and happy are considered. Three affective models have been investigated in the work. The experimental results demonstrate the proposed feature combination in addition to data augmentation show enhanced affect recognition.
Collapse
Affiliation(s)
- S. Lalitha
- Department of Electronics and Communication Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
| | - Deepa Gupta
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
| |
Collapse
|
3
|
Noh KJ, Jeong CY, Lim J, Chung S, Kim G, Lim JM, Jeong H. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets. SENSORS 2021; 21:s21051579. [PMID: 33668254 PMCID: PMC7956608 DOI: 10.3390/s21051579] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 02/15/2021] [Accepted: 02/21/2021] [Indexed: 01/01/2023]
Abstract
Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.
Collapse
|
4
|
Sanchez-Comas A, Synnes K, Hallberg J. Hardware for Recognition of Human Activities: A Review of Smart Home and AAL Related Technologies. SENSORS (BASEL, SWITZERLAND) 2020; 20:E4227. [PMID: 32751345 PMCID: PMC7435866 DOI: 10.3390/s20154227] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 07/19/2020] [Accepted: 07/20/2020] [Indexed: 01/09/2023]
Abstract
Activity recognition (AR) from an applied perspective of ambient assisted living (AAL) and smart homes (SH) has become a subject of great interest. Promising a better quality of life, AR applied in contexts such as health, security, and energy consumption can lead to solutions capable of reaching even the people most in need. This study was strongly motivated because levels of development, deployment, and technology of AR solutions transferred to society and industry are based on software development, but also depend on the hardware devices used. The current paper identifies contributions to hardware uses for activity recognition through a scientific literature review in the Web of Science (WoS) database. This work found four dominant groups of technologies used for AR in SH and AAL-smartphones, wearables, video, and electronic components-and two emerging technologies: Wi-Fi and assistive robots. Many of these technologies overlap across many research works. Through bibliometric networks analysis, the present review identified some gaps and new potential combinations of technologies for advances in this emerging worldwide field and their uses. The review also relates the use of these six technologies in health conditions, health care, emotion recognition, occupancy, mobility, posture recognition, localization, fall detection, and generic activity recognition applications. The above can serve as a road map that allows readers to execute approachable projects and deploy applications in different socioeconomic contexts, and the possibility to establish networks with the community involved in this topic. This analysis shows that the research field in activity recognition accepts that specific goals cannot be achieved using one single hardware technology, but can be using joint solutions, this paper shows how such technology works in this regard.
Collapse
Affiliation(s)
- Andres Sanchez-Comas
- Department of Productivity and Innovation, Universidad de la Costa, Barranquilla 080 002, Colombia
| | - Kåre Synnes
- Department of Computer Science, Electrical and Space Engineering, Luleå Tekniska Universitet, 971 87 Luleå, Sweden;
| | - Josef Hallberg
- Department of Computer Science, Electrical and Space Engineering, Luleå Tekniska Universitet, 971 87 Luleå, Sweden;
| |
Collapse
|
5
|
A Robust Framework for Self-Care Problem Identification for Children with Disability. Symmetry (Basel) 2019. [DOI: 10.3390/sym11010089] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Recently, a standard dataset namely SCADI (Self-Care Activities Dataset) based on the International Classification of Functioning, Disability, and Health for Children and Youth framework for self-care problems identification of children with physical and motor disabilities was introduced. This is a very interesting, important and challenging topic due to its usefulness in medical diagnosis. This study proposes a robust framework using a sampling technique and extreme gradient boosting (FSX) to improve the prediction performance for the SCADI dataset. The proposed framework first converts the original dataset to a new dataset with a smaller number of dimensions. Then, our proposed framework balances the new dataset in the previous step using oversampling techniques with different ratios. Next, extreme gradient boosting was used to diagnose the problems. The experiments in terms of prediction performance and feature importance were conducted to show the effectiveness of FSX as well as to analyse the results. The experimental results show that FSX that uses the Synthetic Minority Over-sampling Technique (SMOTE) for the oversampling module outperforms the ANN (Artificial Neural Network) -based approach, Support vector machine (SVM) and Random Forest for the SCADI dataset. The overall accuracy of the proposed framework reaches 85.4%, a pretty high performance, which can be used for self-care problem classification in medical diagnosis.
Collapse
|