1
|
Sato W. Advancements in Sensors and Analyses for Emotion Sensing. SENSORS (BASEL, SWITZERLAND) 2024; 24:4166. [PMID: 39000945 PMCID: PMC11244073 DOI: 10.3390/s24134166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/16/2024]
Abstract
Exploring the objective signals associated with subjective emotional states has practical significance [...].
Collapse
Affiliation(s)
- Wataru Sato
- Psychological Process Team, Guardian Robot Project, RIKEN, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
| |
Collapse
|
2
|
Elyoseph Z, Refoua E, Asraf K, Lvovsky M, Shimoni Y, Hadar-Shoval D. Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study. JMIR Ment Health 2024; 11:e54369. [PMID: 38319707 PMCID: PMC10879976 DOI: 10.2196/54369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/09/2023] [Accepted: 12/25/2023] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. OBJECTIVE The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. METHODS The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. RESULTS ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. CONCLUSIONS ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Educational Psychology, The Center for Psychobiological Research, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Imperial College London, London, United Kingdom
| | - Elad Refoua
- Department of Psychology, Bar-Ilan University, Ramat Gan, Israel
| | - Kfir Asraf
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Maya Lvovsky
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Yoav Shimoni
- Boston Children's Hospital, Boston, MA, United States
| | - Dorit Hadar-Shoval
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| |
Collapse
|
3
|
Mejia-Escobar C, Cazorla M, Martinez-Martin E. Towards a Better Performance in Facial Expression Recognition: A Data-Centric Approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:1394882. [PMID: 37954097 PMCID: PMC10637848 DOI: 10.1155/2023/1394882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 09/16/2022] [Indexed: 11/14/2023]
Abstract
Facial expression is the best evidence of our emotions. Its automatic detection and recognition are key for robotics, medicine, healthcare, education, psychology, sociology, marketing, security, entertainment, and many other areas. Experiments in the lab environments achieve high performance. However, in real-world scenarios, it is challenging. Deep learning techniques based on convolutional neural networks (CNNs) have shown great potential. Most of the research is exclusively model-centric, searching for better algorithms to improve recognition. However, progress is insufficient. Despite being the main resource for automatic learning, few works focus on improving the quality of datasets. We propose a novel data-centric method to tackle misclassification, a problem commonly encountered in facial image datasets. The strategy is to progressively refine the dataset by successive training of a CNN model that is fixed. Each training uses the facial images corresponding to the correct predictions of the previous training, allowing the model to capture more distinctive features of each class of facial expression. After the last training, the model performs automatic reclassification of the whole dataset. Unlike other similar work, our method avoids modifying, deleting, or augmenting facial images. Experimental results on three representative datasets proved the effectiveness of the proposed method, improving the validation accuracy by 20.45%, 14.47%, and 39.66%, for FER2013, NHFI, and AffectNet, respectively. The recognition rates on the reclassified versions of these datasets are 86.71%, 70.44%, and 89.17% and become state-of-the-art performance.
Collapse
Affiliation(s)
| | - Miguel Cazorla
- Institute for Computer Research, University of Alicante, P.O. Box 99. 03080, Alicante, Spain
| | - Ester Martinez-Martin
- Institute for Computer Research, University of Alicante, P.O. Box 99. 03080, Alicante, Spain
| |
Collapse
|
4
|
Li Y, Huang J, Lu S, Zhang Z, Lu G. Cross-Domain Facial Expression Recognition via Contrastive Warm up and Complexity-Aware Self-Training. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5438-5450. [PMID: 37773906 DOI: 10.1109/tip.2023.3318955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
Unsupervised cross-domain Facial Expression Recognition (FER) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain. Existing methods strive to reduce the discrepancy between source and target domain, but cannot effectively explore the abundant semantic information of the target domain due to the absence of target labels. To this end, we propose a novel framework via Contrastive Warm up and Complexity-aware Self-Training (namely CWCST), which facilitates source knowledge transfer and target semantic learning jointly. Specifically, we formulate a contrastive warm up strategy via features, momentum features, and learnable category centers to concurrently learn discriminative representations and narrow the domain gap, which benefits domain adaptation by generating more accurate target pseudo labels. Moreover, to deal with the inevitable noise in pseudo labels, we develop complexity-aware self-training with a label selection module based on prediction entropy, which iteratively generates pseudo labels and adaptively chooses the reliable ones for training, ultimately yielding effective target semantics exploration. Furthermore, by jointly using the two mentioned components, our framework enables to effectively utilize the source knowledge and target semantic information by source-target co- training. In addition, our framework can be easily incorporated into other baselines with consistent performance improvements. Extensive experimental results on seven databases show the superior performance of the proposed method against various baselines.
Collapse
|
5
|
Wang JZ, Zhao S, Wu C, Adams RB, Newman MG, Shafir T, Tsachor R. Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion: Drawing Insights From Psychology, Engineering, and the Arts, This Article Provides a Comprehensive Overview of the Field of Emotion Analysis in Visual Media and Discusses the Latest Research, Systems, Challenges, Ethical Implications, and Potential Impact of Artificial Emotional Intelligence on Society. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2023; 111:1236-1286. [PMID: 37859667 PMCID: PMC10586271 DOI: 10.1109/jproc.2023.3273517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion," coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.
Collapse
Affiliation(s)
- James Z Wang
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Sicheng Zhao
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China
| | - Chenyan Wu
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Reginald B Adams
- Department of Psychology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Michelle G Newman
- Department of Psychology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Tal Shafir
- Emily Sagol Creative Arts Therapies Research Center, University of Haifa, Haifa 3498838, Israel
| | - Rachelle Tsachor
- School of Theatre and Music, University of Illinois at Chicago, Chicago, IL 60607 USA
| |
Collapse
|
6
|
Ma Y, Shen J, Zhao Z, Liang H, Tan Y, Liu Z, Qian K, Yang M, Hu B. What Can Facial Movements Reveal? Depression Recognition and Analysis Based on Optical Flow Using Bayesian Networks. IEEE Trans Neural Syst Rehabil Eng 2023; 31:3459-3468. [PMID: 37581961 DOI: 10.1109/tnsre.2023.3305351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Recent evidence have demonstrated that facial expressions could be a valid and important aspect for depression recognition. Although various works have been achieved in automatic depression recognition, it is a challenge to explore the inherent nuances of facial expressions that might reveal the underlying differences between depressed patients and healthy subjects under different stimuli. There is a lack of an undisturbed system that monitors depressive patients' mental states in various free-living scenarios, so this paper steps towards building a classification model where data collection, feature extraction, depression recognition and facial actions analysis are conducted to infer the differences of facial movements between depressive patients and healthy subjects. In this study, we firstly present a plan of dividing facial regions of interest to extract optical flow features of facial expressions for depression recognition. We then propose facial movements coefficients utilising discrete wavelet transformation. Specifically, Bayesian Networks equipped with construction of Pearson Correlation Coefficients based on discrete wavelet transformation is learnt, which allows for analysing movements of different facial regions. We evaluate our method on a clinically validated dataset of 30 depressed patients and 30 healthy control subjects, and experiments results obtained the accuracy and recall of 81.7%, 96.7%, respectively, outperforming other features for comparison. Most importantly, the Bayesian Networks we built on the coefficients under different stimuli may reveal some facial action patterns of depressed subjects, which have a potential to assist the automatic diagnosis of depression.
Collapse
|
7
|
Klingner CM, Guntinas-Lichius O. Facial expression and emotion. Laryngorhinootologie 2023; 102:S115-S125. [PMID: 37130535 PMCID: PMC10171334 DOI: 10.1055/a-2003-5687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Human facial expressions are unique in their ability to express our emotions and communicate them to others. The mimic expression of basic emotions is very similar across different cultures and has also many features in common with other mammals. This suggests a common genetic origin of the association between facial expressions and emotion. However, recent studies also show cultural influences and differences. The recognition of emotions from facial expressions, as well as the process of expressing one's emotions facially, occurs within an extremely complex cerebral network. Due to the complexity of the cerebral processing system, there are a variety of neurological and psychiatric disorders that can significantly disrupt the coupling of facial expressions and emotions. Wearing masks also limits our ability to convey and recognize emotions through facial expressions. Through facial expressions, however, not only "real" emotions can be expressed, but also acted ones. Thus, facial expressions open up the possibility of faking socially desired expressions and also of consciously faking emotions. However, these pretenses are mostly imperfect and can be accompanied by short-term facial movements that indicate the emotions that are actually present (microexpressions). These microexpressions are of very short duration and often barely perceptible by humans, but they are the ideal application area for computer-aided analysis. This automatic identification of microexpressions has not only received scientific attention in recent years, but its use is also being tested in security-related areas. This article summarizes the current state of knowledge of facial expressions and emotions.
Collapse
Affiliation(s)
- Carsten M Klingner
- Hans Berger Department of Neurology, Jena University Hospital, Germany
- Biomagnetic Center, Jena University Hospital, Germany
| | | |
Collapse
|
8
|
SoftClusterMix: learning soft boundaries for empirical risk minimization. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08338-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
9
|
Liu P, Lin Y, Meng Z, Lu L, Deng W, Zhou JT, Yang Y. Point Adversarial Self-Mining: A Simple Method for Facial Expression Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12649-12660. [PMID: 34197333 DOI: 10.1109/tcyb.2021.3085744] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we propose a simple yet effective approach, called point adversarial self mining (PASM), to improve the recognition accuracy in facial expression recognition (FER). Unlike previous works focusing on designing specific architectures or loss functions to solve this problem, PASM boosts the network capability by simulating human learning processes: providing updated learning materials and guidance from more capable teachers. Specifically, to generate new learning materials, PASM leverages a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task, generating harder learning samples to refine the network. The searched position is highly adaptive since it considers both the statistical information of each sample and the teacher network capability. Other than being provided new learning materials, the student network also receives guidance from the teacher network. After the student network finishes training, the student network changes its role and acts as a teacher, generating new learning materials and providing stronger guidance to train a better student network. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively. Extensive experimental results validate the efficacy of our method over the existing state of the arts for FER.
Collapse
|
10
|
Pan H, Xie L, Wang Z. Spatio-temporal convolutional emotional attention network for spotting macro- and micro-expression intervals in long video sequences. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Zhao S, Yao X, Yang J, Jia G, Ding G, Chua TS, Schuller BW, Keutzer K. Affective Image Content Analysis: Two Decades Review and New Perspectives. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:6729-6751. [PMID: 34214034 DOI: 10.1109/tpami.2021.3094362] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges - the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
Collapse
|
12
|
Franěk M, Petružálek J, Šefara D. Facial Expressions and Self-Reported Emotions When Viewing Nature Images. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:10588. [PMID: 36078304 PMCID: PMC9518385 DOI: 10.3390/ijerph191710588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 08/16/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
Many studies have demonstrated that exposure to simulated natural scenes has positive effects on emotions and reduces stress. In the present study, we investigated emotional facial expressions while viewing images of various types of natural environments. Both automated facial expression analysis by iMotions' AFFDEX 8.1 software (iMotions, Copenhagen, Denmark) and self-reported emotions were analyzed. Attractive and unattractive natural images were used, representing either open or closed natural environments. The goal was to further understand the actual features and characteristics of natural scenes that could positively affect emotional states and to evaluate face reading technology to measure such effects. It was predicted that attractive natural scenes would evoke significantly higher levels of positive emotions than unattractive scenes. The results showed generally small values of emotional facial expressions while observing the images. The facial expression of joy was significantly higher than that of other registered emotions. Contrary to predictions, there was no difference between facial emotions while viewing attractive and unattractive scenes. However, the self-reported emotions evoked by the images showed significantly larger differences between specific categories of images in accordance with the predictions. The differences between the registered emotional facial expressions and self-reported emotions suggested that the participants more likely described images in terms of common stereotypes linked with the beauty of natural environments. This result might be an important finding for further methodological considerations.
Collapse
|
13
|
Devi B, Preetha MMSJ. An Innovative Facial Emotion Recognition Model Enabled by Optimal Feature Selection Using Firefly Plus Jaya Algorithm. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2022. [DOI: 10.4018/ijsir.304399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper intents to develop an intelligent facial emotion recognition model by following four major processes like (a) Face detection (b) Feature extraction (c) Optimal feature selection and (d) Classification. In the face detection model, the face of the human is detected using the viola-Jones method. Then, the resultant face detected image is subjected to feature extraction via (a) LBP (b) DWT (c) GLCM. Further, the length of the features is large in size and hence it is essential to choose the most relevant features from the extracted image. The optimally chosen features are classified using NN. The outcome of NN portrays the type of emotions like Normal, disgust, fear, angry, smile, surprise or sad. As a novelty, this research work enhances the classification accuracy of the facial emotions by selecting the optimal features as well as optimizing the weight of NN. These both tasks are accomplished by hybridizing the concept of FF and JA together referred as MF-JFF. The resultant of NN is the accurate recognized facial emotion and the whole model is simply referred as MF-JFF-NN.
Collapse
Affiliation(s)
- Bhagyashri Devi
- Department of ECE, Noorul Islam Centre for Higher Education, India
| | | |
Collapse
|
14
|
Facial Emotion Expressions in Human–Robot Interaction: A Survey. Int J Soc Robot 2022. [DOI: 10.1007/s12369-022-00867-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractFacial expressions are an ideal means of communicating one’s emotions or intentions to others. This overview will focus on human facial expression recognition as well as robotic facial expression generation. In the case of human facial expression recognition, both facial expression recognition on predefined datasets as well as in real-time will be covered. For robotic facial expression generation, hand-coded and automated methods i.e., facial expressions of a robot are generated by moving the features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques, will also be covered. There are already plenty of studies that achieve high accuracy for emotion expression recognition on predefined datasets, but the accuracy for facial expression recognition in real-time is comparatively lower. In the case of expression generation in robots, while most of the robots are capable of making basic facial expressions, there are not many studies that enable robots to do so automatically. In this overview, state-of-the-art research in facial emotion expressions during human–robot interaction has been discussed leading to several possible directions for future research.
Collapse
|
15
|
Methods for Facial Expression Recognition with Applications in Challenging Situations. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9261438. [PMID: 35665283 PMCID: PMC9159845 DOI: 10.1155/2022/9261438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 04/12/2022] [Accepted: 04/18/2022] [Indexed: 11/17/2022]
Abstract
In the last few years, a great deal of interesting research has been achieved on automatic facial emotion recognition (FER). FER has been used in a number of ways to make human-machine interactions better, including human center computing and the new trends of emotional artificial intelligence (EAI). Researchers in the EAI field aim to make computers better at predicting and analyzing the facial expressions and behavior of human under different scenarios and cases. Deep learning has had the greatest influence on such a field since neural networks have evolved significantly in recent years, and accordingly, different architectures are being developed to solve more and more difficult problems. This article will address the latest advances in computational intelligence-related automated emotion recognition using recent deep learning models. We show that both deep learning-based FER and models that use architecture-related methods, such as databases, can collaborate well in delivering highly accurate results.
Collapse
|
16
|
Pons G, Masip D. Multitask, Multilabel, and Multidomain Learning With Convolutional Networks for Emotion Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4764-4771. [PMID: 33306479 DOI: 10.1109/tcyb.2020.3036935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in deep learning have assumed a significant breakthrough in this topic, strong changes in pose, orientation, and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly and the current state-of-the-art deep learning algorithms cannot model all the aforementioned difficulties. In this article, we propose applying a multitask learning loss function to share a common feature representation with other related tasks. Particularly, we show that emotion recognition benefits from jointly learning a model with a detector of facial action units (collective muscle movements). The proposed loss function addresses the problem of learning multiple tasks with heterogeneously labeled data, improving previous multitask approaches. We validate the proposal using three datasets acquired in noncontrolled environments, and an application to predict compound facial emotion expressions.
Collapse
|
17
|
Qu X, Zou Z, Su X, Zhou P, Wei W, Wen S, Wu D. Attend to Where and When: Cascaded Attention Network for Facial Expression Recognition. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2022. [DOI: 10.1109/tetci.2021.3070713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Xiaoye Qu
- Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Hubei, China
| | - Zhikang Zou
- Department of Computer Vision Technology, Baidu Inc., Beijing, China
| | - Xinxing Su
- Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Hubei, China
| | - Pan Zhou
- Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Hubei, China
| | - Wei Wei
- School of Computer Science and Engineering, Huazhong University of Science and Technology, Hubei, China
| | - Shiping Wen
- Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Dapeng Wu
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA
| |
Collapse
|
18
|
Marceddu AC, Pugliese L, Sini J, Espinosa GR, Amel Solouki M, Chiavassa P, Giusto E, Montrucchio B, Violante M, De Pace F. A Novel Redundant Validation IoT System for Affective Learning Based on Facial Expressions and Biological Signals. SENSORS (BASEL, SWITZERLAND) 2022; 22:2773. [PMID: 35408387 PMCID: PMC9003217 DOI: 10.3390/s22072773] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 11/16/2022]
Abstract
Teaching is an activity that requires understanding the class's reaction to evaluate the teaching methodology effectiveness. This operation can be easy to achieve in small classrooms, while it may be challenging to do in classes of 50 or more students. This paper proposes a novel Internet of Things (IoT) system to aid teachers in their work based on the redundant use of non-invasive techniques such as facial expression recognition and physiological data analysis. Facial expression recognition is performed using a Convolutional Neural Network (CNN), while physiological data are obtained via Photoplethysmography (PPG). By recurring to Russel's model, we grouped the most important Ekman's facial expressions recognized by CNN into active and passive. Then, operations such as thresholding and windowing were performed to make it possible to compare and analyze the results from both sources. Using a window size of 100 samples, both sources have detected a level of attention of about 55.5% for the in-presence lectures tests. By comparing results coming from in-presence and pre-recorded remote lectures, it is possible to note that, thanks to validation with physiological data, facial expressions alone seem useful in determining students' level of attention for in-presence lectures.
Collapse
Affiliation(s)
- Antonio Costantino Marceddu
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Luigi Pugliese
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Jacopo Sini
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Gustavo Ramirez Espinosa
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
- Electronics Department, Engineering School, Pontificia Universidad Javeriana, Bogota 1301, Colombia
| | - Mohammadreza Amel Solouki
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Pietro Chiavassa
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Edoardo Giusto
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Bartolomeo Montrucchio
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Massimo Violante
- Department of Control and Computer Engineering, Politecnico di Torino, 10129 Turin, Italy; (A.C.M.); (L.P.); (G.R.E.); (M.A.S.); (P.C.); (E.G.); (B.M.); (M.V.)
| | - Francesco De Pace
- Institute of Visual Computing and Human-Centered Technology, Vienna University of Technology (TU Wien), 1040 Vienna, Austria;
| |
Collapse
|
19
|
Wang S, Ding H, Peng G. Dual Learning for Facial Action Unit Detection Under Nonfull Annotation. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2225-2237. [PMID: 32881700 DOI: 10.1109/tcyb.2020.3003502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Most methods for facial action unit (AU) recognition typically require training images that are fully AU labeled. Manual AU annotation is time intensive. To alleviate this, we propose a novel dual learning framework and apply it to AU detection under two scenarios, that is, semisupervised AU detection with partially AU-labeled and fully expression-labeled samples, and weakly supervised AU detection with fully expression-labeled samples alone. We leverage two forms of auxiliary information. The first is the probabilistic duality between the AU detection task and its dual task, in this case, the face synthesis task given AU labels. We also take advantage of the dependencies among multiple AUs, the dependencies between expression and AUs, and the dependencies between facial features and AUs. Specifically, the proposed method consists of a classifier, an image generator, and a discriminator. The classifier and generator yield face-AU-expression tuples, which are forced to coverage of the ground-truth distribution. This joint distribution also includes three kinds of inherent dependencies: 1) the dependencies among multiple AUs; 2) the dependencies between expression and AUs; and 3) the dependencies between facial features and AUs. We reconstruct the inputted face and AU labels and introduce two reconstruction losses. In a semisupervised scenario, the supervised loss is also incorporated into the full objective for AU-labeled samples. In a weakly supervised scenario, we generate pseudo paired data according to the domain knowledge about expression and AUs. Semisupervised and weakly supervised experiments on three widely used datasets demonstrate the superiority of the proposed method for AU detection and facial synthesis tasks over current works.
Collapse
|
20
|
Monteith S, Glenn T, Geddes J, Whybrow PC, Bauer M. Commercial Use of Emotion Artificial Intelligence (AI): Implications for Psychiatry. Curr Psychiatry Rep 2022; 24:203-211. [PMID: 35212918 DOI: 10.1007/s11920-022-01330-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/07/2022] [Indexed: 11/03/2022]
Abstract
PURPOSE OF REVIEW Emotion artificial intelligence (AI) is technology for emotion detection and recognition. Emotion AI is expanding rapidly in commercial and government settings outside of medicine, and will increasingly become a routine part of daily life. The goal of this narrative review is to increase awareness both of the widespread use of emotion AI, and of the concerns with commercial use of emotion AI in relation to people with mental illness. RECENT FINDINGS This paper discusses emotion AI fundamentals, a general overview of commercial emotion AI outside of medicine, and examples of the use of emotion AI in employee hiring and workplace monitoring. The successful re-integration of patients with mental illness into society must recognize the increasing commercial use of emotion AI. There are concerns that commercial use of emotion AI will increase stigma and discrimination, and have negative consequences in daily life for people with mental illness. Commercial emotion AI algorithm predictions about mental illness should not be treated as medical fact.
Collapse
Affiliation(s)
- Scott Monteith
- Michigan State University College of Human Medicine, Traverse City Campus, 1400 Medical Campus Drive, Traverse City, MI, 49684, USA.
| | - Tasha Glenn
- ChronoRecord Association, Fullerton, CA, USA
| | - John Geddes
- Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK
| | - Peter C Whybrow
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles (UCLA), Los Angeles, CA, USA
| | - Michael Bauer
- Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus Medical Faculty, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
21
|
Churamani N, Barros P, Gunes H, Wermter S. Affect-Driven Learning of Robot Behaviour for Collaborative Human-Robot Interactions. Front Robot AI 2022; 9:717193. [PMID: 35265672 PMCID: PMC8898942 DOI: 10.3389/frobt.2022.717193] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 01/17/2022] [Indexed: 11/29/2022] Open
Abstract
Collaborative interactions require social robots to share the users’ perspective on the interactions and adapt to the dynamics of their affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for affect-driven behaviour generation in social robots. The framework consists of (i) a hybrid neural model for evaluating facial expressions and speech of the users, forming intrinsic affective representations in the robot, (ii) an Affective Core, that employs self-organising neural models to embed behavioural traits like patience and emotional actuation that modulate the robot’s affective appraisal, and (iii) a Reinforcement Learning model that uses the robot’s appraisal to learn interaction behaviour. We investigate the effect of modelling different affective core dispositions on the affective appraisal and use this affective appraisal as the motivation to generate robot behaviours. For evaluation, we conduct a user study (n = 31) where the NICO robot acts as a proposer in the Ultimatum Game. The effect of the robot’s affective core on its negotiation strategy is witnessed by participants, who rank a patient robot with high emotional actuation higher on persistence, while an impatient robot with low emotional actuation is rated higher on its generosity and altruistic behaviour.
Collapse
Affiliation(s)
- Nikhil Churamani
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nikhil Churamani,
| | - Pablo Barros
- Cognitive Architecture for Collaborative Technologies (CONTACT) Unit, Istituto Italiano di Tecnologia, Genova, Italy
| | - Hatice Gunes
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Stefan Wermter
- Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany
| |
Collapse
|
22
|
Deep Neural Network Approach for Pose, Illumination, and Occlusion Invariant Driver Emotion Detection. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19042352. [PMID: 35206540 PMCID: PMC8871818 DOI: 10.3390/ijerph19042352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 02/02/2022] [Accepted: 02/15/2022] [Indexed: 11/24/2022]
Abstract
Monitoring drivers’ emotions is the key aspect of designing advanced driver assistance systems (ADAS) in intelligent vehicles. To ensure safety and track the possibility of vehicles’ road accidents, emotional monitoring will play a key role in justifying the mental status of the driver while driving the vehicle. However, the pose variations, illumination conditions, and occlusions are the factors that affect the detection of driver emotions from proper monitoring. To overcome these challenges, two novel approaches using machine learning methods and deep neural networks are proposed to monitor various drivers’ expressions in different pose variations, illuminations, and occlusions. We obtained the remarkable accuracy of 93.41%, 83.68%, 98.47%, and 98.18% for CK+, FER 2013, KDEF, and KMU-FED datasets, respectively, for the first approach and improved accuracy of 96.15%, 84.58%, 99.18%, and 99.09% for CK+, FER 2013, KDEF, and KMU-FED datasets respectively in the second approach, compared to the existing state-of-the-art methods.
Collapse
|
23
|
Bai W, Quan C, Luo ZW. Data-driven Dimensional Expression Generation via Encapsulated Variational Auto-Encoders. Cognit Comput 2022. [DOI: 10.1007/s12559-021-09973-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractConcerning facial expression generation, relying on the sheer volume of training data, recent advances on generative models allow high-quality generation of facial expressions free of the laborious facial expression annotating procedure. However, these generative processes have limited relevance to the psychological conceptualised dimensional plane, i.e., the Arousal-Valence two-dimensional plane, resulting in the generation of psychological uninterpretable facial expressions. For this, in this research, we seek to present a novel generative model, targeting learning the psychological compatible (low-dimensional) representations of facial expressions to permit the generation of facial expressions along the psychological conceptualised Arousal-Valence dimensions. To generate Arousal-Valence compatible facial expressions, we resort to a novel form of the data-driven generative model, i.e., the encapsulated variational auto-encoders (EVAE), which is consisted of two connected variational auto-encoders. Two harnessed variational auto-encoders in our EVAE model are concatenated with a tuneable continuous hyper-parameter, which bounds the learning of EVAE. Since this tuneable hyper-parameter, along with the linearly sampled inputs, largely determine the process of generating facial expressions, we hypothesise the correspondence between continuous scales on the hyper-parameter and sampled inputs, and the psychological conceptualised Arousal-Valence dimensions. For empirical validations, two public released facial expression datasets, e.g., the Frey faces and FERG-DB datasets, were employed here to evaluate the dimensional generative performance of our proposed EVAE. Across two datasets, the generated facial expressions along our two hypothesised continuous scales were observed in consistent with the psychological conceptualised Arousal-Valence dimensions. Applied our proposed EVAE model to the Frey faces and FERG-DB facial expression datasets, we demonstrate the feasibility of generating facial expressions along with the conceptualised Arousal-Valence dimensions. In conclusion, to generate facial expressions along the psychological conceptualised Arousal-Valance dimensions, we propose a novel type of generative model, i.e., encapsulated variational auto-encoders (EVAE), allowing the generation process to be disentangled into two tuneable continuous factors. Validated in two publicly available facial expression datasets, we demonstrate the association between these factors and Arousal-Valence dimensions in facial expression generation, deriving the data-driven Arousal-Valence plane in affective computing. Despite its embryonic stage, our research may shed light on the prospect of continuous, dimensional affective computing.
Collapse
|
24
|
Birnbaum ML, Abrami A, Heisig S, Ali A, Arenare E, Agurto C, Lu N, Kane JM, Cecchi G. Acoustic and Facial Features From Clinical Interviews for Machine Learning-Based Psychiatric Diagnosis: Algorithm Development. JMIR Ment Health 2022; 9:e24699. [PMID: 35072648 PMCID: PMC8822433 DOI: 10.2196/24699] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 04/29/2021] [Accepted: 12/01/2021] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND In contrast to all other areas of medicine, psychiatry is still nearly entirely reliant on subjective assessments such as patient self-report and clinical observation. The lack of objective information on which to base clinical decisions can contribute to reduced quality of care. Behavioral health clinicians need objective and reliable patient data to support effective targeted interventions. OBJECTIVE We aimed to investigate whether reliable inferences-psychiatric signs, symptoms, and diagnoses-can be extracted from audiovisual patterns in recorded evaluation interviews of participants with schizophrenia spectrum disorders and bipolar disorder. METHODS We obtained audiovisual data from 89 participants (mean age 25.3 years; male: 48/89, 53.9%; female: 41/89, 46.1%): individuals with schizophrenia spectrum disorders (n=41), individuals with bipolar disorder (n=21), and healthy volunteers (n=27). We developed machine learning models based on acoustic and facial movement features extracted from participant interviews to predict diagnoses and detect clinician-coded neuropsychiatric symptoms, and we assessed model performance using area under the receiver operating characteristic curve (AUROC) in 5-fold cross-validation. RESULTS The model successfully differentiated between schizophrenia spectrum disorders and bipolar disorder (AUROC 0.73) when aggregating face and voice features. Facial action units including cheek-raising muscle (AUROC 0.64) and chin-raising muscle (AUROC 0.74) provided the strongest signal for men. Vocal features, such as energy in the frequency band 1 to 4 kHz (AUROC 0.80) and spectral harmonicity (AUROC 0.78), provided the strongest signal for women. Lip corner-pulling muscle signal discriminated between diagnoses for both men (AUROC 0.61) and women (AUROC 0.62). Several psychiatric signs and symptoms were successfully inferred: blunted affect (AUROC 0.81), avolition (AUROC 0.72), lack of vocal inflection (AUROC 0.71), asociality (AUROC 0.63), and worthlessness (AUROC 0.61). CONCLUSIONS This study represents advancement in efforts to capitalize on digital data to improve diagnostic assessment and supports the development of a new generation of innovative clinical tools by employing acoustic and facial data analysis.
Collapse
Affiliation(s)
- Michael L Birnbaum
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States.,The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Avner Abrami
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| | - Stephen Heisig
- Icahn School of Medicine at Mount Sinai, New York City, NY, United States
| | - Asra Ali
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Elizabeth Arenare
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Carla Agurto
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| | - Nathaniel Lu
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - John M Kane
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States.,The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Guillermo Cecchi
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| |
Collapse
|
25
|
Fully automated age-weighted expression classification using real and apparent age. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-021-01044-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
Wang X, He J, Jin Z, Yang M, Wang Y, Qu H. M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:802-812. [PMID: 34587037 DOI: 10.1109/tvcg.2021.3114794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2 Lens, to visualize and explain multimodal models for sentiment analysis. M2 Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2 Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.
Collapse
|
27
|
Abstract
AbstractHuman emotion recognition is an active research area in artificial intelligence and has made substantial progress over the past few years. Many recent works mainly focus on facial regions to infer human affection, while the surrounding context information is not effectively utilized. In this paper, we proposed a new deep network to effectively recognize human emotions using a novel global-local attention mechanism. Our network is designed to extract features from both facial and context regions independently, then learn them together using the attention module. In this way, both the facial and contextual information is used to infer human emotions, therefore enhancing the discrimination of the classifier. The intensive experiments show that our method surpasses the current state-of-the-art methods on recent emotion datasets by a fair margin. Qualitatively, our global-local attention module can extract more meaningful attention maps than previous methods. The source code and trained model of our network are available at https://github.com/minhnhatvt/glamor-net.
Collapse
|
28
|
Zhang H, Su W, Yu J, Wang Z. Identity–Expression Dual Branch Network for Facial Expression Recognition. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.3034807] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
29
|
Automatic Recognition of Macaque Facial Expressions for Detection of Affective States. eNeuro 2021; 8:ENEURO.0117-21.2021. [PMID: 34799408 PMCID: PMC8664380 DOI: 10.1523/eneuro.0117-21.2021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 08/28/2021] [Accepted: 11/10/2021] [Indexed: 11/21/2022] Open
Abstract
Internal affective states produce external manifestations such as facial expressions. In humans, the Facial Action Coding System (FACS) is widely used to objectively quantify the elemental facial action units (AUs) that build complex facial expressions. A similar system has been developed for macaque monkeys-the Macaque FACS (MaqFACS); yet, unlike the human counterpart, which is already partially replaced by automatic algorithms, this system still requires labor-intensive coding. Here, we developed and implemented the first prototype for automatic MaqFACS coding. We applied the approach to the analysis of behavioral and neural data recorded from freely interacting macaque monkeys. The method achieved high performance in the recognition of six dominant AUs, generalizing between conspecific individuals (Macaca mulatta) and even between species (Macaca fascicularis). The study lays the foundation for fully automated detection of facial expressions in animals, which is crucial for investigating the neural substrates of social and affective states.
Collapse
|
30
|
Song Z. Facial Expression Emotion Recognition Model Integrating Philosophy and Machine Learning Theory. Front Psychol 2021; 12:759485. [PMID: 34646223 PMCID: PMC8503687 DOI: 10.3389/fpsyg.2021.759485] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 09/06/2021] [Indexed: 01/07/2023] Open
Abstract
Facial expression emotion recognition is an intuitive reflection of a person’s mental state, which contains rich emotional information, and is one of the most important forms of interpersonal communication. It can be used in various fields, including psychology. As a celebrity in ancient China, Zeng Guofan’s wisdom involves facial emotion recognition techniques. His book Bing Jian summarizes eight methods on how to identify people, especially how to choose the right one, which means “look at the eyes and nose for evil and righteousness, the lips for truth and falsehood; the temperament for success and fame, the spirit for wealth and fortune; the fingers and claws for ideas, the hamstrings for setback; if you want to know his consecution, you can focus on what he has said.” It is said that a person’s personality, mind, goodness, and badness can be showed by his face. However, due to the complexity and variability of human facial expression emotion features, traditional facial expression emotion recognition technology has the disadvantages of insufficient feature extraction and susceptibility to external environmental influences. Therefore, this article proposes a novel feature fusion dual-channel expression recognition algorithm based on machine learning theory and philosophical thinking. Specifically, the feature extracted using convolutional neural network (CNN) ignores the problem of subtle changes in facial expressions. The first path of the proposed algorithm takes the Gabor feature of the ROI area as input. In order to make full use of the detailed features of the active facial expression emotion area, first segment the active facial expression emotion area from the original face image, and use the Gabor transform to extract the emotion features of the area. Focus on the detailed description of the local area. The second path proposes an efficient channel attention network based on depth separable convolution to improve linear bottleneck structure, reduce network complexity, and prevent overfitting by designing an efficient attention module that combines the depth of the feature map with spatial information. It focuses more on extracting important features, improves emotion recognition accuracy, and outperforms the competition on the FER2013 dataset.
Collapse
Affiliation(s)
- Zhenjie Song
- School of Humanities and Social Sciences, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
31
|
Vijaya Lakshmi A, Mohanaiah P. WOA-TLBO: Whale optimization algorithm with Teaching-learning-based optimization for global optimization and facial emotion recognition. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107623] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
32
|
Banire B, Al Thani D, Qaraqe M, Mansoor B. Face-Based Attention Recognition Model for Children with Autism Spectrum Disorder. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2021; 5:420-445. [PMID: 35415454 PMCID: PMC8982782 DOI: 10.1007/s41666-021-00101-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 04/12/2021] [Accepted: 06/10/2021] [Indexed: 11/25/2022]
Abstract
Attention recognition plays a vital role in providing learning support for children with autism spectrum disorders (ASD). The unobtrusiveness of face-tracking techniques makes it possible to build automatic systems to detect and classify attentional behaviors. However, constructing such systems is a challenging task due to the complexity of attentional behavior in ASD. This paper proposes a face-based attention recognition model using two methods. The first is based on geometric feature transformation using a support vector machine (SVM) classifier, and the second is based on the transformation of time-domain spatial features to 2D spatial images using a convolutional neural network (CNN) approach. We conducted an experimental study on different attentional tasks for 46 children (ASD n=20, typically developing children n=26) and explored the limits of the face-based attention recognition model for participant and task differences. Our results show that the geometric feature transformation using an SVM classifier outperforms the CNN approach. Also, attention detection is more generalizable within typically developing children than within ASD groups and within low-attention tasks than within high-attention tasks. This paper highlights the basis for future face-based attentional recognition for real-time learning and clinical attention interventions.
Collapse
Affiliation(s)
- Bilikis Banire
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Dena Al Thani
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Marwa Qaraqe
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Bilal Mansoor
- Mechanical Engineering Program, Texas A & M University at Doha, Qatar, Doha, Qatar
| |
Collapse
|
33
|
Ong DC, Wu Z, Zhi-Xuan T, Reddan M, Kahhale I, Mattek A, Zaki J. Modeling emotion in complex stories: the Stanford Emotional Narratives Dataset. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 2021; 12:579-594. [PMID: 34484569 PMCID: PMC8414991 DOI: 10.1109/taffc.2019.2955949] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Human emotions unfold over time, and more affective computing research has to prioritize capturing this crucial component of real-world affect. Modeling dynamic emotional stimuli requires solving the twin challenges of time-series modeling and of collecting high-quality time-series datasets. We begin by assessing the state-of-the-art in time-series emotion recognition, and we review contemporary time-series approaches in affective computing, including discriminative and generative models. We then introduce the first version of the Stanford Emotional Narratives Dataset (SENDv1): a set of rich, multimodal videos of self-paced, unscripted emotional narratives, annotated for emotional valence over time. The complex narratives and naturalistic expressions in this dataset provide a challenging test for contemporary time-series emotion recognition models. We demonstrate several baseline and state-of-the-art modeling approaches on the SEND, including a Long Short-Term Memory model and a multimodal Variational Recurrent Neural Network, which perform comparably to the human-benchmark. We end by discussing the implications for future research in time-series affective computing.
Collapse
Affiliation(s)
- Desmond C Ong
- Department of Information Systems and Analytics, National University of Singapore, and with the ASTAR Artificial Intelligence Initiative, Agency for Science, Technology and Research, Singapore
| | - Zhengxuan Wu
- Department of Management Science and Engineering, Stanford University
| | - Tan Zhi-Xuan
- Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, and with the ASTAR Artificial Intelligence Initiative
| | | | | | | | - Jamil Zaki
- Department of Psychology, Stanford University
| |
Collapse
|
34
|
DC-EDN: densely connected encoder-decoder network with reinforced depthwise convolution for face alignment. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01940-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
35
|
Deep transfer learning in human–robot interaction for cognitive and physical rehabilitation purposes. Pattern Anal Appl 2021. [DOI: 10.1007/s10044-021-00988-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
36
|
Hassan T, Seus D, Wollenberg J, Weitz K, Kunz M, Lautenbacher S, Garbas JU, Schmid U. Automatic Detection of Pain from Facial Expressions: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1815-1831. [PMID: 31825861 DOI: 10.1109/tpami.2019.2958341] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pain sensation is essential for survival, since it draws attention to physical threat to the body. Pain assessment is usually done through self-reports. However, self-assessment of pain is not available in the case of noncommunicative patients, and therefore, observer reports should be relied upon. Observer reports of pain could be prone to errors due to subjective biases of observers. Moreover, continuous monitoring by humans is impractical. Therefore, automatic pain detection technology could be deployed to assist human caregivers and complement their service, thereby improving the quality of pain management, especially for noncommunicative patients. Facial expressions are a reliable indicator of pain, and are used in all observer-based pain assessment tools. Following the advancements in automatic facial expression analysis, computer vision researchers have tried to use this technology for developing approaches for automatically detecting pain from facial expressions. This paper surveys the literature published in this field over the past decade, categorizes it, and identifies future research directions. The survey covers the pain datasets used in the reviewed literature, the learning tasks targeted by the approaches, the features extracted from images and image sequences to represent pain-related information, and finally, the machine learning methods used.
Collapse
|
37
|
Kossaifi J, Walecki R, Panagakis Y, Shen J, Schmitt M, Ringeval F, Han J, Pandit V, Toisoul A, Schuller B, Star K, Hajiyev E, Pantic M. SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1022-1040. [PMID: 31581074 DOI: 10.1109/tpami.2019.2944808] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2,000 minutes of audio-visual data of 398 people coming from six cultures, 50 percent female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal, and (dis)liking intensity estimation.
Collapse
|
38
|
Kola DGR, Samayamantula SK. Facial expression recognition using singular values and wavelet‐based LGC‐HD operator. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12012] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
39
|
Abstract
AbstractFacial expression recognition has seen rapid development in recent years due to its wide range of applications such as human–computer interaction, health care, and social robots. Although significant progress has been made in this field, it is still challenging to recognize facial expressions with occlusions and large head-poses. To address these issues, this paper presents a cascade regression-based face frontalization (CRFF) method, which aims to immediately reconstruct a clean, frontal and expression-aware face given an in-the-wild facial image. In the first stage, a frontal facial shape is predicted by developing a cascade regression model to learn the pairwise spatial relation between non-frontal face-shape and its frontal counterpart. Unlike most existing shape prediction methods that used single-step regression, the cascade model is a multi-step regressor that gradually aligns non-frontal shape to its frontal view. We employ several different regressors and make a ensemble decision to boost prediction performance. For facial texture reconstruction, active appearance model instantiation is employed to warp the input face to the predicted frontal shape and generate a clean face. To remove occlusions, we train this generative model on manually selected clean-face sets, which ensures generating a clean face as output regardless of whether the input face involves occlusions or not. Unlike the existing face reconstruction methods that are computational expensive, the proposed method works in real time, so it is suitable for dynamic analysis of facial expression. The experimental validation shows that the ensembling cascade model has improved frontal shape prediction accuracy for an average of 5% and the proposed method has achieved superior performance on both static and dynamic recognition of facial expressions over the state-of-the-art approaches. The experimental results demonstrate that the proposed method has achieved expression-preserving frontalization, de-occlusion and has improved performance of facial expression recognition.
Collapse
|
40
|
Sepas-Moghaddam A, Etemad A, Pereira F, Correia PL. CapsField: Light Field-Based Face and Expression Recognition in the Wild Using Capsule Routing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2627-2642. [PMID: 33523811 DOI: 10.1109/tip.2021.3054476] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Light field (LF) cameras provide rich spatio-angular visual representations by sensing the visual scene from multiple perspectives and have recently emerged as a promising technology to boost the performance of human-machine systems such as biometrics and affective computing. Despite the significant success of LF representation for constrained facial image analysis, this technology has never been used for face and expression recognition in the wild. In this context, this paper proposes a new deep face and expression recognition solution, called CapsField, based on a convolutional neural network and an additional capsule network that utilizes dynamic routing to learn hierarchical relations between capsules. CapsField extracts the spatial features from facial images and learns the angular part-whole relations for a selected set of 2D sub-aperture images rendered from each LF image. To analyze the performance of the proposed solution in the wild, the first in the wild LF face dataset, along with a new complementary constrained face dataset captured from the same subjects recorded earlier have been captured and are made available. A subset of the in the wild dataset contains facial images with different expressions, annotated for usage in the context of face expression recognition tests. An extensive performance assessment study using the new datasets has been conducted for the proposed and relevant prior solutions, showing that the CapsField proposed solution achieves superior performance for both face and expression recognition tasks when compared to the state-of-the-art.
Collapse
|
41
|
Toisoul A, Kossaifi J, Bulat A, Tzimiropoulos G, Pantic M. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-020-00280-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
42
|
Kawulok M, Nalepa J, Kawulok J, Smolka B. Dynamics of facial actions for assessing smile genuineness. PLoS One 2021; 16:e0244647. [PMID: 33400708 PMCID: PMC7785114 DOI: 10.1371/journal.pone.0244647] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 12/14/2020] [Indexed: 11/19/2022] Open
Abstract
Applying computer vision techniques to distinguish between spontaneous and posed smiles is an active research topic of affective computing. Although there have been many works published addressing this problem and a couple of excellent benchmark databases created, the existing state-of-the-art approaches do not exploit the action units defined within the Facial Action Coding System that has become a standard in facial expression analysis. In this work, we explore the possibilities of extracting discriminative features directly from the dynamics of facial action units to differentiate between genuine and posed smiles. We report the results of our experimental study which shows that the proposed features offer competitive performance to those based on facial landmark analysis and on textural descriptors extracted from spatial-temporal blocks. We make these features publicly available for the UvA-NEMO and BBC databases, which will allow other researchers to further improve the classification scores, while preserving the interpretation capabilities attributed to the use of facial action units. Moreover, we have developed a new technique for identifying the smile phases, which is robust against the noise and allows for continuous analysis of facial videos.
Collapse
Affiliation(s)
- Michal Kawulok
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
- * E-mail:
| | - Jakub Nalepa
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Jolanta Kawulok
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Bogdan Smolka
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
43
|
Xia Y, Yu H, Wang X, Jian M, Wang FY. Relation-Aware Facial Expression Recognition. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2021.3100131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
44
|
Nanda A, Im W, Choi KS, Yang HS. Combined center dispersion loss function for deep facial expression recognition. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2020.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
45
|
Wang X, Fairhurst MC, Canuto AM. Improving multi-view facial expression recognition through two novel texture-based feature representations. INTELL DATA ANAL 2020. [DOI: 10.3233/ida-194798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Although several automatic computer systems have been proposed to address facial expression recognition problems, the majority of them still fail to cope with some requirements of many practical application scenarios. In this paper, one of the most influential and common issues raised in practical application scenarios when applying automatic facial expression recognition system, head pose variation, is comprehensively explored and investigated. In order to do this, two novel texture feature representations are proposed for implementing multi-view facial expression recognition systems in practical environments. These representations combine the block-based techniques with Local Ternary Pattern-based features, providing a more informative and efficient feature representation of the facial images. In addition, an in-house multi-view facial expression database has been designed and collected to allow us to conduct a detailed research study of the effect of out-of-plane pose angles on the performance of a multi-view facial expression recognition system. Along with the proposed in-house dataset, the proposed system is tested on two well-known facial expression databases, CK+ and BU-3DFE datasets. The obtained results shows that the proposed system outperforms current state-of-the-art 2D facial expression systems in the presence of pose variations.
Collapse
Affiliation(s)
- Xuejian Wang
- School of Engineering and Digital Arts, Jennison Building, University of Kent, UK
| | - Michael C. Fairhurst
- School of Engineering and Digital Arts, Jennison Building, University of Kent, UK
| | - Anne M.P. Canuto
- Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal, RN, Brazil
| |
Collapse
|
46
|
Zhi R, Hu X, Wang C, Liu S. Development of a direct mapping model between hedonic rating and facial responses by dynamic facial expression representation. Food Res Int 2020; 137:109411. [PMID: 33233098 DOI: 10.1016/j.foodres.2020.109411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 06/02/2020] [Accepted: 06/03/2020] [Indexed: 11/17/2022]
Abstract
Consumer tests are one of the most important activities in product development. More evidence indicates that consumer emotions in real life are mostly driven by unconscious mechanisms, and implicit measurements are regarded as beneficial by an increasing number of sensory and consumer scientists. Nonverbal manner such as facial expression analysis is a supplement to the declarative method and brings very insightful results. Up until now, the facial expression analysis for consumers' acceptance identification is limited to investigate the relationship between hedonic rating and facial expression descriptors, such as facial coding system (FACS or MAX), discrete facial expressions (i.e. happiness, sadness, surprise, fear, anger, and disgust), and affective dimensional model (valence and activation). In this study, we attempt to develop a direct mapping model between the hedonic rating and facial responses evoked by various taste stimuli. Basic taste solutions (sourness, sweetness, bitterness, umami, and saltiness) with six levels, and five types of juice are used as stimuli. Firstly, the hedonic rating categories are defined based on the nine-point hedonic scale, with a coarse-to-fine division of scale levels based on two directions of like and dislike. Secondly, the facial dynamic optical flow method is employed to analyze facial characteristics of the subjects' facial responses evoked by taste stimuli. And the genetic algorithm is conducted to select facial regions that have high contribution to hedonic rating identification. It indicates that the texture changes of eye area, wrinkles at the nasal root, and mouth area can effectively reflect the facial reaction corresponding to hedonic rating. The research shows that it is feasible to establish a direct mapping model between hedonic rating and facial responses. The hedonic rating can be predicted through automatic facial reading technology, without extra transformation from predefined emotional models. In general, this is the first try to discuss the direct prediction of hedonic rating through facial expressions up to now, and it is a complex problem due to various influence factors.
Collapse
Affiliation(s)
- Ruicong Zhi
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, PR China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, PR China.
| | - Xin Hu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, PR China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, PR China
| | - Chenyang Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, PR China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, PR China
| | - Shuai Liu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, PR China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, PR China
| |
Collapse
|
47
|
Kołakowska A, Szwoch W, Szwoch M. A Review of Emotion Recognition Methods Based on Data Acquired via Smartphone Sensors. SENSORS (BASEL, SWITZERLAND) 2020; 20:E6367. [PMID: 33171646 PMCID: PMC7664622 DOI: 10.3390/s20216367] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/29/2020] [Accepted: 11/05/2020] [Indexed: 01/18/2023]
Abstract
In recent years, emotion recognition algorithms have achieved high efficiency, allowing the development of various affective and affect-aware applications. This advancement has taken place mainly in the environment of personal computers offering the appropriate hardware and sufficient power to process complex data from video, audio, and other channels. However, the increase in computing and communication capabilities of smartphones, the variety of their built-in sensors, as well as the availability of cloud computing services have made them an environment in which the task of recognising emotions can be performed at least as effectively. This is possible and particularly important due to the fact that smartphones and other mobile devices have become the main computer devices used by most people. This article provides a systematic overview of publications from the last 10 years related to emotion recognition methods using smartphone sensors. The characteristics of the most important sensors in this respect are presented, and the methods applied to extract informative features on the basis of data read from these input channels. Then, various machine learning approaches implemented to recognise emotional states are described.
Collapse
Affiliation(s)
- Agata Kołakowska
- Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, 80-233 Gdansk, Poland; (W.S.); (M.S.)
| | | | | |
Collapse
|
48
|
Jain DK, Zhang Z, Huang K. Multi angle optimal pattern-based deep learning for automatic facial expression recognition. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2017.06.025] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
49
|
Abstract
This paper connects two large research areas, namely sentiment analysis and human–robot interaction. Emotion analysis, as a subfield of sentiment analysis, explores text data and, based on the characteristics of the text and generally known emotional models, evaluates what emotion is presented in it. The analysis of emotions in the human–robot interaction aims to evaluate the emotional state of the human being and on this basis to decide how the robot should adapt its behavior to the human being. There are several approaches and algorithms to detect emotions in the text data. We decided to apply a combined method of dictionary approach with machine learning algorithms. As a result of the ambiguity and subjectivity of labeling emotions, it was possible to assign more than one emotion to a sentence; thus, we were dealing with a multi-label problem. Based on the overview of the problem, we performed experiments with the Naive Bayes, Support Vector Machine and Neural Network classifiers. Results obtained from classification were subsequently used in human–robot experiments. Despise the lower accuracy of emotion classification, we proved the importance of expressing emotion gestures based on the words we speak.
Collapse
|
50
|
|