1
|
Bian Y, Küster D, Liu H, Krumhuber EG. Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models. SENSORS (BASEL, SWITZERLAND) 2023; 24:126. [PMID: 38202988 PMCID: PMC10781259 DOI: 10.3390/s24010126] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 11/30/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024]
Abstract
This paper provides a comprehensive overview of affective computing systems for facial expression recognition (FER) research in naturalistic contexts. The first section presents an updated account of user-friendly FER toolboxes incorporating state-of-the-art deep learning models and elaborates on their neural architectures, datasets, and performances across domains. These sophisticated FER toolboxes can robustly address a variety of challenges encountered in the wild such as variations in illumination and head pose, which may otherwise impact recognition accuracy. The second section of this paper discusses multimodal large language models (MLLMs) and their potential applications in affective science. MLLMs exhibit human-level capabilities for FER and enable the quantification of various contextual variables to provide context-aware emotion inferences. These advancements have the potential to revolutionize current methodological approaches for studying the contextual influences on emotions, leading to the development of contextualized emotion models.
Collapse
Affiliation(s)
- Yifan Bian
- Department of Experimental Psychology, University College London, London WC1H 0AP, UK;
| | - Dennis Küster
- Department of Mathematics and Computer Science, University of Bremen, 28359 Bremen, Germany; (D.K.); (H.L.)
| | - Hui Liu
- Department of Mathematics and Computer Science, University of Bremen, 28359 Bremen, Germany; (D.K.); (H.L.)
| | - Eva G. Krumhuber
- Department of Experimental Psychology, University College London, London WC1H 0AP, UK;
| |
Collapse
|
2
|
Onal Ertugrul I, Ahn YA, Bilalpur M, Messinger DS, Speltz ML, Cohn JF. Infant AFAR: Automated facial action recognition in infants. Behav Res Methods 2023; 55:1024-1035. [PMID: 35538295 PMCID: PMC9646921 DOI: 10.3758/s13428-022-01863-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 11/08/2022]
Abstract
Automated detection of facial action units in infants is challenging. Infant faces have different proportions, less texture, fewer wrinkles and furrows, and unique facial actions relative to adults. For these and related reasons, action unit (AU) detectors that are trained on adult faces may generalize poorly to infant faces. To train and test AU detectors for infant faces, we trained convolutional neural networks (CNN) in adult video databases and fine-tuned these networks in two large, manually annotated, infant video databases that differ in context, head pose, illumination, video resolution, and infant age. AUs were those central to expression of positive and negative emotion. AU detectors trained in infants greatly outperformed ones trained previously in adults. Training AU detectors across infant databases afforded greater robustness to between-database differences than did training database specific AU detectors and outperformed previous state-of-the-art in infant AU detection. The resulting AU detection system, which we refer to as Infant AFAR (Automated Facial Action Recognition), is available to the research community for further testing and applications in infant emotion, social interaction, and related topics.
Collapse
|
3
|
Namba S, Sato W, Matsui H. Spatio-Temporal Properties of Amused, Embarrassed, and Pained Smiles. JOURNAL OF NONVERBAL BEHAVIOR 2022. [DOI: 10.1007/s10919-022-00404-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractSmiles are universal but nuanced facial expressions that are most frequently used in face-to-face communications, typically indicating amusement but sometimes conveying negative emotions such as embarrassment and pain. Although previous studies have suggested that spatial and temporal properties could differ among these various types of smiles, no study has thoroughly analyzed these properties. This study aimed to clarify the spatiotemporal properties of smiles conveying amusement, embarrassment, and pain using a spontaneous facial behavior database. The results regarding spatial patterns revealed that pained smiles showed less eye constriction and more overall facial tension than amused smiles; no spatial differences were identified between embarrassed and amused smiles. Regarding temporal properties, embarrassed and pained smiles remained in a state of higher facial tension than amused smiles. Moreover, embarrassed smiles showed a more gradual change from tension states to the smile state than amused smiles, and pained smiles had lower probabilities of staying in or transitioning to the smile state compared to amused smiles. By comparing the spatiotemporal properties of these three smile types, this study revealed that the probability of transitioning between discrete states could help distinguish amused, embarrassed, and pained smiles.
Collapse
|
4
|
Guerdelli H, Ferrari C, Barhoumi W, Ghazouani H, Berretti S. Macro- and Micro-Expressions Facial Datasets: A Survey. SENSORS 2022; 22:s22041524. [PMID: 35214430 PMCID: PMC8879817 DOI: 10.3390/s22041524] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/10/2022] [Accepted: 02/11/2022] [Indexed: 11/16/2022]
Abstract
Automatic facial expression recognition is essential for many potential applications. Thus, having a clear overview on existing datasets that have been investigated within the framework of face expression recognition is of paramount importance in designing and evaluating effective solutions, notably for neural networks-based training. In this survey, we provide a review of more than eighty facial expression datasets, while taking into account both macro- and micro-expressions. The proposed study is mostly focused on spontaneous and in-the-wild datasets, given the common trend in the research is that of considering contexts where expressions are shown in a spontaneous way and in a real context. We have also provided instances of potential applications of the investigated datasets, while putting into evidence their pros and cons. The proposed survey can help researchers to have a better understanding of the characteristics of the existing datasets, thus facilitating the choice of the data that best suits the particular context of their application.
Collapse
Affiliation(s)
- Hajer Guerdelli
- Research Team on Intelligent Systems in Imaging and Artificial Vision (SIIVA), LR16ES06 Laboratoire de Recherche en Informatique, Modélisation et Traitement de’Information et dea Connaissance (LIMTIC), Institut Supérieur d’Informatique d’El Manar, Université de Tunis El Manar, Tunis 1068, Tunisia; (H.G.); (W.B.); (H.G.)
- Media Integration and Communication Center, University of Florence, 50121 Firenze, Italy
| | - Claudio Ferrari
- Department of Engineering and Architecture, University of Parma, 43121 Parma, Italy;
| | - Walid Barhoumi
- Research Team on Intelligent Systems in Imaging and Artificial Vision (SIIVA), LR16ES06 Laboratoire de Recherche en Informatique, Modélisation et Traitement de’Information et dea Connaissance (LIMTIC), Institut Supérieur d’Informatique d’El Manar, Université de Tunis El Manar, Tunis 1068, Tunisia; (H.G.); (W.B.); (H.G.)
- Ecole Nationale d’Ingénieurs de Carthage, Université de Carthage, Carthage 1054, Tunisia
| | - Haythem Ghazouani
- Research Team on Intelligent Systems in Imaging and Artificial Vision (SIIVA), LR16ES06 Laboratoire de Recherche en Informatique, Modélisation et Traitement de’Information et dea Connaissance (LIMTIC), Institut Supérieur d’Informatique d’El Manar, Université de Tunis El Manar, Tunis 1068, Tunisia; (H.G.); (W.B.); (H.G.)
- Ecole Nationale d’Ingénieurs de Carthage, Université de Carthage, Carthage 1054, Tunisia
| | - Stefano Berretti
- Media Integration and Communication Center, University of Florence, 50121 Firenze, Italy
- Correspondence: ; Tel.: +39-216-96202969
| |
Collapse
|
5
|
Elhamdadi H, Canavan S, Rosen P. AffectiveTDA: Using Topological Data Analysis to Improve Analysis and Explainability in Affective Computing. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:769-779. [PMID: 34587031 DOI: 10.1109/tvcg.2021.3114784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present an approach utilizing Topological Data Analysis to study the structure of face poses used in affective computing, i.e., the process of recognizing human emotion. The approach uses a conditional comparison of different emotions, both respective and irrespective of time, with multiple topological distance metrics, dimension reduction techniques, and face subsections (e.g., eyes, nose, mouth, etc.). The results confirm that our topology-based approach captures known patterns, distinctions between emotions, and distinctions between individuals, which is an important step towards more robust and explainable emotion recognition by machines.
Collapse
|
6
|
Namba S, Sato W, Osumi M, Shimokawa K. Assessing Automated Facial Action Unit Detection Systems for Analyzing Cross-Domain Facial Expression Databases. SENSORS (BASEL, SWITZERLAND) 2021; 21:4222. [PMID: 34203007 PMCID: PMC8235167 DOI: 10.3390/s21124222] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/15/2021] [Accepted: 06/17/2021] [Indexed: 11/16/2022]
Abstract
In the field of affective computing, achieving accurate automatic detection of facial movements is an important issue, and great progress has already been made. However, a systematic evaluation of systems that now have access to the dynamic facial database remains an unmet need. This study compared the performance of three systems (FaceReader, OpenFace, AFARtoolbox) that detect each facial movement corresponding to an action unit (AU) derived from the Facial Action Coding System. All machines could detect the presence of AUs from the dynamic facial database at a level above chance. Moreover, OpenFace and AFAR provided higher area under the receiver operating characteristic curve values compared to FaceReader. In addition, several confusion biases of facial components (e.g., AU12 and AU14) were observed to be related to each automated AU detection system and the static mode was superior to dynamic mode for analyzing the posed facial database. These findings demonstrate the features of prediction patterns for each system and provide guidance for research on facial expressions.
Collapse
Affiliation(s)
- Shushi Namba
- Psychological Process Team, BZP, Robotics Project, RIKEN, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 6190288, Japan
| | - Wataru Sato
- Psychological Process Team, BZP, Robotics Project, RIKEN, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 6190288, Japan
| | - Masaki Osumi
- KOHINATA Limited Liability Company, 2-7-3, Tateba, Naniwa-ku, Osaka 5560020, Japan; (M.O.); (K.S.)
| | - Koh Shimokawa
- KOHINATA Limited Liability Company, 2-7-3, Tateba, Naniwa-ku, Osaka 5560020, Japan; (M.O.); (K.S.)
| |
Collapse
|
7
|
Niinuma K, Onal Ertugrul I, Cohn JF, Jeni LA. Systematic Evaluation of Design Choices for Deep Facial Action Coding Across Pose. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.636094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The performance of automated facial expression coding is improving steadily. Advances in deep learning techniques have been key to this success. While the advantage of modern deep learning techniques is clear, the contribution of critical design choices remains largely unknown, especially for facial action unit occurrence and intensity across pose. Using the The Facial Expression Recognition and Analysis 2017 (FERA 2017) database, which provides a common protocol to evaluate robustness to pose variation, we systematically evaluated design choices in pre-training, feature alignment, model size selection, and optimizer details. Informed by the findings, we developed an architecture that exceeds state-of-the-art on FERA 2017. The architecture achieved a 3.5% increase in F1 score for occurrence detection and a 5.8% increase in Intraclass Correlation (ICC) for intensity estimation. To evaluate the generalizability of the architecture to unseen poses and new dataset domains, we performed experiments across pose in FERA 2017 and across domains in Denver Intensity of Spontaneous Facial Action (DISFA) and the UNBC Pain Archive.
Collapse
|
8
|
Niinuma K, Ertugrul IO, Cohn JF, Jeni LA. Synthetic Expressions are Better Than Real for Learning to Detect Facial Actions. IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION. IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION 2021; 2021:1247-1256. [PMID: 38250021 PMCID: PMC10798354 DOI: 10.1109/wacv48630.2021.00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Critical obstacles in training classifiers to detect facial actions are the limited sizes of annotated video databases and the relatively low frequencies of occurrence of many actions. To address these problems, we propose an approach that makes use of facial expression generation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and then trains a GAN-based network to synthesize novel images with facial action units of interest. To evaluate this approach, a deep neural network was trained on two separate datasets: One network was trained on video of synthesized facial expressions generated from FERA17; the other network was trained on unaltered video from the same database. Both networks used the same train and validation partitions and were tested on the test partition of actual video from FERA17. The network trained on synthesized facial expressions outperformed the one trained on actual facial expressions and surpassed current state-of-the-art approaches.
Collapse
|
9
|
|