51
|
Sariyanidi E, Zampella CJ, Bartley KG, Herrington JD, Satterthwaite TD, Schultz RT, Tunc B. Discovering Synchronized Subsets of Sequences: A Large Scale Solution. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2020; 2020:9490-9499. [PMID: 32968342 DOI: 10.1109/cvpr42600.2020.00951] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Finding the largest subset of sequences (i.e., time series) that are correlated above a certain threshold, within large datasets, is of significant interest for computer vision and pattern recognition problems across domains, including behavior analysis, computational biology, neuroscience, and finance. Maximal clique algorithms can be used to solve this problem, but they are not scalable. We present an approximate, but highly efficient and scalable, method that represents the search space as a union of sets called ϵ-expanded clusters, one of which is theoretically guaranteed to contain the largest subset of synchronized sequences. The method finds synchronized sets by fitting a Euclidean ball on ϵ-expanded clusters, using Jung's theorem. We validate the method on data from the three distinct domains of facial behavior analysis, finance, and neuroscience, where we respectively discover the synchrony among pixels of face videos, stock market item prices, and dynamic brain connectivity data. Experiments show that our method produces results comparable to, but up to 300 times faster than, maximal clique algorithms, with speed gains increasing exponentially with the number of input sequences.
Collapse
Affiliation(s)
| | | | - Keith G Bartley
- Center for Autism Research, Children's Hospital of Philadelphia
| | - John D Herrington
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| | | | - Robert T Schultz
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| | - Birkan Tunc
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| |
Collapse
|
52
|
Sariyanidi E, Zampella CJ, Schultz RT, Tunc B. Can Facial Pose and Expression Be Separated with Weak Perspective Camera? PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2020; 2020:7171-7180. [PMID: 32921968 DOI: 10.1109/cvpr42600.2020.00720] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Separating facial pose and expression within images requires a camera model for 3D-to-2D mapping. The weak perspective (WP) camera has been the most popular choice; it is the default, if not the only option, in state-of-the-art facial analysis methods and software. WP camera is justified by the supposition that its errors are negligible when the subjects are relatively far from the camera, yet this claim has never been tested despite nearly 20 years of research. This paper critically examines the suitability of WP camera for separating facial pose and expression. First, we theoretically show that WP causes pose-expression ambiguity, as it leads to estimation of spurious expressions. Next, we experimentally quantify the magnitude of spurious expressions. Finally, we test whether spurious expressions have detrimental effects on a common facial analysis application, namely Action Unit (AU) detection. Contrary to conventional wisdom, we find that severe pose-expression ambiguity exists even when subjects are not close to the camera, leading to large false positive rates in AU detection. We also demonstrate that the magnitude and characteristics of spurious expressions depend on the point distribution model used to model the expressions. Our results suggest that common assumptions about WP need to be revisited in facial expression modeling, and that facial analysis software should encourage and facilitate the use of the true camera model whenever possible.
Collapse
Affiliation(s)
| | | | - Robert T Schultz
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| | - Birkan Tunc
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| |
Collapse
|
53
|
Carletti V, Greco A, Percannella G, Vento M. Age from Faces in the Deep Learning Revolution. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2113-2132. [PMID: 30990174 DOI: 10.1109/tpami.2019.2910522] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Face analysis includes a variety of specific problems as face detection, person identification, gender and ethnicity recognition, just to name the most common ones; in the last two decades, significant research efforts have been devoted to the challenging task of age estimation from faces, as witnessed by the high number of published papers. The explosion of the deep learning paradigm, that is determining a spectacular increasing of the performance, is in the public eye; consequently, the number of approaches based on deep learning is impressively growing and this also happened for age estimation. The exciting results obtained have been recently surveyed on almost all the specific face analysis problems; the only exception stands for age estimation, whose last survey dates back to 2010 and does not include any deep learning based approach to the problem. This paper provides an analysis of the deep methods proposed in the last six years; these are analysed from different points of view: the network architecture together with the learning procedure, the used datasets, data preprocessing and augmentation, and the exploitation of additional data coming from gender, race and face expression. The review is completed by discussing the results obtained on public datasets, so as the impact of different aspects on system performance, together with still open issues.
Collapse
|
54
|
Löffler-Stastka H, Wong G. Learning and competence development via clinical cases – what elements should be investigated to best train good medical doctors? World J Meta-Anal 2020; 8:178-189. [DOI: 10.13105/wjma.v8.i3.178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 06/15/2020] [Accepted: 06/19/2020] [Indexed: 02/06/2023] Open
Abstract
In European higher education, application of information technology, concentration on the learning-processes, consistent implementation, transfer learning, case-based learning, autonomous learning has been extensively studied in the last decade. Educational sciences based on neuroscientific findings use brain-based learning and teaching, including integrated thematic instructions and emotion-theory. Elements essential to this strategy, such as theory and methods for learning, competencies, attitudes, social reality, and a metadiscourse are described herein. Research on learning tends to focus on declarative knowledge, associative learning with conditional stimuli, and procedural knowledge with polythematic/crosslinking thinking. Research on competencies: In research on competencies (e.g., for clinical reasoning, decision-making), intuitive and analytical components are studied. As repeated presentation and exercising of clinical cases is crucial for an efficient learning process, the implementation of interactive scenarios including affectively involving didactics is considered. For competence-development observational methods, questionnaires/item sets or factors have to be targeted and empirically validated. Attitudes and social reality: Clinical decision-making, identification processes and attitudes (“Hidden curriculum”), as well as secondary socialization processes (integration of social norms, values, preparation of role-acquisition, occupational role) are studied via process research, conceptual research, and observational methods. With respect to social reality research, conscious and unconscious bargaining processes have to be taken into account. Methodology: Neuroscience – memory, neuronal, molecular biology, and computer science (Neurocircuits) are integrated into observational process research (e.g., affective-cognitive interface, identification processes) and conceptual research is added and studied on the meta-level, including discussion of research paradigms. This discussion provides ongoing feedback to projects in a hermeneutic circle.
Collapse
Affiliation(s)
- Henriette Löffler-Stastka
- Department of Psychoanalysis and Psychotherapy, and Postgraduate Unit, Teaching Center, Medical University of Vienna, Vienna 1090, Austria
| | - Guoruey Wong
- Faculté de Médecine, Université de Montréal, Montréal H3T 1J4, Quebec, Canada
| |
Collapse
|
55
|
Fei Z, Yang E, Li DDU, Butler S, Ijomah W, Li X, Zhou H. Deep convolution network based emotion analysis towards mental health care. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.034] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
56
|
Dupré D, Krumhuber EG, Küster D, McKeown GJ. A performance comparison of eight commercially available automatic classifiers for facial affect recognition. PLoS One 2020; 15:e0231968. [PMID: 32330178 PMCID: PMC7182192 DOI: 10.1371/journal.pone.0231968] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 04/03/2020] [Indexed: 02/03/2023] Open
Abstract
In the wake of rapid advances in automatic affect analysis, commercial automatic classifiers for facial affect recognition have attracted considerable attention in recent years. While several options now exist to analyze dynamic video data, less is known about the relative performance of these classifiers, in particular when facial expressions are spontaneous rather than posed. In the present work, we tested eight out-of-the-box automatic classifiers, and compared their emotion recognition performance to that of human observers. A total of 937 videos were sampled from two large databases that conveyed the basic six emotions (happiness, sadness, anger, fear, surprise, and disgust) either in posed (BU-4DFE) or spontaneous (UT-Dallas) form. Results revealed a recognition advantage for human observers over automatic classification. Among the eight classifiers, there was considerable variance in recognition accuracy ranging from 48% to 62%. Subsequent analyses per type of expression revealed that performance by the two best performing classifiers approximated those of human observers, suggesting high agreement for posed expressions. However, classification accuracy was consistently lower (although above chance level) for spontaneous affective behavior. The findings indicate potential shortcomings of existing out-of-the-box classifiers for measuring emotions, and highlight the need for more spontaneous facial databases that can act as a benchmark in the training and testing of automatic emotion recognition systems. We further discuss some limitations of analyzing facial expressions that have been recorded in controlled environments.
Collapse
Affiliation(s)
- Damien Dupré
- Business School, Dublin City University, Dublin, Republic of Ireland
| | - Eva G. Krumhuber
- Department of Experimental Psychology, University College London, London, England, United Kingdom
| | - Dennis Küster
- Department of Mathematics and Computer Science, University of Bremen, Bremen, Germany
- Department of Psychology and Methods, Jacobs University Bremen, Bremen, Germany
| | - Gary J. McKeown
- Department of Psychology, Queen’s University Belfast, Belfast, Northern Ireland, United Kingdom
| |
Collapse
|
57
|
Payal P, Goyani MM. A comprehensive study on face recognition: methods and challenges. THE IMAGING SCIENCE JOURNAL 2020. [DOI: 10.1080/13682199.2020.1738741] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Parekh Payal
- Department of Computer Engineering, GEC, Modasa, India
| | | |
Collapse
|
58
|
Park S, Lee K, Lim JA, Ko H, Kim T, Lee JI, Kim H, Han SJ, Kim JS, Park S, Lee JY, Lee EC. Differences in Facial Expressions between Spontaneous and Posed Smiles: Automated Method by Action Units and Three-Dimensional Facial Landmarks. SENSORS (BASEL, SWITZERLAND) 2020; 20:E1199. [PMID: 32098261 PMCID: PMC7070510 DOI: 10.3390/s20041199] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 02/20/2020] [Accepted: 02/20/2020] [Indexed: 12/05/2022]
Abstract
Research on emotion recognition from facial expressions has found evidence of different muscle movements between genuine and posed smiles. To further confirm discrete movement intensities of each facial segment, we explored differences in facial expressions between spontaneous and posed smiles with three-dimensional facial landmarks. Advanced machine analysis was adopted to measure changes in the dynamics of 68 segmented facial regions. A total of 57 normal adults (19 men, 38 women) who displayed adequate posed and spontaneous facial expressions for happiness were included in the analyses. The results indicate that spontaneous smiles have higher intensities for upper face than lower face. On the other hand, posed smiles showed higher intensities in the lower part of the face. Furthermore, the 3D facial landmark technique revealed that the left eyebrow displayed stronger intensity during spontaneous smiles than the right eyebrow. These findings suggest a potential application of landmark based emotion recognition that spontaneous smiles can be distinguished from posed smiles via measuring relative intensities between the upper and lower face with a focus on left-sided asymmetry in the upper region.
Collapse
Affiliation(s)
- Seho Park
- Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul 08826, Korea; (S.P.); (H.K.)
- Dental Research Institute, Seoul National University, School of Dentistry, Seoul 08826, Korea
- Department of Psychiatry, Seoul National University College of Medicine & SMG-SNU Boramae Medical Center, Seoul 03080, Korea
| | - Kunyoung Lee
- Department of Computer Science, Sangmyung University, Seoul 03016, Korea;
| | - Jae-A Lim
- Department of Psychiatry, Seoul National University College of Medicine & SMG-SNU Boramae Medical Center, Seoul 03080, Korea
| | - Hyunwoong Ko
- Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul 08826, Korea; (S.P.); (H.K.)
- Dental Research Institute, Seoul National University, School of Dentistry, Seoul 08826, Korea
- Department of Psychiatry, Seoul National University College of Medicine & SMG-SNU Boramae Medical Center, Seoul 03080, Korea
| | - Taehoon Kim
- Seoul National University College of Medicine, Seoul 03080, Korea; (T.K.); (H.K.); (J.-I.L.)
| | - Jung-In Lee
- Seoul National University College of Medicine, Seoul 03080, Korea; (T.K.); (H.K.); (J.-I.L.)
| | - Hakrim Kim
- Seoul National University College of Medicine, Seoul 03080, Korea; (T.K.); (H.K.); (J.-I.L.)
| | - Seong-Jae Han
- Seoul National University College of Medicine, Seoul 03080, Korea; (T.K.); (H.K.); (J.-I.L.)
| | - Jeong-Shim Kim
- Department of Psychiatry, Seoul National University College of Medicine & SMG-SNU Boramae Medical Center, Seoul 03080, Korea
| | - Soowon Park
- Department of Education, Sejong University, Seoul 05006, Korea;
| | - Jun-Young Lee
- Department of Psychiatry, Seoul National University College of Medicine & SMG-SNU Boramae Medical Center, Seoul 03080, Korea
| | - Eui Chul Lee
- Department of Human Centered Artificial Intelligence, Sangmyung University, Seoul 03016, Korea
| |
Collapse
|
59
|
Abstract
Facial emotion recognition is a crucial task for human-computer interaction, autonomous vehicles, and a multitude of multimedia applications. In this paper, we propose a modular framework for human facial emotions’ recognition. The framework consists of two machine learning algorithms (for detection and classification) that could be trained offline for real-time applications. Initially, we detect faces in the images by exploring the AdaBoost cascade classifiers. We then extract neighborhood difference features (NDF), which represent the features of a face based on localized appearance information. The NDF models different patterns based on the relationships between neighboring regions themselves instead of considering only intensity information. The study is focused on the seven most important facial expressions that are extensively used in day-to-day life. However, due to the modular design of the framework, it can be extended to classify N number of facial expressions. For facial expression classification, we train a random forest classifier with a latent emotional state that takes care of the mis-/false detection. Additionally, the proposed method is independent of gender and facial skin color for emotion recognition. Moreover, due to the intrinsic design of NDF, the proposed method is illumination and orientation invariant. We evaluate our method on different benchmark datasets and compare it with five reference methods. In terms of accuracy, the proposed method gives 13% and 24% better results than the reference methods on the static facial expressions in the wild (SFEW) and real-world affective faces (RAF) datasets, respectively.
Collapse
|
60
|
Sun Y, Ayaz H, Akansu AN. Multimodal Affective State Assessment Using fNIRS + EEG and Spontaneous Facial Expression. Brain Sci 2020; 10:E85. [PMID: 32041316 PMCID: PMC7071625 DOI: 10.3390/brainsci10020085] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Revised: 01/31/2020] [Accepted: 02/01/2020] [Indexed: 01/04/2023] Open
Abstract
Human facial expressions are regarded as a vital indicator of one's emotion and intention, and even reveal the state of health and wellbeing. Emotional states have been associated with information processing within and between subcortical and cortical areas of the brain, including the amygdala and prefrontal cortex. In this study, we evaluated the relationship between spontaneous human facial affective expressions and multi-modal brain activity measured via non-invasive and wearable sensors: functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG) signals. The affective states of twelve male participants detected via fNIRS, EEG, and spontaneous facial expressions were investigated in response to both image-content stimuli and video-content stimuli. We propose a method to jointly evaluate fNIRS and EEG signals for affective state detection (emotional valence as positive or negative). Experimental results reveal a strong correlation between spontaneous facial affective expressions and the perceived emotional valence. Moreover, the affective states were estimated by the fNIRS, EEG, and fNIRS + EEG brain activity measurements. We show that the proposed EEG + fNIRS hybrid method outperforms fNIRS-only and EEG-only approaches. Our findings indicate that the dynamic (video-content based) stimuli triggers a larger affective response than the static (image-content based) stimuli. These findings also suggest joint utilization of facial expression and wearable neuroimaging, fNIRS, and EEG, for improved emotional analysis and affective brain-computer interface applications.
Collapse
Affiliation(s)
- Yanjia Sun
- Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA;
| | - Hasan Ayaz
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA;
- Department of Psychology, College of Arts and Sciences, Drexel University, Philadelphia, PA 19104, USA
- Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Injury Research and Prevention, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Ali N. Akansu
- Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA;
| |
Collapse
|
61
|
|
62
|
Happy SL, Dantcheva A, Bremond F. A Weakly Supervised learning technique for classifying facial expressions. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.08.025] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
63
|
Abstract
In this paper, we address the problem of synthesizing continuous variations with the appearance of makeup by taking a linear combination of the examples. Makeup usually shows a vague boundary and does not form a clear shape, which makes this problem unique from the existing image interpolation problems. We approach this problem as an interpolation between semi-transparent image layers and tackle this by presenting new parametrization schemes for the color and for the shape separately in order to achieve an effective interpolation. For the color parametrization, our main idea is based on the observation of the symmetric relation between the color and transparency of the makeup; we provide an optimization framework for extracting a representative palette of colors associated with the transparent values, which enables us to easily set up the color correspondence among the multiple makeup samples. For the shape parametrization, we exploit a polar coordinate system, that creates the in-between shapes effectively, without ghosting artifacts.
Collapse
|
64
|
3D Approaches and Challenges in Facial Expression Recognition Algorithms—A Literature Review. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9183904] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
In recent years, facial expression analysis and recognition (FER) have emerged as an active research topic with applications in several different areas, including the human-computer interaction domain. Solutions based on 2D models are not entirely satisfactory for real-world applications, as they present some problems of pose variations and illumination related to the nature of the data. Thanks to technological development, 3D facial data, both still images and video sequences, have become increasingly used to improve the accuracy of FER systems. Despite the advance in 3D algorithms, these solutions still have some drawbacks that make pure three-dimensional techniques convenient only for a set of specific applications; a viable solution to overcome such limitations is adopting a multimodal 2D+3D analysis. In this paper, we analyze the limits and strengths of traditional and deep-learning FER techniques, intending to provide the research community an overview of the results obtained looking to the next future. Furthermore, we describe in detail the most used databases to address the problem of facial expressions and emotions, highlighting the results obtained by the various authors. The different techniques used are compared, and some conclusions are drawn concerning the best recognition rates achieved.
Collapse
|
65
|
Deligianni F, Guo Y, Yang GZ. From Emotions to Mood Disorders: A Survey on Gait Analysis Methodology. IEEE J Biomed Health Inform 2019; 23:2302-2316. [PMID: 31502995 DOI: 10.1109/jbhi.2019.2938111] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Mood disorders affect more than 300 million people worldwide and can cause devastating consequences. Elderly people and patients with neurological conditions are particularly susceptible to depression. Gait and body movements can be affected by mood disorders, and thus they can be used as a surrogate sign, as well as an objective index for pervasive monitoring of emotion and mood disorders in daily life. Here we review evidence that demonstrates the relationship between gait, emotions and mood disorders, highlighting the potential of a multimodal approach that couples gait data with physiological signals and home-based monitoring for early detection and management of mood disorders. This could enhance self-awareness, enable the development of objective biomarkers that identify high risk subjects and promote subject-specific treatment.
Collapse
|
66
|
Meng Z, Han S, Liu P, Tong Y. Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3293-3306. [PMID: 29994138 DOI: 10.1109/tcyb.2018.2840090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.
Collapse
|
67
|
A Survey on Deep Learning in Image Polarity Detection: Balancing Generalization Performances and Computational Costs. ELECTRONICS 2019. [DOI: 10.3390/electronics8070783] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Deep convolutional neural networks (CNNs) provide an effective tool to extract complex information from images. In the area of image polarity detection, CNNs are customarily utilized in combination with transfer learning techniques to tackle a major problem: the unavailability of large sets of labeled data. Thus, polarity predictors in general exploit a pre-trained CNN as the feature extractor that in turn feeds a classification unit. While the latter unit is trained from scratch, the pre-trained CNN is subject to fine-tuning. As a result, the specific CNN architecture employed as the feature extractor strongly affects the overall performance of the model. This paper analyses state-of-the-art literature on image polarity detection and identifies the most reliable CNN architectures. Moreover, the paper provides an experimental protocol that should allow assessing the role played by the baseline architecture in the polarity detection task. Performance is evaluated in terms of both generalization abilities and computational complexity. The latter attribute becomes critical as polarity predictors, in the era of social networks, might need to be updated within hours or even minutes. In this regard, the paper gives practical hints on the advantages and disadvantages of the examined architectures both in terms of generalization and computational cost.
Collapse
|
68
|
Maremmani C, Monastero R, Orlandi G, Salvadori S, Pieroni A, Baschi R, Pecori A, Dolciotti C, Berchina G, Rovini E, Cuddemi F, Cavallo F. Objective assessment of blinking and facial expressions in Parkinson's disease using a vertical electro-oculogram and facial surface electromyography. Physiol Meas 2019; 40:065005. [PMID: 31018181 DOI: 10.1088/1361-6579/ab1c05] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
OBJECTIVE Hypomimia is a common and early symptom of Parkinson's disease (PD), which reduces the ability of PD patients to manifest emotions. Currently, it is visually evaluated by the neurologist during neurological examinations for PD diagnosis, as described in task 3.2 of the Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS). Since such an evaluation is semi-quantitative and affected by inter-variability, this paper aims to measure the physiological parameters related to eye blink and facial expressions extracted from a vertical electro-oculogram (VEOG) and facial surface electromyography (fsEMG) to differentiate PD patients from healthy control subjects (HCs). APPROACH The spontaneous eye blink rate-minute (sEBR), its maximum amplitude (BMP), and facial cutaneous muscle activity were measured in 24 PD patients and 24 HCs while the subjects looked at a visual-tester composed of three main parts: static vision, dynamic vision and reading silently. Specificity and sensitivity for each parameter were calculated. MAIN RESULTS The VEOG and the fsEMG allowed the identification of some parameters related to eye blink and facial expressions (i.e. sEBR, BMP, frontal and peribuccal muscular activities), being able to distinguish between PD patients and HCs with high sensitivity and specificity. SIGNIFICANCE The demonstration that the combination of parameters related to eye blink and facial expressions can discriminate (with high accuracy) between PD patients versus HCs, thus resulting in a useful tool to support the neurologist in objective assessment of hypomimia for improving PD diagnosis.
Collapse
Affiliation(s)
- Carlo Maremmani
- Unità Operativa di Neurologia, Laboratorio Congiunto di Neuro-Biorobotica, Ospedale delle Apuane, Azienda USL Toscana Nord Ovest, Massa, Italia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
69
|
Visual and Thermal Image Processing for Facial Specific Landmark Detection to Infer Emotions in a Child-Robot Interaction. SENSORS 2019; 19:s19132844. [PMID: 31248004 PMCID: PMC6650968 DOI: 10.3390/s19132844] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/21/2019] [Accepted: 06/22/2019] [Indexed: 11/17/2022]
Abstract
Child-Robot Interaction (CRI) has become increasingly addressed in research and applications. This work proposes a system for emotion recognition in children, recording facial images by both visual (RGB-red, green and blue) and Infrared Thermal Imaging (IRTI) cameras. For this purpose, the Viola-Jones algorithm is used on color images to detect facial regions of interest (ROIs), which are transferred to the thermal camera plane by multiplying a homography matrix obtained through the calibration process of the camera system. As a novelty, we propose to compute the error probability for each ROI located over thermal images, using a reference frame manually marked by a trained expert, in order to choose that ROI better placed according to the expert criteria. Then, this selected ROI is used to relocate the other ROIs, increasing the concordance with respect to the reference manual annotations. Afterwards, other methods for feature extraction, dimensionality reduction through Principal Component Analysis (PCA) and pattern classification by Linear Discriminant Analysis (LDA) are applied to infer emotions. The results show that our approach for ROI locations may track facial landmarks with significant low errors with respect to the traditional Viola-Jones algorithm. These ROIs have shown to be relevant for recognition of five emotions, specifically disgust, fear, happiness, sadness, and surprise, with our recognition system based on PCA and LDA achieving mean accuracy (ACC) and Kappa values of 85.75% and 81.84%, respectively. As a second stage, the proposed recognition system was trained with a dataset of thermal images, collected on 28 typically developing children, in order to infer one of five basic emotions (disgust, fear, happiness, sadness, and surprise) during a child-robot interaction. The results show that our system can be integrated to a social robot to infer child emotions during a child-robot interaction.
Collapse
|
70
|
Gunes H, Celiktutan O, Sariyanidi E. Live human-robot interactive public demonstrations with automatic emotion and personality prediction. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180026. [PMID: 30853000 PMCID: PMC6452249 DOI: 10.1098/rstb.2018.0026] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/14/2019] [Indexed: 02/05/2023] Open
Abstract
Communication with humans is a multi-faceted phenomenon where the emotions, personality and non-verbal behaviours, as well as the verbal behaviours, play a significant role, and human-robot interaction (HRI) technologies should respect this complexity to achieve efficient and seamless communication. In this paper, we describe the design and execution of five public demonstrations made with two HRI systems that aimed at automatically sensing and analysing human participants' non-verbal behaviour and predicting their facial action units, facial expressions and personality in real time while they interacted with a small humanoid robot. We describe an overview of the challenges faced together with the lessons learned from those demonstrations in order to better inform the science and engineering fields to design and build better robots with more purposeful interaction capabilities. This article is part of the theme issue 'From social brains to social robots: applying neurocognitive insights to human-robot interaction'.
Collapse
Affiliation(s)
- Hatice Gunes
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, UK
| | - Oya Celiktutan
- Centre for Robotics Research, Department of Informatics, King’s College London, London WC2R 2LS, UK
| | | |
Collapse
|
71
|
Yang J, Wang X, Han S, Wang J, Park DS, Wang Y. Improved Real-Time Facial Expression Recognition Based on a Novel Balanced and Symmetric Local Gradient Coding. SENSORS 2019; 19:s19081899. [PMID: 31013582 PMCID: PMC6514715 DOI: 10.3390/s19081899] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 04/17/2019] [Accepted: 04/17/2019] [Indexed: 11/21/2022]
Abstract
In the field of Facial Expression Recognition (FER), traditional local texture coding methods have a low computational complexity, while providing a robust solution with respect to occlusion, illumination, and other factors. However, there is still need for improving the accuracy of these methods while maintaining their real-time nature and low computational complexity. In this paper, we propose a feature-based FER system with a novel local texture coding operator, named central symmetric local gradient coding (CS-LGC), to enhance the performance of real-time systems. It uses four different directional gradients on 5 × 5 grids, and the gradient is computed in the center-symmetric way. The averages of the gradients are used to reduce the sensitivity to noise. These characteristics lead to symmetric of features by the CS-LGC operator, thus providing a better generalization capability in comparison to existing local gradient coding (LGC) variants. The proposed system further transforms the extracted features into an eigen-space using a principal component analysis (PCA) for better representation and less computation; it estimates the intended classes by training an extreme learning machine. The recognition rate for the JAFFE database is 95.24%, whereas that for the CK+ database is 98.33%. The results show that the system has advantages over the existing local texture coding methods.
Collapse
Affiliation(s)
- Jucheng Yang
- College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300457, China.
| | - Xiaojing Wang
- College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300457, China.
| | - Shujie Han
- College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300457, China.
| | - Jie Wang
- College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300457, China.
| | - Dong Sun Park
- College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300457, China.
- Department of Electronic and Information Engineering, Chonbuk National University, Jeonbuk 561-756, Korea.
| | - Yuan Wang
- College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300457, China.
| |
Collapse
|
72
|
A Review on Automatic Facial Expression Recognition Systems Assisted by Multimodal Sensor Data. SENSORS 2019; 19:s19081863. [PMID: 31003522 PMCID: PMC6514576 DOI: 10.3390/s19081863] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/15/2019] [Accepted: 04/15/2019] [Indexed: 11/28/2022]
Abstract
Facial Expression Recognition (FER) can be widely applied to various research areas, such as mental diseases diagnosis and human social/physiological interaction detection. With the emerging advanced technologies in hardware and sensors, FER systems have been developed to support real-world application scenes, instead of laboratory environments. Although the laboratory-controlled FER systems achieve very high accuracy, around 97%, the technical transferring from the laboratory to real-world applications faces a great barrier of very low accuracy, approximately 50%. In this survey, we comprehensively discuss three significant challenges in the unconstrained real-world environments, such as illumination variation, head pose, and subject-dependence, which may not be resolved by only analysing images/videos in the FER system. We focus on those sensors that may provide extra information and help the FER systems to detect emotion in both static images and video sequences. We introduce three categories of sensors that may help improve the accuracy and reliability of an expression recognition system by tackling the challenges mentioned above in pure image/video processing. The first group is detailed-face sensors, which detect a small dynamic change of a face component, such as eye-trackers, which may help differentiate the background noise and the feature of faces. The second is non-visual sensors, such as audio, depth, and EEG sensors, which provide extra information in addition to visual dimension and improve the recognition reliability for example in illumination variation and position shift situation. The last is target-focused sensors, such as infrared thermal sensors, which can facilitate the FER systems to filter useless visual contents and may help resist illumination variation. Also, we discuss the methods of fusing different inputs obtained from multimodal sensors in an emotion system. We comparatively review the most prominent multimodal emotional expression recognition approaches and point out their advantages and limitations. We briefly introduce the benchmark data sets related to FER systems for each category of sensors and extend our survey to the open challenges and issues. Meanwhile, we design a framework of an expression recognition system, which uses multimodal sensor data (provided by the three categories of sensors) to provide complete information about emotions to assist the pure face image/video analysis. We theoretically analyse the feasibility and achievability of our new expression recognition system, especially for the use in the wild environment, and point out the future directions to design an efficient, emotional expression recognition system.
Collapse
|
73
|
Jain DK, Shamsolmoali P, Sehdev P. Extended deep neural network for facial emotion recognition. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.01.008] [Citation(s) in RCA: 133] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
74
|
Barman A, Dutta P. Facial expression recognition using distance and texture signature relevant features. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.01.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
75
|
Liu Y, Yu M, Yu Y, Yin M. Facial expression recognition based on weighted adaptive symmetric CBP-TOP. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-18696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Yi Liu
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, PR China
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, PR China
| | - Ming Yu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, PR China
| | - Yang Yu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, PR China
| | - Mingyue Yin
- Hebei branch of China Life Insurance Company Limited, Shijiazhuang, PR China
| |
Collapse
|
76
|
Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01158-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
77
|
Chu WS, De la Torre F, Cohn JF. Learning Facial Action Units with Spatiotemporal Cues and Multi-label Sampling. IMAGE AND VISION COMPUTING 2019; 81:1-14. [PMID: 30524157 PMCID: PMC6277040 DOI: 10.1016/j.imavis.2018.10.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Facial action units (AUs) may be represented spatially, temporally, and in terms of their correlation. Previous research focuses on one or another of these aspects or addresses them disjointly. We propose a hybrid network architecture that jointly models spatial and temporal representations and their correlation. In particular, we use a Convolutional Neural Network (CNN) to learn spatial representations, and a Long Short-Term Memory (LSTM) to model temporal dependencies among them. The outputs of CNNs and LSTMs are aggregated into a fusion network to produce per-frame prediction of multiple AUs. The hybrid network was compared to previous state-of-the-art approaches in two large FACS-coded video databases, GFT and BP4D, with over 400,000 AU-coded frames of spontaneous facial behavior in varied social contexts. Relative to standard multi-label CNN and feature-based state-of-the-art approaches, the hybrid system reduced person-specific biases and obtained increased accuracy for AU detection. To address class imbalance within and between batches during training the network, we introduce multi-labeling sampling strategies that further increase accuracy when AUs are relatively sparse. Finally, we provide visualization of the learned AU models, which, to the best of our best knowledge, reveal for the first time how machines see AUs.
Collapse
Affiliation(s)
- Wen-Sheng Chu
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| | | | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, USA
| |
Collapse
|
78
|
Wang Y, Dantcheva A, Broutart JC, Robert P, Bremond F, Bilinski P. Comparing Methods for Assessment of Facial Dynamics in Patients with Major Neurocognitive Disorders. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-11024-6_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
79
|
The Extended Multidimensional Neo-Fuzzy System and Its Fast Learning in Pattern Recognition Tasks. DATA 2018. [DOI: 10.3390/data3040063] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Methods of machine learning and data mining are becoming the cornerstone in information technologies with real-time image and video recognition methods getting more and more attention. While computational system architectures are getting larger and more complex, their learning methods call for changes, as training datasets often reach tens and hundreds of thousands of samples, therefore increasing the learning time of such systems. It is possible to reduce computational costs by tuning the system structure to allow fast, high accuracy learning algorithms to be applied. This paper proposes a system based on extended multidimensional neo-fuzzy units and its learning algorithm designed for data streams processing tasks. The proposed learning algorithm, based on the information entropy criterion, has significantly improved the system approximating capabilities. Experiments have confirmed the efficiency of the proposed system in solving real-time video stream recognition tasks.
Collapse
|
80
|
Boccignone G, Conte D, Cuculo V, D'Amelio A, Grossi G, Lanzarotti R. Deep Construction of an Affective Latent Space via Multimodal Enactment. IEEE Trans Cogn Dev Syst 2018. [DOI: 10.1109/tcds.2017.2788820] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
81
|
Del Coco M, Leo M, Carcagni P, Fama F, Spadaro L, Ruta L, Pioggia G, Distante C. Study of Mechanisms of Social Interaction Stimulation in Autism Spectrum Disorder by Assisted Humanoid Robot. IEEE Trans Cogn Dev Syst 2018. [DOI: 10.1109/tcds.2017.2783684] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
82
|
Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning. Int J Comput Vis 2018. [DOI: 10.1007/s11263-018-1131-1] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
83
|
|
84
|
|
85
|
Cohn JF, Okun MS, Jeni LA, Ertugrul IO, Borton D, Malone D, Goodman WK. Automated Affect Detection in Deep Brain Stimulation for Obsessive-Compulsive Disorder: A Pilot Study. PROCEEDINGS OF THE ... ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ICMI (CONFERENCE) 2018; 2018:40-44. [PMID: 30511050 PMCID: PMC6271416 DOI: 10.1145/3242969.3243023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Automated measurement of affective behavior in psychopathology has been limited primarily to screening and diagnosis. While useful, clinicians more often are concerned with whether patients are improving in response to treatment. Are symptoms abating, is affect becoming more positive, are unanticipated side effects emerging? When treatment includes neural implants, need for objective, repeatable biometrics tied to neurophysiology becomes especially pressing. We used automated face analysis to assess treatment response to deep brain stimulation (DBS) in two patients with intractable obsessive-compulsive disorder (OCD). One was assessed intraoperatively following implantation and activation of the DBS device. The other was assessed three months post-implantation. Both were assessed during DBS on and o conditions. Positive and negative valence were quantified using a CNN trained on normative data of 160 non-OCD participants. Thus, a secondary goal was domain transfer of the classifiers. In both contexts, DBS-on resulted in marked positive affect. In response to DBS-off, affect flattened in both contexts and alternated with increased negative affect in the outpatient setting. Mean AUC for domain transfer was 0.87. These findings suggest that parametric variation of DBS is strongly related to affective behavior and may introduce vulnerability for negative affect in the event that DBS is discontinued.
Collapse
|
86
|
An Adaptive Ensemble Approach to Ambient Intelligence Assisted People Search. APPLIED SYSTEM INNOVATION 2018. [DOI: 10.3390/asi1030033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Some machine learning algorithms have shown a better overall recognition rate for facial recognition than humans, provided that the models are trained with massive image databases of human faces. However, it is still a challenge to use existing algorithms to perform localized people search tasks where the recognition must be done in real time, and where only a small face database is accessible. A localized people search is essential to enable robot–human interactions. In this article, we propose a novel adaptive ensemble approach to improve facial recognition rates while maintaining low computational costs, by combining lightweight local binary classifiers with global pre-trained binary classifiers. In this approach, the robot is placed in an ambient intelligence environment that makes it aware of local context changes. Our method addresses the extreme unbalance of false positive results when it is used in local dataset classifications. Furthermore, it reduces the errors caused by affine deformation in face frontalization, and by poor camera focus. Our approach shows a higher recognition rate compared to a pre-trained global classifier using a benchmark database under various resolution images, and demonstrates good efficacy in real-time tasks.
Collapse
|
87
|
Zhi R, Zamzmi GZD, Goldgof D, Ashmeade T, Sun Y. Automatic Infants' Pain Assessment by Dynamic Facial Representation: Effects of Profile View, Gestational Age, Gender, and Race. J Clin Med 2018; 7:E173. [PMID: 29997313 PMCID: PMC6069472 DOI: 10.3390/jcm7070173] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 06/30/2018] [Accepted: 07/03/2018] [Indexed: 12/03/2022] Open
Abstract
Infants' early exposure to painful procedures can have negative short and long-term effects on cognitive, neurological, and brain development. However, infants cannot express their subjective pain experience, as they do not communicate in any language. Facial expression is the most specific pain indicator, which has been effectively employed for automatic pain recognition. In this paper, dynamic pain facial expression representation and fusion scheme for automatic pain assessment in infants is proposed by combining temporal appearance facial features and temporal geometric facial features. We investigate the effects of various factors that influence pain reactivity in infants, such as individual variables of gestational age, gender, and race. Different automatic infant pain assessment models are constructed, depending on influence factors as well as facial profile view, which affect the model ability of pain recognition. It can be concluded that the profile-based infant pain assessment is feasible, as its performance is almost as good as that of the whole face. Moreover, gestational age is the most influencing factor for pain assessment, and it is necessary to construct specific models depending on it. This is mainly because of a lack of behavioral communication ability in infants with low gestational age, due to limited neurological development. To our best knowledge, this is the first study investigating infants' pain recognition, highlighting profile facial views and various individual variables.
Collapse
Affiliation(s)
- Ruicong Zhi
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China.
| | | | - Dmitry Goldgof
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA.
| | - Terri Ashmeade
- College of Medicine Pediatrics, University of South Florida, Tampa, FL 33620, USA.
| | - Yu Sun
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA.
| |
Collapse
|
88
|
Oh YH, See J, Le Ngo AC, Phan RCW, Baskaran VM. A Survey of Automatic Facial Micro-Expression Analysis: Databases, Methods, and Challenges. Front Psychol 2018; 9:1128. [PMID: 30042706 PMCID: PMC6049018 DOI: 10.3389/fpsyg.2018.01128] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Accepted: 06/12/2018] [Indexed: 11/13/2022] Open
Abstract
Over the last few years, automatic facial micro-expression analysis has garnered increasing attention from experts across different disciplines because of its potential applications in various fields such as clinical diagnosis, forensic investigation and security systems. Advances in computer algorithms and video acquisition technology have rendered machine analysis of facial micro-expressions possible today, in contrast to decades ago when it was primarily the domain of psychiatrists where analysis was largely manual. Indeed, although the study of facial micro-expressions is a well-established field in psychology, it is still relatively new from the computational perspective with many interesting problems. In this survey, we present a comprehensive review of state-of-the-art databases and methods for micro-expressions spotting and recognition. Individual stages involved in the automation of these tasks are also described and reviewed at length. In addition, we also deliberate on the challenges and future directions in this growing field of automatic facial micro-expression analysis.
Collapse
Affiliation(s)
- Yee-Hui Oh
- Faculty of Engineering, Multimedia University Cyberjaya, Malaysia
| | - John See
- Faculty of Computing and Informatics, Multimedia University Cyberjaya, Malaysia
| | - Anh Cat Le Ngo
- School of Psychology, University of Nottingham Nottingham, United Kingdom
| | - Raphael C-W Phan
- Faculty of Engineering, Multimedia University Cyberjaya, Malaysia.,Research Institute for Digital Security, Multimedia University Cyberjaya, Malaysia
| | - Vishnu M Baskaran
- School of Information Technology, Monash University Malaysia Bandar Sunway, Malaysia
| |
Collapse
|
89
|
Cabada RZ, Estrada MLB, Hernández FG, Bustillos RO, Reyes-García CA. An affective and Web 3.0-based learning environment for a programming language. TELEMATICS AND INFORMATICS 2018. [DOI: 10.1016/j.tele.2017.03.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
90
|
Zhao K, Chu WS, Martinez AM. Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2018; 2018:2090-2099. [PMID: 31244515 PMCID: PMC6594709 DOI: 10.1109/cvpr.2018.00223] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We present a scalable weakly supervised clustering approach to learn facial action units (AUs) from large, freely available web images. Unlike most existing methods (e.g., CNNs) that rely on fully annotated data, our method exploits web images with inaccurate annotations. Specifically, we derive a weakly-supervised spectral algorithm that learns an embedding space to couple image appearance and semantics. The algorithm has efficient gradient update, and scales up to large quantities of images with a stochastic extension. With the learned embedding space, we adopt rank-order clustering to identify groups of visually and semantically similar images, and re-annotate these groups for training AU classifiers. Evaluation on the 1 millon EmotioNet dataset demonstrates the effectiveness of our approach: (1) our learned annotations reach on average 91.3% agreement with human annotations on 7 common AUs, (2) classifiers trained with re-annotated images perform comparably to, sometimes even better than, its supervised CNN-based counterpart, and (3) our method offers intuitive outlier/noise pruning instead of forcing one annotation to every image. Code is available.
Collapse
Affiliation(s)
- Kaili Zhao
- School of Comm. and Info. Engineering, Beijing University of Posts and Telecom
| | | | - Aleix M Martinez
- Dept. of Electrical and Computer Engineering, The Ohio State University
| |
Collapse
|
91
|
Ertugrul IO, Jeni LA, Cohn JF. FACSCaps: Pose-Independent Facial Action Coding with Capsules. CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. WORKSHOPS 2018; 2018:2211-2220. [PMID: 30944768 PMCID: PMC6443417 DOI: 10.1109/cvprw.2018.00287] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most automated facial expression analysis methods treat the face as a 2D object, flat like a sheet of paper. That works well provided images are frontal or nearly so. In real- world conditions, moderate to large head rotation is common and system performance to recognize expression degrades. Multi-view Convolutional Neural Networks (CNNs) have been proposed to increase robustness to pose, but they require greater model sizes and may generalize poorly across views that are not included in the training set. We propose FACSCaps architecture to handle multi-view and multi-label facial action unit (AU) detection within a single model that can generalize to novel views. Additionally, FACSCaps's ability to synthesize faces enables insights into what is leaned by the model. FACSCaps models video frames using matrix capsules, where hierarchical pose relationships between face parts are built into internal representations. The model is trained by jointly optimizing a multi-label loss and the reconstruction accuracy. FACSCaps was evaluated using the FERA 2017 facial expression dataset that includes spontaneous facial expressions in a wide range of head orientations. FACSCaps outperformed both state-of-the-art CNNs and their temporal extensions.
Collapse
Affiliation(s)
| | - Lászlό A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jeffrey F Cohn
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
92
|
Uncertainty Flow Facilitates Zero-Shot Multi-Label Learning in Affective Facial Analysis. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8020300] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
93
|
Li B, Yang F. An Across-Target Study on Visual Attentions in Facial Expression Recognition. Interdiscip Sci 2018; 10:367-374. [PMID: 29383565 DOI: 10.1007/s12539-018-0281-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 01/09/2018] [Accepted: 01/11/2018] [Indexed: 11/27/2022]
Abstract
As a simulation of human expression recognition, the studies on automatic expression recognition expect to draw useful enlightenment through close, accurate observation on human expression processing via advanced devices. Eye-trackers are mostly used devices that are technically designed to obtain eye-movement data. However, due to the discrepancy between target faces, across-target analysis is limited in these studies, and this much reduces the chance of finding the latent eye-behavior patterns. Through the utilization of correspondences between targets, this study achieves an across-target analysis to explore the attention pattern in expression recognition. The fixations from different targets are mapped onto a synthetic face to generate an across-target fixation map, and then tokenized with area of interests (AOI), measured in receiver operating characteristic (ROC) space, modeled by linear regression and compared through Pearson's correlation. The resulted averaged correlation values vary in the range (0.60, 0.86), and illustrate that there is significant similarity between subjects when recognizing the same expression classes.
Collapse
Affiliation(s)
- Baomin Li
- Faculty of Education, East China Normal University, Shanghai, 200062, China
| | - Fenglei Yang
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
94
|
|
95
|
Oral-Motor and Lexical Diversity During Naturalistic Conversations in Adults with Autism Spectrum Disorder. PROCEEDINGS OF THE CONFERENCE. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. NORTH AMERICAN CHAPTER. MEETING 2018; 2018:147-157. [PMID: 33073267 DOI: 10.18653/v1/w18-0616] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by impaired social communication and the presence of restricted, repetitive patterns of behaviors and interests. Prior research suggests that restricted patterns of behavior in ASD may be cross-domain phenomena that are evident in a variety of modalities. Computational studies of language in ASD provide support for the existence of an underlying dimension of restriction that emerges during a conversation. Similar evidence exists for restricted patterns of facial movement. Using tools from computational linguistics, computer vision, and information theory, this study tests whether cognitive-motor restriction can be detected across multiple behavioral domains in adults with ASD during a naturalistic conversation. Our methods identify restricted behavioral patterns, as measured by entropy in word use and mouth movement. Results suggest that adults with ASD produce significantly less diverse mouth movements and words than neurotypical adults, with an increased reliance on repeated patterns in both domains. The diversity values of the two domains are not significantly correlated, suggesting that they provide complementary information.
Collapse
|
96
|
Costa-Abreu MD, Bezerra GS. FAMOS: a framework for investigating the use of face features to identify spontaneous emotions. Pattern Anal Appl 2017. [DOI: 10.1007/s10044-017-0675-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
97
|
|
98
|
Hammal Z, Chu WS, Cohn JF, Heike C, Speltz ML. Automatic Action Unit Detection in Infants Using Convolutional Neural Network. INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION AND WORKSHOPS : [PROCEEDINGS]. ACII (CONFERENCE) 2017; 2017:216-221. [PMID: 29862131 PMCID: PMC5976252 DOI: 10.1109/acii.2017.8273603] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Action unit detection in infants relative to adults presents unique challenges. Jaw contour is less distinct, facial texture is reduced, and rapid and unusual facial movements are common. To detect facial action units in spontaneous behavior of infants, we propose a multi-label Convolutional Neural Network (CNN). Eighty-six infants were recorded during tasks intended to elicit enjoyment and frustration. Using an extension of FACS for infants (Baby FACS), over 230,000 frames were manually coded for ground truth. To control for chance agreement, inter-observer agreement between Baby-FACS coders was quantified using free-margin kappa. Kappa coefficients ranged from 0.79 to 0.93, which represents high agreement. The multi-label CNN achieved comparable agreement with manual coding. Kappa ranged from 0.69 to 0.93. Importantly, the CNN-based AU detection revealed the same change in findings with respect to infant expressiveness between tasks. While further research is needed, these findings suggest that automatic AU detection in infants is a viable alternative to manual coding of infant facial expression.
Collapse
Affiliation(s)
- Zakia Hammal
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| | - Wen-Sheng Chu
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| | - Jeffrey F Cohn
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
- Department of Psychology, University of Pittsburgh, Pittsburgh, USA
| | | | | |
Collapse
|
99
|
Oyedotun OK, Demisse G, Shabayek AER, Aouada D, Ottersten B. Facial Expression Recognition via Joint Deep Learning of RGB-Depth Map Latent Representations. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) 2017. [DOI: 10.1109/iccvw.2017.374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
100
|
Sun Z, Hu ZP, Wang M, Bai F, Sun B. Robust Facial Expression Recognition with Low-Rank Sparse Error Dictionary Based Probabilistic Collaborative Representation Classification. INT J ARTIF INTELL T 2017. [DOI: 10.1142/s0218213017500178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The performance of facial expression recognition (FER) would be degraded due to some factors such as individual differences, Gaussian random noise and so on. Prior feature extraction methods like Local Binary Patterns (LBP) and Gabor filters require explicit expression components, which are always unavailable and difficult to obtain. To make the facial expression recognition (FER) more robust, we propose a novel FER approach based on low-rank sparse error dictionary (LRSE) to remit the side-effect caused by the problems above. Then the query samples can be represented and classified by a probabilistic collaborative representation based classifier (ProCRC), which exploits the maximum likelihood that the query sample belonging to the collaborative subspace of all classes can be better computed. The final classification is performed by seeking which class has the maximum probability. The proposed approach which exploits ProCRC associated with the LRSE features (LRSE ProCRC) for robust FER reaches higher average accuracies on the different databases (i.e., 79.39% on KDEF database, 89.54% on CAS-PEAL database, 84.45% on CK+ database etc.). In addition, our method also leads to state-of-the-art classification results from the aspect of feature extraction methods, training samples, Gaussian noise variances and classification based methods on benchmark databases.
Collapse
Affiliation(s)
- Zhe Sun
- Department of Information Science and Engineering, Yanshan University, Hebei Street, Qinhuangdao, Hebei Province, China
| | - Zheng-Ping Hu
- Department of Information Science and Engineering, Yanshan University, Hebei Street, Qinhuangdao, Hebei Province, China
| | - Meng Wang
- Department of Information Science and Engineering, Taishan University, Dongyue Street, Tai’an, Shandong Province, China
| | - Fan Bai
- Department of Information Science and Engineering, Yanshan University, Hebei Street, Qinhuangdao, Hebei Province, China
| | - Bo Sun
- Department of Physics Science and Technology, Hebei University, Wusi Street, Baoding, Hebei Province, China
| |
Collapse
|