1
|
Onal Ertugrul I, Ahn YA, Bilalpur M, Messinger DS, Speltz ML, Cohn JF. Infant AFAR: Automated facial action recognition in infants. Behav Res Methods 2023; 55:1024-1035. [PMID: 35538295 PMCID: PMC9646921 DOI: 10.3758/s13428-022-01863-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 11/08/2022]
Abstract
Automated detection of facial action units in infants is challenging. Infant faces have different proportions, less texture, fewer wrinkles and furrows, and unique facial actions relative to adults. For these and related reasons, action unit (AU) detectors that are trained on adult faces may generalize poorly to infant faces. To train and test AU detectors for infant faces, we trained convolutional neural networks (CNN) in adult video databases and fine-tuned these networks in two large, manually annotated, infant video databases that differ in context, head pose, illumination, video resolution, and infant age. AUs were those central to expression of positive and negative emotion. AU detectors trained in infants greatly outperformed ones trained previously in adults. Training AU detectors across infant databases afforded greater robustness to between-database differences than did training database specific AU detectors and outperformed previous state-of-the-art in infant AU detection. The resulting AU detection system, which we refer to as Infant AFAR (Automated Facial Action Recognition), is available to the research community for further testing and applications in infant emotion, social interaction, and related topics.
Collapse
|
2
|
Prkachin KM, Hammal Z. Computer mediated automatic detection of pain-related behavior: prospect, progress, perils. FRONTIERS IN PAIN RESEARCH 2022; 2. [PMID: 35174358 PMCID: PMC8846566 DOI: 10.3389/fpain.2021.788606] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Pain is often characterized as a fundamentally subjective phenomenon; however, all pain assessment reduces the experience to observables, with strengths and limitations. Most evidence about pain derives from observations of pain-related behavior. There has been considerable progress in articulating the properties of behavioral indices of pain; especially, but not exclusively those based on facial expression. An abundant literature shows that a limited subset of facial actions, with homologs in several non-human species, encode pain intensity across the lifespan. Unfortunately, acquiring such measures remains prohibitively impractical in many settings because it requires trained human observers and is laborious. The advent of the field of affective computing, which applies computer vision and machine learning (CVML) techniques to the recognition of behavior, raised the prospect that advanced technology might overcome some of the constraints limiting behavioral pain assessment in clinical and research settings. Studies have shown that it is indeed possible, through CVML, to develop systems that track facial expressions of pain. There has since been an explosion of research testing models for automated pain assessment. More recently, researchers have explored the feasibility of multimodal measurement of pain-related behaviors. Commercial products that purport to enable automatic, real-time measurement of pain expression have also appeared. Though progress has been made, this field remains in its infancy and there is risk of overpromising on what can be delivered. Insufficient adherence to conventional principles for developing valid measures and drawing appropriate generalizations to identifiable populations could lead to scientifically dubious and clinically risky claims. There is a particular need for the development of databases containing samples from various settings in which pain may or may not occur, meticulously annotated according to standards that would permit sharing, subject to international privacy standards. Researchers and users need to be sensitive to the limitations of the technology (for e.g., the potential reification of biases that are irrelevant to the assessment of pain) and its potentially problematic social implications.
Collapse
Affiliation(s)
- Kenneth M Prkachin
- Department of Psychology, University of Northern British Columbia, Prince George, BC, Canada
| | - Zakia Hammal
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
3
|
Ertugrul IO, Cohn JF, Jeni LA, Zhang Z, Yin L, Ji Q. Crossing Domains for AU Coding: Perspectives, Approaches, and Measures. IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE 2020; 2:158-171. [PMID: 32377637 PMCID: PMC7202467 DOI: 10.1109/tbiom.2020.2977225] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Facial action unit (AU) detectors have performed well when trained and tested within the same domain. How well do AU detectors transfer to domains in which they have not been trained? We review literature on cross-domain transfer and conduct experiments to address limitations of prior research. We evaluate generalizability in four publicly available databases. EB+ (an expanded version of BP4D+), Sayette GFT, DISFA and UNBC Shoulder Pain (SP). The databases differ in observational scenarios, context, participant diversity, range of head pose, video resolution, and AU base rates. In most cases performance decreased with change in domain, often to below the threshold needed for behavioral research. However, exceptions were noted. Deep and shallow approaches generally performed similarly and average results were slightly better for deep model compared to shallow one. Occlusion sensitivity maps revealed that local specificity was greater for AU detection within than cross domains. The findings suggest that more varied domains and deep learning approaches may be better suited for generalizability and suggest the need for more attention to characteristics that vary between domains. Until further improvement is realized, caution is warranted when applying AU classifiers from one domain to another.
Collapse
Affiliation(s)
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - László A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zheng Zhang
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Lijun Yin
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Qiang Ji
- Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
4
|
Martinez AM. The promises and perils of automated facial action coding in studying children's emotions. Dev Psychol 2020; 55:1965-1981. [PMID: 31464498 DOI: 10.1037/dev0000728] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computer vision algorithms have made tremendous advances in recent years. We now have algorithms that can detect and recognize objects, faces, and even facial actions in still images and video sequences. This is wonderful news for researchers that need to code facial articulations in large data sets of images and videos, because this task is time consuming and can only be completed by expert coders, making it very expensive. The availability of computer algorithms that can automatically code facial actions in extremely large data sets also opens the door to studies in psychology and neuroscience that were not previously possible, for example, to study the development of the production of facial expressions from infancy to adulthood within and across cultures. Unfortunately, there is a lack of methodological understanding on how these algorithms should and should not be used, and on how to select the most appropriate algorithm for each study. This article aims to address this gap in the literature. Specifically, we present several methodologies for use in hypothesis-based and exploratory studies, explain how to select the computer algorithms that best fit to the requirements of our experimental design, and detail how to evaluate whether the automatic annotations provided by existing algorithms are trustworthy. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
5
|
Ertugrul IO, Cohn JF, Jeni LA, Zhang Z, Yin L, Ji Q. Cross-domain AU Detection: Domains, Learning Approaches, and Measures. PROCEEDINGS OF THE ... INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION. IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION 2019; 2019:10.1109/FG.2019.8756543. [PMID: 31749665 PMCID: PMC6867108 DOI: 10.1109/fg.2019.8756543] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Facial action unit (AU) detectors have performed well when trained and tested within the same domain. Do AU detectors transfer to new domains in which they have not been trained? To answer this question, we review literature on cross-domain transfer and conduct experiments to address limitations of prior research. We evaluate both deep and shallow approaches to AU detection (CNN and SVM, respectively) in two large, well-annotated, publicly available databases, Expanded BP4D+ and GFT. The databases differ in observational scenarios, participant characteristics, range of head pose, video resolution, and AU base rates. For both approaches and databases, performance decreased with change in domain, often to below the threshold needed for behavioral research. Decreases were not uniform, however. They were more pronounced for GFT than for Expanded BP4D+ and for shallow relative to deep learning. These findings suggest that more varied domains and deep learning approaches may be better suited for promoting generalizability. Until further improvement is realized, caution is warranted when applying AU classifiers from one domain to another.
Collapse
Affiliation(s)
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - László A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zheng Zhang
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Lijun Yin
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Qiang Ji
- Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
6
|
Eleftheriadis S, Rudovic O, Deisenroth MP, Pantic M. Gaussian Process Domain Experts for Modeling of Facial Affect. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:4697-4711. [PMID: 28678708 DOI: 10.1109/tip.2017.2721114] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Most of existing models for facial behavior analysis rely on generic classifiers, which fail to generalize well to previously unseen data. This is because of inherent differences in source (training) and target (test) data, mainly caused by variation in subjects' facial morphology, camera views, and so on. All of these account for different contexts in which target and source data are recorded, and thus, may adversely affect the performance of the models learned solely from source data. In this paper, we exploit the notion of domain adaptation and propose a data efficient approach to adapt already learned classifiers to new unseen contexts. Specifically, we build upon the probabilistic framework of Gaussian processes (GPs), and introduce domain-specific GP experts (e.g., for each subject). The model adaptation is facilitated in a probabilistic fashion, by conditioning the target expert on the predictions from multiple source experts. We further exploit the predictive variance of each expert to define an optimal weighting during inference. We evaluate the proposed model on three publicly available data sets for multi-class (MultiPIE) and multi-label (DISFA, FERA2015) facial expression analysis by performing adaptation of two contextual factors: "where" (view) and "who" (subject). In our experiments, the proposed approach consistently outperforms: 1) both source and target classifiers, while using a small number of target examples during the adaptation and 2) related state-of-the-art approaches for supervised domain adaptation.
Collapse
|
7
|
Girard JM, Chu WS, Jeni LA, Cohn JF, De la Torre F, Sayette MA. Sayette Group Formation Task (GFT) Spontaneous Facial Expression Database. PROCEEDINGS OF THE ... INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION. IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION 2017; 2017:581-588. [PMID: 29606916 PMCID: PMC5876025 DOI: 10.1109/fg.2017.144] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Despite the important role that facial expressions play in interpersonal communication and our knowledge that interpersonal behavior is influenced by social context, no currently available facial expression database includes multiple interacting participants. The Sayette Group Formation Task (GFT) database addresses the need for well-annotated video of multiple participants during unscripted interactions. The database includes 172,800 video frames from 96 participants in 32 three-person groups. To aid in the development of automated facial expression analysis systems, GFT includes expert annotations of FACS occurrence and intensity, facial landmark tracking, and baseline results for linear SVM, deep learning, active patch learning, and personalized classification. Baseline performance is quantified and compared using identical partitioning and a variety of metrics (including means and confidence intervals). The highest performance scores were found for the deep learning and active patch learning methods. Learn more at http://osf.io/7wcyz.
Collapse
Affiliation(s)
- Jeffrey M Girard
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260
| | - Wen-Sheng Chu
- Robotic Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | - László A Jeni
- Robotic Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260
- Robotic Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | | | - Michael A Sayette
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
8
|
Chu WS, De la Torre F, Cohn JF, Messinger DS. A Branch-and-Bound Framework for Unsupervised Common Event Discovery. Int J Comput Vis 2017; 123:372-391. [PMID: 28943718 DOI: 10.1007/s11263-017-0989-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Event discovery aims to discover a temporal segment of interest, such as human behavior, actions or activities. Most approaches to event discovery within or between time series use supervised learning. This becomes problematic when some relevant event labels are unknown, are difficult to detect, or not all possible combinations of events have been anticipated. To overcome these problems, this paper explores Common Event Discovery (CED), a new problem that aims to discover common events of variable-length segments in an unsupervised manner. A potential solution to CED is searching over all possible pairs of segments, which would incur a prohibitive quartic cost. In this paper, we propose an efficient branch-and-bound (B&B) framework that avoids exhaustive search while guaranteeing a globally optimal solution. To this end, we derive novel bounding functions for various commonality measures and provide extensions to multiple commonality discovery and accelerated search. The B&B framework takes as input any multidimensional signal that can be quantified into histograms. A generalization of the framework can be readily applied to discover events at the same or different times (synchrony and event commonality, respectively). We consider extensions to video search and supervised event detection. The effectiveness of the B&B framework is evaluated in motion capture of deliberate behavior and in video of spontaneous facial behavior in diverse interpersonal contexts: interviews, small groups of young adults, and parent-infant face-to-face interaction.
Collapse
Affiliation(s)
| | | | - Jeffrey F Cohn
- Robotics Institute, Carnegie Mellon University, USA
- Department of Psychology, University of Pittsburgh, USA
| | | |
Collapse
|
9
|
De la Torre F, Cohn JF. Confidence Preserving Machine for Facial Action Unit Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:4753-4767. [PMID: 27479964 PMCID: PMC5272912 DOI: 10.1109/tip.2016.2594486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Facial action unit (AU) detection from video has been a long-standing problem in the automated facial expression analysis. While progress has been made, accurate detection of facial AUs remains challenging due to ubiquitous sources of errors, such as inter-personal variability, pose, and low-intensity AUs. In this paper, we refer to samples causing such errors as hard samples, and the remaining as easy samples. To address learning with the hard samples, we propose the confidence preserving machine (CPM), a novel two-stage learning framework that combines multiple classifiers following an "easy-to-hard" strategy. During the training stage, CPM learns two confident classifiers. Each classifier focuses on separating easy samples of one class from all else, and thus preserves confidence on predicting each class. During the test stage, the confident classifiers provide "virtual labels" for easy test samples. Given the virtual labels, we propose a quasi-semi-supervised (QSS) learning strategy to learn a person-specific classifier. The QSS strategy employs a spatio-temporal smoothness that encourages similar predictions for samples within a spatio-temporal neighborhood. In addition, to further improve detection performance, we introduce two CPM extensions: iterative CPM that iteratively augments training samples to train the confident classifiers, and kernel CPM that kernelizes the original CPM model to promote nonlinearity. Experiments on four spontaneous data sets GFT, BP4D, DISFA, and RU-FACS illustrate the benefits of the proposed CPM models over baseline methods and the state-of-the-art semi-supervised learning and transfer learning methods.
Collapse
|