1
|
Ma W, Xiao Z, Wu Y, Zhang X, Zheng D, Lei X, Han C. Face Blindness in Children and Current Interventions. Behav Sci (Basel) 2023; 13:676. [PMID: 37622816 PMCID: PMC10451769 DOI: 10.3390/bs13080676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 08/05/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023] Open
Abstract
Children with prosopagnosia, also known as face blindness, struggle to recognize the faces of acquaintances, which can have a negative impact on their social interactions and overall functioning. This paper reviews existing research on interventions for children with prosopagnosia, including compensatory and remedial strategies, and provides a summary and comparison of their effectiveness. However, despite the availability of these interventions, their effectiveness remains limited and constrained by various factors. The lack of a widely accepted treatment for children with prosopagnosia emphasizes the need for further research to improve intervention strategies. Last, three future research directions were proposed to improve interventions for prosopagnosia, including ecological approaches, the social challenges faced by children, and new potential intervention methods.
Collapse
Affiliation(s)
- Weina Ma
- Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China
- Zhejiang Philosophy and Social Science Laboratory for Research in Early Development and Childcare, Hangzhou Normal University, Hangzhou 311121, China
| | - Zeyu Xiao
- Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China
| | - Yannan Wu
- Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China
| | - Xiaoxian Zhang
- Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China
- Zhejiang Philosophy and Social Science Laboratory for Research in Early Development and Childcare, Hangzhou Normal University, Hangzhou 311121, China
| | - Dongwen Zheng
- Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China
| | - Xue Lei
- School of Business Administration, Zhejiang University of Finance and Economics, Hangzhou 310018, China
| | - Chengyang Han
- Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China
- Zhejiang Philosophy and Social Science Laboratory for Research in Early Development and Childcare, Hangzhou Normal University, Hangzhou 311121, China
| |
Collapse
|
2
|
Eyes versus Eyebrows: A Comprehensive Evaluation Using the Multiscale Analysis and Curvature-Based Combination Methods in Partial Face Recognition. ALGORITHMS 2022. [DOI: 10.3390/a15060208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This work aimed to find the most discriminative facial regions between the eyes and eyebrows for periocular biometric features in a partial face recognition system. We propose multiscale analysis methods combined with curvature-based methods. The goal of this combination was to capture the details of these features at finer scales and offer them in-depth characteristics using curvature. The eye and eyebrow images cropped from four face 2D image datasets were evaluated. The recognition performance was calculated using the nearest neighbor and support vector machine classifiers. Our proposed method successfully produced richer details in finer scales, yielding high recognition performance. The highest accuracy results were 76.04% and 98.61% for the limited dataset and 96.88% and 93.22% for the larger dataset for the eye and eyebrow images, respectively. Moreover, we compared the results between our proposed methods and other works, and we achieved similar high accuracy results using only eye and eyebrow images.
Collapse
|
3
|
Aguileta AA, Brena RF, Molino-Minero-Re E, Galván-Tejada CE. Facial Expression Recognition from Multi-Perspective Visual Inputs and Soft Voting. SENSORS 2022; 22:s22114206. [PMID: 35684825 PMCID: PMC9185323 DOI: 10.3390/s22114206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 05/26/2022] [Accepted: 05/26/2022] [Indexed: 11/16/2022]
Abstract
Automatic identification of human facial expressions has many potential applications in today's connected world, from mental health monitoring to feedback for onscreen content or shop windows and sign-language prosodic identification. In this work we use visual information as input, namely, a dataset of face points delivered by a Kinect device. The most recent work on facial expression recognition uses Machine Learning techniques, to use a modular data-driven path of development instead of using human-invented ad hoc rules. In this paper, we present a Machine-Learning based method for automatic facial expression recognition that leverages information fusion architecture techniques from our previous work and soft voting. Our approach shows an average prediction performance clearly above the best state-of-the-art results for the dataset considered. These results provide further evidence of the usefulness of information fusion architectures rather than adopting the default ML approach of features aggregation.
Collapse
Affiliation(s)
- Antonio A. Aguileta
- Facultad de Matemáticas, Universidad Autónoma de Yucatán, Mérida 97110, Mexico;
| | - Ramón F. Brena
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey 64849, Mexico
- Departamento de Computación y Diseño, Instituto Tecnológico de Sonora, Ciudad Obregón 85000, Mexico
- Correspondence:
| | - Erik Molino-Minero-Re
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas—Unidad Yucatán, Universidad Nacional Autónoma de México, Sierra Papacal, Yucatán 97302, Mexico;
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica y Comunicaciones, Universidad Autónoma de Zacatecas, Zacatecas 98000, Mexico;
| |
Collapse
|
4
|
Lamprinou N, Nikolikos N, Psarakis EZ. Groupwise Image Alignment via Self Quotient Images. SENSORS (BASEL, SWITZERLAND) 2020; 20:E2325. [PMID: 32325922 PMCID: PMC7219661 DOI: 10.3390/s20082325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/14/2020] [Accepted: 04/17/2020] [Indexed: 11/16/2022]
Abstract
Compared with pairwise registration, the groupwise one is capable of handling a large-scale population of images simultaneously in an unbiased way. In this work we improve upon the state-of-the-art pixel-level, Least-Squares (LS)-based groupwise image registration methods. Specifically, the registration technique is properly adapted by the use of Self Quotient Images (SQI) in order to become capable for solving the groupwise registration of photometrically distorted, partially occluded as well as unimodal and multimodal images. Moreover, the proposed groupwise technique is linear to the cardinality of the image set and thus it can be used for the successful solution of the problem on large image sets with low complexity. From the application of the proposed technique on a series of experiments for the groupwise registration of photometrically and geometrically distorted, partially occluded faces as well as unimodal and multimodal magnetic resonance image sets and its comparison with the Lucas-Kanade Entropy (LKE) algorithm, the obtained results look very promising, in terms of alignment quality, using as figures of merit the mean Peak Signal to Noise Ratio ( m P S N R ) and mean Structural Similarity ( m S S I M ), and computational cost.
Collapse
Affiliation(s)
| | | | - Emmanouil Z. Psarakis
- Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece; (N.L.); (N.N.)
| |
Collapse
|
5
|
Ismail HA, Hashim IA, Abd BH. A Survey on Linguistic Interpretation of Facial Expressions and Technologies. 2019 2ND INTERNATIONAL CONFERENCE ON ENGINEERING TECHNOLOGY AND ITS APPLICATIONS (IICETA) 2019. [DOI: 10.1109/iiceta47481.2019.9012983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
6
|
Enhanced feature fusion through irrelevant redundancy elimination in intra-class and extra-class discriminative correlation analysis. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.01.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
|
8
|
Zhao R, Wang Y, Martinez AM. A Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D Landmarks on a Single Image. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:3059-3066. [PMID: 29990100 PMCID: PMC6262843 DOI: 10.1109/tpami.2017.2772922] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Three-dimensional shape reconstruction of 2D landmark points on a single image is a hallmark of human vision, but is a task that has been proven difficult for computer vision algorithms. We define a feed-forward deep neural network algorithm that can reconstruct 3D shapes from 2D landmark points almost perfectly (i.e., with extremely small reconstruction errors), even when these 2D landmarks are from a single image. Our experimental results show an improvement of up to two-fold over state-of-the-art computer vision algorithms; 3D shape reconstruction error (measured as the Procrustes distance between the reconstructed shape and the ground-truth) of human faces is , cars is .0022, human bodies is .022, and highly-deformable flags is .0004. Our algorithm was also a top performer at the 2016 3D Face Alignment in the Wild Challenge competition (done in conjunction with the European Conference on Computer Vision, ECCV) that required the reconstruction of 3D face shape from a single image. The derived algorithm can be trained in a couple hours and testing runs at more than 1,000 frames/s on an i7 desktop. We also present an innovative data augmentation approach that allows us to train the system efficiently with small number of samples. And the system is robust to noise (e.g., imprecise landmark points) and missing data (e.g., occluded or undetected landmark points).
Collapse
|
9
|
|
10
|
Wen Z, Hou Z, Jiao L. Discriminative Dictionary Learning With Two-Level Low Rank and Group Sparse Decomposition for Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3758-3771. [PMID: 27390198 DOI: 10.1109/tcyb.2016.2581861] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Discriminative dictionary learning (DDL) framework has been widely used in image classification which aims to learn some class-specific feature vectors as well as a representative dictionary according to a set of labeled training samples. However, interclass similarities and intraclass variances among input samples and learned features will generally weaken the representability of dictionary and the discrimination of feature vectors so as to degrade the classification performance. Therefore, how to explicitly represent them becomes an important issue. In this paper, we present a novel DDL framework with two-level low rank and group sparse decomposition model. In the first level, we learn a class-shared and several class-specific dictionaries, where a low rank and a group sparse regularization are, respectively, imposed on the corresponding feature matrices. In the second level, the class-specific feature matrix will be further decomposed into a low rank and a sparse matrix so that intraclass variances can be separated to concentrate the corresponding feature vectors. Extensive experimental results demonstrate the effectiveness of our model. Compared with the other state-of-the-arts on several popular image databases, our model can achieve a competitive or better performance in terms of the classification accuracy.
Collapse
|
11
|
Wen Z, Hou B, Jiao L. Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3449-3462. [PMID: 28475057 DOI: 10.1109/tip.2017.2700761] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A linear synthesis model-based dictionary learning framework has achieved remarkable performances in image classification in the last decade. Behaved as a generative feature model, it, however, suffers from some intrinsic deficiencies. In this paper, we propose a novel parametric nonlinear analysis cosparse model (NACM) with which a unique feature vector will be much more efficiently extracted. Additionally, we derive a deep insight to demonstrate that NACM is capable of simultaneously learning the task-adapted feature transformation and regularization to encode our preferences, domain prior knowledge, and task-oriented supervised information into the features. The proposed NACM is devoted to the classification task as a discriminative feature model and yield a novel discriminative nonlinear analysis operator learning framework (DNAOL). The theoretical analysis and experimental performances clearly demonstrate that DNAOL will not only achieve the better or at least competitive classification accuracies than the state-of-the-art algorithms, but it can also dramatically reduce the time complexities in both training and testing phases.
Collapse
|
12
|
|
13
|
Du S, Martinez AM. Compound facial expressions of emotion: from basic research to clinical applications. DIALOGUES IN CLINICAL NEUROSCIENCE 2016. [PMID: 26869845 PMCID: PMC4734882 DOI: 10.31887/dcns.2015.17.4/sdu] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Emotions are sometimes revealed through facial expressions. When these natural facial articulations involve the contraction of the same muscle groups in people of distinct cultural upbringings, this is taken as evidence of a biological origin of these emotions. While past research had identified facial expressions associated with a single internally felt category (eg, the facial expression of happiness when we feel joyful), we have recently studied facial expressions observed when people experience compound emotions (eg, the facial expression of happy surprise when we feel joyful in a surprised way, as, for example, at a surprise birthday party). Our research has identified 17 compound expressions consistently produced across cultures, suggesting that the number of facial expressions of emotion of biological origin is much larger than previously believed. The present paper provides an overview of these findings and shows evidence supporting the view that spontaneous expressions are produced using the same facial articulations previously identified in laboratory experiments. We also discuss the implications of our results in the study of psychopathologies, and consider several open research questions.
Collapse
Affiliation(s)
- Shichuan Du
- LENA Research Foundation, Boulder, Colorado, USA
| | | |
Collapse
|
14
|
|
15
|
Wimmer L, Bellingrath S, von Stockhausen L. Cognitive Effects of Mindfulness Training: Results of a Pilot Study Based on a Theory Driven Approach. Front Psychol 2016; 7:1037. [PMID: 27462287 PMCID: PMC4940413 DOI: 10.3389/fpsyg.2016.01037] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 06/24/2016] [Indexed: 11/13/2022] Open
Abstract
The present paper reports a pilot study which tested cognitive effects of mindfulness practice in a theory-driven approach. Thirty-four fifth graders received either a mindfulness training which was based on the mindfulness-based stress reduction approach (experimental group), a concentration training (active control group), or no treatment (passive control group). Based on the operational definition of mindfulness by Bishop et al. (2004), effects on sustained attention, cognitive flexibility, cognitive inhibition, and data-driven as opposed to schema-based information processing were predicted. These abilities were assessed in a pre-post design by means of a vigilance test, a reversible figures test, the Wisconsin Card Sorting Test, a Stroop test, a visual search task, and a recognition task of prototypical faces. Results suggest that the mindfulness training specifically improved cognitive inhibition and data-driven information processing.
Collapse
Affiliation(s)
- Lena Wimmer
- Department of Psychology, University of Duisburg-EssenEssen, Germany
| | | | | |
Collapse
|
16
|
|
17
|
Fan X, Wang H, Luo Z, Li Y, Hu W, Luo D. Fiducial facial point extraction using a novel projective invariant. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:1164-1177. [PMID: 25594969 DOI: 10.1109/tip.2015.2390976] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Automatic extraction of fiducial facial points is one of the key steps to face tracking, recognition, and animation.Great facial variations, especially pose or viewpoint changes,typically degrade the performance of classical methods. Recent learning or regression-based approaches highly rely on the availability of a training set that covers facial variations as wide as possible. In this paper, we introduce and extend a novel projective invariant, named the characteristic number (CN), which unifies the collinearity, cross ratio, and geometrical characteristics given by more (6) points. We derive strong shape priors from CN statistics on a moderate size (515) of frontal upright faces in order to characterize the intrinsic geometries shared by human faces. We combine these shape priors with simple appearance based constraints, e.g., texture, edge, and corner, into a quadratic optimization. Thereafter, the solution to facial point extraction can be found by the standard gradient descent. The inclusion of these shape priors renders the robustness to pose changes owing to their invariance to projective transformations. Extensive experiments on the Labeled Faces in the Wild, Labeled Face Parts in the Wild and Helen database, and cross-set faces with various changes demonstrate the effectiveness of the CN-based shape priors compared with the state of the art.
Collapse
|
18
|
Sun Y, Liu Q, Tang J, Tao D. Learning discriminative dictionary for group sparse representation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:3816-3828. [PMID: 24956370 DOI: 10.1109/tip.2014.2331760] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In recent years, sparse representation has been widely used in object recognition applications. How to learn the dictionary is a key issue to sparse representation. A popular method is to use l1 norm as the sparsity measurement of representation coefficients for dictionary learning. However, the l1 norm treats each atom in the dictionary independently, so the learned dictionary cannot well capture the multisubspaces structural information of the data. In addition, the learned subdictionary for each class usually shares some common atoms, which weakens the discriminative ability of the reconstruction error of each subdictionary. This paper presents a new dictionary learning model to improve sparse representation for image classification, which targets at learning a class-specific subdictionary for each class and a common subdictionary shared by all classes. The model is composed of a discriminative fidelity, a weighted group sparse constraint, and a subdictionary incoherence term. The discriminative fidelity encourages each class-specific subdictionary to sparsely represent the samples in the corresponding class. The weighted group sparse constraint term aims at capturing the structural information of the data. The subdictionary incoherence term is to make all subdictionaries independent as much as possible. Because the common subdictionary represents features shared by all classes, we only use the reconstruction error of each class-specific subdictionary for classification. Extensive experiments are conducted on several public image databases, and the experimental results demonstrate the power of the proposed method, compared with the state-of-the-arts.
Collapse
|
19
|
Song J, Jia L, Wang W, Ying H. Robust nose tip localization based on two-stage subclass discriminant analysis. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.02.055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
20
|
Abstract
Understanding the different categories of facial expressions of emotion regularly used by us is essential to gain insights into human cognition and affect as well as for the design of computational models and perceptual interfaces. Past research on facial expressions of emotion has focused on the study of six basic categories--happiness, surprise, anger, sadness, fear, and disgust. However, many more facial expressions of emotion exist and are used regularly by humans. This paper describes an important group of expressions, which we call compound emotion categories. Compound emotions are those that can be constructed by combining basic component categories to create new ones. For instance, happily surprised and angrily surprised are two distinct compound emotion categories. The present work defines 21 distinct emotion categories. Sample images of their facial expressions were collected from 230 human subjects. A Facial Action Coding System analysis shows the production of these 21 categories is different but consistent with the subordinate categories they represent (e.g., a happily surprised expression combines muscle movements observed in happiness and surprised). We show that these differences are sufficient to distinguish between the 21 defined categories. We then use a computational model of face perception to demonstrate that most of these categories are also visually discriminable from one another.
Collapse
|
21
|
Discriminant features and temporal structure of nonmanuals in American Sign Language. PLoS One 2014; 9:e86268. [PMID: 24516528 PMCID: PMC3916328 DOI: 10.1371/journal.pone.0086268] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 12/12/2013] [Indexed: 11/19/2022] Open
Abstract
To fully define the grammar of American Sign Language (ASL), a linguistic model of its nonmanuals needs to be constructed. While significant progress has been made to understand the features defining ASL manuals, after years of research, much still needs to be done to uncover the discriminant nonmanual components. The major barrier to achieving this goal is the difficulty in correlating facial features and linguistic features, especially since these correlations may be temporally defined. For example, a facial feature (e.g., head moves down) occurring at the end of the movement of another facial feature (e.g., brows moves up), may specify a Hypothetical conditional, but only if this time relationship is maintained. In other instances, the single occurrence of a movement (e.g., brows move up) can be indicative of the same grammatical construction. In the present paper, we introduce a linguistic-computational approach to efficiently carry out this analysis. First, a linguistic model of the face is used to manually annotate a very large set of 2,347 videos of ASL nonmanuals (including tens of thousands of frames). Second, a computational approach is used to determine which features of the linguistic model are more informative of the grammatical rules under study. We used the proposed approach to study five types of sentences--Hypothetical conditionals, Yes/no questions, Wh-questions, Wh-questions postposed, and Assertions--plus their polarities--positive and negative. Our results verify several components of the standard model of ASL nonmanuals and, most importantly, identify several previously unreported features and their temporal relationship. Notably, our results uncovered a complex interaction between head position and mouth shape. These findings define some temporal structures of ASL nonmanuals not previously detected by other approaches.
Collapse
|
22
|
Benitez-Quiroz CF, Rivera S, Gotardo PF, Martinez AM. Salient and Non-Salient Fiducial Detection using a Probabilistic Graphical Model. PATTERN RECOGNITION 2014; 47:10.1016/j.patcog.2013.06.013. [PMID: 24187386 PMCID: PMC3810992 DOI: 10.1016/j.patcog.2013.06.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Deformable shape detection is an important problem in computer vision and pattern recognition. However, standard detectors are typically limited to locating only a few salient landmarks such as landmarks near edges or areas of high contrast, often conveying insufficient shape information. This paper presents a novel statistical pattern recognition approach to locate a dense set of salient and non-salient landmarks in images of a deformable object. We explore the fact that several object classes exhibit a homogeneous structure such that each landmark position provides some information about the position of the other landmarks. In our model, the relationship between all pairs of landmarks is naturally encoded as a probabilistic graph. Dense landmark detections are then obtained with a new sampling algorithm that, given a set of candidate detections, selects the most likely positions as to maximize the probability of the graph. Our experimental results demonstrate accurate, dense landmark detections within and across different databases.
Collapse
Affiliation(s)
- C. Fabian Benitez-Quiroz
- Corresponding Author. (C. Fabian Benitez-Quiroz), (Samuel Rivera), (Paulo F.U. Gotardo), (Aleix M. Martinez)
| | - Samuel Rivera
- Corresponding Author. (C. Fabian Benitez-Quiroz), (Samuel Rivera), (Paulo F.U. Gotardo), (Aleix M. Martinez)
| | - Paulo F.U. Gotardo
- Corresponding Author. (C. Fabian Benitez-Quiroz), (Samuel Rivera), (Paulo F.U. Gotardo), (Aleix M. Martinez)
| | - Aleix M. Martinez
- Corresponding Author. (C. Fabian Benitez-Quiroz), (Samuel Rivera), (Paulo F.U. Gotardo), (Aleix M. Martinez)
| |
Collapse
|
23
|
Martinez B, Valstar MF, Binefa X, Pantic M. Local evidence aggregation for regression-based facial point detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:1149-1163. [PMID: 23520256 DOI: 10.1109/tpami.2012.205] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We propose a new algorithm to detect facial points in frontal and near-frontal face images. It combines a regression-based approach with a probabilistic graphical model-based face shape model that restricts the search to anthropomorphically consistent regions. While most regression-based approaches perform a sequential approximation of the target location, our algorithm detects the target location by aggregating the estimates obtained from stochastically selected local appearance information into a single robust prediction. The underlying assumption is that by aggregating the different estimates, their errors will cancel out as long as the regressor inputs are uncorrelated. Once this new perspective is adopted, the problem is reformulated as how to optimally select the test locations over which the regressors are evaluated. We propose to extend the regression-based model to provide a quality measure of each prediction, and use the shape model to restrict and correct the sampling region. Our approach combines the low computational cost typical of regression-based approaches with the robustness of exhaustive-search approaches. The proposed algorithm was tested on over 7,500 images from five databases. Results showed significant improvement over the current state of the art.
Collapse
Affiliation(s)
- Brais Martinez
- Department of Computing, Imperial College London, London, United Kingdom.
| | | | | | | |
Collapse
|
24
|
Martinez A, Du S. A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2012; 13:1589-1608. [PMID: 23950695 PMCID: PMC3742375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion-the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in studies of human perception, social interactions and disorders.
Collapse
|
25
|
Abstract
We propose an approach to shape detection of highly deformable shapes in images via manifold learning with regression. Our method does not require shape key points be defined at high contrast image regions, nor do we need an initial estimate of the shape. We only require sufficient representative training data and a rough initial estimate of the object position and scale. We demonstrate the method for face shape learning, and provide a comparison to nonlinear Active Appearance Model. Our method is extremely accurate, to nearly pixel precision and is capable of accurately detecting the shape of faces undergoing extreme expression changes. The technique is robust to occlusions such as glasses and gives reasonable results for extremely degraded image resolutions.
Collapse
|
26
|
Mishra AK, Aloimonos Y, Cheong LF, Kassim AA. Active visual segmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2012; 34:639-653. [PMID: 22383341 DOI: 10.1109/tpami.2011.171] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene, and every fixation point lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an identification marker on the object, we propose a method to segment the object of interest by finding the "optimal" closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space. The proposed segmentation process is carried out in two separate steps: First, all visual cues are combined to generate the probabilistic boundary edge map of the scene; second, in this edge map, the "optimal" closed contour around a given fixation point is found. Having two separate steps also makes it possible to establish a simple feedback between the mid-level cue (regions) and the low-level visual cues (edges). In fact, we propose a segmentation refinement process based on such a feedback process. Finally, our experiments show the promise of the proposed method as an automatic segmentation framework for a general purpose visual system.
Collapse
Affiliation(s)
- Ajay K Mishra
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.
| | | | | | | |
Collapse
|
27
|
Serrano Á, Martín de Diego I, Conde C, Cabello E. Analysis of variance of Gabor filter banks parameters for optimal face recognition. Pattern Recognit Lett 2011. [DOI: 10.1016/j.patrec.2011.09.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
28
|
Ong EJ, Bowden R. Robust Facial Feature Tracking Using Shape-Constrained Multiresolution-Selected Linear Predictors. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2011; 33:1844-1859. [PMID: 21135441 DOI: 10.1109/tpami.2010.205] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This paper proposes a learned data-driven approach for accurate, real-time tracking of facial features using only intensity information. The task of automatic facial feature tracking is nontrivial since the face is a highly deformable object with large textural variations and motion in certain regions. Existing works attempt to address these problems by either limiting themselves to tracking feature points with strong and unique visual cues (e.g., mouth and eye corners) or by incorporating a priori information that needs to be manually designed (e.g., selecting points for a shape model). The framework proposed here largely avoids the need for such restrictions by automatically identifying the optimal visual support required for tracking a single facial feature point. This automatic identification of the visual context required for tracking allows the proposed method to potentially track any point on the face. Tracking is achieved via linear predictors which provide a fast and effective method for mapping pixel intensities into tracked feature position displacements. Building upon the simplicity and strengths of linear predictors, a more robust biased linear predictor is introduced. Multiple linear predictors are then grouped into a rigid flock to further increase robustness. To improve tracking accuracy, a novel probabilistic selection method is used to identify relevant visual areas for tracking a feature point. These selected flocks are then combined into a hierarchical multiresolution LP model. Finally, we also exploit a simple shape constraint for correcting the occasional tracking failure of a minority of feature points. Experimental results show that this method performs more robustly and accurately than AAMs, with minimal training examples on example sequences that range from SD quality to Youtube quality. Additionally, an analysis of the visual support consistency across different subjects is also provided.
Collapse
|
29
|
Martinez AM. Deciphering the Face. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2011; 2011:7-12. [PMID: 25264420 DOI: 10.1109/cvprw.2011.5981690] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We argue that to make robust computer vision algorithms for face analysis and recognition, these should be based on configural and shape features. In this model, the most important task to be solved by computer vision researchers is that of accurate detection of facial features, rather than recognition. We base our arguments on recent results in cognitive science and neuroscience. In particular, we show that different facial expressions of emotion have diverse uses in human behavior/cognition and that a facial expression may be associated to multiple emotional categories. These two results are in contradiction with the continuous models in cognitive science, the limbic assumption in neuroscience and the multidimensional approaches typically employed in computer vision. Thus, we propose an alternative hybrid continuous-categorical approach to the perception of facial expressions and show that configural and shape features are most important for the recognition of emotional constructs by humans. We illustrate how these image cues can be successfully exploited by computer vision algorithms. Throughout the paper, we discuss the implications of these results in applications in face recognition and human-computer interaction.
Collapse
|
30
|
Neth D, Martinez AM. A computational shape-based model of anger and sadness justifies a configural representation of faces. Vision Res 2010; 50:1693-711. [PMID: 20510267 DOI: 10.1016/j.visres.2010.05.024] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Revised: 05/17/2010] [Accepted: 05/19/2010] [Indexed: 01/22/2023]
Abstract
Research suggests that configural cues (second-order relations) play a major role in the representation and classification of face images; making faces a "special" class of objects, since object recognition seems to use different encoding mechanisms. It is less clear, however, how this representation emerges and whether this representation is also used in the recognition of facial expressions of emotion. In this paper, we show how configural cues emerge naturally from a classical analysis of shape in the recognition of anger and sadness. In particular our results suggest that at least two of the dimensions of the computational (cognitive) space of facial expressions of emotion correspond to pure configural changes. The first of these dimensions measures the distance between the eyebrows and the mouth, while the second is concerned with the height-width ratio of the face. Under this proposed model, becoming a face "expert" would mean to move from the generic shape representation to that based on configural cues. These results suggest that the recognition of facial expressions of emotion shares this expertise property with the other processes of face processing.
Collapse
Affiliation(s)
- Donald Neth
- The Ohio State University, Columbus, OH 43215, United States
| | | |
Collapse
|