1
|
Lian H, Lu C, Li S, Zhao Y, Tang C, Zong Y. A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1440. [PMID: 37895561 PMCID: PMC10606253 DOI: 10.3390/e25101440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/20/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023]
Abstract
Multimodal emotion recognition (MER) refers to the identification and understanding of human emotional states by combining different signals, including-but not limited to-text, speech, and face cues. MER plays a crucial role in the human-computer interaction (HCI) domain. With the recent progression of deep learning technologies and the increasing availability of multimodal datasets, the MER domain has witnessed considerable development, resulting in numerous significant research breakthroughs. However, a conspicuous absence of thorough and focused reviews on these deep learning-based MER achievements is observed. This survey aims to bridge this gap by providing a comprehensive overview of the recent advancements in MER based on deep learning. For an orderly exposition, this paper first outlines a meticulous analysis of the current multimodal datasets, emphasizing their advantages and constraints. Subsequently, we thoroughly scrutinize diverse methods for multimodal emotional feature extraction, highlighting the merits and demerits of each method. Moreover, we perform an exhaustive analysis of various MER algorithms, with particular focus on the model-agnostic fusion methods (including early fusion, late fusion, and hybrid fusion) and fusion based on intermediate layers of deep models (encompassing simple concatenation fusion, utterance-level interaction fusion, and fine-grained interaction fusion). We assess the strengths and weaknesses of these fusion strategies, providing guidance to researchers to help them select the most suitable techniques for their studies. In summary, this survey aims to provide a thorough and insightful review of the field of deep learning-based MER. It is intended as a valuable guide to aid researchers in furthering the evolution of this dynamic and impactful field.
Collapse
Affiliation(s)
- Hailun Lian
- Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210000, China; (H.L.); (C.L.); (S.L.); (Y.Z.); (C.T.)
- School of Information Science and Engineering, Southeast University, Nanjing 210000, China
| | - Cheng Lu
- Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210000, China; (H.L.); (C.L.); (S.L.); (Y.Z.); (C.T.)
- School of Biological Science and Medical Engineering, Southeast University, Nanjing 210000, China
| | - Sunan Li
- Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210000, China; (H.L.); (C.L.); (S.L.); (Y.Z.); (C.T.)
- School of Information Science and Engineering, Southeast University, Nanjing 210000, China
| | - Yan Zhao
- Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210000, China; (H.L.); (C.L.); (S.L.); (Y.Z.); (C.T.)
- School of Information Science and Engineering, Southeast University, Nanjing 210000, China
| | - Chuangao Tang
- Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210000, China; (H.L.); (C.L.); (S.L.); (Y.Z.); (C.T.)
- School of Biological Science and Medical Engineering, Southeast University, Nanjing 210000, China
| | - Yuan Zong
- Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210000, China; (H.L.); (C.L.); (S.L.); (Y.Z.); (C.T.)
- School of Biological Science and Medical Engineering, Southeast University, Nanjing 210000, China
| |
Collapse
|
2
|
Su J, Zhu J, Song T, Chang H. Subject-Independent EEG Emotion Recognition Based on Genetically Optimized Projection Dictionary Pair Learning. Brain Sci 2023; 13:977. [PMID: 37508909 PMCID: PMC10377713 DOI: 10.3390/brainsci13070977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/18/2023] [Accepted: 06/19/2023] [Indexed: 07/30/2023] Open
Abstract
One of the primary challenges in Electroencephalogram (EEG) emotion recognition lies in developing models that can effectively generalize to new unseen subjects, considering the significant variability in EEG signals across individuals. To address the issue of subject-specific features, a suitable approach is to employ projection dictionary learning, which enables the identification of emotion-relevant features across different subjects. To accomplish the objective of pattern representation and discrimination for subject-independent EEG emotion recognition, we utilized the fast and efficient projection dictionary pair learning (PDPL) technique. PDPL involves the joint use of a synthesis dictionary and an analysis dictionary to enhance the representation of features. Additionally, to optimize the parameters of PDPL, which depend on experience, we applied the genetic algorithm (GA) to obtain the optimal solution for the model. We validated the effectiveness of our algorithm using leave-one-subject-out cross validation on three EEG emotion databases: SEED, MPED, and GAMEEMO. Our approach outperformed traditional machine learning methods, achieving an average accuracy of 69.89% on the SEED database, 24.11% on the MPED database, 64.34% for the two-class GAMEEMO, and 49.01% for the four-class GAMEEMO. These results highlight the potential of subject-independent EEG emotion recognition algorithms in the development of intelligent systems capable of recognizing and responding to human emotions in real-world scenarios.
Collapse
Affiliation(s)
- Jipu Su
- School of Information Science and Engineering, Southeast University, Nanjing 210096, China
| | - Jie Zhu
- School of Information Science and Engineering, Southeast University, Nanjing 210096, China
| | - Tiecheng Song
- School of Information Science and Engineering, Southeast University, Nanjing 210096, China
| | - Hongli Chang
- School of Information Science and Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
3
|
Wei P, Ke Y, Ong YS, Ma Z. Adaptive Transfer Kernel Learning for Transfer Gaussian Process Regression. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:7142-7156. [PMID: 37145953 DOI: 10.1109/tpami.2022.3219121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Transfer regression is a practical and challenging problem with important applications in various domains, such as engineering design and localization. Capturing the relatedness of different domains is the key of adaptive knowledge transfer. In this paper, we investigate an effective way of explicitly modelling domain relatedness through transfer kernel, a transfer-specified kernel that considers domain information in the covariance calculation. Specifically, we first give the formal definition of transfer kernel, and introduce three basic general forms that well cover existing related works. To cope with the limitations of the basic forms in handling complex real-world data, we further propose two advanced forms. Corresponding instantiations of the two forms are developed, namely Trkαβ and Trkω based on multiple kernel learning and neural networks, respectively. For each instantiation, we present a condition with which the positive semi-definiteness is guaranteed and a semantic meaning is interpreted to the learned domain relatedness. Moreover, the condition can be easily used in the learning of TrGP αβ and TrGP ω that are the Gaussian process models with the transfer kernels Trkαβ and Trkω respectively. Extensive empirical studies show the effectiveness of TrGP αβ and TrGP ω on domain relatedness modelling and transfer adaptiveness.
Collapse
|
4
|
Xu T, Dang W, Wang J, Zhou Y. DAGAM: a domain adversarial graph attention model for subject-independent EEG-based emotion recognition. J Neural Eng 2023; 20:016022. [PMID: 36548989 DOI: 10.1088/1741-2552/acae06] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 12/22/2022] [Indexed: 12/24/2022]
Abstract
Objective.Due to individual differences in electroencephalogram (EEG) signals, the learning model built by the subject-dependent technique from one person's data would be inaccurate when applied to another person for emotion recognition. Thus, the subject-dependent approach for emotion recognition may result in poor generalization performance when compared to the subject-independent approach. However, existing studies have attempted but have not fully utilized EEG's topology, nor have they solved the problem caused by the difference in data distribution between the source and target domains.Approach.To eliminate individual differences in EEG signals, this paper proposes the domain adversarial graph attention model, a novel EEG-based emotion recognition model. The basic idea is to generate a graph using biological topology to model multichannel EEG signals. Graph theory can topologically describe and analyze EEG channel relationships and mutual dependencies. Then, unlike other graph convolutional networks, self-attention pooling is used to benefit from the extraction of salient EEG features from the graph, effectively improving performance. Finally, following graph pooling, the domain adversarial model based on the graph is used to identify and handle EEG variation across subjects, achieving good generalizability efficiently.Main Results.We conduct extensive evaluations on two benchmark data sets (SEED and SEED IV) and obtain cutting-edge results in subject-independent emotion recognition. Our model boosts the SEED accuracy to 92.59% (4.06% improvement) with the lowest standard deviation (STD) of 3.21% (2.46% decrements) and SEED IV accuracy to 80.74% (6.90% improvement) with the lowest STD of 4.14% (3.88% decrements), respectively. The computational complexity is drastically reduced in comparison to similar efforts (33 times lower).Significance.We have developed a model that significantly reduces the computation time while maintaining accuracy, making EEG-based emotion decoding more practical and generalizable.
Collapse
Affiliation(s)
- Tao Xu
- Northwestern Polytechnical University, School of Software, Xi'an, People's Republic of China
| | - Wang Dang
- Northwestern Polytechnical University, School of Software, Xi'an, People's Republic of China
| | - Jiabao Wang
- Northwestern Polytechnical University, School of Software, Xi'an, People's Republic of China
| | - Yun Zhou
- Shaanxi Normal University, Faculty of Education, Xi'an, People's Republic of China
| |
Collapse
|
5
|
Chen T, Pu T, Wu H, Xie Y, Liu L, Lin L. Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9887-9903. [PMID: 34847019 DOI: 10.1109/tpami.2021.3131222] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Facial expression recognition (FER) has received significant attention in the past decade with witnessed progress, but data inconsistencies among different FER datasets greatly hinder the generalization ability of the models learned on one dataset to another. Recently, a series of cross-domain FER algorithms (CD-FERs) have been extensively developed to address this issue. Although each declares to achieve superior performance, comprehensive and fair comparisons are lacking due to inconsistent choices of the source/target datasets and feature extractors. In this work, we first propose to construct a unified CD-FER evaluation benchmark, in which we re-implement the well-performing CD-FER and recently published general domain adaptation algorithms and ensure that all these algorithms adopt the same source/target datasets and feature extractors for fair CD-FER evaluations. Based on the analysis, we find that most of the current state-of-the-art algorithms use adversarial learning mechanisms that aim to learn holistic domain-invariant features to mitigate domain shifts. However, these algorithms ignore local features, which are more transferable across different datasets and carry more detailed content for fine-grained adaptation. Therefore, we develop a novel adversarial graph representation adaptation (AGRA) framework that integrates graph representation propagation with adversarial learning to realize effective cross-domain holistic-local feature co-adaptation. Specifically, our framework first builds two graphs to correlate holistic and local regions within each domain and across different domains, respectively. Then, it extracts holistic-local features from the input image and uses learnable per-class statistical distributions to initialize the corresponding graph nodes. Finally, two stacked graph convolution networks (GCNs) are adopted to propagate holistic-local features within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair comparisons on the unified evaluation benchmark and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
Collapse
|
6
|
Valliani AA, Gulamali FF, Kwon YJ, Martini ML, Wang C, Kondziolka D, Chen VJ, Wang W, Costa AB, Oermann EK. Deploying deep learning models on unseen medical imaging using adversarial domain adaptation. PLoS One 2022; 17:e0273262. [PMID: 36240135 PMCID: PMC9565422 DOI: 10.1371/journal.pone.0273262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/04/2022] [Indexed: 11/06/2022] Open
Abstract
The fundamental challenge in machine learning is ensuring that trained models generalize well to unseen data. We developed a general technique for ameliorating the effect of dataset shift using generative adversarial networks (GANs) on a dataset of 149,298 handwritten digits and dataset of 868,549 chest radiographs obtained from four academic medical centers. Efficacy was assessed by comparing area under the curve (AUC) pre- and post-adaptation. On the digit recognition task, the baseline CNN achieved an average internal test AUC of 99.87% (95% CI, 99.87-99.87%), which decreased to an average external test AUC of 91.85% (95% CI, 91.82-91.88%), with an average salvage of 35% from baseline upon adaptation. On the lung pathology classification task, the baseline CNN achieved an average internal test AUC of 78.07% (95% CI, 77.97-78.17%) and an average external test AUC of 71.43% (95% CI, 71.32-71.60%), with a salvage of 25% from baseline upon adaptation. Adversarial domain adaptation leads to improved model performance on radiographic data derived from multiple out-of-sample healthcare populations. This work can be applied to other medical imaging domains to help shape the deployment toolkit of machine learning in medicine.
Collapse
Affiliation(s)
- Aly A. Valliani
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States of America
| | - Faris F. Gulamali
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States of America
| | - Young Joon Kwon
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States of America
| | - Michael L. Martini
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States of America
| | - Chiatse Wang
- Data Science Degree Program, National Taiwan University, Taipei, Taiwan
| | - Douglas Kondziolka
- Department of Neurosurgery, New York University Langone Medical Center, New York, NY, United States of America
- Department of Radiation Oncology, New York University Langone Medical Center, New York, NY, United States of America
| | - Viola J. Chen
- Oncology Early Development, Merck Co., Inc, Kenilworth, NJ, United States of America
| | - Weichung Wang
- Data Science Degree Program, National Taiwan University, Taipei, Taiwan
- Institute of Applied Mathematical Sciences, National Taiwan University, Taipei, Taiwan
| | | | - Eric K. Oermann
- Department of Neurosurgery, New York University Langone Medical Center, New York, NY, United States of America
- Department of Radiology, New York University Langone Medical Center, New York, NY, United States of America
- * E-mail:
| |
Collapse
|
7
|
Advances in computer–human interaction for detecting facial expression using dual tree multi band wavelet transform and Gaussian mixture model. Neural Comput Appl 2022. [DOI: 10.1007/s00521-020-05037-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Mitigating domain mismatch in face recognition using style matching. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
9
|
AU-Guided Unsupervised Domain-Adaptive Facial Expression Recognition. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Domain diversities, including inconsistent annotation and varied image collection conditions, inevitably exist among different facial expression recognition (FER) datasets, posing an evident challenge for adapting FER models trained on one dataset to another one. Recent works mainly focus on domain-invariant deep feature learning with adversarial learning mechanisms, ignoring the sibling facial action unit (AU) detection task, which has obtained great progress. Considering that AUs objectively determine facial expressions, this paper proposes an AU-guided unsupervised domain-adaptive FER (AdaFER) framework to relieve the annotation bias between different FER datasets. In AdaFER, we first leverage an advanced model for AU detection on both a source and a target domain. Then, we compare the AU results to perform AU-guided annotating, i.e., target faces that own the same AUs as source faces would inherit the labels from the source domain. Meanwhile, to achieve domain-invariant compact features, we utilize an AU-guided triplet training, which randomly collects anchor–positive–negative triplets on both domains with AUs. We conduct extensive experiments on several popular benchmarks and show that AdaFER achieves state-of-the-art results on all these benchmarks.
Collapse
|
10
|
Sima Y, Yi J, Chen A, Jin Z. Automatic expression recognition of face image sequence based on key-frame generation and differential emotion feature. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.108029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5585041. [PMID: 34616444 PMCID: PMC8487826 DOI: 10.1155/2021/5585041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 07/18/2021] [Accepted: 09/11/2021] [Indexed: 12/02/2022]
Abstract
The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention-based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.
Collapse
|
12
|
Wang Y, Xia Z, Deng J, Xie X, Gong M, Ma X. TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain. BMC Bioinformatics 2021; 22:274. [PMID: 34433414 PMCID: PMC8386056 DOI: 10.1186/s12859-021-04190-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 05/12/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. RESULTS In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. CONCLUSION The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.
Collapse
Affiliation(s)
- Yan Wang
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
- Department of Library, Xidian University, South TaiBai Road, Xi’an, China
| | - Zuheng Xia
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| | - Jingjing Deng
- Department of Computer Science, Swansea University, Bay, UK
| | - Xianghua Xie
- Department of Computer Science, Swansea University, Bay, UK
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, South TaiBai Road, Xi’an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| |
Collapse
|
13
|
Wei P, Sagarna R, Ke Y, Ong YS. Practical Multisource Transfer Regression With Source-Target Similarity Captures. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3498-3509. [PMID: 32784144 DOI: 10.1109/tnnls.2020.3012457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A key challenge in many applications of multisource transfer learning is to explicitly capture the diverse source-target similarities. In this article, we are concerned with stretching the set of practical approaches based on Gaussian process (GP) models to solve multisource transfer regression problems. Precisely, we first investigate the feasibility and performance of a family of transfer covariance functions that represent the pairwise similarity of each source and the target domain. We theoretically show that using such a transfer covariance function for general GP modeling can only capture the same similarity coefficient for all the sources, and thus may result in unsatisfactory transfer performance. This outcome, together with the scalability issues of a single GP based approach, leads us to propose TCMSStack , an integrated framework incorporating a separate transfer covariance function for each source and stacking. Contrary to typical stacking approaches, TCMSStack learns the source-target similarity in each base GP model by considering the dependencies of the other sources along the process. We introduce two instances of the proposed TCMSStack . Extensive experiments on one synthetic and two real-world data sets, with learning settings up to 11 sources for the latter, demonstrate the effectiveness of our approach.
Collapse
|
14
|
Li Y, Fu B, Li F, Shi G, Zheng W. A novel transferability attention neural network model for EEG emotion recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.048] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
15
|
Dan Y, Tao J, Fu J, Zhou D. Possibilistic Clustering-Promoting Semi-Supervised Learning for EEG-Based Emotion Recognition. Front Neurosci 2021; 15:690044. [PMID: 34276295 PMCID: PMC8281971 DOI: 10.3389/fnins.2021.690044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 04/28/2021] [Indexed: 12/01/2022] Open
Abstract
The purpose of the latest brain computer interface is to perform accurate emotion recognition through the customization of their recognizers to each subject. In the field of machine learning, graph-based semi-supervised learning (GSSL) has attracted more and more attention due to its intuitive and good learning performance for emotion recognition. However, the existing GSSL methods are sensitive or not robust enough to noise or outlier electroencephalogram (EEG)-based data since each individual subject may present noise or outlier EEG patterns in the same scenario. To address the problem, in this paper, we invent a Possibilistic Clustering-Promoting semi-supervised learning method for EEG-based Emotion Recognition. Specifically, it constrains each instance to have the same label membership value with its local weighted mean to improve the reliability of the recognition method. In addition, a regularization term about fuzzy entropy is introduced into the objective function, and the generalization ability of membership function is enhanced by increasing the amount of sample discrimination information, which improves the robustness of the method to noise and the outlier. A large number of experimental results on the three real datasets (i.e., DEAP, SEED, and SEED-IV) show that the proposed method improves the reliability and robustness of the EEG-based emotion recognition.
Collapse
Affiliation(s)
- Yufang Dan
- Institute of Artificial Intelligence Application, Ningbo Polytechnic, Ningbo, China
| | - Jianwen Tao
- Institute of Artificial Intelligence Application, Ningbo Polytechnic, Ningbo, China
| | - Jianjing Fu
- School of Media Engineering, Communication University of Zhejiang, Hangzhou, China
| | - Di Zhou
- Dazhou Industrial Technological Institute of Intelligent Manufacturing, Sichuan University of Arts and Science, Dazhou, China
| |
Collapse
|
16
|
Li Y, Wang L, Zheng W, Zong Y, Qi L, Cui Z, Zhang T, Song T. A Novel Bi-Hemispheric Discrepancy Model for EEG Emotion Recognition. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.2999337] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
17
|
Cross-Database Micro-Expression Recognition Exploiting Intradomain Structure. JOURNAL OF HEALTHCARE ENGINEERING 2021. [DOI: 10.1155/2021/5511509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Micro-expressions are unconscious, faint, short-lived expressions that appear on the faces. It can make people's understanding of psychological state and emotion more accurate. Therefore, micro-expression recognition is particularly important in psychotherapy and clinical diagnosis, which has been widely studied by researchers for the past decades. In practical applications, the micro-expression recognition samples used in training and testing are from different databases, which causes the feature distribution between the training and testing samples to be different to a large extent, resulting in a drastic decrease in the performance of the traditional micro-expression recognition methods. However, most of the existing cross-database micro-expression recognition methods require a large number of model selection or hyperparameter tuning to select better results from them, which consumes a large amount of time and labor costs. In this paper, we overcome this problem by exploiting the intradomain structure. Nonparametric transfer features are learned through intradomain alignment, while at the same time, a classifier is learned through intradomain programming. In order to evaluate the performance, a large number of cross-database experiments were conducted in CASMEII and SMIC databases. The comparison of results shows that this method can achieve a promising recognition accuracy and with high computational efficiency.
Collapse
|
18
|
Tao J, Dan Y. Multi-Source Co-adaptation for EEG-Based Emotion Recognition by Mining Correlation Information. Front Neurosci 2021; 15:677106. [PMID: 34054422 PMCID: PMC8155359 DOI: 10.3389/fnins.2021.677106] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 03/22/2021] [Indexed: 11/17/2022] Open
Abstract
Since each individual subject may present completely different encephalogram (EEG) patterns with respect to other subjects, existing subject-independent emotion classifiers trained on data sampled from cross-subjects or cross-dataset generally fail to achieve sound accuracy. In this scenario, the domain adaptation technique could be employed to address this problem, which has recently got extensive attention due to its effectiveness on cross-distribution learning. Focusing on cross-subject or cross-dataset automated emotion recognition with EEG features, we propose in this article a robust multi-source co-adaptation framework by mining diverse correlation information (MACI) among domains and features with l 2,1-norm as well as correlation metric regularization. Specifically, by minimizing the statistical and semantic distribution differences between source and target domains, multiple subject-invariant classifiers can be learned together in a joint framework, which can make MACI use relevant knowledge from multiple sources by exploiting the developed correlation metric function. Comprehensive experimental evidence on DEAP and SEED datasets verifies the better performance of MACI in EEG-based emotion recognition.
Collapse
|
19
|
Fu Y, Ruan Q, Luo Z, An G, Jin Y. Orthogonal tucker decomposition using factor priors for 2D+3D facial expression recognition. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Yunfang Fu
- Institute of Information Science Beijing Jiaotong University Beijing China
- School of Computer Science & Engineering Shijiazhuang University Shijiazhuang China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| | - Qiuqi Ruan
- Institute of Information Science Beijing Jiaotong University Beijing China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| | - Ziyan Luo
- Department of Mathematics Beijing Jiaotong University Beijing China
| | - Gaoyun An
- Institute of Information Science Beijing Jiaotong University Beijing China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| | - Yi Jin
- Institute of Information Science Beijing Jiaotong University Beijing China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| |
Collapse
|
20
|
Murphy CP, Kerekes JP. Physics-guided neural network for predicting chemical signatures. APPLIED OPTICS 2021; 60:3176-3181. [PMID: 33983216 DOI: 10.1364/ao.420688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/15/2021] [Indexed: 06/12/2023]
Abstract
Achieving high classification accuracy on trace chemical residues in active spectroscopic sensing is challenging due to the limited amount of training data available to the classifier. Such classifiers often rely on physics-based models for generating training data though these models are not always accurate when compared to measured data. To overcome this challenge, we developed a physics-guided neural network (PGNN) for predicting chemical reflectance for a set of parameterized inputs that is more accurate than the state-of-the-art physics-based signature model for chemical residues. After training the PGNN, we use it to generate a library of predicted spectra for training a classifier. We compare the classification accuracy when using this PGNN library versus a library generated by the physics-based model. Using the PGNN, the average classification accuracy increases from 0.623 to 0.813 on real chemical reflectance data, including data from chemicals not included in the PGNN training set.
Collapse
|
21
|
Zong Y, Zheng W, Cui Z, Zhao G, Hu B. Toward Bridging Microexpressions From Different Domains. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:5047-5060. [PMID: 31180877 DOI: 10.1109/tcyb.2019.2914512] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Recently, microexpression recognition has attracted a lot of researchers' attention due to its challenges and valuable applications. However, it is noticed that currently most of the existing proposed methods are often evaluated and tested on the single database and, hence, this brings us a question whether these methods are still effective if the training and testing samples belong to different domains, for example, different microexpression databases. In this case, a large feature distribution difference may exist between training (source) and testing (target) samples and, hence, microexpression recognition tasks would become more difficult. To solve this challenging problem, that is, cross-domain microexpression recognition, in this paper, we propose an effective method consisting of an auxiliary set selection model (ASSM) and a transductive transfer regression model (TTRM). In our method, an ASSM is designed to automatically select an optimal set of samples from the target domain to serve as the auxiliary set, which is used for subsequent TTRM training. As for TTRM, it aims at bridging the feature distribution gap between the source and target domains by learning a joint regression model with the source domain samples and the auxiliary set selected from the target domain. We evaluate the proposed TTRM plus ASSM by extensive cross-domain microexpression recognition experiments on SMIC and CASME II databases. Compared with the recent state-of-the-art domain adaptation methods, our proposed method has a more satisfactory performance in dealing with the cross-domain microexpression recognition tasks.
Collapse
|
22
|
Chen S, Han L, Liu X, He Z, Yang X. Subspace Distribution Adaptation Frameworks for Domain Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5204-5218. [PMID: 31995505 DOI: 10.1109/tnnls.2020.2964790] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Domain adaptation tries to adapt a model trained from a source domain to a different but related target domain. Currently, prevailing methods for domain adaptation rely on either instance reweighting or feature transformation. Unfortunately, instance reweighting has difficulty in estimating the sample weights as the dimension increases, whereas feature transformation sometimes fails to make the transformed source and target distributions similar when the cross-domain discrepancy is large. In order to overcome the shortcomings of both methodologies, in this article, we model the unsupervised domain adaptation problem under the generalized covariate shift assumption and adapt the source distribution to the target distribution in a subspace by applying a distribution adaptation function. Accordingly, we propose two frameworks: Bregman-divergence-embedded structural risk minimization (BSRM) and joint structural risk minimization (JSRM). In the proposed frameworks, the subspace distribution adaptation function and the target prediction model are jointly learned. Under certain instantiations, convex optimization problems are derived from both frameworks. Experimental results on the synthetic and real-world text and image data sets show that the proposed methods outperform the state-of-the-art domain adaptation techniques with statistical significance.
Collapse
|
23
|
Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3892-3905. [PMID: 31725395 DOI: 10.1109/tnnls.2019.2947244] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a new approach for facial expression recognition (FER) using deep covariance descriptors. The solution is based on the idea of encoding local and global deep convolutional neural network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of symmetric positive definite (SPD) matrices. By conducting the classification of static facial expressions using a support vector machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, static facial expression in the wild (SFEW), and acted facial expressions in the wild (AFEW) data sets, we show that both the proposed static and dynamic approaches achieve the state-of-the-art performance for FER outperforming many recent approaches.
Collapse
|
24
|
Lee MK, Kim DH, Song BC. Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5184. [PMID: 32932939 PMCID: PMC7571042 DOI: 10.3390/s20185184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 09/07/2020] [Accepted: 09/09/2020] [Indexed: 11/20/2022]
Abstract
Facial expression recognition (FER) technology has made considerable progress with the rapid development of deep learning. However, conventional FER techniques are mainly designed and trained for videos that are artificially acquired in a limited environment, so they may not operate robustly on videos acquired in a wild environment suffering from varying illuminations and head poses. In order to solve this problem and improve the ultimate performance of FER, this paper proposes a new architecture that extends a state-of-the-art FER scheme and a multi-modal neural network that can effectively fuse image and landmark information. To this end, we propose three methods. To maximize the performance of the recurrent neural network (RNN) in the previous scheme, we first propose a frame substitution module that replaces the latent features of less important frames with those of important frames based on inter-frame correlation. Second, we propose a method for extracting facial landmark features based on the correlation between frames. Third, we propose a new multi-modal fusion method that effectively fuses video and facial landmark information at the feature level. By applying attention based on the characteristics of each modality to the features of the modality, novel fusion is achieved. Experimental results show that the proposed method provides remarkable performance, with 51.4% accuracy for the wild AFEW dataset, 98.5% accuracy for the CK+ dataset and 81.9% accuracy for the MMI dataset, outperforming the state-of-the-art networks.
Collapse
Affiliation(s)
| | | | - Byung Cheol Song
- Department of Electronic Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon 22212, Korea; (M.K.L.); (D.H.K.)
| |
Collapse
|
25
|
Wu H, Yan Y, Ng MK, Wu Q. Domain-attention Conditional Wasserstein Distance for Multi-source Domain Adaptation. ACM T INTEL SYST TEC 2020. [DOI: 10.1145/3391229] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Multi-source domain adaptation has received considerable attention due to its effectiveness of leveraging the knowledge from multiple related sources with different distributions to enhance the learning performance. One of the fundamental challenges in multi-source domain adaptation is how to determine the amount of knowledge transferred from each source domain to the target domain. To address this issue, we propose a new algorithm, called Domain-attention Conditional Wasserstein Distance (DCWD), to learn transferred weights for evaluating the relatedness across the source and target domains. In DCWD, we design a new conditional Wasserstein distance objective function by taking the label information into consideration to measure the distance between a given source domain and the target domain. We also develop an attention scheme to compute the transferred weights of different source domains based on their conditional Wasserstein distances to the target domain. After that, the transferred weights can be used to reweight the source data to determine their importance in knowledge transfer. We conduct comprehensive experiments on several real-world data sets, and the results demonstrate the effectiveness and efficiency of the proposed method.
Collapse
Affiliation(s)
- Hanrui Wu
- South China University of Technology, Guangzhou, China
| | - Yuguang Yan
- The University of Hong Kong, Hong Kong, China
| | | | - Qingyao Wu
- South China University of Technology, Guangzhou, China
| |
Collapse
|
26
|
Zhang F, Zhang T, Mao Q, Xu C. A Unified Deep Model for Joint Facial Expression Recognition, Face Synthesis, and Face Alignment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6574-6589. [PMID: 32396088 DOI: 10.1109/tip.2020.2991549] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Facial expression recognition, face synthesis, and face alignment are three coherently related tasks and can be solved in a joint framework. To achieve this goal, in this paper, we propose a novel end-to-end deep learning model by exploiting the expression code, geometry code and generated data jointly for simultaneous pose-invariant facial expression recognition, face image synthesis, and face alignment. The proposed deep model enjoys several merits. First, to the best of our knowledge, this is the first work to address these three tasks jointly in a unified deep model to complement and enhance each other. Second, the proposed model can effectively disentangle the global and local identity representation from different expression and geometry codes. As a result, it can automatically generate facial images with different expressions under arbitrary geometry codes. Third, these three tasks can further boost their performance for each other via our model. Extensive experimental results on three standard benchmarks demonstrate that the proposed deep model performs favorably against state-of-the-art methods on the three tasks.
Collapse
|
27
|
Ertugrul IO, Cohn JF, Jeni LA, Zhang Z, Yin L, Ji Q. Crossing Domains for AU Coding: Perspectives, Approaches, and Measures. IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE 2020; 2:158-171. [PMID: 32377637 PMCID: PMC7202467 DOI: 10.1109/tbiom.2020.2977225] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Facial action unit (AU) detectors have performed well when trained and tested within the same domain. How well do AU detectors transfer to domains in which they have not been trained? We review literature on cross-domain transfer and conduct experiments to address limitations of prior research. We evaluate generalizability in four publicly available databases. EB+ (an expanded version of BP4D+), Sayette GFT, DISFA and UNBC Shoulder Pain (SP). The databases differ in observational scenarios, context, participant diversity, range of head pose, video resolution, and AU base rates. In most cases performance decreased with change in domain, often to below the threshold needed for behavioral research. However, exceptions were noted. Deep and shallow approaches generally performed similarly and average results were slightly better for deep model compared to shallow one. Occlusion sensitivity maps revealed that local specificity was greater for AU detection within than cross domains. The findings suggest that more varied domains and deep learning approaches may be better suited for generalizability and suggest the need for more attention to characteristics that vary between domains. Until further improvement is realized, caution is warranted when applying AU classifiers from one domain to another.
Collapse
Affiliation(s)
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - László A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zheng Zhang
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Lijun Yin
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Qiang Ji
- Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
28
|
Wu H, Yan Y, Ye Y, Ng MK, Wu Q. Geometric Knowledge Embedding for unsupervised domain adaptation. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105155] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Zhang F, Zhang T, Mao Q, Xu C. Geometry Guided Pose-invariant Facial Expression Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4445-4460. [PMID: 32070956 DOI: 10.1109/tip.2020.2972114] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Driven by recent advances in human-centered computing, Facial Expression Recognition (FER) has attracted significant attention in many applications. However, most conventional approaches either perform face frontalization on a non-frontal facial image or learn separate classifier for each pose. Different from existing methods, this paper proposes an end-to-end deep learning model that allows to simultaneous facial image synthesis and pose-invariant facial expression recognition by exploiting shape geometry of the face image. The proposed model is based on generative adversarial network (GAN) and enjoys several merits. First, given an input face and a target pose and expression designated by a set of facial landmarks, an identity-preserving face can be generated through guiding by the target pose and expression. Second, the identity representation is explicitly disentangled from both expression and pose variations through the shape geometry delivered by facial landmarks. Third, our model can automatically generate face images with different expressions and poses in a continuous way to enlarge and enrich the training set for the FER task. Our approach is demonstrated to perform well when compared with state-of-the-art algorithms on both controlled and in-the-wild benchmark datasets including Multi-PIE, BU-3DFE, and SFEW.
Collapse
|
30
|
|
31
|
Yi J, Chen A, Cai Z, Sima Y, Zhou M, Wu X. Facial expression recognition of intercepted video sequences based on feature point movement trend and feature block texture variation. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105540] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
32
|
Doborjeh M, Kasabov N, Doborjeh Z, Enayatollahi R, Tu E, Gandomi AH. Personalised modelling with spiking neural networks integrating temporal and static information. Neural Netw 2019; 119:162-177. [PMID: 31446235 DOI: 10.1016/j.neunet.2019.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 07/19/2019] [Accepted: 07/25/2019] [Indexed: 10/26/2022]
Abstract
This paper proposes a new personalised prognostic/diagnostic system that supports classification, prediction and pattern recognition when both static and dynamic/spatiotemporal features are presented in a dataset. The system is based on a proposed clustering method (named d2WKNN) for optimal selection of neighbouring samples to an individual with respect to the integration of both static (vector-based) and temporal individual data. The most relevant samples to an individual are selected to train a Personalised Spiking Neural Network (PSNN) that learns from sets of streaming data to capture the space and time association patterns. The generated time-dependant patterns resulted in a higher accuracy of classification/prediction (80% to 93%) when compared with global modelling and conventional methods. In addition, the PSNN models can support interpretability by creating personalised profiling of an individual. This contributes to a better understanding of the interactions between features. Therefore, an end-user can comprehend what interactions in the model have led to a certain decision (outcome). The proposed PSNN model is an analytical tool, applicable to several real-life health applications, where different data domains describe a person's health condition. The system was applied to two case studies: (1) classification of spatiotemporal neuroimaging data for the investigation of individual response to treatment and (2) prediction of risk of stroke with respect to temporal environmental data. For both datasets, besides the temporal data, static health data were also available. The hyper-parameters of the proposed system, including the PSNN models and the d2WKNN clustering parameters, are optimised for each individual.
Collapse
Affiliation(s)
- Maryam Doborjeh
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland, New Zealand; Computer Science Department, Auckland University of Technology, New Zealand.
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland, New Zealand; Computer Science Department, Auckland University of Technology, New Zealand
| | - Zohreh Doborjeh
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland, New Zealand
| | - Reza Enayatollahi
- BioDesign Lab, School of Engineering, Computer & Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand
| | - Enmei Tu
- School of Electronics, Information & Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Amir H Gandomi
- Faculty of Engineering & Information Technology, University of Technology, Sydney, Ultimo, NSW 2007, Australia; School of Business, Stevens Institute of Technology, Hoboken, NJ 07030, USA
| |
Collapse
|
33
|
Barrett LF, Adolphs R, Marsella S, Martinez A, Pollak SD. Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements. Psychol Sci Public Interest 2019; 20:1-68. [PMID: 31313636 PMCID: PMC6640856 DOI: 10.1177/1529100619832930] [Citation(s) in RCA: 384] [Impact Index Per Article: 76.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It is commonly assumed that a person's emotional state can be readily inferred from his or her facial movements, typically called emotional expressions or facial expressions. This assumption influences legal judgments, policy decisions, national security protocols, and educational practices; guides the diagnosis and treatment of psychiatric illness, as well as the development of commercial applications; and pervades everyday social interactions as well as research in other scientific fields such as artificial intelligence, neuroscience, and computer vision. In this article, we survey examples of this widespread assumption, which we refer to as the common view, and we then examine the scientific evidence that tests this view, focusing on the six most popular emotion categories used by consumers of emotion research: anger, disgust, fear, happiness, sadness, and surprise. The available scientific evidence suggests that people do sometimes smile when happy, frown when sad, scowl when angry, and so on, as proposed by the common view, more than what would be expected by chance. Yet how people communicate anger, disgust, fear, happiness, sadness, and surprise varies substantially across cultures, situations, and even across people within a single situation. Furthermore, similar configurations of facial movements variably express instances of more than one emotion category. In fact, a given configuration of facial movements, such as a scowl, often communicates something other than an emotional state. Scientists agree that facial movements convey a range of information and are important for social communication, emotional or otherwise. But our review suggests an urgent need for research that examines how people actually move their faces to express emotions and other social information in the variety of contexts that make up everyday life, as well as careful study of the mechanisms by which people perceive instances of emotion in one another. We make specific research recommendations that will yield a more valid picture of how people move their faces to express emotions and how they infer emotional meaning from facial movements in situations of everyday life. This research is crucial to provide consumers of emotion research with the translational information they require.
Collapse
Affiliation(s)
- Lisa Feldman Barrett
- Northeastern University, Department of Psychology, Boston, MA
- Massachusetts General Hospital, Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Charlestown, MA
- Harvard Medical School, Department of Psychiatry, Boston MA
| | - Ralph Adolphs
- California Institute of Technology, Departments of Psychology, Neuroscience, and Biology,Pasadena, CA
| | - Stacy Marsella
- Northeastern University, Department of Psychology, Boston, MA
- Northeastern University, College of Computer and Information Science, Boston, MA
- University of Glasgow, Glasgow, Scotland
| | - Aleix Martinez
- The Ohio State University, Department of Electrical and Computer Engineering, and Center for Cognitive and Brain Sciences, Columbus, OH
| | - Seth D. Pollak
- University of Wisconsin - Madison, Department of Psychology, Madison, WI
| |
Collapse
|
34
|
Learning Domain-Independent Deep Representations by Mutual Information Minimization. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2019; 2019:9414539. [PMID: 31316558 PMCID: PMC6604496 DOI: 10.1155/2019/9414539] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 05/01/2019] [Accepted: 05/21/2019] [Indexed: 11/17/2022]
Abstract
Domain transfer learning aims to learn common data representations from a source domain and a target domain so that the source domain data can help the classification of the target domain. Conventional transfer representation learning imposes the distributions of source and target domain representations to be similar, which heavily relies on the characterization of the distributions of domains and the distribution matching criteria. In this paper, we proposed a novel framework for domain transfer representation learning. Our motive is to make the learned representations of data points independent from the domains which they belong to. In other words, from an optimal cross-domain representation of a data point, it is difficult to tell which domain it is from. In this way, the learned representations can be generalized to different domains. To measure the dependency between the representations and the corresponding domain which the data points belong to, we propose to use the mutual information between the representations and the domain-belonging indicators. By minimizing such mutual information, we learn the representations which are independent from domains. We build a classwise deep convolutional network model as a representation model and maximize the margin of each data point of the corresponding class, which is defined over the intraclass and interclass neighborhood. To learn the parameters of the model, we construct a unified minimization problem where the margins are maximized while the representation-domain mutual information is minimized. In this way, we learn representations which are not only discriminate but also independent from domains. An iterative algorithm based on the Adam optimization method is proposed to solve the minimization to learn the classwise deep model parameters and the cross-domain representations simultaneously. Extensive experiments over benchmark datasets show its effectiveness and advantage over existing domain transfer learning methods.
Collapse
|
35
|
Ertugrul IO, Cohn JF, Jeni LA, Zhang Z, Yin L, Ji Q. Cross-domain AU Detection: Domains, Learning Approaches, and Measures. PROCEEDINGS OF THE ... INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION. IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION 2019; 2019:10.1109/FG.2019.8756543. [PMID: 31749665 PMCID: PMC6867108 DOI: 10.1109/fg.2019.8756543] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Facial action unit (AU) detectors have performed well when trained and tested within the same domain. Do AU detectors transfer to new domains in which they have not been trained? To answer this question, we review literature on cross-domain transfer and conduct experiments to address limitations of prior research. We evaluate both deep and shallow approaches to AU detection (CNN and SVM, respectively) in two large, well-annotated, publicly available databases, Expanded BP4D+ and GFT. The databases differ in observational scenarios, participant characteristics, range of head pose, video resolution, and AU base rates. For both approaches and databases, performance decreased with change in domain, often to below the threshold needed for behavioral research. Decreases were not uniform, however. They were more pronounced for GFT than for Expanded BP4D+ and for shallow relative to deep learning. These findings suggest that more varied domains and deep learning approaches may be better suited for promoting generalizability. Until further improvement is realized, caution is warranted when applying AU classifiers from one domain to another.
Collapse
Affiliation(s)
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - László A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zheng Zhang
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Lijun Yin
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Qiang Ji
- Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
36
|
|
37
|
|
38
|
Yan K, Zheng W, Cui Z, Zong Y, Zhang T, Tang C. Unsupervised facial expression recognition using domain adaptation based dictionary learning approach. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
39
|
Lu H, Shen C, Cao Z, Xiao Y, van den Hengel A. An Embarrassingly Simple Approach to Visual Domain Adaptation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3403-3417. [PMID: 29671743 DOI: 10.1109/tip.2018.2819503] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We show that it is possible to achieve high-quality domain adaptation without explicit adaptation. The nature of the classification problem means that when samples from the same class in different domains are sufficiently close, and samples from differing classes are separated by large enough margins, there is a high probability that each will be classified correctly. Inspired by this, we propose an embarrassingly simple yet effective approach to domain adaptation-only the class mean is used to learn class-specific linear projections. Learning these projections is naturally cast into a linear-discriminant-analysis-like framework, which gives an efficient, closed form solution. Furthermore, to enable to application of this approach to unsupervised learning, an iterative validation strategy is developed to infer target labels. Extensive experiments on cross-domain visual recognition demonstrate that, even with the simplest formulation, our approach outperforms existing non-deep adaptation methods and exhibits classification performance comparable with that of modern deep adaptation methods. An analysis of potential issues effecting the practical application of the method is also described, including robustness, convergence, and the impact of small sample sizes.
Collapse
|
40
|
Ertugrul IO, Jeni LA, Cohn JF. FACSCaps: Pose-Independent Facial Action Coding with Capsules. CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. WORKSHOPS 2018; 2018:2211-2220. [PMID: 30944768 PMCID: PMC6443417 DOI: 10.1109/cvprw.2018.00287] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most automated facial expression analysis methods treat the face as a 2D object, flat like a sheet of paper. That works well provided images are frontal or nearly so. In real- world conditions, moderate to large head rotation is common and system performance to recognize expression degrades. Multi-view Convolutional Neural Networks (CNNs) have been proposed to increase robustness to pose, but they require greater model sizes and may generalize poorly across views that are not included in the training set. We propose FACSCaps architecture to handle multi-view and multi-label facial action unit (AU) detection within a single model that can generalize to novel views. Additionally, FACSCaps's ability to synthesize faces enables insights into what is leaned by the model. FACSCaps models video frames using matrix capsules, where hierarchical pose relationships between face parts are built into internal representations. The model is trained by jointly optimizing a multi-label loss and the reconstruction accuracy. FACSCaps was evaluated using the FERA 2017 facial expression dataset that includes spontaneous facial expressions in a wide range of head orientations. FACSCaps outperformed both state-of-the-art CNNs and their temporal extensions.
Collapse
Affiliation(s)
| | - Lászlό A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jeffrey F Cohn
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
41
|
Bina RW, Langevin JP. Closed Loop Deep Brain Stimulation for PTSD, Addiction, and Disorders of Affective Facial Interpretation: Review and Discussion of Potential Biomarkers and Stimulation Paradigms. Front Neurosci 2018; 12:300. [PMID: 29780303 PMCID: PMC5945819 DOI: 10.3389/fnins.2018.00300] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Accepted: 04/18/2018] [Indexed: 01/06/2023] Open
Abstract
The treatment of psychiatric diseases with Deep Brain Stimulation (DBS) is becoming more of a reality as studies proliferate the indications and targets for therapies. Opinions on the initial failures of DBS trials for some psychiatric diseases point to a certain lack of finesse in using an Open Loop DBS (OLDBS) system in these dynamic, cyclical pathologies. OLDBS delivers monomorphic input into dysfunctional brain circuits with modulation of that input via human interface at discrete time points with no interim modulation or adaptation to the changing circuit dynamics. Closed Loop DBS (CLDBS) promises dynamic, intrinsic circuit modulation based on individual physiologic biomarkers of dysfunction. Discussed here are several psychiatric diseases which may be amenable to CLDBS paradigms as the neurophysiologic dysfunction is stochastic and not static. Post-Traumatic Stress Disorder (PTSD) has several peripheral and central physiologic and neurologic changes preceding stereotyped hyper-activation behavioral responses. Biomarkers for CLDBS potentially include skin conductance changes indicating changes in the sympathetic nervous system, changes in serum and central neurotransmitter concentrations, and limbic circuit activation. Chemical dependency and addiction have been demonstrated to be improved with both ablation and DBS of the Nucleus Accumbens and as a serendipitous side effect of movement disorder treatment. Potential peripheral biomarkers are similar to those proposed for PTSD with possible use of environmental and geolocation based cues, peripheral signs of physiologic arousal, and individual changes in central circuit patterns. Non-substance addiction disorders have also been serendipitously treated in patients with OLDBS for movement disorders. As more is learned about these behavioral addictions, DBS targets and effectors will be identified. Finally, discussed is the use of facial recognition software to modulate activation of inappropriate responses for psychiatric diseases in which misinterpretation of social cues feature prominently. These include Autism Spectrum Disorder, PTSD, and Schizophrenia-all of which have a common feature of dysfunctional interpretation of facial affective clues. Technological advances and improvements in circuit-based, individual-specific, real-time adaptable modulation, forecast functional neurosurgery treatments for heretofore treatment-resistant behavioral diseases.
Collapse
Affiliation(s)
- Robert W Bina
- Division of Neurosurgery, Banner University Medical Center, Tucson, AZ, United States
| | - Jean-Phillipe Langevin
- Neurosurgery Service, VA Greater Los Angeles Healthcare System, Los Angeles, CA, United States.,Department of Neurosurgery, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
42
|
Tao J, Zhou D, Zhu B. Robust Latent Regression with discriminative regularization by leveraging auxiliary knowledge. Neural Netw 2018; 101:79-93. [DOI: 10.1016/j.neunet.2018.02.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 11/10/2017] [Accepted: 02/02/2018] [Indexed: 11/29/2022]
|
43
|
Zong Y, Zheng W, Huang X, Shi J, Cui Z, Zhao G. Domain Regeneration for Cross-Database Micro-Expression Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2484-2498. [PMID: 29994602 DOI: 10.1109/tip.2018.2797479] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recently, micro-expression recognition has attracted lots of researchers' attention due to its potential value in many practical applications, e.g., lie detection. In this paper, we investigate an interesting and challenging problem in micro-expression recognition, i.e., cross-database micro-expression recognition, in which the training and testing samples come from different micro-expression databases. Under this problem setting, the consistent feature distribution between the training and testing samples originally existing in conventional micro-expression recognition would be seriously broken and hence the performance of most current well-performing micro-expression recognition methods may sharply drop. In order to overcome it, we propose a simple yet effective framework called Domain Regeneration (DR) in this paper. DR framework aims at learning a domain regenerator to regenerate the micro-expression samples from source and target databases respectively such that they can abide by the same or similar feature distributions. Thus, we are able to use the classifier learned based on the labeled source micro-expression samples to predict the label information of the unlabeled target micro-expression samples. To evaluate the proposed DR framework, we conduct extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases. Experimental results show that compared with recent state-of-the-art cross-database emotion recognition methods, the proposed DR framework has more promising performance.
Collapse
|
44
|
Zeng J, Shan S, Chen X. Facial Expression Recognition with Inconsistently Annotated Datasets. COMPUTER VISION – ECCV 2018 2018. [DOI: 10.1007/978-3-030-01261-8_14] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
45
|
Hammal Z, Chu WS, Cohn JF, Heike C, Speltz ML. Automatic Action Unit Detection in Infants Using Convolutional Neural Network. INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION AND WORKSHOPS : [PROCEEDINGS]. ACII (CONFERENCE) 2017; 2017:216-221. [PMID: 29862131 PMCID: PMC5976252 DOI: 10.1109/acii.2017.8273603] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Action unit detection in infants relative to adults presents unique challenges. Jaw contour is less distinct, facial texture is reduced, and rapid and unusual facial movements are common. To detect facial action units in spontaneous behavior of infants, we propose a multi-label Convolutional Neural Network (CNN). Eighty-six infants were recorded during tasks intended to elicit enjoyment and frustration. Using an extension of FACS for infants (Baby FACS), over 230,000 frames were manually coded for ground truth. To control for chance agreement, inter-observer agreement between Baby-FACS coders was quantified using free-margin kappa. Kappa coefficients ranged from 0.79 to 0.93, which represents high agreement. The multi-label CNN achieved comparable agreement with manual coding. Kappa ranged from 0.69 to 0.93. Importantly, the CNN-based AU detection revealed the same change in findings with respect to infant expressiveness between tasks. While further research is needed, these findings suggest that automatic AU detection in infants is a viable alternative to manual coding of infant facial expression.
Collapse
Affiliation(s)
- Zakia Hammal
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| | - Wen-Sheng Chu
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| | - Jeffrey F Cohn
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
- Department of Psychology, University of Pittsburgh, Pittsburgh, USA
| | | | | |
Collapse
|
46
|
Lee SH, Kang J, Lee S. Enhanced particle-filtering framework for vessel segmentation and tracking. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 148:99-112. [PMID: 28774443 DOI: 10.1016/j.cmpb.2017.06.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 05/25/2017] [Accepted: 06/23/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVES A robust vessel segmentation and tracking method based on a particle-filtering framework is proposed to cope with increasing demand for a method that can detect and track vessel anomalies. METHODS We apply the level set method to segment the vessel boundary and a particle filter to track the position and shape variations in the vessel boundary between two adjacent slices. To enhance the segmentation and tracking performances, the importance density of the particle filter is localized by estimating the translation of an object's boundary. In addition, to minimize problems related to degeneracy and sample impoverishment in the particle filter, a newly proposed weighting policy is investigated. RESULTS Compared to conventional methods, the proposed algorithm demonstrates better segmentation and tracking performances. Moreover, the stringent weighting policy we proposed demonstrates a tendency of suppressing degeneracy and sample impoverishment, and higher tracking accuracy can be obtained. CONCLUSIONS The proposed method is expected to be applied to highly valuable applications for more accurate three-dimensional vessel tracking and rendering.
Collapse
Affiliation(s)
- Sang-Hoon Lee
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul, 120-749, Republic of Korea
| | - Jiwoo Kang
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul, 120-749, Republic of Korea
| | - Sanghoon Lee
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul, 120-749, Republic of Korea.
| |
Collapse
|
47
|
Girard JM, Chu WS, Jeni LA, Cohn JF, De la Torre F, Sayette MA. Sayette Group Formation Task (GFT) Spontaneous Facial Expression Database. PROCEEDINGS OF THE ... INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION. IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION 2017; 2017:581-588. [PMID: 29606916 PMCID: PMC5876025 DOI: 10.1109/fg.2017.144] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Despite the important role that facial expressions play in interpersonal communication and our knowledge that interpersonal behavior is influenced by social context, no currently available facial expression database includes multiple interacting participants. The Sayette Group Formation Task (GFT) database addresses the need for well-annotated video of multiple participants during unscripted interactions. The database includes 172,800 video frames from 96 participants in 32 three-person groups. To aid in the development of automated facial expression analysis systems, GFT includes expert annotations of FACS occurrence and intensity, facial landmark tracking, and baseline results for linear SVM, deep learning, active patch learning, and personalized classification. Baseline performance is quantified and compared using identical partitioning and a variety of metrics (including means and confidence intervals). The highest performance scores were found for the deep learning and active patch learning methods. Learn more at http://osf.io/7wcyz.
Collapse
Affiliation(s)
- Jeffrey M Girard
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260
| | - Wen-Sheng Chu
- Robotic Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | - László A Jeni
- Robotic Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260
- Robotic Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | | | - Michael A Sayette
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
48
|
Chu WS, De la Torre F, Cohn JF, Messinger DS. A Branch-and-Bound Framework for Unsupervised Common Event Discovery. Int J Comput Vis 2017; 123:372-391. [PMID: 28943718 DOI: 10.1007/s11263-017-0989-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Event discovery aims to discover a temporal segment of interest, such as human behavior, actions or activities. Most approaches to event discovery within or between time series use supervised learning. This becomes problematic when some relevant event labels are unknown, are difficult to detect, or not all possible combinations of events have been anticipated. To overcome these problems, this paper explores Common Event Discovery (CED), a new problem that aims to discover common events of variable-length segments in an unsupervised manner. A potential solution to CED is searching over all possible pairs of segments, which would incur a prohibitive quartic cost. In this paper, we propose an efficient branch-and-bound (B&B) framework that avoids exhaustive search while guaranteeing a globally optimal solution. To this end, we derive novel bounding functions for various commonality measures and provide extensions to multiple commonality discovery and accelerated search. The B&B framework takes as input any multidimensional signal that can be quantified into histograms. A generalization of the framework can be readily applied to discover events at the same or different times (synchrony and event commonality, respectively). We consider extensions to video search and supervised event detection. The effectiveness of the B&B framework is evaluated in motion capture of deliberate behavior and in video of spontaneous facial behavior in diverse interpersonal contexts: interviews, small groups of young adults, and parent-infant face-to-face interaction.
Collapse
Affiliation(s)
| | | | - Jeffrey F Cohn
- Robotics Institute, Carnegie Mellon University, USA
- Department of Psychology, University of Pittsburgh, USA
| | | |
Collapse
|