1
|
Lambiase PD, Rossi A, Rossi S. A Two-Tier GAN Architecture for Conditioned Expressions Synthesis on Categorical Emotions. Int J Soc Robot 2023. [DOI: 10.1007/s12369-023-00973-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
AbstractEmotions are an effective communication mode during human–human and human–robot interactions. However, while humans can easily understand other people’s emotions, and they are able to show emotions with natural facial expressions, robot-simulated emotions still represent an open challenge also due to a lack of naturalness and variety of possible expressions. In this direction, we present a two-tier Generative Adversarial Networks (GAN) architecture that generates facial expressions starting from categorical emotions (e.g. joy, sadness, etc.) to obtain a variety of synthesised expressions for each emotion. The proposed approach combines the key features of Conditional Generative Adversarial Networks (CGAN) and GANimation, overcoming their limits by allowing fine modelling of facial expressions, and generating a wide range of expressions for each class (i.e., discrete emotion). The architecture is composed of two modules for generating a synthetic Action Units (AU, i.e., a coding mechanism representing facial muscles and their activation) vector conditioned on a given emotion, and for applying an AU vector to a given image. The overall model is capable of modifying an image of a human face by modelling the facial expression to show a specific discrete emotion. Qualitative and quantitative measurements have been performed to evaluate the ability of the network to generate a variety of expressions that are consistent with the conditioned emotion. Moreover, we also collected people’s responses about the quality and the legibility of the produced expressions by showing them applied to images and a social robot.
Collapse
|
2
|
Hu Z, Zhang Y, Li Q, Lv C. Human–Machine Telecollaboration Accelerates the Safe Deployment of Large-Scale Autonomous Robots During the COVID-19 Pandemic. Front Robot AI 2022; 9:853828. [PMID: 35494540 PMCID: PMC9043527 DOI: 10.3389/frobt.2022.853828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 03/30/2022] [Indexed: 11/17/2022] Open
Affiliation(s)
- Zhongxu Hu
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, Nanyang, Singapore
| | - Yiran Zhang
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, Nanyang, Singapore
| | - Qinghua Li
- Autonomous Driving Lab, Alibaba DAMO Academy, Hangzhou, China
| | - Chen Lv
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, Nanyang, Singapore
- *Correspondence: Chen Lv,
| |
Collapse
|
3
|
Combining CNN and LSTM for activity of daily living recognition with a 3D matrix skeleton representation. INTEL SERV ROBOT 2021. [DOI: 10.1007/s11370-021-00358-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractIn socially assistive robotics, human activity recognition plays a central role when the adaptation of the robot behavior to the human one is required. In this paper, we present an activity recognition approach for activities of daily living based on deep learning and skeleton data. In the literature, ad hoc features extraction/selection algorithms with supervised classification methods have been deployed, reaching an excellent classification performance. Here, we propose a deep learning approach, combining CNN and LSTM, that exploits both the learning of spatial dependencies correlating the limbs in a skeleton 3D grid representation and the learning of temporal dependencies from instances with a periodic pattern that works on raw data and so without requiring an explicit feature extraction process. These models are proposed for real-time activity recognition, and they are tested on the CAD-60 dataset. Results show that the proposed model behaves better than an LSTM model thanks to the automatic features extraction of the limbs’ correlation. “New Person” results show that the CNN-LSTM model achieves $$95.4\%$$
95.4
%
of precision and $$94.4\%$$
94.4
%
of recall, while the “Have Seen” results are $$96.1\%$$
96.1
%
of precision and $$94.7\%$$
94.7
%
of recall.
Collapse
|