1
|
Wang J, Shen Y, Zhao J, Wang X, Chen Z, Han T, Huang Y, Wang Y, Zhao W, Wen W, Zhou X, Xu Y. Algorithmic and sensor-based research on Chinese children's and adolescents' screen use behavior and light environment. Front Public Health 2024; 12:1352759. [PMID: 38454995 PMCID: PMC10917963 DOI: 10.3389/fpubh.2024.1352759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 02/06/2024] [Indexed: 03/09/2024] Open
Abstract
Background Myopia poses a global health concern and is influenced by both genetic and environmental factors. The incidence of myopia tends to increase during infectious outbreaks, such as the COVID-19 pandemic. This study examined the screen-time behaviors among Chinese children and adolescents and investigated the efficacy of artificial intelligence (AI)-based alerts in modifying screen-time practices. Methods A cross-sectional analysis was performed using data from 6,716 children and adolescents with AI-enhanced tablets that monitored and recorded their behavior and environmental light during screen time. Results The median daily screen time of all participants was 58.82 min. Among all age groups, elementary-school students had the longest median daily screen time, which was 87.25 min and exceeded 4 h per week. Children younger than 2 years engaged with tablets for a median of 41.84 min per day. Learning accounted for 54.88% of participants' screen time, and 51.03% (3,390/6,643) of the participants used tablets for 1 h at an average distance <50 cm. The distance and posture alarms were triggered 807,355 and 509,199 times, respectively. In the study, 70.65% of the participants used the tablet under an illuminance of <300 lux during the day and 61.11% under an illuminance of <100 lux at night. The ambient light of 85.19% of the participants exceeded 4,000 K color temperature during night. Most incorrect viewing habits (65.49% in viewing distance; 86.48% in viewing posture) were rectified swiftly following AI notifications (all p < 0.05). Conclusion Young children are increasingly using digital screens, with school-age children and adolescents showing longer screen time than preschoolers. The study highlighted inadequate lighting conditions during screen use. AI alerts proved effective in prompting users to correct their screen-related behavior promptly.
Collapse
Affiliation(s)
- Jifang Wang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
- Department of Nursing, Eye & ENT Hospital, Fudan University, Shanghai, China
| | - Yang Shen
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Jing Zhao
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Xiaoying Wang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Zhi Chen
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Tian Han
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Yangyi Huang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Yuliang Wang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Wuxiao Zhao
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
- Center for Optometry and Visual Science, Guangxi Academy of Medical Sciences, Nanning, China
| | - Wen Wen
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Xingtao Zhou
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| | - Ye Xu
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Research Center of Ophthalmology and Optometry, Shanghai, China
- Shanghai Engineering Research Center of Laser and Autostereoscopic 3D for Vision Care, Shanghai, China
| |
Collapse
|
2
|
Li N, Ross R. Invoking and identifying task-oriented interlocutor confusion in human-robot interaction. Front Robot AI 2023; 10:1244381. [PMID: 38054199 PMCID: PMC10694506 DOI: 10.3389/frobt.2023.1244381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/31/2023] [Indexed: 12/07/2023] Open
Abstract
Successful conversational interaction with a social robot requires not only an assessment of a user's contribution to an interaction, but also awareness of their emotional and attitudinal states as the interaction unfolds. To this end, our research aims to systematically trigger, but then interpret human behaviors to track different states of potential user confusion in interaction so that systems can be primed to adjust their policies in light of users entering confusion states. In this paper, we present a detailed human-robot interaction study to prompt, investigate, and eventually detect confusion states in users. The study itself employs a Wizard-of-Oz (WoZ) style design with a Pepper robot to prompt confusion states for task-oriented dialogues in a well-defined manner. The data collected from 81 participants includes audio and visual data, from both the robot's perspective and the environment, as well as participant survey data. From these data, we evaluated the correlations of induced confusion conditions with multimodal data, including eye gaze estimation, head pose estimation, facial emotion detection, silence duration time, and user speech analysis-including emotion and pitch analysis. Analysis shows significant differences of participants' behaviors in states of confusion based on these signals, as well as a strong correlation between confusion conditions and participants own self-reported confusion scores. The paper establishes strong correlations between confusion levels and these observable features, and lays the ground or a more complete social and affect oriented strategy for task-oriented human-robot interaction. The contributions of this paper include the methodology applied, dataset, and our systematic analysis.
Collapse
Affiliation(s)
- Na Li
- School of Computer Science, Technological University, Dublin, Ireland
| | | |
Collapse
|
3
|
Zhong R, He L, Wang H, Yuan L, Li K, Liu Z. Attention-Guided Huber Loss for Head Pose Estimation Based on Improved Capsule Network. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1024. [PMID: 37509971 PMCID: PMC10378512 DOI: 10.3390/e25071024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 06/29/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023]
Abstract
Head pose estimation is an important technology for analyzing human behavior and has been widely researched and applied in areas such as human-computer interaction and fatigue detection. However, traditional head pose estimation networks suffer from the problem of easily losing spatial structure information, particularly in complex scenarios where occlusions and multiple object detections are common, resulting in low accuracy. To address the above issues, we propose a head pose estimation model based on the residual network and capsule network. Firstly, a deep residual network is used to extract features from three stages, capturing spatial structure information at different levels, and a global attention block is employed to enhance the spatial weight of feature extraction. To effectively avoid the loss of spatial structure information, the features are encoded and transmitted to the output using an improved capsule network, which is enhanced in its generalization ability through self-attention routing mechanisms. To enhance the robustness of the model, we optimize Huber loss, which is first used in head pose estimation. Finally, experiments are conducted on three popular public datasets, 300W-LP, AFLW2000, and BIWI. The results demonstrate that the proposed method achieves state-of-the-art results, particularly in scenarios with occlusions.
Collapse
Affiliation(s)
- Runhao Zhong
- School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
| | - Li He
- School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
| | - Hongwei Wang
- School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
| | - Liang Yuan
- School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
- School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Kexin Li
- School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
| | - Zhening Liu
- School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
| |
Collapse
|
4
|
Xu H, Zhang J, Sun H, Qi M, Kong J. Analyzing students' attention by gaze tracking and object detection in classroom teaching. DATA TECHNOLOGIES AND APPLICATIONS 2023. [DOI: 10.1108/dta-09-2021-0236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
PurposeAttention is one of the most important factors to affect the academic performance of students. Effectively analyzing students' attention in class can promote teachers' precise teaching and students' personalized learning. To intelligently analyze the students' attention in classroom from the first-person perspective, this paper proposes a fusion model based on gaze tracking and object detection. In particular, the proposed attention analysis model does not depend on any smart equipment.Design/methodology/approachGiven a first-person view video of students' learning, the authors first estimate the gazing point by using the deep space–time neural network. Second, single shot multi-box detector and fast segmentation convolutional neural network are comparatively adopted to accurately detect the objects in the video. Third, they predict the gazing objects by combining the results of gazing point estimation and object detection. Finally, the personalized attention of students is analyzed based on the predicted gazing objects and the measurable eye movement criteria.FindingsA large number of experiments are carried out on a public database and a new dataset that is built in a real classroom. The experimental results show that the proposed model not only can accurately track the students' gazing trajectory and effectively analyze the fluctuation of attention of the individual student and all students but also provide a valuable reference to evaluate the process of learning of students.Originality/valueThe contributions of this paper can be summarized as follows. The analysis of students' attention plays an important role in improving teaching quality and student achievement. However, there is little research on how to automatically and intelligently analyze students' attention. To alleviate this problem, this paper focuses on analyzing students' attention by gaze tracking and object detection in classroom teaching, which is significant for practical application in the field of education. The authors proposed an effectively intelligent fusion model based on the deep neural network, which mainly includes the gazing point module and the object detection module, to analyze students' attention in classroom teaching instead of relying on any smart wearable device. They introduce the attention mechanism into the gazing point module to improve the performance of gazing point detection and perform some comparison experiments on the public dataset to prove that the gazing point module can achieve better performance. They associate the eye movement criteria with visual gaze to get quantifiable objective data for students' attention analysis, which can provide a valuable basis to evaluate the learning process of students, provide useful learning information of students for both parents and teachers and support the development of individualized teaching. They built a new database that contains the first-person view videos of 11 subjects in a real classroom and employ it to evaluate the effectiveness and feasibility of the proposed model.
Collapse
|
5
|
Eyvazpour R, Shoaran M, Karimian G. Hardware implementation of SLAM algorithms: a survey on implementation approaches and platforms. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10310-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Vankayalapati HD, Kuchibhotla S, Chadalavada MSK, Dargar SK, Anne KR, Kyandoghere K. A Novel Zernike Moment-Based Real-Time Head Pose and Gaze Estimation Framework for Accuracy-Sensitive Applications. SENSORS (BASEL, SWITZERLAND) 2022; 22:8449. [PMID: 36366147 PMCID: PMC9658879 DOI: 10.3390/s22218449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/02/2022] [Accepted: 10/28/2022] [Indexed: 06/16/2023]
Abstract
A real-time head pose and gaze estimation (HPGE) algorithm has excellent potential for technological advancements either in human-machine or human-robot interactions. For example, in high-accuracy advent applications such as Driver's Assistance System (DAS), HPGE plays a crucial role in omitting accidents and road hazards. In this paper, the authors propose a new hybrid framework for improved estimation by combining both the appearance and geometric-based conventional methods to extract local and global features. Therefore, the Zernike moments algorithm has been prominent in extracting rotation, scale, and illumination invariant features. Later, conventional discriminant algorithms were used to classify the head poses and gaze direction. Furthermore, the experiments were performed on standard datasets and real-time images to analyze the accuracy of the proposed algorithm. As a result, the proposed framework has immediately estimated the range of direction changes under different illumination conditions. We obtained an accuracy of ~85%; the average response time was 21.52 and 7.483 ms for estimating head poses and gaze, respectively, independent of illumination, background, and occlusion. The proposed method is promising for future developments of a robust system that is invariant even to blurring conditions and thus reaching much more significant performance enhancement.
Collapse
Affiliation(s)
- Hima Deepthi Vankayalapati
- Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankovil 626126, India
| | - Swarna Kuchibhotla
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram 522302, India
| | - Mohan Sai Kumar Chadalavada
- Department of Electronics and Communication Engineering, VelTech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai 600062, India
| | - Shashi Kant Dargar
- Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankovil 626126, India
| | - Koteswara Rao Anne
- Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankovil 626126, India
| | - Kyamakya Kyandoghere
- Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt am Wörthersee, Austria
| |
Collapse
|
7
|
Thai C, Tran V, Bui M, Nguyen D, Ninh H, Tran H. Real-time masked face classification and head pose estimation for RGB facial image via knowledge distillation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
8
|
Hammadi Y, Grondin F, Ferland F, Lebel K. Evaluation of Various State of the Art Head Pose Estimation Algorithms for Clinical Scenarios. SENSORS (BASEL, SWITZERLAND) 2022; 22:6850. [PMID: 36146199 PMCID: PMC9502716 DOI: 10.3390/s22186850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 08/17/2022] [Accepted: 08/29/2022] [Indexed: 06/16/2023]
Abstract
Head pose assessment can reveal important clinical information on human motor control. Quantitative assessment have the potential to objectively evaluate head pose and movements' specifics, in order to monitor the progression of a disease or the effectiveness of a treatment. Optoelectronic camera-based motion-capture systems, recognized as a gold standard in clinical biomechanics, have been proposed for head pose estimation. However, these systems require markers to be positioned on the person's face which is impractical for everyday clinical practice. Furthermore, the limited access to this type of equipment and the emerging trend to assess mobility in natural environments support the development of algorithms capable of estimating head orientation using off-the-shelf sensors, such as RGB cameras. Although artificial vision is a popular field of research, limited validation of human pose estimation based on image recognition suitable for clinical applications has been performed. This paper first provides a brief review of available head pose estimation algorithms in the literature. Current state-of-the-art head pose algorithms designed to capture the facial geometry from videos, OpenFace 2.0, MediaPipe and 3DDFA_V2, are then further evaluated and compared. Accuracy is assessed by comparing both approaches to a baseline, measured with an optoelectronic camera-based motion-capture system. Results reveal a mean error lower or equal to 5.6∘ for 3DDFA_V2 depending on the plane of movement, while the mean error reaches 14.1∘ and 11.0∘ for OpenFace 2.0 and MediaPipe, respectively. This demonstrates the superiority of the 3DDFA_V2 algorithm in estimating head pose, in different directions of motion, and suggests that this algorithm can be used in clinical scenarios.
Collapse
Affiliation(s)
- Yassine Hammadi
- Department of Electrical and Computer Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Research Center on Aging, Sherbrooke, QC J1H 4C4, Canada
| | - François Grondin
- Department of Electrical and Computer Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Research Center on Aging, Sherbrooke, QC J1H 4C4, Canada
- Interdisciplinary Institute for Technological Innovation (3IT), Université de Sherbrooke, Sherbrooke, QC J1K 0A5, Canada
| | - François Ferland
- Department of Electrical and Computer Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Interdisciplinary Institute for Technological Innovation (3IT), Université de Sherbrooke, Sherbrooke, QC J1K 0A5, Canada
| | - Karina Lebel
- Department of Electrical and Computer Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Research Center on Aging, Sherbrooke, QC J1H 4C4, Canada
| |
Collapse
|
9
|
An improved hand gesture recognition system using keypoints and hand bounding boxes. ARRAY 2022. [DOI: 10.1016/j.array.2022.100251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
10
|
Ahmad MI, Refik R. “No Chit Chat!” A Warning From a Physical Versus Virtual Robot Invigilator: Which Matters Most? Front Robot AI 2022; 9:908013. [PMID: 35937616 PMCID: PMC9355029 DOI: 10.3389/frobt.2022.908013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 06/14/2022] [Indexed: 11/25/2022] Open
Abstract
Past work has not considered social robots as proctors or monitors to prevent cheating or maintain discipline in the context of exam invigilation with adults. Further, we do not see an investigation into the role of invigilation for the robot presented in two different embodiments (physical vs. virtual). We demonstrate a system that enables a robot (physical and virtual) to act as an invigilator and deploy an exam setup with two participants completing a programming task. We conducted two studies (an online video-based survey and an in-person evaluation) to understand participants’ perceptions of the invigilator robot presented in two different embodiments. Additionally, we investigated whether participants showed cheating behaviours in one condition more than the other. The findings showed that participants’ ratings did not differ significantly. Further, participants were more talkative in the virtual robot condition compared to the physical robot condition. These findings are promising and call for further research into the invigilation role of social robots in more subtle and complex exam-like settings.
Collapse
|
11
|
An Improved Tiered Head Pose Estimation Network with Self-Adjust Loss Function. ENTROPY 2022; 24:e24070974. [PMID: 35885197 PMCID: PMC9320982 DOI: 10.3390/e24070974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 07/11/2022] [Accepted: 07/12/2022] [Indexed: 12/10/2022]
Abstract
As an important task in computer vision, head pose estimation has been widely applied in both academia and industry. However, there remains two challenges in the field of head pose estimation: (1) even given the same task (e.g., tiredness detection), the existing algorithms usually consider the estimation of the three angles (i.e., roll, yaw, and pitch) as separate facets, which disregard their interplay as well as differences and thus share the same parameters for all layers; and (2) the discontinuity in angle estimation definitely reduces the accuracy. To solve these two problems, a THESL-Net (tiered head pose estimation with self-adjust loss network) model is proposed in this study. Specifically, first, an idea of stepped estimation using distinct network layers is proposed, gaining a greater freedom during angle estimation. Furthermore, the reasons for the discontinuity in angle estimation are revealed, including not only labeling the dataset with quaternions or Euler angles, but also the loss function that simply adds the classification and regression losses. Subsequently, a self-adjustment constraint on the loss function is applied, making the angle estimation more consistent. Finally, to examine the influence of different angle ranges on the proposed model, experiments are conducted on three popular public benchmark datasets, BIWI, AFLW2000, and UPNA, demonstrating that the proposed model outperforms the state-of-the-art approaches.
Collapse
|
12
|
Zeng D, Wu Z, Ding C, Ren Z, Yang Q, Xie S. Labeled-Robust Regression: Simultaneous Data Recovery and Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5026-5039. [PMID: 33151887 DOI: 10.1109/tcyb.2020.3026101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Rank minimization is widely used to extract low-dimensional subspaces. As a convex relaxation of the rank minimization, the problem of nuclear norm minimization has been attracting widespread attention. However, the standard nuclear norm minimization usually results in overcompression of data in all subspaces and eliminates the discrimination information between different categories of data. To overcome these drawbacks, in this article, we introduce the label information into the nuclear norm minimization problem and propose a labeled-robust principal component analysis (L-RPCA) to realize nuclear norm minimization on multisubspace data. Compared with the standard nuclear norm minimization, our method can effectively utilize the discriminant information in multisubspace rank minimization and avoid excessive elimination of local information and multisubspace characteristics of the data. Then, an effective labeled-robust regression (L-RR) method is proposed to simultaneously recover the data and labels of the observed data. Experiments on real datasets show that our proposed methods are superior to other state-of-the-art methods.
Collapse
|
13
|
Zohary E, Harari D, Ullman S, Ben-Zion I, Doron R, Attias S, Porat Y, Sklar AY, Mckyton A. Gaze following requires early visual experience. Proc Natl Acad Sci U S A 2022; 119:e2117184119. [PMID: 35549552 PMCID: PMC9171757 DOI: 10.1073/pnas.2117184119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 03/03/2022] [Indexed: 11/18/2022] Open
Abstract
Gaze understanding—a suggested precursor for understanding others’ intentions—requires recovery of gaze direction from the observed person's head and eye position. This challenging computation is naturally acquired at infancy without explicit external guidance, but can it be learned later if vision is extremely poor throughout early childhood? We addressed this question by studying gaze following in Ethiopian patients with early bilateral congenital cataracts diagnosed and treated by us only at late childhood. This sight restoration provided a unique opportunity to directly address basic issues on the roles of “nature” and “nurture” in development, as it caused a selective perturbation to the natural process, eliminating some gaze-direction cues while leaving others still available. Following surgery, the patients’ visual acuity typically improved substantially, allowing discrimination of pupil position in the eye. Yet, the patients failed to show eye gaze-following effects and fixated less than controls on the eyes—two spontaneous behaviors typically seen in controls. Our model for unsupervised learning of gaze direction explains how head-based gaze following can develop under severe image blur, resembling preoperative conditions. It also suggests why, despite acquiring sufficient resolution to extract eye position, automatic eye gaze following is not established after surgery due to lack of detailed early visual experience. We suggest that visual skills acquired in infancy in an unsupervised manner will be difficult or impossible to acquire when internal guidance is no longer available, even when sufficient image resolution for the task is restored. This creates fundamental barriers to spontaneous vision recovery following prolonged deprivation in early age.
Collapse
Affiliation(s)
- Ehud Zohary
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Daniel Harari
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Shimon Ullman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Itay Ben-Zion
- Department of Ophthalmology, Padeh Medical Center, Poriya 15208, Israel
| | - Ravid Doron
- Department of Optometry and Vision Science, Hadassah Academic College, Jerusalem 91010, Israel
| | - Sara Attias
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Yuval Porat
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Asael Y. Sklar
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Ayelet Mckyton
- Neurology Department, Hadassah Medical Organization and Faculty of Medicine, Jerusalem 91120, Israel
| |
Collapse
|
14
|
Face Image Analysis Using Machine Learning: A Survey on Recent Trends and Applications. ELECTRONICS 2022. [DOI: 10.3390/electronics11081210] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Human face image analysis using machine learning is an important element in computer vision. The human face image conveys information such as age, gender, identity, emotion, race, and attractiveness to both human and computer systems. Over the last ten years, face analysis methods using machine learning have received immense attention due to their diverse applications in various tasks. Although several methods have been reported in the last ten years, face image analysis still represents a complicated challenge, particularly for images obtained from ’in the wild’ conditions. This survey paper presents a comprehensive review focusing on methods in both controlled and uncontrolled conditions. Our work illustrates both merits and demerits of each method previously proposed, starting from seminal works on face image analysis and ending with the latest ideas exploiting deep learning frameworks. We show a comparison of the performance of the previous methods on standard datasets and also present some promising future directions on the topic.
Collapse
|
15
|
Geng X, Qian X, Huo Z, Zhang Y. Head Pose Estimation Based on Multivariate Label Distribution. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:1974-1991. [PMID: 33031033 DOI: 10.1109/tpami.2020.3029585] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Accurate ground-truth pose is essential to the training of most existing head pose estimation methods. However, in many cases, the "ground truth" pose is obtained in rather subjective ways, such as asking the subjects to stare at different markers on the wall. Thus it is better to use soft labels rather than explicit hard labels to indicate the pose of a face image. This paper proposes to associate a multivariate label distribution (MLD) to each image. An MLD covers a neighborhood around the original pose. Labeling the images with MLD can not only alleviate the problem of inaccurate pose labels, but also boost the training examples associated to each pose without actually increasing the total amount of training examples. Four algorithms are proposed to learn from MLD. Furthermore, an extension of MLD with the hierarchical structure is proposed to deal with fine-grained head pose estimation, which is named hierarchical multivariate label distribution (HMLD). Experimental results show that the MLD-based methods perform significantly better than the compared state-of-the-art head pose estimation algorithms. Moreover, the MLD-based methods appear much more robust against the label noise in the training set than the compared baseline methods.
Collapse
|
16
|
Real-Time Gender Recognition for Juvenile and Adult Faces. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1503188. [PMID: 35341170 PMCID: PMC8947889 DOI: 10.1155/2022/1503188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 12/09/2021] [Accepted: 01/18/2022] [Indexed: 11/18/2022]
Abstract
Facial gender recognition is a crucial research topic due to its comprehensive use cases, including a demographic gender survey, visitor profile identification, targeted advertisement, access control, security, and surveillance from CCTV. For these real-time applications, the face of a person can be oriented to any angle from the camera axis, and the person can be of any age group, including juveniles. A child’s face consists of immature craniofacial feature points in texture and edge compared to an adult face, making it very hard to recognize gender using the child’s face. Real-word faces captured in an unconstrained environment make the gender prediction system more complex to identify correctly due to orientation. These factors reduce the accuracy of the existing state-of-the-art models developed so far for real-time facial gender prediction. This paper presents the novelty of facial gender recognition for juveniles, adults, and unconstrained-oriented faces. The progressive calibration network (PCN) detects rotation-invariant faces in the proposed model. Then, a Gabor filter is applied to extract unique edge and texture features from the detected face. The Gabor filter is invariant to illumination and produces texture and edge features with redundant feature coefficients in large dimensions. Gabor has drawbacks such as redundancy and a large dimension resolved by the proposed meanDWT feature optimization method, which optimizes the system’s accuracy, the size of the model, and computational timing. The proposed feature engineering model is classified with different classifiers such as Naïve Bayes, Logistic Regression, SVM with linear, and RBF kernel. Its results are compared with the state-of-the-art techniques; detailed experimental analysis is presented and concluded to support the argument. We also present a review of approaches based on conventional and deep learning methods with their pros and cons for facial gender recognition on different datasets available for facial gender recognition.
Collapse
|
17
|
Detecting Groups and Estimating F-Formations for Social Human–Robot Interactions. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6030018] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The ability of a robot to detect and join groups of people is of increasing importance in social contexts, and for the collaboration between teams of humans and robots. In this paper, we propose a framework, autonomous group interactions for robots (AGIR), that endows a robot with the ability to detect such groups while following the principles of F-formations. Using on-board sensors, this method accounts for a wide spectrum of different robot systems, ranging from autonomous service robots to telepresence robots. The presented framework detects individuals, estimates their position and orientation, detects groups, determines their F-formations, and is able to suggest a position for the robot to enter the social group. For evaluation, two simulation scenes were developed based on the standard real-world datasets. The 1st scene is built with 20 virtual agents (VAs) interacting in 7 different groups of varying sizes and 3 different formations. The 2nd scene is built with 36 VAs, positioned in 13 different groups of varying sizes and 6 different formations. A model of a Pepper robot is used in both simulated scenes in randomly generated different positions. The ability for the robot to estimate orientation, detect groups, and estimate F-formations at various locations is used to determine the validation of the approaches. The obtained results show a high accuracy within each of the simulated scenarios and demonstrates that the framework is able to work from an egocentric view with a robot in real time.
Collapse
|
18
|
Sei M, Utsumi A, Yamazoe H, Lee JH. Personalized face-pose estimation network using incrementally updated face shape parameters. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02888-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
19
|
A Study on the Teaching Design of a Hybrid Civics Course Based on the Improved Attention Mechanism. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031243] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
As an important vehicle for moral education, the moral indicators of civics and political science textbooks are naturally some of the most important criteria for revising textbooks. However, the textbook text dataset has too much textual information, ambiguous features, unbalanced sample distributions, etc. To address these problems, this paper combines a novel data enhancement method to obtain classification results based on word vectors. Additionally, for the problem of unbalanced sample sizes, this paper proposes a network model based on the attention mechanism, which combines the ideas of SMOTE and EDA, and uses a self-built stop word list and synonym word forest to conduct synonym queries, achieve a few categories of oversampling, and randomly disrupt the sentence order and intra-sentence word order to build a balanced dataset. The experimental results also show that the data augmentation method used in this paper’s model can effectively improve the performance of the model, resulting in a higher boost in the F1-measure of the model. The model incorporating the attention mechanism has better model generalization compared to the one without the attention mechanism, as well as a significant advantage compared to the reference model in other settings. The experimental results show that, compared with the original text classifier, the scheme of this paper effectively improves the evaluation effect and the reliability design for teaching a civics course.
Collapse
|
20
|
Robot System Assistant (RoSA): Towards Intuitive Multi-Modal and Multi-Device Human-Robot Interaction. SENSORS 2022; 22:s22030923. [PMID: 35161671 PMCID: PMC8838571 DOI: 10.3390/s22030923] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/18/2022] [Accepted: 01/19/2022] [Indexed: 01/09/2023]
Abstract
This paper presents an implementation of RoSA, a Robot System Assistant, for safe and intuitive human-machine interaction. The interaction modalities were chosen and previously reviewed using a Wizard of Oz study emphasizing a strong propensity for speech and pointing gestures. Based on these findings, we design and implement a new multi-modal system for contactless human-machine interaction based on speech, facial, and gesture recognition. We evaluate our proposed system in an extensive study with multiple subjects to examine the user experience and interaction efficiency. It reports that our method achieves similar usability scores compared to the entirely human remote-controlled robot interaction in our Wizard of Oz study. Furthermore, our framework’s implementation is based on the Robot Operating System (ROS), allowing modularity and extendability for our multi-device and multi-user method.
Collapse
|
21
|
Yuan G, Wang Y, Yan H, Fu X. Self-calibrated driver gaze estimation via gaze pattern learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
22
|
Barra P, Distasi R, Pero C, Ricciardi S, Tucci M. Gradient boosting regression for faster Partitioned Iterated Function Systems‐based head pose estimation. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Paola Barra
- Department of Computer Science Sapienza University of Rome Rome Italy
| | - Riccardo Distasi
- Department of Computer Science University of Salerno Salerno Italy
| | - Chiara Pero
- Department of Computer Science University of Salerno Salerno Italy
| | - Stefano Ricciardi
- Department of Biosciences and Territory University of Molise Pesche Italy
| | - Maurizio Tucci
- Department of Computer Science University of Salerno Salerno Italy
| |
Collapse
|
23
|
Malek S, Rossi S. Head pose estimation using facial-landmarks classification for children rehabilitation games. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.11.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Berral-Soler R, Madrid-Cuevas FJ, Muñoz-Salinas R, Marín-Jiménez MJ. RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05511-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
25
|
Pardoe HR, Martin SP, Zhao Y, George A, Yuan H, Zhou J, Liu W, Devinsky O. Estimation of in-scanner head pose changes during structural MRI using a convolutional neural network trained on eye tracker video. Magn Reson Imaging 2021; 81:101-108. [PMID: 34147591 DOI: 10.1016/j.mri.2021.06.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 05/06/2021] [Accepted: 06/15/2021] [Indexed: 10/21/2022]
Abstract
INTRODUCTION In-scanner head motion is a common cause of reduced image quality in neuroimaging, and causes systematic brain-wide changes in cortical thickness and volumetric estimates derived from structural MRI scans. There are few widely available methods for measuring head motion during structural MRI. Here, we train a deep learning predictive model to estimate changes in head pose using video obtained from an in-scanner eye tracker during an EPI-BOLD acquisition with participants undertaking deliberate in-scanner head movements. The predictive model was used to estimate head pose changes during structural MRI scans, and correlated with cortical thickness and subcortical volume estimates. METHODS 21 healthy controls (age 32 ± 13 years, 11 female) were studied. Participants carried out a series of stereotyped prompted in-scanner head motions during acquisition of an EPI-BOLD sequence with simultaneous recording of eye tracker video. Motion-affected and motion-free whole brain T1-weighted MRI were also obtained. Image coregistration was used to estimate changes in head pose over the duration of the EPI-BOLD scan, and used to train a predictive model to estimate head pose changes from the video data. Model performance was quantified by assessing the coefficient of determination (R2). We evaluated the utility of our technique by assessing the relationship between video-based head pose changes during structural MRI and (i) vertex-wise cortical thickness and (ii) subcortical volume estimates. RESULTS Video-based head pose estimates were significantly correlated with ground truth head pose changes estimated from EPI-BOLD imaging in a hold-out dataset. We observed a general brain-wide overall reduction in cortical thickness with increased head motion, with some isolated regions showing increased cortical thickness estimates with increased motion. Subcortical volumes were generally reduced in motion affected scans. CONCLUSIONS We trained a predictive model to estimate changes in head pose during structural MRI scans using in-scanner eye tracker video. The method is independent of individual image acquisition parameters and does not require markers to be to be fixed to the patient, suggesting it may be well suited to clinical imaging and research environments. Head pose changes estimated using our approach can be used as covariates for morphometric image analyses to improve the neurobiological validity of structural imaging studies of brain development and disease.
Collapse
Affiliation(s)
- Heath R Pardoe
- Comprehensive Epilepsy Center, Department of Neurology, NYU Grossman School of Medicine, New York, USA.
| | - Samantha P Martin
- Comprehensive Epilepsy Center, Department of Neurology, NYU Grossman School of Medicine, New York, USA
| | | | - Allan George
- Comprehensive Epilepsy Center, Department of Neurology, NYU Grossman School of Medicine, New York, USA
| | - Hui Yuan
- Fordham University, New York, USA
| | | | - Wei Liu
- Fordham University, New York, USA
| | - Orrin Devinsky
- Comprehensive Epilepsy Center, Department of Neurology, NYU Grossman School of Medicine, New York, USA
| |
Collapse
|
26
|
Gullapalli AR, Anderson NE, Yerramsetty R, Harenski CL, Kiehl KA. Quantifying the psychopathic stare: Automated assessment of head motion is related to antisocial traits in forensic interviews. JOURNAL OF RESEARCH IN PERSONALITY 2021. [DOI: 10.1016/j.jrp.2021.104093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
27
|
Pataky TC, Yagi M, Ichihashi N, Cox PG. Landmark-free, parametric hypothesis tests regarding two-dimensional contour shapes using coherent point drift registration and statistical parametric mapping. PeerJ Comput Sci 2021; 7:e542. [PMID: 34084938 PMCID: PMC8157043 DOI: 10.7717/peerj-cs.542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 04/22/2021] [Indexed: 06/12/2023]
Abstract
This paper proposes a computational framework for automated, landmark-free hypothesis testing of 2D contour shapes (i.e., shape outlines), and implements one realization of that framework. The proposed framework consists of point set registration, point correspondence determination, and parametric full-shape hypothesis testing. The results are calculated quickly (<2 s), yield morphologically rich detail in an easy-to-understand visualization, and are complimented by parametrically (or nonparametrically) calculated probability values. These probability values represent the likelihood that, in the absence of a true shape effect, smooth, random Gaussian shape changes would yield an effect as large as the observed one. This proposed framework nevertheless possesses a number of limitations, including sensitivity to algorithm parameters. As a number of algorithms and algorithm parameters could be substituted at each stage in the proposed data processing chain, sensitivity analysis would be necessary for robust statistical conclusions. In this paper, the proposed technique is applied to nine public datasets using a two-sample design, and an ANCOVA design is then applied to a synthetic dataset to demonstrate how the proposed method generalizes to the family of classical hypothesis tests. Extension to the analysis of 3D shapes is discussed.
Collapse
Affiliation(s)
- Todd C. Pataky
- Department of Human Health Sciences, Kyoto University, Kyoto, Japan
| | - Masahide Yagi
- Department of Human Health Sciences, Kyoto University, Kyoto, Japan
| | | | - Philip G. Cox
- Department of Archaeology, University of York, York, United Kingdom
- Hull York Medical School, University of York, York, United Kingdom
| |
Collapse
|
28
|
Liu T, Wang J, Yang B, Wang X. NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.090] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
29
|
Liu H, Nie H, Zhang Z, Li YF. Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.068] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
30
|
How frontal is a face? Quantitative estimation of face pose based on CNN and geometric projection. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05167-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
31
|
Liu L, Ke Z, Huo J, Chen J. Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image. SENSORS (BASEL, SWITZERLAND) 2021; 21:1841. [PMID: 33800750 PMCID: PMC7961623 DOI: 10.3390/s21051841] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/26/2021] [Accepted: 03/02/2021] [Indexed: 11/27/2022]
Abstract
Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D-2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D-2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing'04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing'04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.
Collapse
Affiliation(s)
- Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Zeran Ke
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
| | - Jiao Huo
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
| | - Jingying Chen
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
32
|
Bisogni C, Nappi M, Pero C, Ricciardi S. FASHE: A FrActal Based Strategy for Head Pose Estimation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3192-3203. [PMID: 33617454 DOI: 10.1109/tip.2021.3059409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Head pose estimation (HPE) represents a topic central to many relevant research fields and characterized by a wide application range. In particular, HPE performed using a singular RGB frame is particular suitable to be applied at best-frame-selection problems. This explains a growing interest witnessed by a large number of contributions, most of which exploit deep learning architectures and require extensive training sessions to achieve accuracy and robustness in estimating head rotations on three axes. However, methods alternative to machine learning approaches could be capable of similar if not better performance. To this regard, we present FASHE, an approach based on partitioned iterated function systems (PIFS) to represent auto-similarities within face image through a contractive affine function transforming the domain blocks extracted only once by a single frontal reference image, in a good approximation of the range blocks which the target image has been partitioned into. Pose estimation is achieved by finding the closest match between fractal code of target image and a reference array by means of Hamming distance. The results of experiments conducted exceed the state of the art on both Biwi and Ponting'04 datasets as well as approaching those of the best performing methods on the challenging AFLW2000 database. In addition, the applications to GOTCHA Video Dataset demonstrate that FASHE successfully operates in-the-wild.
Collapse
|
33
|
Abbas A, Yadav V, Smith E, Ramjas E, Rutter SB, Benavidez C, Koesmahargyo V, Zhang L, Guan L, Rosenfield P, Perez-Rodriguez M, Galatzer-Levy IR. Computer Vision-Based Assessment of Motor Functioning in Schizophrenia: Use of Smartphones for Remote Measurement of Schizophrenia Symptomatology. Digit Biomark 2021; 5:29-36. [PMID: 33615120 DOI: 10.1159/000512383] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 10/14/2020] [Indexed: 11/19/2022] Open
Abstract
Introduction Motor abnormalities have been shown to be a distinct component of schizophrenia symptomatology. However, objective and scalable methods for assessment of motor functioning in schizophrenia are lacking. Advancements in machine learning-based digital tools have allowed for automated and remote "digital phenotyping" of disease symptomatology. Here, we assess the performance of a computer vision-based assessment of motor functioning as a characteristic of schizophrenia using video data collected remotely through smartphones. Methods Eighteen patients with schizophrenia and 9 healthy controls were asked to remotely participate in smartphone-based assessments daily for 14 days. Video recorded from the smartphone front-facing camera during these assessments was used to quantify the Euclidean distance of head movement between frames through a pretrained computer vision model. The ability of head movement measurements to distinguish between patients and healthy controls as well as their relationship to schizophrenia symptom severity as measured through traditional clinical scores was assessed. Results The rate of head movement in participants with schizophrenia (1.48 mm/frame) and those without differed significantly (2.50 mm/frame; p = 0.01), and a logistic regression demonstrated that head movement was a significant predictor of schizophrenia diagnosis (p = 0.02). Linear regression between head movement and clinical scores of schizophrenia showed that head movement has a negative relationship with schizophrenia symptom severity (p = 0.04), primarily with negative symptoms of schizophrenia. Conclusions Remote, smartphone-based assessments were able to capture meaningful visual behavior for computer vision-based objective measurement of head movement. The measurements of head movement acquired were able to accurately classify schizophrenia diagnosis and quantify symptom severity in patients with schizophrenia.
Collapse
Affiliation(s)
| | | | - Emma Smith
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Elizabeth Ramjas
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Sarah B Rutter
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | | | | - Li Zhang
- AiCure, LLC, New York, New York, USA
| | - Lei Guan
- AiCure, LLC, New York, New York, USA
| | - Paul Rosenfield
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | | - Isaac R Galatzer-Levy
- AiCure, LLC, New York, New York, USA.,Psychiatry, New York University School of Medicine, New York, New York, USA
| |
Collapse
|
34
|
|
35
|
Driver Distraction Detection Method Based on Continuous Head Pose Estimation. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020. [DOI: 10.1155/2020/9606908] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In view of the fact that the detection of driver’s distraction is a burning issue, this study chooses the driver’s head pose as the evaluation parameter for driving distraction and proposes a driver distraction method based on the head pose. The effects of single regression and classification combined with regression are compared in terms of accuracy, and four kinds of classical networks are improved and trained using 300W-LP and AFLW datasets. The HPE_Resnet50 with the best accuracy is selected as the head pose estimator and applied to the ten-category distracted driving dataset SF3D to obtain 20,000 sets of head pose data. The differences between classes are discussed qualitatively and quantitatively. The analysis of variance shows that there is a statistically significant difference in head posture between safe driving and all kinds of distracted driving at 95% and 90% confidence levels, and the postures of all kinds of driving movements are distributed in a specific Euler angle range, which provides a characteristic basis for the design of subsequent recognition methods. In addition, according to the continuity of human movement, this paper also selects 90 drivers’ videos to analyze the difference in head pose between safe driving and distracted driving frame by frame. By calculating the spatial distance and sample statistics, the results provide the reference point, spatial range, and threshold of safe driving under this driving condition. Experimental results show that the average error of HPE_Resnet50 in AFLW2000 is 6.17° and that there is an average difference of 12.4° to 54.9° in the Euler angle between safe driving and nine kinds of distracted driving on SF3D.
Collapse
|
36
|
Agarwala R, Leube A, Wahl S. Utilizing minicomputer technology for low-cost photorefraction: a feasibility study. BIOMEDICAL OPTICS EXPRESS 2020; 11:6108-6121. [PMID: 33282478 PMCID: PMC7687974 DOI: 10.1364/boe.400720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/03/2020] [Accepted: 09/13/2020] [Indexed: 06/12/2023]
Abstract
Eccentric photorefraction is an objective technique to determine the refractive errors of the eye. To address the rise in prevalence of visual impairment, especially in rural areas, a minicomputer-based low-cost infrared photorefractor was developed using off-the-shelf hardware components. Clinical validation revealed that the developed infrared photorefractor demonstrated a linear working range between +4.0 D and -6.0 D at 50 cm. Further, measurement of astigmatism from human eye showed absolute error for cylinder of 0.3 D and high correlation for axis assessment. To conclude, feasibility was shown for a low-cost, portable and low-power driven stand-alone device to objectively determine refractive errors, showing potential for screening applications. The developed photorefractor creates a new avenue for telemedicine for ophthalmic measurements.
Collapse
Affiliation(s)
- Rajat Agarwala
- Institute for Ophthalmic Research, Eberhard Karls University Tuebingen, Elfriede-Aulhorn-Str. 7, Tuebingen, 72076, Germany
| | - Alexander Leube
- Institute for Ophthalmic Research, Eberhard Karls University Tuebingen, Elfriede-Aulhorn-Str. 7, Tuebingen, 72076, Germany
- Carl Zeiss Vision International GmbH, Turnstr. 27, Aalen, 73430, Germany
| | - Siegfried Wahl
- Institute for Ophthalmic Research, Eberhard Karls University Tuebingen, Elfriede-Aulhorn-Str. 7, Tuebingen, 72076, Germany
- Carl Zeiss Vision International GmbH, Turnstr. 27, Aalen, 73430, Germany
| |
Collapse
|
37
|
Orlandi S, Hotze F, Lim D, Estrada SG, Muir D, Friesen HA, Chau T. Customized Access Technology for Children using Head Movement Recognition .. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:1783-1786. [PMID: 33018344 DOI: 10.1109/embc44109.2020.9175747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Children with cerebral palsy and complex communication needs face limitations in their access technology (AT) usage. Speech recognition software and conventional ATs (e.g., mechanical switches) can be insufficient for those with speech impairment and limited control of voluntary motion. Automatic recognition of head movements represents a promising pathway. Previous studies have shown the robustness of head pose estimation algorithms on adult participants, but further research is needed to use these methods with children. An algorithm for head movement recognition was implemented and evaluated on videos recorded in a naturalistic environment when children were playing a videogame. A face-tracking algorithm was used to detect the main facial landmarks. Head poses were then estimated using the Pose from Orthography and Scaling with Iterations (POSIT) algorithm and three head movements were classified through Hidden Markov Models (HMMs). Preliminary classification results obtained from the analysis of videos of five typically developing children showed an accuracy of up to 95.6% in predicting head movements.
Collapse
|
38
|
Tan C, Ceballos G, Kasabov N, Puthanmadam Subramaniyam N. FusionSense: Emotion Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking Neural Network. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5328. [PMID: 32957655 PMCID: PMC7571195 DOI: 10.3390/s20185328] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/04/2020] [Accepted: 09/11/2020] [Indexed: 01/22/2023]
Abstract
Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.
Collapse
Affiliation(s)
- Clarence Tan
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
| | - Gerardo Ceballos
- School of Electrical Engineering, University of Los Andes, Merida 5101, Venezuela;
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
| | - Narayan Puthanmadam Subramaniyam
- Faculty of Medicine and Health Technology and BioMediTech Institute, Tampere University, 33520 Tampere, Finland;
- Department of Neuroscience and Biomedical Engineering, School of Science, Aalto University, 02150 Espoo, Finland
| |
Collapse
|
39
|
Learning from discrete Gaussian label distribution and spatial channel-aware residual attention for head pose estimation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
40
|
When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking. SENSORS 2020; 20:s20133739. [PMID: 32635375 PMCID: PMC7374327 DOI: 10.3390/s20133739] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 06/18/2020] [Accepted: 06/30/2020] [Indexed: 11/16/2022]
Abstract
The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific literature as gaze tracking. This has become a very hot topic in the field of computer vision during the last decades, with a surprising and continuously growing number of application fields. A very long journey has been made from the first pioneering works, and this continuous search for more accurate solutions process has been further boosted in the last decade when deep neural networks have revolutionized the whole machine learning area, and gaze tracking as well. In this arena, it is being increasingly useful to find guidance through survey/review articles collecting most relevant works and putting clear pros and cons of existing techniques, also by introducing a precise taxonomy. This kind of manuscripts allows researchers and technicians to choose the better way to move towards their application or scientific goals. In the literature, there exist holistic and specifically technological survey documents (even if not updated), but, unfortunately, there is not an overview discussing how the great advancements in computer vision have impacted gaze tracking. Thus, this work represents an attempt to fill this gap, also introducing a wider point of view that brings to a new taxonomy (extending the consolidated ones) by considering gaze tracking as a more exhaustive task that aims at estimating gaze target from different perspectives: from the eye of the beholder (first-person view), from an external camera framing the beholder's, from a third-person view looking at the scene where the beholder is placed in, and from an external view independent from the beholder.
Collapse
|
41
|
Stirling L, Kelty-Stephen D, Fineman R, Jones MLH, Daniel Park BK, Reed MP, Parham J, Choi HJ. Static, Dynamic, and Cognitive Fit of Exosystems for the Human Operator. HUMAN FACTORS 2020; 62:424-440. [PMID: 32004106 DOI: 10.1177/0018720819896898] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
OBJECTIVE To define static, dynamic, and cognitive fit and their interactions as they pertain to exosystems and to document open research needs in using these fit characteristics to inform exosystem design. BACKGROUND Initial exosystem sizing and fit evaluations are currently based on scalar anthropometric dimensions and subjective assessments. As fit depends on ongoing interactions related to task setting and user, attempts to tailor equipment have limitations when optimizing for this limited fit definition. METHOD A targeted literature review was conducted to inform a conceptual framework defining three characteristics of exosystem fit: static, dynamic, and cognitive. Details are provided on the importance of differentiating fit characteristics for developing exosystems. RESULTS Static fit considers alignment between human and equipment and requires understanding anthropometric characteristics of target users and geometric equipment features. Dynamic fit assesses how the human and equipment move and interact with each other, with a focus on the relative alignment between the two systems. Cognitive fit considers the stages of human-information processing, including somatosensation, executive function, and motor selection. Human cognitive capabilities should remain available to process task- and stimulus-related information in the presence of an exosystem. Dynamic and cognitive fit are operationalized in a task-specific manner, while static fit can be considered for predefined postures. CONCLUSION A deeper understanding of how an exosystem fits an individual is needed to ensure good human-system performance. Development of methods for evaluating different fit characteristics is necessary. APPLICATION Methods are presented to inform exosystem evaluation across physical and cognitive characteristics.
Collapse
Affiliation(s)
| | | | - Richard Fineman
- 2167 Harvard-MIT Health Science and Technology Program, Cambridge, MA, USA
| | - Monica L H Jones
- 1259 University of Michigan Transportation Research Institute, Ann Arbor, USA
| | | | - Matthew P Reed
- 1259 University of Michigan Transportation Research Institute, Ann Arbor, USA
| | - Joseph Parham
- 155353 U.S. Army Combat Capabilities Development Command Soldier Center, Natick, MA, USA
| | - Hyeg Joo Choi
- 155353 U.S. Army Combat Capabilities Development Command Soldier Center, Natick, MA, USA
| |
Collapse
|
42
|
Wang S, Li J, Yang P, Gao T, Bowers AR, Luo G. Towards Wide Range Tracking of Head Scanning Movement in Driving. INT J PATTERN RECOGN 2020; 34. [PMID: 34267412 DOI: 10.1142/s0218001420500330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Gaining environmental awareness through lateral head scanning (yaw rotations) is important for driving safety, especially when approaching intersections. Therefore, head scanning movements could be an important behavioral metric for driving safety research and driving risk mitigation systems. Tracking head scanning movements with a single in-car camera is preferred hardware-wise, but it is very challenging to track the head over almost a 180° range. In this paper we investigate two state-of-the-art methods, a multi-loss deep residual learning method with 50 layers (multi-loss ResNet-50) and an ORB feature-based simultaneous localization and mapping method (ORB-SLAM). While deep learning methods have been extensively studied for head pose detection, this is the first study in which SLAM has been employed to innovatively track head scanning over a very wide range. Our laboratory experimental results showed that ORB-SLAM was more accurate than multi-loss ResNet-50, which often failed when many facial features were not in the view. On the contrary, ORB-SLAM was able to continue tracking as it doesn't rely on particular facial features. Testing with real driving videos demonstrated the feasibility of using ORB-SLAM for tracking large lateral head scans in naturalistic video data.
Collapse
Affiliation(s)
- Shuhang Wang
- Schepens Eye Research Institute of Massachusetts Eye and Ear, Boston, MA, USA, Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | - Jianfeng Li
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Pengshuai Yang
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Tianxiao Gao
- Institute of Digital Media, Peking University, Beijing, 100871, China
| | - Alex R Bowers
- Schepens Eye Research Institute of Massachusetts Eye and Ear, Boston, MA, USA, Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | - Gang Luo
- Schepens Eye Research Institute of Massachusetts Eye and Ear, Boston, MA, USA, Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
43
|
Borghi G, Fabbri M, Vezzani R, Calderara S, Cucchiara R. Face-from-Depth for Head Pose Estimation on Depth Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:596-609. [PMID: 30530311 DOI: 10.1109/tpami.2018.2885472] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon +, that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second.
Collapse
|
44
|
Khan K, Attique M, Khan RU, Syed I, Chung TS. A Multi-Task Framework for Facial Attributes Classification through End-to-End Face Parsing and Deep Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2020; 20:E328. [PMID: 31935996 PMCID: PMC7014093 DOI: 10.3390/s20020328] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 12/29/2019] [Accepted: 12/30/2019] [Indexed: 11/17/2022]
Abstract
Human face image analysis is an active research area within computer vision. In this paper we propose a framework for face image analysis, addressing three challenging problems of race, age, and gender recognition through face parsing. We manually labeled face images for training an end-to-end face parsing model through Deep Convolutional Neural Networks. The deep learning-based segmentation model parses a face image into seven dense classes. We use the probabilistic classification method and created probability maps for each face class. The probability maps are used as feature descriptors. We trained another Convolutional Neural Network model by extracting features from probability maps of the corresponding class for each demographic task (race, age, and gender). We perform extensive experiments on state-of-the-art datasets and obtained much better results as compared to previous results.
Collapse
Affiliation(s)
- Khalil Khan
- Department of Electrical Engineering, University of Azad Jammu and Kashmir, Muzaffarabad 13100, Pakistan
- Intelligent Analytics Group (IAG), College of Computer, Qassim University, Al-Mulida 51431, Saudi Arabia
| | | | - Rehan Ullah Khan
- Department of Information Technology, College of Computer, Qassim University, Al-Mulida 51431, Saudi Arabia;
- Intelligent Analytics Group (IAG), College of Computer, Qassim University, Al-Mulida 51431, Saudi Arabia
| | - Ikram Syed
- Department of Computer Science, The Superior College, Lahore 54000, Pakistan;
| | - Tae-Sun Chung
- Department of Computer Engineering, Ajou University, Ajou 16499, Korea;
| |
Collapse
|
45
|
Singh J, Modi N. Use of information modelling techniques to understand research trends in eye gaze estimation methods: An automated review. Heliyon 2019; 5:e03033. [PMID: 31890964 PMCID: PMC6928306 DOI: 10.1016/j.heliyon.2019.e03033] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 10/22/2019] [Accepted: 12/10/2019] [Indexed: 10/31/2022] Open
Abstract
Eye gaze tracking has been used to study the influence of visual stimuli on consumer behavior and attentional processes. Eye gaze tracking techniques have made substantial contributions in advertisement design, human computer interaction, virtual reality and disease diagnosis. Eye gaze estimation is considered critical for prediction of human attention, and hence indispensable for better understanding human activities. In this paper, Latent Semantic Analysis is used to develop an information model for identifying emerging research trends within eye gaze estimation techniques. An exhaustive collection of 423 titles and abstracts of research papers published during 2005-2018 were used. Five major research areas and ten research trends were classified based upon this study.
Collapse
Affiliation(s)
- Jaiteg Singh
- Department of Computer Applications, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, 140401, India
| | - Nandini Modi
- Department of Computer Science and Engineering, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, 140401, India
| |
Collapse
|
46
|
|
47
|
Martinikorena I, Larumbe-Bergera A, Ariz M, Porta S, Cabeza R, Villanueva A. Low cost gaze estimation: knowledge-based solutions. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2328-2343. [PMID: 31634835 DOI: 10.1109/tip.2019.2946452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.
Collapse
|
48
|
Derkach D, Ruiz A, Sukno FM. Tensor Decomposition and Non-linear Manifold Modeling for 3D Head Pose Estimation. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01208-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
49
|
3D Approaches and Challenges in Facial Expression Recognition Algorithms—A Literature Review. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9183904] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
In recent years, facial expression analysis and recognition (FER) have emerged as an active research topic with applications in several different areas, including the human-computer interaction domain. Solutions based on 2D models are not entirely satisfactory for real-world applications, as they present some problems of pose variations and illumination related to the nature of the data. Thanks to technological development, 3D facial data, both still images and video sequences, have become increasingly used to improve the accuracy of FER systems. Despite the advance in 3D algorithms, these solutions still have some drawbacks that make pure three-dimensional techniques convenient only for a set of specific applications; a viable solution to overcome such limitations is adopting a multimodal 2D+3D analysis. In this paper, we analyze the limits and strengths of traditional and deep-learning FER techniques, intending to provide the research community an overview of the results obtained looking to the next future. Furthermore, we describe in detail the most used databases to address the problem of facial expressions and emotions, highlighting the results obtained by the various authors. The different techniques used are compared, and some conclusions are drawn concerning the best recognition rates achieved.
Collapse
|
50
|
Khan K, Attique M, Syed I, Sarwar G, Irfan MA, Khan RU. A Unified Framework for Head Pose, Age and Gender Classification through End-to-End Face Segmentation. ENTROPY 2019; 21:e21070647. [PMID: 33267361 PMCID: PMC7515140 DOI: 10.3390/e21070647] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 06/23/2019] [Accepted: 06/24/2019] [Indexed: 11/16/2022]
Abstract
Accurate face segmentation strongly benefits the human face image analysis problem. In this paper we propose a unified framework for face image analysis through end-to-end semantic face segmentation. The proposed framework contains a set of stack components for face understanding, which includes head pose estimation, age classification, and gender recognition. A manually labeled face data-set is used for training the Conditional Random Fields (CRFs) based segmentation model. A multi-class face segmentation framework developed through CRFs segments a facial image into six parts. The probabilistic classification strategy is used, and probability maps are generated for each class. The probability maps are used as features descriptors and a Random Decision Forest (RDF) classifier is modeled for each task (head pose, age, and gender). We assess the performance of the proposed framework on several data-sets and report better results as compared to the previously reported results.
Collapse
Affiliation(s)
- Khalil Khan
- Department of Electrical Engineering, University of Azad Jammu and Kashmir, Muzafarabbad 13100, Pakistan
- Correspondence: (K.K.); (M.A.)
| | - Muhammad Attique
- Department of Software Engineering, Sejong University, Seoul 05006, Korea
- Correspondence: (K.K.); (M.A.)
| | - Ikram Syed
- Department of Software Engineering, University of Azad Jammu and Kashmir, Muzafarabbad 13100, Pakistan
| | - Ghulam Sarwar
- Department of Software Engineering, University of Azad Jammu and Kashmir, Muzafarabbad 13100, Pakistan
| | - Muhammad Abeer Irfan
- Dipartimento di Elettronica e Telecomunicazioni (DET), Politecnico di Torino, 10156 Torino, Italy
| | - Rehan Ullah Khan
- IT Department, College of Computer, Qassim University, Al-Mulida 51431, Saudi Arabia
| |
Collapse
|