1
|
Akbari A, Awais M, Fatemifar S, Khalid SS, Kittler J. RAgE: Robust Age Estimation Through Subject Anchoring With Consistency Regularisation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1603-1617. [PMID: 35767502 DOI: 10.1109/tpami.2022.3187079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Modern facial age estimation systems can achieve high accuracy when training and test datasets are identically distributed and captured under similar conditions. However, domain shifts in data, encountered in practice, lead to a sharp drop in accuracy of most existing age estimation algorithms. In this article, we propose a novel method, namely RAgE, to improve the robustness and reduce the uncertainty of age estimates by leveraging unlabelled data through a subject anchoring strategy and a novel consistency regularisation term. First, we propose an similarity-preserving pseudo-labelling algorithm by which the model generates pseudo-labels for a cohort of unlabelled images belonging to the same subject, while taking into account the similarity among age labels. In order to improve the robustness of the system, a consistency regularisation term is then used to simultaneously encourage the model to produce invariant outputs for the images in the cohort with respect to an anchor image. We propose a novel consistency regularisation term the noise-tolerant property of which effectively mitigates the so-called confirmation bias caused by incorrect pseudo-labels. Experiments on multiple benchmark ageing datasets demonstrate substantial improvements over the state-of-the-art methods and robustness to confounding external factors, including subject's head pose, illumination variation and appearance of expression in the face image.
Collapse
|
2
|
Alhameed M, Jeribi F, Elnaim BME, Hossain MA, Abdelhag ME. Pandemic disease detection through wireless communication using infrared image based on deep learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:1083-1105. [PMID: 36650803 DOI: 10.3934/mbe.2023050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Rapid diagnosis to test diseases, such as COVID-19, is a significant issue. It is a routine virus test in a reverse transcriptase-polymerase chain reaction. However, a test like this takes longer to complete because it follows the serial testing method, and there is a high chance of a false-negative ratio (FNR). Moreover, there arises a deficiency of R.T.-PCR test kits. Therefore, alternative procedures for a quick and accurate diagnosis of patients are urgently needed to deal with these pandemics. The infrared image is self-sufficient for detecting these diseases by measuring the temperature at the initial stage. C.T. scans and other pathological tests are valuable aspects of evaluating a patient with a suspected pandemic infection. However, a patient's radiological findings may not be identified initially. Therefore, we have included an Artificial Intelligence (A.I.) algorithm-based Machine Intelligence (MI) system in this proposal to combine C.T. scan findings with all other tests, symptoms, and history to quickly diagnose a patient with a positive symptom of current and future pandemic diseases. Initially, the system will collect information by an infrared camera of the patient's facial regions to measure temperature, keep it as a record, and complete further actions. We divided the face into eight classes and twelve regions for temperature measurement. A database named patient-info-mask is maintained. While collecting sample data, we incorporate a wireless network using a cloudlets server to make processing more accessible with minimal infrastructure. The system will use deep learning approaches. We propose convolution neural networks (CNN) to cross-verify the collected data. For better results, we incorporated tenfold cross-verification into the synthesis method. As a result, our new way of estimating became more accurate and efficient. We achieved 3.29% greater accuracy by incorporating the "decision tree level synthesis method" and "ten-folded-validation method". It proves the robustness of our proposed method.
Collapse
Affiliation(s)
| | - Fathe Jeribi
- College of CS & IT, Jazan University, Jazan, Saudi Arabia
| | | | | | | |
Collapse
|
3
|
Akbari A, Awais M, Fatemifar S, Kittler J. Deep Order-Preserving Learning With Adaptive Optimal Transport Distance. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:313-328. [PMID: 35254972 DOI: 10.1109/tpami.2022.3156885] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We consider a framework for taking into consideration the relative importance (ordinality) of object labels in the process of learning a label predictor function. The commonly used loss functions are not well matched to this problem, as they exhibit deficiencies in capturing natural correlations of the labels and the corresponding data. We propose to incorporate such correlations into our learning algorithm using an optimal transport formulation. Our approach is to learn the ground metric, which is partly involved in forming the optimal transport distance, by leveraging ordinality as a general form of side information in its formulation. Based on this idea, we then develop a novel loss function for training deep neural networks. A highly efficient alternating learning method is then devised to alternatively optimise the ground metric and the deep model in an end-to-end learning manner. This scheme allows us to adaptively adjust the shape of the ground metric, and consequently the shape of the loss function for each application. We back up our approach by theoretical analysis and verify the performance of our proposed scheme by applying it to two learning tasks, i.e. chronological age estimation from the face and image aesthetic assessment. The numerical results on several benchmark datasets demonstrate the superiority of the proposed algorithm.
Collapse
|
4
|
Ganel T, Sofer C, Goodale MA. Biases in human perception of facial age are present and more exaggerated in current AI technology. Sci Rep 2022; 12:22519. [PMID: 36581653 PMCID: PMC9800363 DOI: 10.1038/s41598-022-27009-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 12/23/2022] [Indexed: 12/30/2022] Open
Abstract
Our estimates of a person's age from their facial appearance suffer from several well-known biases and inaccuracies. Typically, for example, we tend to overestimate the age of smiling faces compared to those with a neutral expression, and the accuracy of our estimates decreases for older faces. The growing interest in age estimation using artificial intelligence (AI) technology raises the question of how AI compares to human performance and whether it suffers from the same biases. Here, we compared human performance with the performance of a large sample of the most prominent AI technology available today. The results showed that AI is even less accurate and more biased than human observers when judging a person's age-even though the overall pattern of errors and biases is similar. Thus, AI overestimated the age of smiling faces even more than human observers did. In addition, AI showed a sharper decrease in accuracy for faces of older adults compared to faces of younger age groups, for smiling compared to neutral faces, and for female compared to male faces. These results suggest that our estimates of age from faces are largely driven by particular visual cues, rather than high-level preconceptions. Moreover, the pattern of errors and biases we observed could provide some insights for the design of more effective AI technology for age estimation from faces.
Collapse
Affiliation(s)
- Tzvi Ganel
- grid.7489.20000 0004 1937 0511Department of Psychology, Ben-Gurion University of the Negev, 8410500 Beer-Sheva, Israel
| | - Carmel Sofer
- grid.7489.20000 0004 1937 0511Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, 8410500 Beer-Sheva, Israel ,grid.7489.20000 0004 1937 0511Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, 8410500 Beer-Sheva, Israel
| | - Melvyn A. Goodale
- grid.39381.300000 0004 1936 8884The Western Institute for Neuroscience, The University of Western Ontario, London, ON N6A 5B7 Canada
| |
Collapse
|
5
|
Akbari A, Awais M, Fatemifar S, Khalid SS, Kittler J. A Novel Ground Metric for Optimal Transport-Based Chronological Age Estimation. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9986-9999. [PMID: 34133311 DOI: 10.1109/tcyb.2021.3083245] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Label distribution learning (LDL) is the state-of-the-art approach to dealing with a number of real-world applications, such as chronological age estimation from a face image, where there is an inherent similarity among adjacent age labels. LDL takes into account the semantic similarity by assigning a label distribution to each instance. The well-known Kullback-Leibler (KL) divergence is the widely used loss function for the LDL framework. However, the KL divergence does not fully and effectively capture the semantic similarity among age labels, thus leading to suboptimal performance. In this article, we propose a novel loss function based on the optimal transport theory for the LDL-based age estimation. A ground metric function plays an important role in the optimal transport formulation. It should be carefully determined based on the underlying geometric structure of the label space of the application in-hand. The label space in the age estimation problem has a specific geometric structure, that is, closer ages have more inherent semantic relationships. Inspired by this, we devise a novel ground metric function, which enables the loss function to increase the influence of highly correlated ages; thus exploiting the semantic similarity among ages more effectively than the existing loss functions. We then use the proposed loss function, namely, γ -Wasserstein loss, for training a deep neural network (DNN). This leads to a notoriously computationally expensive and nonconvex optimization problem. Following the standard methodology, we formulate the optimization function as a convex problem and then use an efficient iterative algorithm to update the parameters of the DNN. Extensive experiments in age estimation on different benchmark datasets validate the effectiveness of the proposed method, which consistently outperforms state-of-the-art approaches.
Collapse
|
6
|
Zengin RS, Sezer V. Super-k: A piecewise linear classifier based on Voronoi tessellations. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Akbari A, Awais M, Feng ZH, Farooq A, Kittler J. Distribution Cognisant Loss for Cross-Database Facial Age Estimation With Sensitivity Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:1869-1887. [PMID: 33026982 DOI: 10.1109/tpami.2020.3029486] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing facial age estimation studies have mostly focused on intra-database protocols that assume training and test images are captured under similar conditions. This is rarely valid in practical applications, where we typically encounter training and test sets with different characteristics. In this article, we deal with such situations, namely subjective-exclusive cross-database age estimation. We formulate the age estimation problem as the distribution learning framework, where the age labels are encoded as a probability distribution. To improve the cross-database age estimation performance, we propose a new loss function which provides a more robust measure of the difference between ground-truth and predicted distributions. The desirable properties of the proposed loss function are theoretically analysed and compared with the state-of-the-art approaches. In addition, we compile a new balanced large-scale age estimation database. Last, we introduce a novel evaluation protocol, called subject-exclusive cross-database age estimation protocol, which provides meaningful information of a method in terms of the generalisation capability. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art age estimation methods under both intra-database and subject-exclusive cross-database evaluation protocols. In addition, in this article, we provide a comparative sensitivity analysis of various algorithms to identify trends and issues inherent to their performance. This analysis introduces some open problems to the community which might be considered when designing a robust age estimation system.
Collapse
|
8
|
Hossain MA, Assiri B. Facial expression recognition based on active region of interest using deep learning and parallelism. PeerJ Comput Sci 2022; 8:e894. [PMID: 35494822 PMCID: PMC9044208 DOI: 10.7717/peerj-cs.894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
The automatic facial expression tracking method has become an emergent topic during the last few decades. It is a challenging problem that impacts many fields such as virtual reality, security surveillance, driver safety, homeland security, human-computer interaction, medical applications. A remarkable cost-efficiency can be achieved by considering some areas of a face. These areas are termed Active Regions of Interest (AROIs). This work proposes a facial expression recognition framework that investigates five types of facial expressions, namely neutral, happiness, fear, surprise, and disgust. Firstly, a pose estimation method is incorporated and to go along with an approach to rotate the face to achieve a normalized pose. Secondly, the whole face-image is segmented into four classes and eight regions. Thirdly, only four AROIs are identified from the segmented regions. The four AROIs are the nose-tip, right eye, left eye, and lips respectively. Fourthly, an info-image-data-mask database is maintained for classification and it is used to store records of images. This database is the mixture of all the images that are gained after introducing a ten-fold cross-validation technique using the Convolutional Neural Network. Correlations of variances and standard deviations are computed based on identified images. To minimize the required processing time in both training and testing the data set, a parallelism technique is introduced, in which each region of the AROIs is classified individually and all of them run in parallel. Fifthly, a decision-tree-level synthesis-based framework is proposed to coordinate the results of parallel classification, which helps to improve the recognition accuracy. Finally, experimentation on both independent and synthesis databases is voted for calculating the performance of the proposed technique. By incorporating the proposed synthesis method, we gain 94.499%, 95.439%, and 98.26% accuracy with the CK+ image sets and 92.463%, 93.318%, and 94.423% with the JAFFE image sets. The overall accuracy is 95.27% in recognition. We gain 2.8% higher accuracy by introducing a decision-level synthesis method. Moreover, with the incorporation of parallelism, processing time speeds up three times faster. This accuracy proves the robustness of the proposed scheme.
Collapse
Affiliation(s)
- Mohammad Alamgir Hossain
- Department of COMPUTER SCIENCE, College of Computer Science & Information Technology, Jazan University, Jazan, Kingdom of Saudi Arabia
| | - Basem Assiri
- Department of COMPUTER SCIENCE, College of Computer Science & Information Technology, Jazan University, Jazan, Kingdom of Saudi Arabia
| |
Collapse
|
9
|
Alonso‐Fernandez F, Hernandez‐Diaz K, Ramis S, Perales FJ, Bigun J. Facial masks and soft‐biometrics: Leveraging face recognition CNNs for age and gender prediction on mobile ocular images. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12046] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
| | | | - Silvia Ramis
- Computer Graphics and Vision and AI Group University of Balearic Islands Spain
| | | | - Josef Bigun
- School of Information Technology Halmstad University Sweden
| |
Collapse
|
10
|
Effective training of convolutional neural networks for age estimation based on knowledge distillation. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05981-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractAge estimation from face images can be profitably employed in several applications, ranging from digital signage to social robotics, from business intelligence to access control. Only in recent years, the advent of deep learning allowed for the design of extremely accurate methods based on convolutional neural networks (CNNs) that achieve a remarkable performance in various face analysis tasks. However, these networks are not always applicable in real scenarios, due to both time and resource constraints that the most accurate approaches often do not meet. Moreover, in case of age estimation, there is the lack of a large and reliably annotated dataset for training deep neural networks. Within this context, we propose in this paper an effective training procedure of CNNs for age estimation based on knowledge distillation, able to allow smaller and simpler “student” models to be trained to match the predictions of a larger “teacher” model. We experimentally show that such student models are able to almost reach the performance of the teacher, obtaining high accuracy over the LFW+, LAP 2016 and Adience datasets, but being up to 15 times faster. Furthermore, we evaluate the performance of the student models in the presence of image corruptions, and we demonstrate that some of them are even more resilient to these corruptions than the teacher model.
Collapse
|
11
|
Deeply Learned Classifiers for Age and Gender Predictions of Unfiltered Faces. ScientificWorldJournal 2020; 2020:1289408. [PMID: 32395084 PMCID: PMC7201854 DOI: 10.1155/2020/1289408] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 03/11/2020] [Indexed: 11/17/2022] Open
Abstract
Age and gender predictions of unfiltered faces classify unconstrained real-world facial images into predefined age and gender. Significant improvements have been made in this research area due to its usefulness in intelligent real-world applications. However, the traditional methods on the unfiltered benchmarks show their incompetency to handle large degrees of variations in those unconstrained images. More recently, Convolutional Neural Networks (CNNs) based methods have been extensively used for the classification task due to their excellent performance in facial analysis. In this work, we propose a novel end-to-end CNN approach, to achieve robust age group and gender classification of unfiltered real-world faces. The two-level CNN architecture includes feature extraction and classification itself. The feature extraction extracts feature corresponding to age and gender, while the classification classifies the face images to the correct age group and gender. Particularly, we address the large variations in the unfiltered real-world faces with a robust image preprocessing algorithm that prepares and processes those faces before being fed into the CNN model. Technically, our network is pretrained on an IMDb-WIKI with noisy labels and then fine-tuned on MORPH-II and finally on the training set of the OIU-Adience (original) dataset. The experimental results, when analyzed for classification accuracy on the same OIU-Adience benchmark, show that our model obtains the state-of-the-art performance in both age group and gender classification. It improves over the best-reported results by 16.6% (exact accuracy) and 3.2% (one-off accuracy) for age group classification and also there is an improvement of 3.0% (exact accuracy) for gender classification.
Collapse
|
12
|
Fu Y, Wu X, Li X, Pan Z, Luo D. Semantic Neighborhood-Aware Deep Facial Expression Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6535-6548. [PMID: 32386155 DOI: 10.1109/tip.2020.2991510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Different from many other attributes, facial expression can change in a continuous way, and therefore, a slight semantic change of input should also lead to the output fluctuation limited in a small scale. This consistency is important. However, current Facial Expression Recognition (FER) datasets may have the extreme imbalance problem, as well as the lack of data and the excessive amounts of noise, hindering this consistency and leading to a performance decreasing when testing. In this paper, we not only consider the prediction accuracy on sample points, but also take the neighborhood smoothness of them into consideration, focusing on the stability of the output with respect to slight semantic perturbations of the input. A novel method is proposed to formulate semantic perturbation and select unreliable samples during training, reducing the bad effect of them. Experiments show the effectiveness of the proposed method and state-of-the-art results are reported, getting closer to an upper limit than the state-of-the-art methods by a factor of 30% in AffectNet, the largest in-the-wild FER database by now.
Collapse
|