1
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
2
|
Schwartz E, Alreja A, Richardson RM, Ghuman A, Anzellotti S. Intracranial Electroencephalography and Deep Neural Networks Reveal Shared Substrates for Representations of Face Identity and Expressions. J Neurosci 2023; 43:4291-4303. [PMID: 37142430 PMCID: PMC10255163 DOI: 10.1523/jneurosci.1277-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/25/2023] [Accepted: 04/17/2023] [Indexed: 05/06/2023] Open
Abstract
According to a classical view of face perception (Bruce and Young, 1986; Haxby et al., 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li et al., 2019), and identity from lateral regions (Anzellotti and Caramazza, 2017). These findings could be reconciled with the classical view if regions specialized for one task (either identity or expression) contain a small amount of information for the other task (that enables above-chance decoding). In this case, we would expect representations in lateral regions to be more similar to representations in deep convolutional neural networks (DCNNs) trained to recognize facial expression than to representations in DCNNs trained to recognize face identity (the converse should hold for ventral regions). We tested this hypothesis by analyzing neural responses to faces varying in identity and expression. Representational dissimilarity matrices (RDMs) computed from human intracranial recordings (n = 11 adults; 7 females) were compared with RDMs from DCNNs trained to label either identity or expression. We found that RDMs from DCNNs trained to recognize identity correlated with intracranial recordings more strongly in all regions tested-even in regions classically hypothesized to be specialized for expression. These results deviate from the classical view, suggesting that face-selective ventral and lateral regions contribute to the representation of both identity and expression.SIGNIFICANCE STATEMENT Previous work proposed that separate brain regions are specialized for the recognition of face identity and facial expression. However, identity and expression recognition mechanisms might share common brain regions instead. We tested these alternatives using deep neural networks and intracranial recordings from face-selective brain regions. Deep neural networks trained to recognize identity and networks trained to recognize expression learned representations that correlate with neural recordings. Identity-trained representations correlated with intracranial recordings more strongly in all regions tested, including regions hypothesized to be expression specialized in the classical hypothesis. These findings support the view that identity and expression recognition rely on common brain regions. This discovery may require reevaluation of the roles that the ventral and lateral neural pathways play in processing socially relevant stimuli.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| | - Arish Alreja
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
| | - R Mark Richardson
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114
- Harvard Medical School, Boston, Massachusetts 02115
| | - Avniel Ghuman
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| |
Collapse
|
3
|
O’Toole AJ, Hu Y. First impressions from faces in the real world: Commentary on Sutherland and Young (2022). Br J Psychol 2023; 114:508-510. [PMID: 36519182 PMCID: PMC10443674 DOI: 10.1111/bjop.12621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/21/2022] [Indexed: 12/23/2022]
Abstract
The study of first impressions from faces now emphasizes the need to understand trait inferences made to naturalistic face images (British Journal of Psychology, 113, 2022, 1056). Face recognition algorithms based on deep convolutional neural networks simultaneously represent invariant, changeable and environmental variables in face images. Therefore, we suggest them as a comprehensive 'face space' model of first impressions of naturalistic faces. We also suggest that to understand trait inferences in the real world, a logical next step is to consider trait inferences made to whole people (faces and bodies). On the role of cultural contributions to trait perception, we think it is important for the field to begin to consider the way in which trait inferences motivate (or not) behaviour in independent and interdependent cultures.
Collapse
Affiliation(s)
| | - Ying Hu
- State Key Laboratory of Brain and Cognitive Science,
Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of
Sciences, Beijing, China
| |
Collapse
|
4
|
Kanwisher N, Khosla M, Dobs K. Using artificial neural networks to ask 'why' questions of minds and brains. Trends Neurosci 2023; 46:240-254. [PMID: 36658072 DOI: 10.1016/j.tins.2022.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/29/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Neuroscientists have long characterized the properties and functions of the nervous system, and are increasingly succeeding in answering how brains perform the tasks they do. But the question 'why' brains work the way they do is asked less often. The new ability to optimize artificial neural networks (ANNs) for performance on human-like tasks now enables us to approach these 'why' questions by asking when the properties of networks optimized for a given task mirror the behavioral and neural characteristics of humans performing the same task. Here we highlight the recent success of this strategy in explaining why the visual and auditory systems work the way they do, at both behavioral and neural levels.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Meenakshi Khosla
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University, Giessen, Germany.
| |
Collapse
|
5
|
Schwartz E, O’Nell K, Saxe R, Anzellotti S. Challenging the Classical View: Recognition of Identity and Expression as Integrated Processes. Brain Sci 2023; 13:296. [PMID: 36831839 PMCID: PMC9954353 DOI: 10.3390/brainsci13020296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/01/2023] [Accepted: 02/02/2023] [Indexed: 02/12/2023] Open
Abstract
Recent neuroimaging evidence challenges the classical view that face identity and facial expression are processed by segregated neural pathways, showing that information about identity and expression are encoded within common brain regions. This article tests the hypothesis that integrated representations of identity and expression arise spontaneously within deep neural networks. A subset of the CelebA dataset is used to train a deep convolutional neural network (DCNN) to label face identity (chance = 0.06%, accuracy = 26.5%), and the FER2013 dataset is used to train a DCNN to label facial expression (chance = 14.2%, accuracy = 63.5%). The identity-trained and expression-trained networks each successfully transfer to labeling both face identity and facial expression on the Karolinska Directed Emotional Faces dataset. This study demonstrates that DCNNs trained to recognize face identity and DCNNs trained to recognize facial expression spontaneously develop representations of facial expression and face identity, respectively. Furthermore, a congruence coefficient analysis reveals that features distinguishing between identities and features distinguishing between expressions become increasingly orthogonal from layer to layer, suggesting that deep neural networks disentangle representational subspaces corresponding to different sources.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Boston, MA 02467, USA
| | - Kathryn O’Nell
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Rebecca Saxe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Boston, MA 02467, USA
| |
Collapse
|
6
|
Laurence S, Baker KA, Proietti VM, Mondloch CJ. What happens to our representation of identity as familiar faces age? Evidence from priming and identity aftereffects. Br J Psychol 2022; 113:677-695. [PMID: 35277854 PMCID: PMC9544931 DOI: 10.1111/bjop.12560] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 02/07/2022] [Indexed: 11/28/2022]
Abstract
Matching identity in images of unfamiliar faces is error prone, but we can easily recognize highly variable images of familiar faces - even images taken decades apart. Recent theoretical development based on computational modelling can account for how we recognize extremely variable instances of the same identity. We provide complementary behavioural data by examining older adults' representation of older celebrities who were also famous when young. In Experiment 1, participants completed a long-lag repetition priming task in which primes and test stimuli were the same age or different ages. In Experiment 2, participants completed an identity after effects task in which the adapting stimulus was an older or young photograph of one celebrity and the test stimulus was a morph between the adapting identity and a different celebrity; the adapting stimulus was the same age as the test stimulus on some trials (e.g., both old) or a different age (e.g., adapter young, test stimulus old). The magnitude of priming and identity after effects were not influenced by whether the prime and adapting stimulus were the same age or different age as the test face. Collectively, our findings suggest that humans have one common mental representation for a familiar face (e.g., Paul McCartney) that incorporates visual changes across decades, rather than multiple age-specific representations. These findings make novel predictions for state-of-the-art algorithms (e.g., Deep Convolutional Neural Networks).
Collapse
Affiliation(s)
- Sarah Laurence
- School of Psychology & CounsellingOpen UniversityMilton KeynesUK
| | - Kristen A. Baker
- Department of PsychologyBrock UniversityCanada UniversitySt. CatharinesOntarioCanada
| | | | - Catherine J. Mondloch
- Department of PsychologyBrock UniversityCanada UniversitySt. CatharinesOntarioCanada
| |
Collapse
|
7
|
Zhou L, Yang A, Meng M, Zhou K. Emerged human-like facial expression representation in a deep convolutional neural network. SCIENCE ADVANCES 2022; 8:eabj4383. [PMID: 35319988 PMCID: PMC8942361 DOI: 10.1126/sciadv.abj4383] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 02/02/2022] [Indexed: 06/14/2023]
Abstract
Recent studies found that the deep convolutional neural networks (DCNNs) trained to recognize facial identities spontaneously learned features that support facial expression recognition, and vice versa. Here, we showed that the self-emerged expression-selective units in a VGG-Face trained for facial identification were tuned to distinct basic expressions and, importantly, exhibited hallmarks of human expression recognition (i.e., facial expression confusion and categorical perception). We then investigated whether the emergence of expression-selective units is attributed to either face-specific experience or domain-general processing by conducting the same analysis on a VGG-16 trained for object classification and an untrained VGG-Face without any visual experience, both having the identical architecture with the pretrained VGG-Face. Although similar expression-selective units were found in both DCNNs, they did not exhibit reliable human-like characteristics of facial expression perception. Together, these findings revealed the necessity of domain-specific visual experience of face identity for the development of facial expression perception, highlighting the contribution of nurture to form human-like facial expression perception.
Collapse
Affiliation(s)
- Liqin Zhou
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing 100875, China
| | - Anmin Yang
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing 100875, China
| | - Ming Meng
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou 510631, China
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou 510631, China
| | - Ke Zhou
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
8
|
Abstract
Deep learning models currently achieve human levels of performance on real-world face recognition tasks. We review scientific progress in understanding human face processing using computational approaches based on deep learning. This review is organized around three fundamental advances. First, deep networks trained for face identification generate a representation that retains structured information about the face (e.g., identity, demographics, appearance, social traits, expression) and the input image (e.g., viewpoint, illumination). This forces us to rethink the universe of possible solutions to the problem of inverse optics in vision. Second, deep learning models indicate that high-level visual representations of faces cannot be understood in terms of interpretable features. This has implications for understanding neural tuning and population coding in the high-level visual cortex. Third, learning in deep networks is a multistep process that forces theoretical consideration of diverse categories of learning that can overlap, accumulate over time, and interact. Diverse learning types are needed to model the development of human face processing skills, cross-race effects, and familiarity with individual faces.
Collapse
Affiliation(s)
- Alice J O'Toole
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA;
| | - Carlos D Castillo
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
9
|
Parde CJ, Colón YI, Hill MQ, Castillo CD, Dhar P, O'Toole AJ. Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition. J Vis 2021; 21:15. [PMID: 34379084 PMCID: PMC8363775 DOI: 10.1167/jov.21.8.15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 06/19/2021] [Indexed: 12/03/2022] Open
Abstract
Single-unit responses and population codes differ in the "read-out" information they provide about high-level visual representations. Diverging local and global read-outs can be difficult to reconcile with in vivo methods. To bridge this gap, we studied the relationship between single-unit and ensemble codes for identity, gender, and viewpoint, using a deep convolutional neural network (DCNN) trained for face recognition. Analogous to the primate visual system, DCNNs develop representations that generalize over image variation, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. At the unit level, we measured the number of single units needed to predict attributes (identity, gender, viewpoint) and the predictive value of individual units for each attribute. Identification was remarkably accurate using random samples of only 3% of the network's output units, and all units had substantial identity-predicting power. Cross-unit responses were minimally correlated, indicating that single units code non-redundant identity cues. Gender and viewpoint classification required large-scale pooling of units-individual units had weak predictive power. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint separated into high-dimensional subspaces, ordered by explained variance. Unit-based directions in the representational space were compared with the directions associated with the attributes. Identity, gender, and viewpoint contributed to all individual unit responses, undercutting a neural tuning analogy. Instead, single-unit responses carry superimposed, distributed codes for face identity, gender, and viewpoint. This undermines confidence in the interpretation of neural representations from unit response profiles for both DCNNs and, by analogy, high-level vision.
Collapse
Affiliation(s)
- Connor J Parde
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Y Ivette Colón
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Matthew Q Hill
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Carlos D Castillo
- University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Prithviraj Dhar
- University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Alice J O'Toole
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|