1
|
Farzmahdi A, Zarco W, Freiwald WA, Kriegeskorte N, Golan T. Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks. eLife 2024; 13:e90256. [PMID: 38661128 PMCID: PMC11142642 DOI: 10.7554/elife.90256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/25/2024] [Indexed: 04/26/2024] Open
Abstract
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g. left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
Collapse
Affiliation(s)
- Amirhossein Farzmahdi
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
- School of Cognitive Sciences, Institute for Research in Fundamental SciencesTehranIslamic Republic of Iran
| | - Wilbert Zarco
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
| | - Winrich A Freiwald
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
- The Center for Brains, Minds & MachinesCambridgeUnited States
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Psychology, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
| | - Tal Golan
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| |
Collapse
|
2
|
Farzmahdi A, Zarco W, Freiwald W, Kriegeskorte N, Golan T. Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522909. [PMID: 36711779 PMCID: PMC9881894 DOI: 10.1101/2023.01.05.522909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g., left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
Collapse
|
3
|
Gert AL, Ehinger BV, Timm S, Kietzmann TC, König P. WildLab: A naturalistic free viewing experiment reveals previously unknown electroencephalography signatures of face processing. Eur J Neurosci 2022; 56:6022-6038. [PMID: 36113866 DOI: 10.1111/ejn.15824] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 08/26/2022] [Accepted: 08/30/2022] [Indexed: 12/29/2022]
Abstract
Neural mechanisms of face perception are predominantly studied in well-controlled experimental settings that involve random stimulus sequences and fixed eye positions. Although powerful, the employed paradigms are far from what constitutes natural vision. Here, we demonstrate the feasibility of ecologically more valid experimental paradigms using natural viewing behaviour, by combining a free viewing paradigm on natural scenes, free of photographer bias, with advanced data processing techniques that correct for overlap effects and co-varying non-linear dependencies of multiple eye movement parameters. We validate this approach by replicating classic N170 effects in neural responses, triggered by fixation onsets (fixation event-related potentials [fERPs]). Importantly, besides finding a strong correlation between both experiments, our more natural stimulus paradigm yielded smaller variability between subjects than the classic setup. Moving beyond classic temporal and spatial effect locations, our experiment furthermore revealed previously unknown signatures of face processing: This includes category-specific modulation of the event-related potential (ERP)'s amplitude even before fixation onset, as well as adaptation effects across subsequent fixations depending on their history.
Collapse
Affiliation(s)
- Anna L Gert
- Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany
| | - Benedikt V Ehinger
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.,Stuttgart Center for Simulation Science, University of Stuttgart, Stuttgart, Germany
| | - Silja Timm
- Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany
| | - Tim C Kietzmann
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.,MRC Cognition and Brain Sciences Unit, Cambridge University, Cambridge, UK
| | - Peter König
- Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany.,Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
4
|
Spoerer CJ, Kietzmann TC, Mehrer J, Charest I, Kriegeskorte N. Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Comput Biol 2020; 16:e1008215. [PMID: 33006992 PMCID: PMC7556458 DOI: 10.1371/journal.pcbi.1008215] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/14/2020] [Accepted: 08/03/2020] [Indexed: 11/18/2022] Open
Abstract
Deep feedforward neural network models of vision dominate in both computational neuroscience and engineering. The primate visual system, by contrast, contains abundant recurrent connections. Recurrent signal flow enables recycling of limited computational resources over time, and so might boost the performance of a physically finite brain or model. Here we show: (1) Recurrent convolutional neural network models outperform feedforward convolutional models matched in their number of parameters in large-scale visual recognition tasks on natural images. (2) Setting a confidence threshold, at which recurrent computations terminate and a decision is made, enables flexible trading of speed for accuracy. At a given confidence threshold, the model expends more time and energy on images that are harder to recognise, without requiring additional parameters for deeper computations. (3) The recurrent model's reaction time for an image predicts the human reaction time for the same image better than several parameter-matched and state-of-the-art feedforward models. (4) Across confidence thresholds, the recurrent model emulates the behaviour of feedforward control models in that it achieves the same accuracy at approximately the same computational cost (mean number of floating-point operations). However, the recurrent model can be run longer (higher confidence threshold) and then outperforms parameter-matched feedforward comparison models. These results suggest that recurrent connectivity, a hallmark of biological visual systems, may be essential for understanding the accuracy, flexibility, and dynamics of human visual recognition.
Collapse
Affiliation(s)
- Courtney J. Spoerer
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Tim C. Kietzmann
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Johannes Mehrer
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Ian Charest
- School of Psychology and Centre for Human Brain Health, University of Birmingham, United Kingdom
| | - Nikolaus Kriegeskorte
- Department of Psychology, Department of Neuroscience, Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| |
Collapse
|
5
|
Bayet L, Zinszer BD, Reilly E, Cataldo JK, Pruitt Z, Cichy RM, Nelson CA, Aslin RN. Temporal dynamics of visual representations in the infant brain. Dev Cogn Neurosci 2020; 45:100860. [PMID: 32932205 PMCID: PMC7498752 DOI: 10.1016/j.dcn.2020.100860] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 08/21/2020] [Accepted: 09/04/2020] [Indexed: 12/12/2022] Open
Abstract
Tools from computational neuroscience have facilitated the investigation of the neural correlates of mental representations. However, access to the representational content of neural activations early in life has remained limited. We asked whether patterns of neural activity elicited by complex visual stimuli (animals, human body) could be decoded from EEG data gathered from 12-15-month-old infants and adult controls. We assessed pairwise classification accuracy at each time-point after stimulus onset, for individual infants and adults. Classification accuracies rose above chance in both groups, within 500 ms. In contrast to adults, neural representations in infants were not linearly separable across visual domains. Representations were similar within, but not across, age groups. These findings suggest a developmental reorganization of visual representations between the second year of life and adulthood and provide a promising proof-of-concept for the feasibility of decoding EEG data within-subject to assess how the infant brain dynamically represents visual objects.
Collapse
Affiliation(s)
- Laurie Bayet
- Department of Psychology, American University, Washington, DC, 20016, USA; Center for Neuroscience and Behavior, American University, Washington, DC, 20016, USA.
| | - Benjamin D Zinszer
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA; Department of Linguistics and Cognitive Science, University of Delaware, Newark, DE, 19716, USA
| | - Emily Reilly
- Boston Children's Hospital, Boston, MA, 02115, USA
| | | | - Zoe Pruitt
- Department of Brain and Cognitive Science, University of Rochester, Rochester, NY, 14627, USA
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, 14195, Berlin, Germany
| | - Charles A Nelson
- Boston Children's Hospital, Boston, MA, 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA; Graduate School of Education, Harvard, Cambridge, MA, 02138, USA
| | - Richard N Aslin
- Haskins Laboratories, 300 George Street, New Haven, CT, 06511, USA; Psychological Sciences Department, University of Connecticut, Storrs, CT, 06269, USA; Department of Psychology, Yale University, New Haven, CT, 06511, USA; Yale Child Study Center, School of Medicine, New Haven, CT, 06519, USA
| |
Collapse
|
6
|
Quantifying the effect of viewpoint changes on sensitivity to face identity. Vision Res 2019; 165:1-12. [DOI: 10.1016/j.visres.2019.09.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 08/28/2019] [Accepted: 09/16/2019] [Indexed: 11/20/2022]
|
7
|
Jaworska K, Yi F, Ince RAA, van Rijsbergen NJ, Schyns PG, Rousselet GA. Healthy aging delays the neural processing of face features relevant for behavior by 40 ms. Hum Brain Mapp 2019; 41:1212-1225. [PMID: 31782861 PMCID: PMC7268067 DOI: 10.1002/hbm.24869] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 10/16/2019] [Accepted: 11/10/2019] [Indexed: 12/18/2022] Open
Abstract
Fast and accurate face processing is critical for everyday social interactions, but it declines and becomes delayed with age, as measured by both neural and behavioral responses. Here, we addressed the critical challenge of understanding how aging changes neural information processing mechanisms to delay behavior. Young (20-36 years) and older (60-86 years) adults performed the basic social interaction task of detecting a face versus noise while we recorded their electroencephalogram (EEG). In each participant, using a new information theoretic framework we reconstructed the features supporting face detection behavior, and also where, when and how EEG activity represents them. We found that occipital-temporal pathway activity dynamically represents the eyes of the face images for behavior ~170 ms poststimulus, with a 40 ms delay in older adults that underlies their 200 ms behavioral deficit of slower reaction times. Our results therefore demonstrate how aging can change neural information processing mechanisms that underlie behavioral slow down.
Collapse
Affiliation(s)
- Katarzyna Jaworska
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | - Fei Yi
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | | | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | | |
Collapse
|
8
|
Kampermann L, Wilming N, Alink A, Büchel C, Onat S. Fixation-pattern similarity analysis reveals adaptive changes in face-viewing strategies following aversive learning. eLife 2019; 8:e44111. [PMID: 31635690 PMCID: PMC6805121 DOI: 10.7554/elife.44111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 09/17/2019] [Indexed: 11/28/2022] Open
Abstract
Animals can effortlessly adapt their behavior by generalizing from past aversive experiences, allowing to avoid harm in novel situations. We studied how visual information was sampled by eye-movements during this process called fear generalization, using faces organized along a circular two-dimensional perceptual continuum. During learning, one face was conditioned to predict a harmful event, whereas the most dissimilar face stayed neutral. This introduced an adversity gradient along one specific dimension, while the other, unspecific dimension was defined solely by perceptual similarity. Aversive learning changed scanning patterns selectively along the adversity-related dimension, but not the orthogonal dimension. This effect was mainly located within the eye region of faces. Our results provide evidence for adaptive changes in viewing strategies of faces following aversive learning. This is compatible with the view that these changes serve to sample information in a way that allows discriminating between safe and adverse for a better threat prediction.
Collapse
Affiliation(s)
- Lea Kampermann
- Department of Systems NeuroscienceUniversity Medical Center Hamburg-EppendorfHamburgGermany
| | - Niklas Wilming
- Department of Neurophysiology and PathophysiologyUniversity Medical Center Hamburg-EppendorfHamburgGermany
| | - Arjen Alink
- Department of Systems NeuroscienceUniversity Medical Center Hamburg-EppendorfHamburgGermany
| | - Christian Büchel
- Department of Systems NeuroscienceUniversity Medical Center Hamburg-EppendorfHamburgGermany
| | - Selim Onat
- Department of Systems NeuroscienceUniversity Medical Center Hamburg-EppendorfHamburgGermany
| |
Collapse
|
9
|
Abstract
How do we learn what we know about others? Answering this question requires understanding the perceptual mechanisms with which we recognize individuals and their actions, and the processes by which the resulting perceptual representations lead to inferences about people's mental states and traits. This review discusses recent behavioral, neural, and computational studies that have contributed to this broad research program, encompassing both social perception and social cognition.
Collapse
Affiliation(s)
- Stefano Anzellotti
- Department of Psychology, Boston College, Boston, Massachusetts 02467, USA; ,
| | - Liane L Young
- Department of Psychology, Boston College, Boston, Massachusetts 02467, USA; ,
| |
Collapse
|
10
|
Shehzad Z, McCarthy G. Perceptual and Semantic Phases of Face Identification Processing: A Multivariate Electroencephalography Study. J Cogn Neurosci 2019; 31:1827-1839. [PMID: 31368824 DOI: 10.1162/jocn_a_01453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Rapid identification of a familiar face requires an image-invariant representation of person identity. A varying sample of familiar faces is necessary to disentangle image-level from person-level processing. We investigated the time course of face identity processing using a multivariate electroencephalography analysis. Participants saw ambient exemplars of celebrity faces that differed in pose, lighting, hairstyle, and so forth. A name prime preceded a face on half of the trials to preactivate person-specific information, whereas a neutral prime was used on the remaining half. This manipulation helped dissociate perceptual- and semantic-based identification. Two time intervals within the post-face onset electroencephalography epoch were sensitive to person identity. The early perceptual phase spanned 110-228 msec and was not modulated by the name prime. The late semantic phase spanned 252-1000 msec and was sensitive to person knowledge activated by the name prime. Within this late phase, the identity response occurred earlier in time (300-600 msec) for the name prime with a scalp topography similar to the FN400 ERP. This may reflect a matching of the person primed in memory with the face on the screen. Following a neutral prime, the identity response occurred later in time (500-800 msec) with a scalp topography similar to the P600f ERP. This may reflect activation of semantic knowledge associated with the identity. Our results suggest that processing of identity begins early (110 msec), with some tolerance to image-level variations, and then progresses in stages sensitive to perceptual and then to semantic features.
Collapse
|
11
|
Rajaei K, Mohsenzadeh Y, Ebrahimpour R, Khaligh-Razavi SM. Beyond core object recognition: Recurrent processes account for object recognition under occlusion. PLoS Comput Biol 2019; 15:e1007001. [PMID: 31091234 PMCID: PMC6538196 DOI: 10.1371/journal.pcbi.1007001] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 05/28/2019] [Accepted: 04/02/2019] [Indexed: 01/08/2023] Open
Abstract
Core object recognition, the ability to rapidly recognize objects despite variations in their appearance, is largely solved through the feedforward processing of visual information. Deep neural networks are shown to achieve human-level performance in these tasks, and explain the primate brain representation. On the other hand, object recognition under more challenging conditions (i.e. beyond the core recognition problem) is less characterized. One such example is object recognition under occlusion. It is unclear to what extent feedforward and recurrent processes contribute in object recognition under occlusion. Furthermore, we do not know whether the conventional deep neural networks, such as AlexNet, which were shown to be successful in solving core object recognition, can perform similarly well in problems that go beyond the core recognition. Here, we characterize neural dynamics of object recognition under occlusion, using magnetoencephalography (MEG), while participants were presented with images of objects with various levels of occlusion. We provide evidence from multivariate analysis of MEG data, behavioral data, and computational modelling, demonstrating an essential role for recurrent processes in object recognition under occlusion. Furthermore, the computational model with local recurrent connections, used here, suggests a mechanistic explanation of how the human brain might be solving this problem. In recent years, deep-learning-based computer vision algorithms have been able to achieve human-level performance in several object recognition tasks. This has also contributed in our understanding of how our brain may be solving these recognition tasks. However, object recognition under more challenging conditions, such as occlusion, is less characterized. Temporal dynamics of object recognition under occlusion is largely unknown in the human brain. Furthermore, we do not know if the previously successful deep-learning algorithms can similarly achieve human-level performance in these more challenging object recognition tasks. By linking brain data with behavior, and computational modeling, we characterized temporal dynamics of object recognition under occlusion, and proposed a computational mechanism that explains both behavioral and the neural data in humans. This provides a plausible mechanistic explanation for how our brain might be solving object recognition under more challenging conditions.
Collapse
Affiliation(s)
- Karim Rajaei
- School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Niavaran, Tehran, Iran
| | - Yalda Mohsenzadeh
- Computer Science and AI Lab (CSAIL), MIT, Cambridge, Massachusetts, United States of America
| | - Reza Ebrahimpour
- School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Niavaran, Tehran, Iran
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran
- * E-mail: (RE); (S-MK-R)
| | - Seyed-Mahdi Khaligh-Razavi
- Computer Science and AI Lab (CSAIL), MIT, Cambridge, Massachusetts, United States of America
- Department of Brain and Cognitive Sciences, Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran
- * E-mail: (RE); (S-MK-R)
| |
Collapse
|
12
|
Sama MA, Nestor A, Cant JS. Independence of viewpoint and identity in face ensemble processing. J Vis 2019; 19:2. [DOI: 10.1167/19.5.2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Affiliation(s)
- Marco A. Sama
- Department of Psychology, University of Toronto Scarborough, Toronto, Canada
| | - Adrian Nestor
- Department of Psychology, University of Toronto Scarborough, Toronto, Canada
| | - Jonathan S. Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, Canada
| |
Collapse
|
13
|
Dobs K, Isik L, Pantazis D, Kanwisher N. How face perception unfolds over time. Nat Commun 2019; 10:1258. [PMID: 30890707 PMCID: PMC6425020 DOI: 10.1038/s41467-019-09239-1] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 02/24/2019] [Indexed: 11/08/2022] Open
Abstract
Within a fraction of a second of viewing a face, we have already determined its gender, age and identity. A full understanding of this remarkable feat will require a characterization of the computational steps it entails, along with the representations extracted at each. Here, we used magnetoencephalography (MEG) to measure the time course of neural responses to faces, thereby addressing two fundamental questions about how face processing unfolds over time. First, using representational similarity analysis, we found that facial gender and age information emerged before identity information, suggesting a coarse-to-fine processing of face dimensions. Second, identity and gender representations of familiar faces were enhanced very early on, suggesting that the behavioral benefit for familiar faces results from tuning of early feed-forward processing mechanisms. These findings start to reveal the time course of face processing in humans, and provide powerful new constraints on computational theories of face perception.
Collapse
Affiliation(s)
- Katharina Dobs
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- McGovern Institute of Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- The Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Leyla Isik
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- McGovern Institute of Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Dimitrios Pantazis
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- McGovern Institute of Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- McGovern Institute of Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
14
|
Symmetrical Viewpoint Representations in Face-Selective Regions Convey an Advantage in the Perception and Recognition of Faces. J Neurosci 2019; 39:3741-3751. [PMID: 30842248 DOI: 10.1523/jneurosci.1977-18.2019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 01/11/2019] [Accepted: 01/16/2019] [Indexed: 11/21/2022] Open
Abstract
Learning new identities is crucial for effective social interaction. A critical aspect of this process is the integration of different images from the same face into a view-invariant representation that can be used for recognition. The representation of symmetrical viewpoints has been proposed to be a key computational step in achieving view-invariance. The aim of this study was to determine whether the representation of symmetrical viewpoints in face-selective regions is directly linked to the perception and recognition of face identity. In Experiment 1, we measured fMRI responses while male and female human participants viewed images of real faces from different viewpoints (-90, -45, 0, 45, and 90° from full-face view). Within the face regions, patterns of neural response to symmetrical views (-45 and 45° or -90 and 90°) were more similar than responses to nonsymmetrical views in the fusiform face area and superior temporal sulcus, but not in the occipital face area. In Experiment 2, participants made perceptual similarity judgements to pairs of face images. Images with symmetrical viewpoints were reported as being more similar than nonsymmetric views. In Experiment 3, we asked whether symmetrical views also convey an advantage when learning new faces. We found that recognition was best when participants were tested with novel face images that were symmetrical to the learning viewpoint. Critically, the pattern of perceptual similarity and recognition across different viewpoints predicted the pattern of neural response in face-selective regions. Together, our results provide support for the functional value of symmetry as an intermediate step in generating view-invariant representations.SIGNIFICANCE STATEMENT The recognition of identity from faces is crucial for successful social interactions. A critical step in this process is the integration of different views into a unified, view-invariant representation. The representation of symmetrical views (e.g., left profile and right profile) has been proposed as an important intermediate step in computing view-invariant representations. We found view symmetric representations were specific to some face-selective regions, but not others. We also show that these neural representations influence the perception of faces. Symmetric views were perceived to be more similar and were recognized more accurately than nonsymmetric views. Moreover, the perception and recognition of faces at different viewpoints predicted patterns of response in those face regions with view symmetric representations.
Collapse
|
15
|
Grill-Spector K, Weiner KS, Gomez J, Stigliani A, Natu VS. The functional neuroanatomy of face perception: from brain measurements to deep neural networks. Interface Focus 2018; 8:20180013. [PMID: 29951193 PMCID: PMC6015811 DOI: 10.1098/rsfs.2018.0013] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2018] [Indexed: 12/14/2022] Open
Abstract
A central goal in neuroscience is to understand how processing within the ventral visual stream enables rapid and robust perception and recognition. Recent neuroscientific discoveries have significantly advanced understanding of the function, structure and computations along the ventral visual stream that serve as the infrastructure supporting this behaviour. In parallel, significant advances in computational models, such as hierarchical deep neural networks (DNNs), have brought machine performance to a level that is commensurate with human performance. Here, we propose a new framework using the ventral face network as a model system to illustrate how increasing the neural accuracy of present DNNs may allow researchers to test the computational benefits of the functional architecture of the human brain. Thus, the review (i) considers specific neural implementational features of the ventral face network, (ii) describes similarities and differences between the functional architecture of the brain and DNNs, and (iii) provides a hypothesis for the computational value of implementational features within the brain that may improve DNN performance. Importantly, this new framework promotes the incorporation of neuroscientific findings into DNNs in order to test the computational benefits of fundamental organizational features of the visual system.
Collapse
Affiliation(s)
- Kalanit Grill-Spector
- Department of Psychology, School of Medicine, Stanford University, Stanford, CA 94305, USA
- Stanford Neurosciences Institute, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Kevin S. Weiner
- Department of Psychology, University of California Berkeley, Berkeley, CA 94720, USA
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Jesse Gomez
- Stanford Neurosciences Program, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Anthony Stigliani
- Department of Psychology, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Vaidehi S. Natu
- Department of Psychology, School of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
16
|
Khaligh-Razavi SM, Cichy RM, Pantazis D, Oliva A. Tracking the Spatiotemporal Neural Dynamics of Real-world Object Size and Animacy in the Human Brain. J Cogn Neurosci 2018; 30:1559-1576. [PMID: 29877767 DOI: 10.1162/jocn_a_01290] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Animacy and real-world size are properties that describe any object and thus bring basic order into our perception of the visual world. Here, we investigated how the human brain processes real-world size and animacy. For this, we applied representational similarity to fMRI and MEG data to yield a view of brain activity with high spatial and temporal resolutions, respectively. Analysis of fMRI data revealed that a distributed and partly overlapping set of cortical regions extending from occipital to ventral and medial temporal cortex represented animacy and real-world size. Within this set, parahippocampal cortex stood out as the region representing animacy and size stronger than most other regions. Further analysis of the detailed representational format revealed differences among regions involved in processing animacy. Analysis of MEG data revealed overlapping temporal dynamics of animacy and real-world size processing starting at around 150 msec and provided the first neuromagnetic signature of real-world object size processing. Finally, to investigate the neural dynamics of size and animacy processing simultaneously in space and time, we combined MEG and fMRI with a novel extension of MEG-fMRI fusion by representational similarity. This analysis revealed partly overlapping and distributed spatiotemporal dynamics, with parahippocampal cortex singled out as a region that represented size and animacy persistently when other regions did not. Furthermore, the analysis highlighted the role of early visual cortex in representing real-world size. A control analysis revealed that the neural dynamics of processing animacy and size were distinct from the neural dynamics of processing low-level visual features. Together, our results provide a detailed spatiotemporal view of animacy and size processing in the human brain.
Collapse
|
17
|
Guggenmos M, Sterzer P, Cichy RM. Multivariate pattern analysis for MEG: A comparison of dissimilarity measures. Neuroimage 2018; 173:434-447. [DOI: 10.1016/j.neuroimage.2018.02.044] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 12/30/2017] [Accepted: 02/22/2018] [Indexed: 11/17/2022] Open
|
18
|
Ramírez FM. Orientation Encoding and Viewpoint Invariance in Face Recognition: Inferring Neural Properties from Large-Scale Signals. Neuroscientist 2018; 24:582-608. [PMID: 29855217 DOI: 10.1177/1073858418769554] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Viewpoint-invariant face recognition is thought to be subserved by a distributed network of occipitotemporal face-selective areas that, except for the human anterior temporal lobe, have been shown to also contain face-orientation information. This review begins by highlighting the importance of bilateral symmetry for viewpoint-invariant recognition and face-orientation perception. Then, monkey electrophysiological evidence is surveyed describing key tuning properties of face-selective neurons-including neurons bimodally tuned to mirror-symmetric face-views-followed by studies combining functional magnetic resonance imaging (fMRI) and multivariate pattern analyses to probe the representation of face-orientation and identity information in humans. Altogether, neuroimaging studies suggest that face-identity is gradually disentangled from face-orientation information along the ventral visual processing stream. The evidence seems to diverge, however, regarding the prevalent form of tuning of neural populations in human face-selective areas. In this context, caveats possibly leading to erroneous inferences regarding mirror-symmetric coding are exposed, including the need to distinguish angular from Euclidean distances when interpreting multivariate pattern analyses. On this basis, this review argues that evidence from the fusiform face area is best explained by a view-sensitive code reflecting head angular disparity, consistent with a role of this area in face-orientation perception. Finally, the importance is stressed of explicit models relating neural properties to large-scale signals.
Collapse
Affiliation(s)
- Fernando M Ramírez
- 1 Bernstein Center for Computational Neuroscience, Charité Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
19
|
Kuo PC, Chen YS, Chen LF. Manifold decoding for neural representations of face viewpoint and gaze direction using magnetoencephalographic data. Hum Brain Mapp 2018; 39:2191-2209. [PMID: 29430792 DOI: 10.1002/hbm.23998] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 01/22/2018] [Accepted: 01/29/2018] [Indexed: 11/06/2022] Open
Abstract
The main challenge in decoding neural representations lies in linking neural activity to representational content or abstract concepts. The transformation from a neural-based to a low-dimensional representation may hold the key to encoding perceptual processes in the human brain. In this study, we developed a novel model by which to represent two changeable features of faces: face viewpoint and gaze direction. These features are embedded in spatiotemporal brain activity derived from magnetoencephalographic data. Our decoding results demonstrate that face viewpoint and gaze direction can be represented by manifold structures constructed from brain responses in the bilateral occipital face area and right superior temporal sulcus, respectively. Our results also show that the superposition of brain activity in the manifold space reveals the viewpoints of faces as well as directions of gazes as perceived by the subject. The proposed manifold representation model provides a novel opportunity to gain further insight into the processing of information in the human brain.
Collapse
Affiliation(s)
- Po-Chih Kuo
- Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan
| | - Yong-Sheng Chen
- Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan.,Institute of Biomedical Engineering, National Chiao Tung University, Hsinchu, Taiwan
| | - Li-Fen Chen
- Institute of Brain Science, National Yang-Ming University, Taipei, Taiwan.,Integrated Brain Research Unit, Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan
| |
Collapse
|