1
|
Li W, Li J, Chu C, Cao D, Shi W, Zhang Y, Jiang T. Common Sequential Organization of Face Processing in the Human Brain and Convolutional Neural Networks. Neuroscience 2024; 541:1-13. [PMID: 38266906 DOI: 10.1016/j.neuroscience.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/11/2024] [Accepted: 01/16/2024] [Indexed: 01/26/2024]
Abstract
Face processing includes two crucial processing levels - face detection and face recognition. However, it remains unclear how human brains organize the two processing levels sequentially. While some studies found that faces are recognized as fast as they are detected, others have reported that faces are detected first, followed by recognition. We discriminated the two processing levels on a fine time scale by combining human intracranial EEG (two females, three males, and three subjects without reported sex information) and representation similarity analysis. Our results demonstrate that the human brain exhibits a "detection-first, recognition-later" pattern during face processing. In addition, we used convolutional neural networks to test the hypothesis that the sequential organization of the two face processing levels in the brain reflects computational optimization. Our findings showed that the networks trained on face recognition also exhibited the "detection-first, recognition-later" pattern. Moreover, this sequential organization mechanism developed gradually during the training of the networks and was observed only for correctly predicted images. These findings collectively support the computational account as to why the brain organizes them in this way.
Collapse
Affiliation(s)
- Wenlu Li
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jin Li
- School of Psychology, Capital Normal University, Beijing 100048, China
| | - Congying Chu
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Dan Cao
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Weiyang Shi
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Yu Zhang
- Research Center for Augmented Intelligence, Zhejiang Lab, Hangzhou 311100, China
| | - Tianzi Jiang
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Research Center for Augmented Intelligence, Zhejiang Lab, Hangzhou 311100, China; Xiaoxiang Institute for Brain Health and Yongzhou Central Hospital, Yongzhou 425000, Hunan Province, China.
| |
Collapse
|
2
|
Wheatley T, Thornton MA, Stolk A, Chang LJ. The Emerging Science of Interacting Minds. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:355-373. [PMID: 38096443 PMCID: PMC10932833 DOI: 10.1177/17456916231200177] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
For over a century, psychology has focused on uncovering mental processes of a single individual. However, humans rarely navigate the world in isolation. The most important determinants of successful development, mental health, and our individual traits and preferences arise from interacting with other individuals. Social interaction underpins who we are, how we think, and how we behave. Here we discuss the key methodological challenges that have limited progress in establishing a robust science of how minds interact and the new tools that are beginning to overcome these challenges. A deep understanding of the human mind requires studying the context within which it originates and exists: social interaction.
Collapse
Affiliation(s)
- Thalia Wheatley
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
- Santa Fe Institute
| | - Mark A. Thornton
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| | - Arjen Stolk
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| | - Luke J. Chang
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| |
Collapse
|
3
|
Faghel-Soubeyrand S, Ramon M, Bamps E, Zoia M, Woodhams J, Richoz AR, Caldara R, Gosselin F, Charest I. Decoding face recognition abilities in the human brain. PNAS NEXUS 2024; 3:pgae095. [PMID: 38516275 PMCID: PMC10957238 DOI: 10.1093/pnasnexus/pgae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024]
Abstract
Why are some individuals better at recognizing faces? Uncovering the neural mechanisms supporting face recognition ability has proven elusive. To tackle this challenge, we used a multimodal data-driven approach combining neuroimaging, computational modeling, and behavioral tests. We recorded the high-density electroencephalographic brain activity of individuals with extraordinary face recognition abilities-super-recognizers-and typical recognizers in response to diverse visual stimuli. Using multivariate pattern analyses, we decoded face recognition abilities from 1 s of brain activity with up to 80% accuracy. To better understand the mechanisms subtending this decoding, we compared representations in the brains of our participants with those in artificial neural network models of vision and semantics, as well as with those involved in human judgments of shape and meaning similarity. Compared to typical recognizers, we found stronger associations between early brain representations of super-recognizers and midlevel representations of vision models as well as shape similarity judgments. Moreover, we found stronger associations between late brain representations of super-recognizers and representations of the artificial semantic model as well as meaning similarity judgments. Overall, these results indicate that important individual variations in brain processing, including neural computations extending beyond purely visual processes, support differences in face recognition abilities. They provide the first empirical evidence for an association between semantic computations and face recognition abilities. We believe that such multimodal data-driven approaches will likely play a critical role in further revealing the complex nature of idiosyncratic face recognition in the human brain.
Collapse
Affiliation(s)
- Simon Faghel-Soubeyrand
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, UK
- Département de psychologie, Université de Montréal, Montréal, Québec H2V 2S9, Canada
| | - Meike Ramon
- Institute of Psychology, University of Lausanne, Lausanne CH-1015, Switzerland
| | - Eva Bamps
- Center for Contextual Psychiatry, Department of Neurosciences, KU Leuven, Leuven ON5, Belgium
| | - Matteo Zoia
- Department for Biomedical Research, University of Bern, Bern 3008, Switzerland
| | - Jessica Woodhams
- Département de psychologie, Université de Montréal, Montréal, Québec H2V 2S9, Canada
- School of Psychology, University of Birmingham, Hills Building, Edgbaston Park Rd, Birmingham B15 2TT, UK
| | | | - Roberto Caldara
- Département de Psychology, Université de Fribourg, Fribourg CH-1700, Switzerland
| | - Frédéric Gosselin
- Département de psychologie, Université de Montréal, Montréal, Québec H2V 2S9, Canada
| | - Ian Charest
- Département de psychologie, Université de Montréal, Montréal, Québec H2V 2S9, Canada
| |
Collapse
|
4
|
Shoham A, Grosbard ID, Patashnik O, Cohen-Or D, Yovel G. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 2024:10.1038/s41562-024-01816-9. [PMID: 38332339 DOI: 10.1038/s41562-024-01816-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 12/22/2023] [Indexed: 02/10/2024]
Abstract
Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual-semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual-semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.
Collapse
Affiliation(s)
- Adva Shoham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Idan Daniel Grosbard
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Or Patashnik
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Daniel Cohen-Or
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
5
|
Cao R, Wang J, Brunner P, Willie JT, Li X, Rutishauser U, Brandmeir NJ, Wang S. Neural mechanisms of face familiarity and learning in the human amygdala and hippocampus. Cell Rep 2024; 43:113520. [PMID: 38151023 PMCID: PMC10834150 DOI: 10.1016/j.celrep.2023.113520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 09/12/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023] Open
Abstract
Recognizing familiar faces and learning new faces play an important role in social cognition. However, the underlying neural computational mechanisms remain unclear. Here, we record from single neurons in the human amygdala and hippocampus and find a greater neuronal representational distance between pairs of familiar faces than unfamiliar faces, suggesting that neural representations for familiar faces are more distinct. Representational distance increases with exposures to the same identity, suggesting that neural face representations are sharpened with learning and familiarization. Furthermore, representational distance is positively correlated with visual dissimilarity between faces, and exposure to visually similar faces increases representational distance, thus sharpening neural representations. Finally, we construct a computational model that demonstrates an increase in the representational distance of artificial units with training. Together, our results suggest that the neuronal population geometry, quantified by the representational distance, encodes face familiarity, similarity, and learning, forming the basis of face recognition and memory.
Collapse
Affiliation(s)
- Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA; Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA.
| | - Jinge Wang
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Peter Brunner
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Jon T Willie
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Xin Li
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Ueli Rutishauser
- Departments of Neurosurgery and Neurology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | | | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA; Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA; Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
6
|
Yovel G, Abudarham N. Why psychologists should embrace rather than abandon DNNs. Behav Brain Sci 2023; 46:e414. [PMID: 38054326 DOI: 10.1017/s0140525x2300167x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep neural networks (DNNs) are powerful computational models, which generate complex, high-level representations that were missing in previous models of human cognition. By studying these high-level representations, psychologists can now gain new insights into the nature and origin of human high-level vision, which was not possible with traditional handcrafted models. Abandoning DNNs would be a huge oversight for psychological sciences.
Collapse
Affiliation(s)
- Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel ; https://people.socsci.tau.ac.il/mu/galityovel/
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Naphtali Abudarham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel ; https://people.socsci.tau.ac.il/mu/galityovel/
| |
Collapse
|
7
|
Wang A, Sliwinska MW, Watson DM, Smith S, Andrews TJ. Distinct patterns of neural response to faces from different races in humans and deep networks. Soc Cogn Affect Neurosci 2023; 18:nsad059. [PMID: 37837305 PMCID: PMC10634630 DOI: 10.1093/scan/nsad059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 07/27/2023] [Accepted: 10/06/2023] [Indexed: 10/15/2023] Open
Abstract
Social categories such as the race or ethnicity of an individual are typically conveyed by the visual appearance of the face. The aim of this study was to explore how these differences in facial appearance are represented in human and artificial neural networks. First, we compared the similarity of faces from different races using a neural network trained to discriminate identity. We found that the differences between races were most evident in the fully connected layers of the network. Although these layers were also able to predict behavioural judgements of face identity from human participants, performance was biased toward White faces. Next, we measured the neural response in face-selective regions of the human brain to faces from different races in Asian and White participants. We found distinct patterns of response to faces from different races in face-selective regions. We also found that the spatial pattern of response was more consistent across participants for own-race compared to other-race faces. Together, these findings show that faces from different races elicit different patterns of response in human and artificial neural networks. These differences may underlie the ability to make categorical judgements and explain the behavioural advantage for the recognition of own-race faces.
Collapse
Affiliation(s)
- Ao Wang
- Department of Psychology, University of York, York YO10 5DD, UK
- Department of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Magdalena W Sliwinska
- Department of Psychology, University of York, York YO10 5DD, UK
- School of Psychology, Liverpool John Moores University, Liverpool L2 2QP, UK
| | - David M Watson
- Department of Psychology, University of York, York YO10 5DD, UK
| | - Sam Smith
- Department of Psychology, University of York, York YO10 5DD, UK
| | | |
Collapse
|
8
|
Liu K, Chen CY, Wang LS, Jo H, Kung CC. Is increased activation in the fusiform face area to Greebles a result of appropriate expertise training or caused by Greebles' face likeness? Front Neurosci 2023; 17:1224721. [PMID: 37916181 PMCID: PMC10616304 DOI: 10.3389/fnins.2023.1224721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 09/20/2023] [Indexed: 11/03/2023] Open
Abstract
Background In 2011, Brants et al. trained eight individuals to become Greeble experts and found neuronal inversion effects [NIEs; i.e., higher fusiform face area (FFA) activity for upright, rather than inverted Greebles]. These effects were also found for faces, both before and after training. By claiming to have replicated the seminal Greeble training study by Gauthier and colleagues in 1999, Brants et al. interpreted these results as participants viewing Greebles as faces throughout training, contrary to the original argument of subjects becoming Greeble experts only after training. However, Brants et al.'s claim presents two issues. First, their behavioral training results did not replicate those of Gauthier and Tarr conducted in 1997 and 1998, raising concerns of whether the right training regime had been adopted. Second, both a literature review and meta-analysis of NIEs in the FFA suggest its impotency as an index of the face(-like) processing. Objectives To empirically evaluate these issues, the present study compared two documented training paradigms Gauthier and colleagues in 1997 and 1998, and compared their impact on the brain. Methods Sixteen NCKU undergraduate and graduate students (nine girls) were recruited. Sixty Greeble exemplars were categorized by two genders, five families, and six individual levels. The participants were randomly divided into two groups (one for Greeble classification at all three levels and the other for gender- and individual-level training). Several fMRI tasks were administered at various time points, specifically, before training (1st), during training (2nd), and typically no <24 h after reaching expertise criterion (3rd). Results The ROI analysis results showed significant increases in the FFA for Greebles, and a clear neural "adaptation," both only in the Gauthier97 group and only after training, reflecting clear modulation of extensive experiences following an "appropriate" training regime. In both groups, no clear NIEs for faces nor Greebles were found, which was also in line with the review of extant studies bearing this comparison. Conclusion Collectively, these results invalidate the assumptions behind Brants et al.'s findings.
Collapse
Affiliation(s)
- Kuo Liu
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- Department of Psychology, National Cheng Kung University, Tainan, Taiwan
| | - Chiu-Yueh Chen
- Department of Psychology, National Cheng Kung University, Tainan, Taiwan
- Brain & Cognition, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Le-Si Wang
- Institute of Creative Industries Design, National Cheng Kung University, Tainan, Taiwan
| | - Hanshin Jo
- Department of Psychology, National Cheng Kung University, Tainan, Taiwan
- Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
| | - Chun-Chia Kung
- Department of Psychology, National Cheng Kung University, Tainan, Taiwan
- Mind Research and Imaging (MRI) Center, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
9
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
10
|
Abstract
Deep neural networks (DNNs) are machine learning algorithms that have revolutionized computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. In this article, we review evidence regarding current DNNs as adequate behavioral models of human core object recognition. To this end, we argue that it is important to distinguish between statistical tools and computational models and to understand model quality as a multidimensional concept in which clarity about modeling goals is key. Reviewing a large number of psychophysical and computational explorations of core object recognition performance in humans and DNNs, we argue that DNNs are highly valuable scientific tools but that, as of today, DNNs should only be regarded as promising-but not yet adequate-computational models of human core object recognition behavior. On the way, we dispel several myths surrounding DNNs in vision science.
Collapse
Affiliation(s)
- Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany;
| | | |
Collapse
|
11
|
Vinken K, Prince JS, Konkle T, Livingstone MS. The neural code for "face cells" is not face-specific. SCIENCE ADVANCES 2023; 9:eadg1736. [PMID: 37647400 PMCID: PMC10468123 DOI: 10.1126/sciadv.adg1736] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/27/2023] [Indexed: 09/01/2023]
Abstract
Face cells are neurons that respond more to faces than to non-face objects. They are found in clusters in the inferotemporal cortex, thought to process faces specifically, and, hence, studied using faces almost exclusively. Analyzing neural responses in and around macaque face patches to hundreds of objects, we found graded response profiles for non-face objects that predicted the degree of face selectivity and provided information on face-cell tuning beyond that from actual faces. This relationship between non-face and face responses was not predicted by color and simple shape properties but by information encoded in deep neural networks trained on general objects rather than face classification. These findings contradict the long-standing assumption that face versus non-face selectivity emerges from face-specific features and challenge the practice of focusing on only the most effective stimulus. They provide evidence instead that category-selective neurons are best understood by their tuning directions in a domain-general object space.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jacob S. Prince
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | | |
Collapse
|
12
|
Wen J, Zhang H, Wu Z, Wang Q, Yu H, Sun W, Liang B, He C, Xiong K, Pan Y, Zhang Y, Liu Z. All-optical spiking neural network and optical spike-time-dependent plasticity based on the self-pulsing effect within a micro-ring resonator. APPLIED OPTICS 2023; 62:5459-5466. [PMID: 37706863 DOI: 10.1364/ao.493466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 06/19/2023] [Indexed: 09/15/2023]
Abstract
In this paper, we proposed an all-optical version of photonic spiking neurons and spike-time-dependent plasticity (STDP) based on the nonlinear optical effects within a micro-ring resonator. In this system, the self-pulsing effect was exploited to implement threshold control, and the equivalent pulse energy required for spiking, calculated by multiplying the input pulse power amplitude with its duration, was about 14.1 pJ. The positive performance of the neurons in the excitability and cascadability tests validated the feasibility of this scheme. Furthermore, two simulations were performed to demonstrate that such an all-optical spiking neural network incorporated with STDP could run stably on a stochastic topology. The essence of such an all-optical spiking neural network is a nonlinear spiking dynamical system that combines the advantages of photonics and spiking neural networks (SNNs), promising access to the high speed and lower consumption inherent to optical systems.
Collapse
|
13
|
Parde CJ, Strehle VE, Banerjee V, Hu Y, Cavazos JG, Castillo CD, O'Toole AJ. Twin Identification over Viewpoint Change: A Deep Convolutional Neural Network Surpasses Humans. ACM TRANSACTIONS ON APPLIED PERCEPTION 2023; 20:10. [PMID: 39131580 PMCID: PMC11315461 DOI: 10.1145/3609224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 04/30/2023] [Indexed: 08/13/2024]
Abstract
Deep convolutional neural networks (DCNNs) have achieved human-level accuracy in face identification (Phillips et al., 2018), though it is unclear how accurately they discriminate highly-similar faces. Here, humans and a DCNN performed a challenging face-identity matching task that included identical twins. Participants (N = 87) viewed pairs of face images of three types: same-identity, general imposters (different identities from similar demographic groups), and twin imposters (identical twin siblings). The task was to determine whether the pairs showed the same person or different people. Identity comparisons were tested in three viewpoint-disparity conditions: frontal to frontal, frontal to 45° profile, and frontal to 90°profile. Accuracy for discriminating matched-identity pairs from twin-imposter pairs and general-imposter pairs was assessed in each viewpoint-disparity condition. Humans were more accurate for general-imposter pairs than twin-imposter pairs, and accuracy declined with increased viewpoint disparity between the images in a pair. A DCNN trained for face identification (Ranjan et al., 2018) was tested on the same image pairs presented to humans. Machine performance mirrored the pattern of human accuracy, but with performance at or above all humans in all but one condition. Human and machine similarity scores were compared across all image-pair types. This item-level analysis showed that human and machine similarity ratings correlated significantly in six of nine image-pair types [range r = 0.38 to r = 0.63], suggesting general accord between the perception of face similarity by humans and the DCNN. These findings also contribute to our understanding of DCNN performance for discriminating high-resemblance faces, demonstrate that the DCNN performs at a level at or above humans, and suggest a degree of parity between the features used by humans and the DCNN.
Collapse
Affiliation(s)
- Connor J Parde
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA
| | - Virginia E Strehle
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA
| | - Vivekjyoti Banerjee
- University of Maryland Institute of Advanced Computer Studies, University of Maryland, USA
| | - Ying Hu
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA
| | | | | | - Alice J O'Toole
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA
| |
Collapse
|
14
|
Baker KA, Stabile VJ, Mondloch CJ. Stable individual differences in unfamiliar face identification: Evidence from simultaneous and sequential matching tasks. Cognition 2023; 232:105333. [PMID: 36508992 DOI: 10.1016/j.cognition.2022.105333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 11/14/2022] [Accepted: 11/19/2022] [Indexed: 12/14/2022]
Abstract
Matching identity in images of unfamiliar faces is difficult: Images of the same person can look different and images of different people can look similar. Recent studies have capitalized on individual differences in the ability to distinguish match (same ID) vs. mismatch (different IDs) face pairs to inform models of face recognition. We addressed two significant gaps in the literature by examining the stability of individual differences in both sensitivity to identity and response bias. In Study 1, 210 participants completed a battery of four tasks in each of two sessions separated by one week. Tasks varied in protocol (same/different, lineup, sorting) and stimulus characteristics (low vs. high within-person variability in appearance). In Study 2, 148 participants completed a battery of three tasks in a single session. Stimuli were presented simultaneously on some trials and sequentially on others, introducing short-term memory demands. Principal components analysis revealed two components that were stable across time and tasks: sensitivity to identity and bias. Analyses of response times suggest that individual differences in bias reflect decision-making processes. We discuss the implications of our findings in applied settings and for models of face recognition.
Collapse
|
15
|
Liao C, Sawayama M, Xiao B. Unsupervised learning reveals interpretable latent representations for translucency perception. PLoS Comput Biol 2023; 19:e1010878. [PMID: 36753520 PMCID: PMC9942964 DOI: 10.1371/journal.pcbi.1010878] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 02/21/2023] [Accepted: 01/18/2023] [Indexed: 02/09/2023] Open
Abstract
Humans constantly assess the appearance of materials to plan actions, such as stepping on icy roads without slipping. Visual inference of materials is important but challenging because a given material can appear dramatically different in various scenes. This problem especially stands out for translucent materials, whose appearance strongly depends on lighting, geometry, and viewpoint. Despite this, humans can still distinguish between different materials, and it remains unsolved how to systematically discover visual features pertinent to material inference from natural images. Here, we develop an unsupervised style-based image generation model to identify perceptually relevant dimensions for translucent material appearances from photographs. We find our model, with its layer-wise latent representation, can synthesize images of diverse and realistic materials. Importantly, without supervision, human-understandable scene attributes, including the object's shape, material, and body color, spontaneously emerge in the model's layer-wise latent space in a scale-specific manner. By embedding an image into the learned latent space, we can manipulate specific layers' latent code to modify the appearance of the object in the image. Specifically, we find that manipulation on the early-layers (coarse spatial scale) transforms the object's shape, while manipulation on the later-layers (fine spatial scale) modifies its body color. The middle-layers of the latent space selectively encode translucency features and manipulation of such layers coherently modifies the translucency appearance, without changing the object's shape or body color. Moreover, we find the middle-layers of the latent space can successfully predict human translucency ratings, suggesting that translucent impressions are established in mid-to-low spatial scale features. This layer-wise latent representation allows us to systematically discover perceptually relevant image features for human translucency perception. Together, our findings reveal that learning the scale-specific statistical structure of natural images might be crucial for humans to efficiently represent material properties across contexts.
Collapse
Affiliation(s)
- Chenxi Liao
- Department of Neuroscience, American University, Washington, D.C., District of Columbia, United States of America
| | - Masataka Sawayama
- Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Bei Xiao
- Department of Computer Science, American University, Washington, D.C., District of Columbia, United States of America
| |
Collapse
|
16
|
Jinsi O, Henderson MM, Tarr MJ. Early experience with low-pass filtered images facilitates visual category learning in a neural network model. PLoS One 2023; 18:e0280145. [PMID: 36608003 PMCID: PMC9821476 DOI: 10.1371/journal.pone.0280145] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
Humans are born with very low contrast sensitivity, meaning that inputs to the infant visual system are both blurry and low contrast. Is this solely a byproduct of maturational processes or is there a functional advantage for beginning life with poor visual acuity? We addressed the impact of poor vision during early learning by exploring whether reduced visual acuity facilitated the acquisition of basic-level categories in a convolutional neural network model (CNN), as well as whether any such benefit transferred to subordinate-level category learning. Using the ecoset dataset to simulate basic-level category learning, we manipulated model training curricula along three dimensions: presence of blurred inputs early in training, rate of blur reduction over time, and grayscale versus color inputs. First, a training regime where blur was initially high and was gradually reduced over time-as in human development-improved basic-level categorization performance in a CNN relative to a regime in which non-blurred inputs were used throughout training. Second, when basic-level models were fine-tuned on a task including both basic-level and subordinate-level categories (using the ImageNet dataset), models initially trained with blurred inputs showed a greater performance benefit as compared to models trained exclusively on non-blurred inputs, suggesting that the benefit of blurring generalized from basic-level to subordinate-level categorization. Third, analogous to the low sensitivity to color that infants experience during the first 4-6 months of development, these advantages were observed only when grayscale images were used as inputs. We conclude that poor visual acuity in human newborns may confer functional advantages, including, as demonstrated here, more rapid and accurate acquisition of visual object categories at multiple levels.
Collapse
Affiliation(s)
- Omisa Jinsi
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Margaret M. Henderson
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Michael J. Tarr
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
17
|
Zhao S, Wang Y, Tian K. Using AAEHS-Net as an Attention-Based Auxiliary Extraction and Hybrid Subsampled Network for Semantic Segmentation. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1536976. [PMID: 36275973 PMCID: PMC9586756 DOI: 10.1155/2022/1536976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/03/2022] [Indexed: 11/17/2022]
Abstract
Semantic segmentation based on deep learning has undergone remarkable advancements in recent years. However, due to the neglect of the shallow features, the problems of inaccurate segmentation have persisted. To address this issue, a semantic segmentation network-attention-based auxiliary extraction and hybrid subsampled network (AAEHS-Net) is suggested in this study. To extract more deep information and the shallow features, the complementary and enhanced extraction module (CEEM) is utilized by the network. As a result, the edge segmentation of the model is improved. Moreover, to reduce the loss of features, a hybrid subsampled module (HSM) is introduced. Meanwhile, global max pool and global avg pool module (GAGM) is designed as an attention module to enhance the features with global and important information and maintain feature continuity. The proposed AAEHS-Net is evaluated on three datasets: the aerial drone image dataset, the Massachusetts roads dataset, and the Massachusetts buildings dataset. On the three datasets, AAEHS-Net achieves 1.15%, 0.88%, and 2.1% higher accuracy than U-Net, reaching 90.12%, 96.23%, and 95.15%, respectively. At the same time, our proposed network has obtained the best values for all evaluation metrics in three datasets compared to the currently popular algorithms.
Collapse
Affiliation(s)
- Shan Zhao
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Yibo Wang
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Kaiwen Tian
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| |
Collapse
|
18
|
Guo Q, Wang Z, Fan D, Wu H. Multi-face detection and alignment using multiple kernels. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
19
|
Behrmann M, Avidan G. Face perception: computational insights from phylogeny. Trends Cogn Sci 2022; 26:350-363. [PMID: 35232662 DOI: 10.1016/j.tics.2022.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 01/21/2022] [Accepted: 01/25/2022] [Indexed: 10/19/2022]
Abstract
Studies of face perception in primates elucidate the psychological and neural mechanisms that support this critical and complex ability. Recent progress in characterizing face perception across species, for example in insects and reptiles, has highlighted the ubiquity over phylogeny of this key ability for social interactions and survival. Here, we review the competence in face perception across species and the types of computation that support this behavior. We conclude that the computational complexity of face perception evinced by a species is not related to phylogenetic status and is, instead, largely a product of environmental context and social and adaptive pressures. Integrating findings across evolutionary data permits the derivation of computational principles that shed further light on primate face perception.
Collapse
Affiliation(s)
- Marlene Behrmann
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Galia Avidan
- Department of Psychology, Ben Gurion University of the Negev, Beer Sheva, Israel
| |
Collapse
|
20
|
Parde CJ, Colón YI, Hill MQ, Castillo CD, Dhar P, O'Toole AJ. Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition. J Vis 2021; 21:15. [PMID: 34379084 PMCID: PMC8363775 DOI: 10.1167/jov.21.8.15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 06/19/2021] [Indexed: 12/03/2022] Open
Abstract
Single-unit responses and population codes differ in the "read-out" information they provide about high-level visual representations. Diverging local and global read-outs can be difficult to reconcile with in vivo methods. To bridge this gap, we studied the relationship between single-unit and ensemble codes for identity, gender, and viewpoint, using a deep convolutional neural network (DCNN) trained for face recognition. Analogous to the primate visual system, DCNNs develop representations that generalize over image variation, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. At the unit level, we measured the number of single units needed to predict attributes (identity, gender, viewpoint) and the predictive value of individual units for each attribute. Identification was remarkably accurate using random samples of only 3% of the network's output units, and all units had substantial identity-predicting power. Cross-unit responses were minimally correlated, indicating that single units code non-redundant identity cues. Gender and viewpoint classification required large-scale pooling of units-individual units had weak predictive power. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint separated into high-dimensional subspaces, ordered by explained variance. Unit-based directions in the representational space were compared with the directions associated with the attributes. Identity, gender, and viewpoint contributed to all individual unit responses, undercutting a neural tuning analogy. Instead, single-unit responses carry superimposed, distributed codes for face identity, gender, and viewpoint. This undermines confidence in the interpretation of neural representations from unit response profiles for both DCNNs and, by analogy, high-level vision.
Collapse
Affiliation(s)
- Connor J Parde
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Y Ivette Colón
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Matthew Q Hill
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Carlos D Castillo
- University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Prithviraj Dhar
- University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Alice J O'Toole
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|