1
|
Dima DC, Janarthanan S, Culham JC, Mohsenzadeh Y. Shared representations of human actions across vision and language. Neuropsychologia 2024; 202:108962. [PMID: 39047974 DOI: 10.1016/j.neuropsychologia.2024.108962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/26/2024] [Accepted: 07/20/2024] [Indexed: 07/27/2024]
Abstract
Humans can recognize and communicate about many actions performed by others. How are actions organized in the mind, and is this organization shared across vision and language? We collected similarity judgments of human actions depicted through naturalistic videos and sentences, and tested four models of action categorization, defining actions at different levels of abstraction ranging from specific (action verb) to broad (action target: whether an action is directed towards an object, another person, or the self). The similarity judgments reflected a shared organization of action representations across videos and sentences, determined mainly by the target of actions, even after accounting for other semantic features. Furthermore, language model embeddings predicted the behavioral similarity of action videos and sentences, and captured information about the target of actions alongside unique semantic information. Together, our results show that action concepts are similarly organized in the mind across vision and language, and that this organization reflects socially relevant goals.
Collapse
Affiliation(s)
- Diana C Dima
- Dept of Computer Science, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
| | | | - Jody C Culham
- Dept of Psychology, Western University, London, Ontario, Canada
| | - Yalda Mohsenzadeh
- Dept of Computer Science, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Kasahara S, Kumasaki N, Shimizu K. Investigating the impact of motion visual synchrony on self face recognition using real time morphing. Sci Rep 2024; 14:13090. [PMID: 38849381 PMCID: PMC11161490 DOI: 10.1038/s41598-024-63233-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/27/2024] [Indexed: 06/09/2024] Open
Abstract
Face recognition is a crucial aspect of self-image and social interactions. Previous studies have focused on static images to explore the boundary of self-face recognition. Our research, however, investigates the dynamics of face recognition in contexts involving motor-visual synchrony. We first validated our morphing face metrics for self-face recognition. We then conducted an experiment using state-of-the-art video processing techniques for real-time face identity morphing during facial movement. We examined self-face recognition boundaries under three conditions: synchronous, asynchronous, and static facial movements. Our findings revealed that participants recognized a narrower self-face boundary with moving facial images compared to static ones, with no significant differences between synchronous and asynchronous movements. The direction of morphing consistently biased the recognized self-face boundary. These results suggest that while motor information of the face is vital for self-face recognition, it does not rely on movement synchronization, and the sense of agency over facial movements does not affect facial identity judgment. Our methodology offers a new approach to exploring the 'self-face boundary in action', allowing for an independent examination of motion and identity.
Collapse
Affiliation(s)
- Shunichi Kasahara
- Sony Computer Science Laboratories, Inc., Tokyo, 141-0022, Japan.
- Okinawa Institute of Science and Technology Graduate University, Okinawa, 904-0412, Japan.
| | - Nanako Kumasaki
- Sony Computer Science Laboratories, Inc., Tokyo, 141-0022, Japan
| | - Kye Shimizu
- Sony Computer Science Laboratories, Inc., Tokyo, 141-0022, Japan
| |
Collapse
|
3
|
Caplette L, Turk-Browne NB. Computational reconstruction of mental representations using human behavior. Nat Commun 2024; 15:4183. [PMID: 38760341 PMCID: PMC11101448 DOI: 10.1038/s41467-024-48114-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 04/19/2024] [Indexed: 05/19/2024] Open
Abstract
Revealing how the mind represents information is a longstanding goal of cognitive science. However, there is currently no framework for reconstructing the broad range of mental representations that humans possess. Here, we ask participants to indicate what they perceive in images made of random visual features in a deep neural network. We then infer associations between the semantic features of their responses and the visual features of the images. This allows us to reconstruct the mental representations of multiple visual concepts, both those supplied by participants and other concepts extrapolated from the same semantic space. We validate these reconstructions in separate participants and further generalize our approach to predict behavior for new stimuli and in a new task. Finally, we reconstruct the mental representations of individual observers and of a neural network. This framework enables a large-scale investigation of conceptual representations.
Collapse
Affiliation(s)
| | - Nicholas B Turk-Browne
- Department of Psychology, Yale University, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| |
Collapse
|
4
|
Garlichs A, Blank H. Prediction error processing and sharpening of expected information across the face-processing hierarchy. Nat Commun 2024; 15:3407. [PMID: 38649694 PMCID: PMC11035707 DOI: 10.1038/s41467-024-47749-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
The perception and neural processing of sensory information are strongly influenced by prior expectations. The integration of prior and sensory information can manifest through distinct underlying mechanisms: focusing on unexpected input, denoted as prediction error (PE) processing, or amplifying anticipated information via sharpened representation. In this study, we employed computational modeling using deep neural networks combined with representational similarity analyses of fMRI data to investigate these two processes during face perception. Participants were cued to see face images, some generated by morphing two faces, leading to ambiguity in face identity. We show that expected faces were identified faster and perception of ambiguous faces was shifted towards priors. Multivariate analyses uncovered evidence for PE processing across and beyond the face-processing hierarchy from the occipital face area (OFA), via the fusiform face area, to the anterior temporal lobe, and suggest sharpened representations in the OFA. Our findings support the proposition that the brain represents faces grounded in prior expectations.
Collapse
Affiliation(s)
- Annika Garlichs
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| | - Helen Blank
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| |
Collapse
|
5
|
Shoham A, Grosbard ID, Patashnik O, Cohen-Or D, Yovel G. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 2024:10.1038/s41562-024-01816-9. [PMID: 38332339 DOI: 10.1038/s41562-024-01816-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 12/22/2023] [Indexed: 02/10/2024]
Abstract
Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual-semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual-semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.
Collapse
Affiliation(s)
- Adva Shoham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Idan Daniel Grosbard
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Or Patashnik
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Daniel Cohen-Or
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
6
|
Pham TD, Holmes SB, Patel M, Coulthard P. Features and networks of the mandible on computed tomography. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231166. [PMID: 38234434 PMCID: PMC10791540 DOI: 10.1098/rsos.231166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/19/2023] [Indexed: 01/19/2024]
Abstract
The mandible or lower jaw is the largest and hardest bone in the human facial skeleton. Fractures of the mandible are reported to be a common facial trauma in emergency medicine and gaining insights into mandibular morphology in different facial types can be helpful for trauma treatment. Furthermore, features of the mandible play an important role in forensics and anthropology for identifying gender and individuals. Thus, discovering hidden information of the mandible can benefit interdisciplinary research. Here, for the first time, a method of artificial intelligence-based nonlinear dynamics and network analysis are used for discovering dissimilar and similar radiographic features of mandibles between male and female subjects. Using a public dataset of 10 computed tomography scans of mandibles, the results suggest a difference in the distribution of spatial autocorrelation between genders, uniqueness in network topologies among individuals and shared values in recurrence quantification.
Collapse
Affiliation(s)
- Tuan D. Pham
- Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Turner Street, London E1 2AD, UK
| | - Simon B. Holmes
- Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Turner Street, London E1 2AD, UK
| | - Mangala Patel
- Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Turner Street, London E1 2AD, UK
| | - Paul Coulthard
- Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Turner Street, London E1 2AD, UK
| |
Collapse
|
7
|
Andrews TJ, Rogers D, Mileva M, Watson DM, Wang A, Burton AM. A narrow band of image dimensions is critical for face recognition. Vision Res 2023; 212:108297. [PMID: 37527594 DOI: 10.1016/j.visres.2023.108297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 07/07/2023] [Accepted: 07/12/2023] [Indexed: 08/03/2023]
Abstract
A key challenge in human and computer face recognition is to differentiate information that is diagnostic for identity from other sources of image variation. Here, we used a combined computational and behavioural approach to reveal critical image dimensions for face recognition. Behavioural data were collected using a sorting and matching task with unfamiliar faces and a recognition task with familiar faces. Principal components analysis was used to reveal the dimensions across which the shape and texture of faces in these tasks varied. We then asked which image dimensions were able to predict behavioural performance across these tasks. We found that the ability to predict behavioural responses in the unfamiliar face tasks increased when the early PCA dimensions (i.e. those accounting for most variance) of shape and texture were removed from the analysis. Image similarity also predicted the output of a computer model of face recognition, but again only when the early image dimensions were removed from the analysis. Finally, we found that recognition of familiar faces increased when the early image dimensions were removed, decreased when intermediate dimensions were removed, but then returned to baseline recognition when only later dimensions were removed. Together, these findings suggest that early image dimensions reflect ambient changes, such as changes in viewpoint or lighting, that do not contribute to face recognition. However, there is a narrow band of image dimensions for shape and texture that are critical for the recognition of identity in humans and computer models of face recognition.
Collapse
Affiliation(s)
| | - Daniel Rogers
- Department of Psychology, University of York, York YO10 5DD, UK
| | - Mila Mileva
- Department of Psychology, University of York, York YO10 5DD, UK
| | - David M Watson
- Department of Psychology, University of York, York YO10 5DD, UK
| | - Ao Wang
- Department of Psychology, University of York, York YO10 5DD, UK
| | - A Mike Burton
- Department of Psychology, University of York, York YO10 5DD, UK
| |
Collapse
|
8
|
Tieo S, Dezeure J, Cryer A, Lepou P, Charpentier MJ, Renoult JP. Social and sexual consequences of facial femininity in a non-human primate. iScience 2023; 26:107901. [PMID: 37766996 PMCID: PMC10520438 DOI: 10.1016/j.isci.2023.107901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/11/2023] [Accepted: 09/08/2023] [Indexed: 09/29/2023] Open
Abstract
In humans, femininity shapes women's interactions with both genders, but its influence on animals remains unknown. Using 10 years of data on a wild primate, we developed an artificial intelligence-based method to estimate facial femininity from naturalistic portraits. Our method explains up to 30% of the variance in perceived femininity in humans, competing with classical methods using standardized pictures taken under laboratory conditions. We then showed that femininity estimated on 95 female mandrills significantly correlated with various socio-sexual behaviors. Unexpectedly, less feminine female mandrills were approached and aggressed more frequently by both sexes and received more male copulations, suggesting a positive valuation of masculinity attributes rather than a perception bias. This study contributes to understand the role of femininity on animal's sociality and offers a framework for non-invasive research on visual communication in behavioral ecology.
Collapse
Affiliation(s)
- Sonia Tieo
- CEFE, University Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Jules Dezeure
- Projet Mandrillus, Fondation Lékédi pour la Biodiversité, Bakoumba BP 52, Gabon
| | - Anna Cryer
- Projet Mandrillus, Fondation Lékédi pour la Biodiversité, Bakoumba BP 52, Gabon
| | - Pascal Lepou
- Projet Mandrillus, Fondation Lékédi pour la Biodiversité, Bakoumba BP 52, Gabon
| | - Marie J.E. Charpentier
- Institut des Sciences de l’Evolution de Montpellier (ISEM), UMR5554 - University of Montpellier/CNRS/IRD/EPHE, Place Eugène Bataillon, 34095 Montpellier Cedex 5, France
| | | |
Collapse
|
9
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
10
|
Dobs K, Yuan J, Martinez J, Kanwisher N. Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition. Proc Natl Acad Sci U S A 2023; 120:e2220642120. [PMID: 37523537 PMCID: PMC10410721 DOI: 10.1073/pnas.2220642120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 06/08/2023] [Indexed: 08/02/2023] Open
Abstract
Human face recognition is highly accurate and exhibits a number of distinctive and well-documented behavioral "signatures" such as the use of a characteristic representational space, the disproportionate performance cost when stimuli are presented upside down, and the drop in accuracy for faces from races the participant is less familiar with. These and other phenomena have long been taken as evidence that face recognition is "special". But why does human face perception exhibit these properties in the first place? Here, we use deep convolutional neural networks (CNNs) to test the hypothesis that all of these signatures of human face perception result from optimization for the task of face recognition. Indeed, as predicted by this hypothesis, these phenomena are all found in CNNs trained on face recognition, but not in CNNs trained on object recognition, even when additionally trained to detect faces while matching the amount of face experience. To test whether these signatures are in principle specific to faces, we optimized a CNN on car discrimination and tested it on upright and inverted car images. As we found for face perception, the car-trained network showed a drop in performance for inverted vs. upright cars. Similarly, CNNs trained on inverted faces produced an inverted face inversion effect. These findings show that the behavioral signatures of human face perception reflect and are well explained as the result of optimization for the task of face recognition, and that the nature of the computations underlying this task may not be so special after all.
Collapse
Affiliation(s)
- Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen35394, Germany
- Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Marburg35302, Germany
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Joanne Yuan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Julio Martinez
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Psychology, Stanford University, Stanford, CA94305
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
11
|
Deen B, Schwiedrzik CM, Sliwa J, Freiwald WA. Specialized Networks for Social Cognition in the Primate Brain. Annu Rev Neurosci 2023; 46:381-401. [PMID: 37428602 PMCID: PMC11115357 DOI: 10.1146/annurev-neuro-102522-121410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2023]
Abstract
Primates have evolved diverse cognitive capabilities to navigate their complex social world. To understand how the brain implements critical social cognitive abilities, we describe functional specialization in the domains of face processing, social interaction understanding, and mental state attribution. Systems for face processing are specialized from the level of single cells to populations of neurons within brain regions to hierarchically organized networks that extract and represent abstract social information. Such functional specialization is not confined to the sensorimotor periphery but appears to be a pervasive theme of primate brain organization all the way to the apex regions of cortical hierarchies. Circuits processing social information are juxtaposed with parallel systems involved in processing nonsocial information, suggesting common computations applied to different domains. The emerging picture of the neural basis of social cognition is a set of distinct but interacting subnetworks involved in component processes such as face perception and social reasoning, traversing large parts of the primate brain.
Collapse
Affiliation(s)
- Ben Deen
- Psychology Department & Tulane Brain Institute, Tulane University, New Orleans, Louisiana, USA
| | - Caspar M Schwiedrzik
- Neural Circuits and Cognition Lab, European Neuroscience Institute Göttingen, A Joint Initiative of the University Medical Center Göttingen and the Max Planck Society; Perception and Plasticity Group, German Primate Center, Leibniz Institute for Primate Research; and Leibniz-Science Campus Primate Cognition, Göttingen, Germany
| | - Julia Sliwa
- Sorbonne Université, Institut du Cerveau, ICM, Inserm, CNRS, APHP, Hôpital de la Pitié Salpêtrière, Paris, France
| | - Winrich A Freiwald
- Laboratory of Neural Systems and The Price Family Center for the Social Brain, The Rockefeller University, New York, NY, USA;
- The Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| |
Collapse
|
12
|
Understanding How Cells Probe the World: A Preliminary Step towards Modeling Cell Behavior? Int J Mol Sci 2023; 24:ijms24032266. [PMID: 36768586 PMCID: PMC9916635 DOI: 10.3390/ijms24032266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 01/16/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Cell biologists have long aimed at quantitatively modeling cell function. Recently, the outstanding progress of high-throughput measurement methods and data processing tools has made this a realistic goal. The aim of this paper is twofold: First, to suggest that, while much progress has been done in modeling cell states and transitions, current accounts of environmental cues driving these transitions remain insufficient. There is a need to provide an integrated view of the biochemical, topographical and mechanical information processed by cells to take decisions. It might be rewarding in the near future to try to connect cell environmental cues to physiologically relevant outcomes rather than modeling relationships between these cues and internal signaling networks. The second aim of this paper is to review exogenous signals that are sensed by living cells and significantly influence fate decisions. Indeed, in addition to the composition of the surrounding medium, cells are highly sensitive to the properties of neighboring surfaces, including the spatial organization of anchored molecules and substrate mechanical and topographical properties. These properties should thus be included in models of cell behavior. It is also suggested that attempts at cell modeling could strongly benefit from two research lines: (i) trying to decipher the way cells encode the information they retrieve from environment analysis, and (ii) developing more standardized means of assessing the quality of proposed models, as was done in other research domains such as protein structure prediction.
Collapse
|