1
|
Peng Y, Gong X, Lu H, Fang F. Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers. J Cogn Neurosci 2024; 36:2458-2480. [PMID: 39106158 DOI: 10.1162/jocn_a_02233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.
Collapse
Affiliation(s)
- Yujia Peng
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- Institute for Artificial Intelligence, Peking University, Beijing, People's Republic of China
- National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence, Beijing, China
- Department of Psychology, University of California, Los Angeles
| | - Xizi Gong
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
| | - Hongjing Lu
- Department of Psychology, University of California, Los Angeles
- Department of Statistics, University of California, Los Angeles
| | - Fang Fang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing, People's Republic of China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, People's Republic of China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, People's Republic of China
| |
Collapse
|
2
|
Peng Y, Burling JM, Todorova GK, Neary C, Pollick FE, Lu H. Patterns of saliency and semantic features distinguish gaze of expert and novice viewers of surveillance footage. Psychon Bull Rev 2024; 31:1745-1758. [PMID: 38273144 PMCID: PMC11358171 DOI: 10.3758/s13423-024-02454-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2024] [Indexed: 01/27/2024]
Abstract
When viewing the actions of others, we not only see patterns of body movements, but we also "see" the intentions and social relations of people. Experienced forensic examiners - Closed Circuit Television (CCTV) operators - have been shown to convey superior performance in identifying and predicting hostile intentions from surveillance footage than novices. However, it remains largely unknown what visual content CCTV operators actively attend to, and whether CCTV operators develop different strategies for active information seeking from what novices do. Here, we conducted computational analysis for the gaze-centered stimuli captured by experienced CCTV operators and novices' eye movements when viewing the same surveillance footage. Low-level image features were extracted by a visual saliency model, whereas object-level semantic features were extracted by a deep convolutional neural network (DCNN), AlexNet, from gaze-centered regions. We found that the looking behavior of CCTV operators differs from novices by actively attending to visual contents with different patterns of saliency and semantic features. Expertise in selectively utilizing informative features at different levels of visual hierarchy may play an important role in facilitating the efficient detection of social relationships between agents and the prediction of harmful intentions.
Collapse
Affiliation(s)
- Yujia Peng
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, 100871, China.
- Institute for Artificial Intelligence, Peking University, Beijing, China.
- National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence, Beijing, China.
- Department of Psychology, University of California, Los Angeles, CA, USA.
| | - Joseph M Burling
- Department of Psychology, University of California, Los Angeles, CA, USA
| | - Greta K Todorova
- School of Psychology and Neuroscience, University of Glasgow, Glasgow, UK
| | - Catherine Neary
- School of Health and Social Wellbeing, The University of the West of England, Bristol, UK
| | - Frank E Pollick
- School of Psychology and Neuroscience, University of Glasgow, Glasgow, UK
| | - Hongjing Lu
- Department of Psychology, University of California, Los Angeles, CA, USA
- Department of Statistics, University of California, Los Angeles, CA, USA
| |
Collapse
|
3
|
Ge Y, Yu Y, Huang S, Huang X, Wang L, Jiang Y. Life motion signals bias the perception of apparent motion direction. Br J Psychol 2024; 115:115-128. [PMID: 37623746 DOI: 10.1111/bjop.12680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 07/17/2023] [Indexed: 08/26/2023]
Abstract
Walking direction conveyed by biological motion (BM) cues, which humans are highly sensitive to since birth, can elicit involuntary shifts of attention to enhance the detection of static targets. Here, we demonstrated that such intrinsic sensitivity to walking direction could also modulate the direction perception of simultaneously presented dynamic stimuli. We showed that the perceived direction of apparent motion was biased towards the walking direction even though observers had been informed in advance that the walking direction of BM did not predict the apparent motion direction. In particular, rightward BM cues had an advantage over leftward BM cues in altering the perception of motion direction. Intriguingly, this perceptual bias disappeared when BM cues were shown inverted, or when the critical biological characteristics were removed from the cues. Critically, both the perceptual direction bias and the rightward advantage persisted even when only local BM cues were presented without any global configuration. Furthermore, the rightward advantage was found to be specific to social cues (i.e., BM), as it vanished when non-social cues (i.e., arrows) were utilized. Taken together, these findings support the existence of a specific processing mechanism for life motion signals and shed new light on their influences in a dynamic environment.
Collapse
Affiliation(s)
- Yiping Ge
- State Key Laboratory of Brain and Cognitive Science, CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Yiwen Yu
- State Key Laboratory of Brain and Cognitive Science, CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Suqi Huang
- State Key Laboratory of Brain and Cognitive Science, CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Xinyi Huang
- State Key Laboratory of Brain and Cognitive Science, CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Li Wang
- State Key Laboratory of Brain and Cognitive Science, CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Yi Jiang
- State Key Laboratory of Brain and Cognitive Science, CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| |
Collapse
|
4
|
Ziccarelli S, Errante A, Fogassi L. Decoding point-light displays and fully visible hand grasping actions within the action observation network. Hum Brain Mapp 2022; 43:4293-4309. [PMID: 35611407 PMCID: PMC9435013 DOI: 10.1002/hbm.25954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/04/2022] [Accepted: 05/08/2022] [Indexed: 11/10/2022] Open
Abstract
Action observation typically recruits visual areas and dorsal and ventral sectors of the parietal and premotor cortex. This network has been collectively termed as extended action observation network (eAON). Within this network, the elaboration of kinematic aspects of biological motion is crucial. Previous studies investigated these aspects by presenting subjects with point-light displays (PLDs) videos of whole-body movements, showing the recruitment of some of the eAON areas. However, studies focused on cortical activation during observation of PLDs grasping actions are lacking. In the present functional magnetic resonance imaging (fMRI) study, we assessed the activation of eAON in healthy participants during the observation of both PLDs and fully visible hand grasping actions, excluding confounding effects due to low-level visual features, motion, and context. Results showed that the observation of PLDs grasping stimuli elicited a bilateral activation of the eAON. Region of interest analyses performed on visual and sensorimotor areas showed no significant differences in signal intensity between PLDs and fully visible experimental conditions, indicating that both conditions evoked a similar motor resonance mechanism. Multivoxel pattern analysis (MVPA) revealed significant decoding of PLDs and fully visible grasping observation conditions in occipital, parietal, and premotor areas belonging to eAON. Data show that kinematic features conveyed by PLDs stimuli are sufficient to elicit a complete action representation, suggesting that these features can be disentangled within the eAON from the features usually characterizing fully visible actions. PLDs stimuli could be useful in assessing which areas are recruited, when only kinematic cues are available, for action recognition, imitation, and motor learning.
Collapse
Affiliation(s)
| | - Antonino Errante
- Department of Medicine and Surgery, University of Parma, Parma, Italy
| | - Leonardo Fogassi
- Department of Medicine and Surgery, University of Parma, Parma, Italy
| |
Collapse
|
5
|
Heinke D, Leonardis A, Leek EC. What do deep neural networks tell us about biological vision? Vision Res 2022; 198:108069. [PMID: 35561463 DOI: 10.1016/j.visres.2022.108069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Dietmar Heinke
- School of Psychology, University of Birmingham, United Kingdom.
| | - Ales Leonardis
- School of Computer Science, University of Birmingham, United Kingdom
| | - E Charles Leek
- Department of Psychology, University of Liverpool, United Kingdom
| |
Collapse
|