1
|
Peng Y, Gong X, Lu H, Fang F. Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers. J Cogn Neurosci 2024; 36:2458-2480. [PMID: 39106158 DOI: 10.1162/jocn_a_02233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.
Collapse
Affiliation(s)
- Yujia Peng
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- Institute for Artificial Intelligence, Peking University, Beijing, People's Republic of China
- National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence, Beijing, China
- Department of Psychology, University of California, Los Angeles
| | - Xizi Gong
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
| | - Hongjing Lu
- Department of Psychology, University of California, Los Angeles
- Department of Statistics, University of California, Los Angeles
| | - Fang Fang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing, People's Republic of China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, People's Republic of China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, People's Republic of China
| |
Collapse
|
2
|
Conwell C, Prince JS, Kay KN, Alvarez GA, Konkle T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nat Commun 2024; 15:9383. [PMID: 39477923 PMCID: PMC11526138 DOI: 10.1038/s41467-024-53147-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/01/2024] [Indexed: 11/02/2024] Open
Abstract
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.
Collapse
Affiliation(s)
- Colin Conwell
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Kendrick N Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - George A Alvarez
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
3
|
Pavuluri A, Kohn A. The representational geometry for naturalistic textures in macaque V1 and V2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.18.619102. [PMID: 39484570 PMCID: PMC11526966 DOI: 10.1101/2024.10.18.619102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Our understanding of visual cortical processing has relied primarily on studying the selectivity of individual neurons in different areas. A complementary approach is to study how the representational geometry of neuronal populations differs across areas. Though the geometry is derived from individual neuronal selectivity, it can reveal encoding strategies difficult to infer from single neuron responses. In addition, recent theoretical work has begun to relate distinct functional objectives to different representational geometries. To understand how the representational geometry changes across stages of processing, we measured neuronal population responses in primary visual cortex (V1) and area V2 of macaque monkeys to an ensemble of synthetic, naturalistic textures. Responses were lower dimensional in V2 than V1, and there was a better alignment of V2 population responses to different textures. The representational geometry in V2 afforded better discriminability between out-of-sample textures. We performed complementary analyses of standard convolutional network models, which did not replicate the representational geometry of cortex. We conclude that there is a shift in the representational geometry between V1 and V2, with the V2 representation exhibiting features of a low-dimensional, systematic encoding of different textures and of different instantiations of each texture. Our results suggest that comparisons of representational geometry can reveal important transformations that occur across successive stages of visual processing.
Collapse
|
4
|
Susan S. Neuroscientific insights about computer vision models: a concise review. BIOLOGICAL CYBERNETICS 2024:10.1007/s00422-024-00998-9. [PMID: 39382577 DOI: 10.1007/s00422-024-00998-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 09/12/2024] [Indexed: 10/10/2024]
Abstract
The development of biologically-inspired computational models has been the focus of study ever since the artificial neuron was introduced by McCulloch and Pitts in 1943. However, a scrutiny of literature reveals that most attempts to replicate the highly efficient and complex biological visual system have been futile or have met with limited success. The recent state-of the-art computer vision models, such as pre-trained deep neural networks and vision transformers, may not be biologically inspired per se. Nevertheless, certain aspects of biological vision are still found embedded, knowingly or unknowingly, in the architecture and functioning of these models. This paper explores several principles related to visual neuroscience and the biological visual pathway that resonate, in some manner, in the architectural design and functioning of contemporary computer vision models. The findings of this survey can provide useful insights for building futuristic bio-inspired computer vision models. The survey is conducted from a historical perspective, tracing the biological connections of computer vision models starting with the basic artificial neuron to modern technologies such as deep convolutional neural network (CNN) and spiking neural networks (SNN). One spotlight of the survey is a discussion on biologically plausible neural networks and bio-inspired unsupervised learning mechanisms adapted for computer vision tasks in recent times.
Collapse
Affiliation(s)
- Seba Susan
- Department of Information Technology, Delhi Technological University, Delhi, India.
| |
Collapse
|
5
|
Cocuzza CV, Sanchez-Romero R, Ito T, Mill RD, Keane BP, Cole MW. Distributed network flows generate localized category selectivity in human visual cortex. PLoS Comput Biol 2024; 20:e1012507. [PMID: 39436929 PMCID: PMC11530028 DOI: 10.1371/journal.pcbi.1012507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 11/01/2024] [Accepted: 09/24/2024] [Indexed: 10/25/2024] Open
Abstract
A central goal of neuroscience is to understand how function-relevant brain activations are generated. Here we test the hypothesis that function-relevant brain activations are generated primarily by distributed network flows. We focused on visual processing in human cortex, given the long-standing literature supporting the functional relevance of brain activations in visual cortex regions exhibiting visual category selectivity. We began by using fMRI data from N = 352 human participants to identify category-specific responses in visual cortex for images of faces, places, body parts, and tools. We then systematically tested the hypothesis that distributed network flows can generate these localized visual category selective responses. This was accomplished using a recently developed approach for simulating - in a highly empirically constrained manner - the generation of task-evoked brain activations by modeling activity flowing over intrinsic brain connections. We next tested refinements to our hypothesis, focusing on how stimulus-driven network interactions initialized in V1 generate downstream visual category selectivity. We found evidence that network flows directly from V1 were sufficient for generating visual category selectivity, but that additional, globally distributed (whole-cortex) network flows increased category selectivity further. Using null network architectures we also found that each region's unique intrinsic "connectivity fingerprint" was key to the generation of category selectivity. These results generalized across regions associated with all four visual categories tested (bodies, faces, places, and tools), and provide evidence that the human brain's intrinsic network organization plays a prominent role in the generation of functionally relevant, localized responses.
Collapse
Affiliation(s)
- Carrisa V. Cocuzza
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, New Jersey, United States of America
- Behavioral and Neural Sciences PhD Program, Rutgers University, Newark, New Jersey, United States of America
- Department of Psychology, Yale University, New Haven, Connecticut, United States of America
- Department of Psychiatry, Brain Health Institute, Rutgers University, Piscataway, New Jersey, United States of America
| | - Ruben Sanchez-Romero
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, New Jersey, United States of America
| | - Takuya Ito
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Ravi D. Mill
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, New Jersey, United States of America
| | - Brian P. Keane
- Department of Psychiatry and Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
- Center for Visual Science, University of Rochester, Rochester, New York, United States of America
- Department of Brain and Cognitive Science, University of Rochester, Rochester, New York, United States of America
| | - Michael W. Cole
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, New Jersey, United States of America
| |
Collapse
|
6
|
Petilli MA, Rodio FM, Günther F, Marelli M. Visual search and real-image similarity: An empirical assessment through the lens of deep learning. Psychon Bull Rev 2024:10.3758/s13423-024-02583-4. [PMID: 39327401 DOI: 10.3758/s13423-024-02583-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 09/28/2024]
Abstract
The ability to predict how efficiently a person finds an object in the environment is a crucial goal of attention research. Central to this issue are the similarity principles initially proposed by Duncan and Humphreys, which outline how the similarity between target and distractor objects (TD) and between distractor objects themselves (DD) affect search efficiency. However, the search principles lack direct quantitative support from an ecological perspective, being a summary approximation of a wide range of lab-based results poorly generalisable to real-world scenarios. This study exploits deep convolutional neural networks to predict human search efficiency from computational estimates of similarity between objects populating, potentially, any visual scene. Our results provide ecological evidence supporting the similarity principles: search performance continuously varies across tasks and conditions and improves with decreasing TD similarity and increasing DD similarity. Furthermore, our results reveal a crucial dissociation: TD and DD similarities mainly operate at two distinct layers of the network: DD similarity at the intermediate layers of coarse object features and TD similarity at the final layers of complex features used for classification. This suggests that these different similarities exert their major effects at two distinct perceptual levels and demonstrates our methodology's potential to offer insights into the depth of visual processing on which the search relies. By combining computational techniques with visual search principles, this approach aligns with modern trends in other research areas and fulfils longstanding demands for more ecologically valid research in the field of visual search.
Collapse
Affiliation(s)
- Marco A Petilli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy.
| | - Francesca M Rodio
- Institute for Advanced Studies, IUSS, Pavia, Italy
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Fritz Günther
- Department of Psychology, Humboldt University at Berlin, Berlin, Germany
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy
- NeuroMI, Milan Center for Neuroscience, Milan, Italy
| |
Collapse
|
7
|
Walbrin J, Sossounov N, Mahdiani M, Vaz I, Almeida J. Fine-grained knowledge about manipulable objects is well-predicted by contrastive language image pre-training. iScience 2024; 27:110297. [PMID: 39040066 PMCID: PMC11261149 DOI: 10.1016/j.isci.2024.110297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/23/2024] [Accepted: 06/14/2024] [Indexed: 07/24/2024] Open
Abstract
Object recognition is an important ability that relies on distinguishing between similar objects (e.g., deciding which utensil(s) to use at different stages of meal preparation). Recent work describes the fine-grained organization of knowledge about manipulable objects via the study of the constituent dimensions that are most relevant to human behavior, for example, vision, manipulation, and function-based properties. A logical extension of this work concerns whether or not these dimensions are uniquely human, or can be approximated by deep learning. Here, we show that behavioral dimensions are generally well-predicted by CLIP-ViT - a multimodal network trained on a large and diverse set of image-text pairs. Moreover, this model outperforms comparison networks pre-trained on smaller, image-only datasets. These results demonstrate the impressive capacity of CLIP-ViT to approximate fine-grained object knowledge. We discuss the possible sources of this benefit relative to other models (e.g., multimodal vs. image-only pre-training, dataset size, architecture).
Collapse
Affiliation(s)
- Jon Walbrin
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Nikita Sossounov
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | | | - Igor Vaz
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Jorge Almeida
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
8
|
Miao HY, Tong F. Convolutional neural network models applied to neuronal responses in macaque V1 reveal limited nonlinear processing. J Vis 2024; 24:1. [PMID: 38829629 PMCID: PMC11156204 DOI: 10.1167/jov.24.6.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/03/2024] [Indexed: 06/05/2024] Open
Abstract
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple nonlinearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more nonlinear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven nonlinear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although the predictive accuracy of VGG-19 was somewhat better than that of standard AlexNet, we found that a modified version of AlexNet could match the performance of VGG-19 after only a few nonlinear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for nonlinear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few nonlinear processing stages.
Collapse
Affiliation(s)
- Hui-Yuan Miao
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| | - Frank Tong
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
9
|
Xie H, Song C, Jian L, Guo Y, Li M, Luo J, Li Q, Tan T. A deep learning-based radiomics model for predicting lymph node status from lung adenocarcinoma. BMC Med Imaging 2024; 24:121. [PMID: 38789936 PMCID: PMC11127329 DOI: 10.1186/s12880-024-01300-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 05/14/2024] [Indexed: 05/26/2024] Open
Abstract
OBJECTIVES At present, there are many limitations in the evaluation of lymph node metastasis of lung adenocarcinoma. Currently, there is a demand for a safe and accurate method to predict lymph node metastasis of lung cancer. In this study, radiomics was used to accurately predict the lymph node status of lung adenocarcinoma patients based on contrast-enhanced CT. METHODS A total of 503 cases that fulfilled the analysis requirements were gathered from two distinct hospitals. Among these, 287 patients exhibited lymph node metastasis (LNM +) while 216 patients were confirmed to be without lymph node metastasis (LNM-). Using both traditional and deep learning methods, 22,318 features were extracted from the segmented images of each patient's enhanced CT. Then, the spearman test and the least absolute shrinkage and selection operator were used to effectively reduce the dimension of the feature data, enabling us to focus on the most pertinent features and enhance the overall analysis. Finally, the classification model of lung adenocarcinoma lymph node metastasis was constructed by machine learning algorithm. The Accuracy, AUC, Specificity, Precision, Recall and F1 were used to evaluate the efficiency of the model. RESULTS By incorporating a comprehensively selected set of features, the extreme gradient boosting method (XGBoost) effectively distinguished the status of lymph nodes in patients with lung adenocarcinoma. The Accuracy, AUC, Specificity, Precision, Recall and F1 of the prediction model performance on the external test set were 0.765, 0.845, 0.705, 0.784, 0.811 and 0.797, respectively. Moreover, the decision curve analysis, calibration curve and confusion matrix of the model on the external test set all indicated the stability and accuracy of the model. CONCLUSIONS Leveraging enhanced CT images, our study introduces a noninvasive classification prediction model based on the extreme gradient boosting method. This approach exhibits remarkable precision in identifying the lymph node status of lung adenocarcinoma patients, offering a safe and accurate alternative to invasive procedures. By providing clinicians with a reliable tool for diagnosing and assessing disease progression, our method holds the potential to significantly improve patient outcomes and enhance the overall quality of clinical practice.
Collapse
Affiliation(s)
- Hui Xie
- Department of Radiation Oncology, Affiliated Hospital (Clinical College) of Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, People's Republic of China
| | - Chaoling Song
- School of Medical Imaging, Laboratory Science and Rehabilitation, Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
| | - Lei Jian
- School of Medical Imaging, Laboratory Science and Rehabilitation, Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
| | - Yeang Guo
- School of Medical Imaging, Laboratory Science and Rehabilitation, Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
| | - Mei Li
- School of Medical Imaging, Laboratory Science and Rehabilitation, Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
| | - Jiang Luo
- School of Medical Imaging, Laboratory Science and Rehabilitation, Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
| | - Qing Li
- Department of Radiation Oncology, Affiliated Hospital (Clinical College) of Xiangnan University, Chenzhou, Hunan province, 423000, People's Republic of China
| | - Tao Tan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, People's Republic of China.
- Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Nijmegen, Netherlands.
| |
Collapse
|
10
|
Morales-Torres R, Wing EA, Deng L, Davis SW, Cabeza R. Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations. J Neurosci 2024; 44:e1479232024. [PMID: 38569925 PMCID: PMC11112637 DOI: 10.1523/jneurosci.1479-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 04/05/2024] Open
Abstract
When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.
Collapse
Affiliation(s)
| | - Erik A Wing
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
| | - Lifu Deng
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
- Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
11
|
Caplette L, Turk-Browne NB. Computational reconstruction of mental representations using human behavior. Nat Commun 2024; 15:4183. [PMID: 38760341 PMCID: PMC11101448 DOI: 10.1038/s41467-024-48114-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 04/19/2024] [Indexed: 05/19/2024] Open
Abstract
Revealing how the mind represents information is a longstanding goal of cognitive science. However, there is currently no framework for reconstructing the broad range of mental representations that humans possess. Here, we ask participants to indicate what they perceive in images made of random visual features in a deep neural network. We then infer associations between the semantic features of their responses and the visual features of the images. This allows us to reconstruct the mental representations of multiple visual concepts, both those supplied by participants and other concepts extrapolated from the same semantic space. We validate these reconstructions in separate participants and further generalize our approach to predict behavior for new stimuli and in a new task. Finally, we reconstruct the mental representations of individual observers and of a neural network. This framework enables a large-scale investigation of conceptual representations.
Collapse
Affiliation(s)
| | - Nicholas B Turk-Browne
- Department of Psychology, Yale University, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| |
Collapse
|
12
|
Revsine C, Gonzalez-Castillo J, Merriam EP, Bandettini PA, Ramírez FM. A Unifying Model for Discordant and Concordant Results in Human Neuroimaging Studies of Facial Viewpoint Selectivity. J Neurosci 2024; 44:e0296232024. [PMID: 38438256 PMCID: PMC11044116 DOI: 10.1523/jneurosci.0296-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 02/06/2024] [Accepted: 02/15/2024] [Indexed: 03/06/2024] Open
Abstract
Recognizing faces regardless of their viewpoint is critical for social interactions. Traditional theories hold that view-selective early visual representations gradually become tolerant to viewpoint changes along the ventral visual hierarchy. Newer theories, based on single-neuron monkey electrophysiological recordings, suggest a three-stage architecture including an intermediate face-selective patch abruptly achieving invariance to mirror-symmetric face views. Human studies combining neuroimaging and multivariate pattern analysis (MVPA) have provided convergent evidence of view selectivity in early visual areas. However, contradictory conclusions have been reached concerning the existence in humans of a mirror-symmetric representation like that observed in macaques. We believe these contradictions arise from low-level stimulus confounds and data analysis choices. To probe for low-level confounds, we analyzed images from two face databases. Analyses of image luminance and contrast revealed biases across face views described by even polynomials-i.e., mirror-symmetric. To explain major trends across neuroimaging studies, we constructed a network model incorporating three constraints: cortical magnification, convergent feedforward projections, and interhemispheric connections. Given the identified low-level biases, we show that a gradual increase of interhemispheric connections across network-layers is sufficient to replicate view-tuning in early processing stages and mirror-symmetry in later stages. Data analysis decisions-pattern dissimilarity measure and data recentering-accounted for the inconsistent observation of mirror-symmetry across prior studies. Pattern analyses of human fMRI data (of either sex) revealed biases compatible with our model. The model provides a unifying explanation of MVPA studies of viewpoint selectivity and suggests observations of mirror-symmetry originate from ineffectively normalized signal imbalances across different face views.
Collapse
Affiliation(s)
- Cambria Revsine
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
- Department of Psychology, University of Chicago, Chicago, Illinois 60637
| | - Javier Gonzalez-Castillo
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
| | - Elisha P Merriam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
| | - Peter A Bandettini
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
- Functional MRI Core Facility, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
| | - Fernando M Ramírez
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892
| |
Collapse
|
13
|
Garlichs A, Blank H. Prediction error processing and sharpening of expected information across the face-processing hierarchy. Nat Commun 2024; 15:3407. [PMID: 38649694 PMCID: PMC11035707 DOI: 10.1038/s41467-024-47749-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
The perception and neural processing of sensory information are strongly influenced by prior expectations. The integration of prior and sensory information can manifest through distinct underlying mechanisms: focusing on unexpected input, denoted as prediction error (PE) processing, or amplifying anticipated information via sharpened representation. In this study, we employed computational modeling using deep neural networks combined with representational similarity analyses of fMRI data to investigate these two processes during face perception. Participants were cued to see face images, some generated by morphing two faces, leading to ambiguity in face identity. We show that expected faces were identified faster and perception of ambiguous faces was shifted towards priors. Multivariate analyses uncovered evidence for PE processing across and beyond the face-processing hierarchy from the occipital face area (OFA), via the fusiform face area, to the anterior temporal lobe, and suggest sharpened representations in the OFA. Our findings support the proposition that the brain represents faces grounded in prior expectations.
Collapse
Affiliation(s)
- Annika Garlichs
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| | - Helen Blank
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| |
Collapse
|
14
|
Jiang C, Chen Z, Wolfe JM. Toward viewing behavior for aerial scene categorization. Cogn Res Princ Implic 2024; 9:17. [PMID: 38530617 PMCID: PMC10965882 DOI: 10.1186/s41235-024-00541-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/07/2024] [Indexed: 03/28/2024] Open
Abstract
Previous work has demonstrated similarities and differences between aerial and terrestrial image viewing. Aerial scene categorization, a pivotal visual processing task for gathering geoinformation, heavily depends on rotation-invariant information. Aerial image-centered research has revealed effects of low-level features on performance of various aerial image interpretation tasks. However, there are fewer studies of viewing behavior for aerial scene categorization and of higher-level factors that might influence that categorization. In this paper, experienced subjects' eye movements were recorded while they were asked to categorize aerial scenes. A typical viewing center bias was observed. Eye movement patterns varied among categories. We explored the relationship of nine image statistics to observers' eye movements. Results showed that if the images were less homogeneous, and/or if they contained fewer or no salient diagnostic objects, viewing behavior became more exploratory. Higher- and object-level image statistics were predictive at both the image and scene category levels. Scanpaths were generally organized and small differences in scanpath randomness could be roughly captured by critical object saliency. Participants tended to fixate on critical objects. Image statistics included in this study showed rotational invariance. The results supported our hypothesis that the availability of diagnostic objects strongly influences eye movements in this task. In addition, this study provides supporting evidence for Loschky et al.'s (Journal of Vision, 15(6), 11, 2015) speculation that aerial scenes are categorized on the basis of image parts and individual objects. The findings were discussed in relation to theories of scene perception and their implications for automation development.
Collapse
Affiliation(s)
- Chenxi Jiang
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei, China
| | - Zhenzhong Chen
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei, China.
- Hubei Luojia Laboratory, Wuhan, Hubei, China.
| | - Jeremy M Wolfe
- Harvard Medical School, Boston, MA, USA
- Brigham & Women's Hospital, Boston, MA, USA
| |
Collapse
|
15
|
Noda T, Aschauer DF, Chambers AR, Seiler JPH, Rumpel S. Representational maps in the brain: concepts, approaches, and applications. Front Cell Neurosci 2024; 18:1366200. [PMID: 38584779 PMCID: PMC10995314 DOI: 10.3389/fncel.2024.1366200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 03/08/2024] [Indexed: 04/09/2024] Open
Abstract
Neural systems have evolved to process sensory stimuli in a way that allows for efficient and adaptive behavior in a complex environment. Recent technological advances enable us to investigate sensory processing in animal models by simultaneously recording the activity of large populations of neurons with single-cell resolution, yielding high-dimensional datasets. In this review, we discuss concepts and approaches for assessing the population-level representation of sensory stimuli in the form of a representational map. In such a map, not only are the identities of stimuli distinctly represented, but their relational similarity is also mapped onto the space of neuronal activity. We highlight example studies in which the structure of representational maps in the brain are estimated from recordings in humans as well as animals and compare their methodological approaches. Finally, we integrate these aspects and provide an outlook for how the concept of representational maps could be applied to various fields in basic and clinical neuroscience.
Collapse
Affiliation(s)
- Takahiro Noda
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University-Mainz, Mainz, Germany
| | - Dominik F. Aschauer
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University-Mainz, Mainz, Germany
| | - Anna R. Chambers
- Department of Otolaryngology – Head and Neck Surgery, Harvard Medical School, Boston, MA, United States
- Eaton Peabody Laboratories, Massachusetts Eye and Ear Infirmary, Boston, MA, United States
| | - Johannes P.-H. Seiler
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University-Mainz, Mainz, Germany
| | - Simon Rumpel
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University-Mainz, Mainz, Germany
| |
Collapse
|
16
|
Wang H, Liu Q, Gui D, Liu Y, Feng X, Qu J, Zhao J, Wei G. Automatedly identify dryland threatened species at large scale by using deep learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 917:170375. [PMID: 38280598 DOI: 10.1016/j.scitotenv.2024.170375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/27/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024]
Abstract
Dryland biodiversity is decreasing at an alarming rate. Advanced intelligent tools are urgently needed to rapidly, automatedly, and precisely detect dryland threatened species on a large scale for biological conservation. Here, we explored the performance of three deep convolutional neural networks (Deeplabv3+, Unet, and Pspnet models) on the intelligent recognition of rare species based on high-resolution (0.3 m) satellite images taken by an unmanned aerial vehicle (UAV). We focused on a threatened species, Populus euphratica, in the Tarim River Basin (China), where there has been a severe population decline in the 1970s and restoration has been carried out since 2000. The testing results showed that Unet outperforms Deeplabv3+ and Pspnet when the training samples are lower, while Deeplabv3+ performs best as the dataset increases. Overall, when training samples are 80, Deeplabv3+ had the best overall performance for Populus euphratica identification, with mean pixel accuracy (MPA) between 87.31 % and 90.2 %, which, on average is 3.74 % and 11.29 % higher than Unet and Pspnet, respectively. Deeplabv3+ can accurately detect the boundaries of Populus euphratica even in areas of dense vegetation, with lower identification uncertainty for each pixel than other models. This study developed a UAV imagery-based identification framework using deep learning with high resolution in large-scale regions. This approach can accurately capture the variation in dryland threatened species, especially those in inaccessible areas, thereby fostering rapid and efficient conservation actions.
Collapse
Affiliation(s)
- Haolin Wang
- State Key Laboratory of Desert and Oasis Ecology, Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; College of Mathematics and System Sciences, Xinjiang University, Urumqi 830017, China
| | - Qi Liu
- State Key Laboratory of Desert and Oasis Ecology, Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Cele National Station of Observation & Research for Desert Grassland Ecosystem in Xinjiang, Cele 848300, China.
| | - Dongwei Gui
- State Key Laboratory of Desert and Oasis Ecology, Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Cele National Station of Observation & Research for Desert Grassland Ecosystem in Xinjiang, Cele 848300, China; University of Chinese Academy of Sciences, Beijing 101408, China
| | - Yunfei Liu
- State Key Laboratory of Desert and Oasis Ecology, Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Cele National Station of Observation & Research for Desert Grassland Ecosystem in Xinjiang, Cele 848300, China
| | - Xinlong Feng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi 830017, China
| | - Jia Qu
- State Key Laboratory of Desert and Oasis Ecology, Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; College of Mathematics and System Sciences, Xinjiang University, Urumqi 830017, China
| | - Jianping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi 830017, China
| | - Guanghui Wei
- Xinjiang Tarim River Basin Management Bureau, Korla 841000, China
| |
Collapse
|
17
|
Li W, Li J, Chu C, Cao D, Shi W, Zhang Y, Jiang T. Common Sequential Organization of Face Processing in the Human Brain and Convolutional Neural Networks. Neuroscience 2024; 541:1-13. [PMID: 38266906 DOI: 10.1016/j.neuroscience.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/11/2024] [Accepted: 01/16/2024] [Indexed: 01/26/2024]
Abstract
Face processing includes two crucial processing levels - face detection and face recognition. However, it remains unclear how human brains organize the two processing levels sequentially. While some studies found that faces are recognized as fast as they are detected, others have reported that faces are detected first, followed by recognition. We discriminated the two processing levels on a fine time scale by combining human intracranial EEG (two females, three males, and three subjects without reported sex information) and representation similarity analysis. Our results demonstrate that the human brain exhibits a "detection-first, recognition-later" pattern during face processing. In addition, we used convolutional neural networks to test the hypothesis that the sequential organization of the two face processing levels in the brain reflects computational optimization. Our findings showed that the networks trained on face recognition also exhibited the "detection-first, recognition-later" pattern. Moreover, this sequential organization mechanism developed gradually during the training of the networks and was observed only for correctly predicted images. These findings collectively support the computational account as to why the brain organizes them in this way.
Collapse
Affiliation(s)
- Wenlu Li
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jin Li
- School of Psychology, Capital Normal University, Beijing 100048, China
| | - Congying Chu
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Dan Cao
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Weiyang Shi
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Yu Zhang
- Research Center for Augmented Intelligence, Zhejiang Lab, Hangzhou 311100, China
| | - Tianzi Jiang
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Research Center for Augmented Intelligence, Zhejiang Lab, Hangzhou 311100, China; Xiaoxiang Institute for Brain Health and Yongzhou Central Hospital, Yongzhou 425000, Hunan Province, China.
| |
Collapse
|
18
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. Nat Commun 2024; 15:1989. [PMID: 38443349 PMCID: PMC10915141 DOI: 10.1038/s41467-024-45679-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 01/30/2024] [Indexed: 03/07/2024] Open
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide multi-faceted neurocomputational evidence that blurry visual experiences may be critical for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea.
| | - Frank Tong
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
19
|
Nara S, Kaiser D. Integrative processing in artificial and biological vision predicts the perceived beauty of natural images. SCIENCE ADVANCES 2024; 10:eadi9294. [PMID: 38427730 PMCID: PMC10906925 DOI: 10.1126/sciadv.adi9294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 01/29/2024] [Indexed: 03/03/2024]
Abstract
Previous research shows that the beauty of natural images is already determined during perceptual analysis. However, it is unclear which perceptual computations give rise to the perception of beauty. Here, we tested whether perceived beauty is predicted by spatial integration across an image, a perceptual computation that reduces processing demands by aggregating image parts into more efficient representations of the whole. We quantified integrative processing in an artificial deep neural network model, where the degree of integration was determined by the amount of deviation between activations for the whole image and its constituent parts. This quantification of integration predicted beauty ratings for natural images across four studies with different stimuli and designs. In a complementary functional magnetic resonance imaging study, we show that integrative processing in human visual cortex similarly predicts perceived beauty. Together, our results establish integration as a computational principle that facilitates perceptual analysis and thereby mediates the perception of beauty.
Collapse
Affiliation(s)
- Sanjeev Nara
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, Gießen Germany
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, Gießen Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, Marburg, Germany
| |
Collapse
|
20
|
Huang S, Howard CM, Hovhannisyan M, Ritchey M, Cabeza R, Davis SW. Hippocampal Functions Modulate Transfer-Appropriate Cortical Representations Supporting Subsequent Memory. J Neurosci 2024; 44:e1135232023. [PMID: 38050089 PMCID: PMC10851689 DOI: 10.1523/jneurosci.1135-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/12/2023] [Accepted: 10/14/2023] [Indexed: 12/06/2023] Open
Abstract
The hippocampus plays a central role as a coordinate system or index of information stored in neocortical loci. Nonetheless, it remains unclear how hippocampal processes integrate with cortical information to facilitate successful memory encoding. Thus, the goal of the current study was to identify specific hippocampal-cortical interactions that support object encoding. We collected fMRI data while 19 human participants (7 female and 12 male) encoded images of real-world objects and tested their memory for object concepts and image exemplars (i.e., conceptual and perceptual memory). Representational similarity analysis revealed robust representations of visual and semantic information in canonical visual (e.g., occipital cortex) and semantic (e.g., angular gyrus) regions in the cortex, but not in the hippocampus. Critically, hippocampal functions modulated the mnemonic impact of cortical representations that are most pertinent to future memory demands, or transfer-appropriate representations Subsequent perceptual memory was best predicted by the strength of visual representations in ventromedial occipital cortex in coordination with hippocampal activity and pattern information during encoding. In parallel, subsequent conceptual memory was best predicted by the strength of semantic representations in left inferior frontal gyrus and angular gyrus in coordination with either hippocampal activity or semantic representational strength during encoding. We found no evidence for transfer-incongruent hippocampal-cortical interactions supporting subsequent memory (i.e., no hippocampal interactions with cortical visual/semantic representations supported conceptual/perceptual memory). Collectively, these results suggest that diverse hippocampal functions flexibly modulate cortical representations of object properties to satisfy distinct future memory demands.Significance Statement The hippocampus is theorized to index pieces of information stored throughout the cortex to support episodic memory. Yet how hippocampal processes integrate with cortical representation of stimulus information remains unclear. Using fMRI, we examined various forms of hippocampal-cortical interactions during object encoding in relation to subsequent performance on conceptual and perceptual memory tests. Our results revealed novel hippocampal-cortical interactions that utilize semantic and visual representations in transfer-appropriate manners: conceptual memory supported by hippocampal modulation of frontoparietal semantic representations, and perceptual memory supported by hippocampal modulation of occipital visual representations. These findings provide important insights into the neural mechanisms underlying the formation of information-rich episodic memory and underscore the value of studying the flexible interplay between brain regions for complex cognition.
Collapse
Affiliation(s)
- Shenyang Huang
- Department of Psychology & Neuroscience, Duke University, Durham 27708, North Carolina
| | - Cortney M Howard
- Department of Psychology & Neuroscience, Duke University, Durham 27708, North Carolina
| | | | - Maureen Ritchey
- Department of Psychology, Boston College, 02467 Massachusetts
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham 27708, North Carolina
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham 27708, North Carolina
- Department of Neurology, Duke University School of Medicine, Durham 27708, North Carolina
| |
Collapse
|
21
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
22
|
Pham TQ, Matsui T, Chikazoe J. Evaluation of the Hierarchical Correspondence between the Human Brain and Artificial Neural Networks: A Review. BIOLOGY 2023; 12:1330. [PMID: 37887040 PMCID: PMC10604784 DOI: 10.3390/biology12101330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/22/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023]
Abstract
Artificial neural networks (ANNs) that are heavily inspired by the human brain now achieve human-level performance across multiple task domains. ANNs have thus drawn attention in neuroscience, raising the possibility of providing a framework for understanding the information encoded in the human brain. However, the correspondence between ANNs and the brain cannot be measured directly. They differ in outputs and substrates, neurons vastly outnumber their ANN analogs (i.e., nodes), and the key algorithm responsible for most of modern ANN training (i.e., backpropagation) is likely absent from the brain. Neuroscientists have thus taken a variety of approaches to examine the similarity between the brain and ANNs at multiple levels of their information hierarchy. This review provides an overview of the currently available approaches and their limitations for evaluating brain-ANN correspondence.
Collapse
Affiliation(s)
| | - Teppei Matsui
- Graduate School of Brain Science, Doshisha University, Kyoto 610-0321, Japan
| | | |
Collapse
|
23
|
Miao HY, Tong F. Convolutional neural network models of neuronal responses in macaque V1 reveal limited non-linear processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.26.554952. [PMID: 37693397 PMCID: PMC10491131 DOI: 10.1101/2023.08.26.554952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple non-linearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more non-linear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower-layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven non-linear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although VGG-19's predictive accuracy was somewhat better than standard AlexNet, we found that a modified version of AlexNet could match VGG-19's performance after only a few non-linear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for non-linear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few non-linear processing stages.
Collapse
Affiliation(s)
- Hui-Yuan Miao
- Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
| | - Frank Tong
- Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
- Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, 37240, USA
| |
Collapse
|
24
|
Wang C, Yan H, Huang W, Sheng W, Wang Y, Fan YS, Liu T, Zou T, Li R, Chen H. Neural encoding with unsupervised spiking convolutional neural network. Commun Biol 2023; 6:880. [PMID: 37640808 PMCID: PMC10462614 DOI: 10.1038/s42003-023-05257-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 08/18/2023] [Indexed: 08/31/2023] Open
Abstract
Accurately predicting the brain responses to various stimuli poses a significant challenge in neuroscience. Despite recent breakthroughs in neural encoding using convolutional neural networks (CNNs) in fMRI studies, there remain critical gaps between the computational rules of traditional artificial neurons and real biological neurons. To address this issue, a spiking CNN (SCNN)-based framework is presented in this study to achieve neural encoding in a more biologically plausible manner. The framework utilizes unsupervised SCNN to extract visual features of image stimuli and employs a receptive field-based regression algorithm to predict fMRI responses from the SCNN features. Experimental results on handwritten characters, handwritten digits and natural images demonstrate that the proposed approach can achieve remarkably good encoding performance and can be utilized for "brain reading" tasks such as image reconstruction and identification. This work suggests that SNN can serve as a promising tool for neural encoding.
Collapse
Affiliation(s)
- Chong Wang
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hongmei Yan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Wei Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Wei Sheng
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuting Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yun-Shuang Fan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Tao Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Ting Zou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Rong Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Huafu Chen
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 611731, China.
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| |
Collapse
|
25
|
Schütt HH, Kipnis AD, Diedrichsen J, Kriegeskorte N. Statistical inference on representational geometries. eLife 2023; 12:e82566. [PMID: 37610302 PMCID: PMC10446828 DOI: 10.7554/elife.82566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 08/07/2023] [Indexed: 08/24/2023] Open
Abstract
Neuroscience has recently made much progress, expanding the complexity of both neural activity measurements and brain-computational models. However, we lack robust methods for connecting theory and experiment by evaluating our new big models with our new big data. Here, we introduce new inference methods enabling researchers to evaluate and compare models based on the accuracy of their predictions of representational geometries: A good model should accurately predict the distances among the neural population representations (e.g. of a set of stimuli). Our inference methods combine novel 2-factor extensions of crossvalidation (to prevent overfitting to either subjects or conditions from inflating our estimates of model accuracy) and bootstrapping (to enable inferential model comparison with simultaneous generalization to both new subjects and new conditions). We validate the inference methods on data where the ground-truth model is known, by simulating data with deep neural networks and by resampling of calcium-imaging and functional MRI data. Results demonstrate that the methods are valid and conclusions generalize correctly. These data analysis methods are available in an open-source Python toolbox (rsatoolbox.readthedocs.io).
Collapse
Affiliation(s)
- Heiko H Schütt
- Zuckerman Institute, Columbia UniversityNew YorkUnited States
| | | | | | | |
Collapse
|
26
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.29.551089. [PMID: 37577646 PMCID: PMC10418076 DOI: 10.1101/2023.07.29.551089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide novel neurocomputational evidence that blurry visual experiences are very important for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology and Vanderbilt Vision Research Center Vanderbilt University
| | - Frank Tong
- Department of Psychology and Vanderbilt Vision Research Center Vanderbilt University
| |
Collapse
|
27
|
Schwartz E, Alreja A, Richardson RM, Ghuman A, Anzellotti S. Intracranial Electroencephalography and Deep Neural Networks Reveal Shared Substrates for Representations of Face Identity and Expressions. J Neurosci 2023; 43:4291-4303. [PMID: 37142430 PMCID: PMC10255163 DOI: 10.1523/jneurosci.1277-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/25/2023] [Accepted: 04/17/2023] [Indexed: 05/06/2023] Open
Abstract
According to a classical view of face perception (Bruce and Young, 1986; Haxby et al., 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li et al., 2019), and identity from lateral regions (Anzellotti and Caramazza, 2017). These findings could be reconciled with the classical view if regions specialized for one task (either identity or expression) contain a small amount of information for the other task (that enables above-chance decoding). In this case, we would expect representations in lateral regions to be more similar to representations in deep convolutional neural networks (DCNNs) trained to recognize facial expression than to representations in DCNNs trained to recognize face identity (the converse should hold for ventral regions). We tested this hypothesis by analyzing neural responses to faces varying in identity and expression. Representational dissimilarity matrices (RDMs) computed from human intracranial recordings (n = 11 adults; 7 females) were compared with RDMs from DCNNs trained to label either identity or expression. We found that RDMs from DCNNs trained to recognize identity correlated with intracranial recordings more strongly in all regions tested-even in regions classically hypothesized to be specialized for expression. These results deviate from the classical view, suggesting that face-selective ventral and lateral regions contribute to the representation of both identity and expression.SIGNIFICANCE STATEMENT Previous work proposed that separate brain regions are specialized for the recognition of face identity and facial expression. However, identity and expression recognition mechanisms might share common brain regions instead. We tested these alternatives using deep neural networks and intracranial recordings from face-selective brain regions. Deep neural networks trained to recognize identity and networks trained to recognize expression learned representations that correlate with neural recordings. Identity-trained representations correlated with intracranial recordings more strongly in all regions tested, including regions hypothesized to be expression specialized in the classical hypothesis. These findings support the view that identity and expression recognition rely on common brain regions. This discovery may require reevaluation of the roles that the ventral and lateral neural pathways play in processing socially relevant stimuli.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| | - Arish Alreja
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
| | - R Mark Richardson
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114
- Harvard Medical School, Boston, Massachusetts 02115
| | - Avniel Ghuman
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| |
Collapse
|
28
|
Mocz V, Jeong SK, Chun M, Xu Y. Multiple visual objects are represented differently in the human brain and convolutional neural networks. Sci Rep 2023; 13:9088. [PMID: 37277406 PMCID: PMC10241785 DOI: 10.1038/s41598-023-36029-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 05/27/2023] [Indexed: 06/07/2023] Open
Abstract
Objects in the real world usually appear with other objects. To form object representations independent of whether or not other objects are encoded concurrently, in the primate brain, responses to an object pair are well approximated by the average responses to each constituent object shown alone. This is found at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in fMRI voxel response patterns in human ventral object processing regions (e.g., LO). Here, we compare how the human brain and convolutional neural networks (CNNs) represent paired objects. In human LO, we show that averaging exists in both single fMRI voxels and voxel population responses. However, in the higher layers of five CNNs pretrained for object classification varying in architecture, depth and recurrent processing, slope distribution across units and, consequently, averaging at the population level both deviated significantly from the brain data. Object representations thus interact with each other in CNNs when objects are shown together and differ from when objects are shown individually. Such distortions could significantly limit CNNs' ability to generalize object representations formed in different contexts.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, Cheongju, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA.
| |
Collapse
|
29
|
Taylor J, Xu Y. Comparing the Dominance of Color and Form Information across the Human Ventral Visual Pathway and Convolutional Neural Networks. J Cogn Neurosci 2023; 35:816-840. [PMID: 36877074 PMCID: PMC11283826 DOI: 10.1162/jocn_a_01979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
Color and form information can be decoded in every region of the human ventral visual hierarchy, and at every layer of many convolutional neural networks (CNNs) trained to recognize objects, but how does the coding strength of these features vary over processing? Here, we characterize for these features both their absolute coding strength-how strongly each feature is represented independent of the other feature-and their relative coding strength-how strongly each feature is encoded relative to the other, which could constrain how well a feature can be read out by downstream regions across variation in the other feature. To quantify relative coding strength, we define a measure called the form dominance index that compares the relative influence of color and form on the representational geometry at each processing stage. We analyze brain and CNN responses to stimuli varying based on color and either a simple form feature, orientation, or a more complex form feature, curvature. We find that while the brain and CNNs largely differ in how the absolute coding strength of color and form vary over processing, comparing them in terms of their relative emphasis of these features reveals a striking similarity: For both the brain and for CNNs trained for object recognition (but not for untrained CNNs), orientation information is increasingly de-emphasized, and curvature information is increasingly emphasized, relative to color information over processing, with corresponding processing stages showing largely similar values of the form dominance index.
Collapse
|
30
|
Sun L, Zhu J, Tan J, Li X, Li R, Deng H, Zhang X, Liu B, Zhu X. Deep learning-assisted automated sewage pipe defect detection for urban water environment management. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 882:163562. [PMID: 37084915 DOI: 10.1016/j.scitotenv.2023.163562] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/13/2023] [Accepted: 04/13/2023] [Indexed: 05/03/2023]
Abstract
A healthy sewage pipe system plays a significant role in urban water management by collecting and transporting wastewater and stormwater, which can be assessed by hydraulic model. However, sewage pipe defects have been observed frequently in recent years during regular pipe maintenance according to the captured interior videos of underground pipes by closed-circuit television (CCTV) robots. In this case, hydraulic model constructed based on a healthy pipe would produce large deviations with that in real hydraulic performance and even be out of work, which can result in unanticipated damages such as blockage collapse or stormwater overflows. Quick defect evaluation and defect quantification are the precondition to achieve risk assessment and model calibration of urban water management, but currently pipe defects assessment still largely relies on technicians to check the CCTV videos/images. An automated sewage pipe defect detection system is necessary to timely determine pipe issues and then rehabilitate or renew sewage pipes, while the rapid development of deep learning especially in recent five years provides a fantastic opportunity to construct automated pipe defect detection system by image recognition. Given the initial success of deep learning application in CCTV interpretation, the review (i) integrated the methodological framework of automated sewage pipe defect detection, including data acquisition, image pre-processing, feature extraction, model construction and evaluation metrics, (ii) discussed the state-of-the-art performance of deep learning in pipe defects classification, location, and severity rating evaluation (e.g., up to ~96 % of accuracy and 140 FPS of processing speed), and (iii) proposed risk assessment and model calibration in urban water management by considering pipe defects. This review introduces a novel practical application-oriented methodology including defect data acquisition by CCTV, model construction by deep learning, and model application, provides references for further improving accuracy and generalization ability of urban water management models in practical application.
Collapse
Affiliation(s)
- Lianpeng Sun
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China
| | - Jinjun Zhu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Jinxin Tan
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xianfeng Li
- School of Computer Science and Engineering, Macau University of Science and Technology, Macau
| | - Ruohong Li
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China
| | - Huanzhong Deng
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xinyang Zhang
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Bingyou Liu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xinzhe Zhu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China.
| |
Collapse
|
31
|
Taylor J, Kriegeskorte N. TorchLens: A Python package for extracting and visualizing hidden activations of PyTorch models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.532916. [PMID: 36993311 PMCID: PMC10055035 DOI: 10.1101/2023.03.16.532916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Deep neural network models (DNNs) are essential to modern AI and provide powerful models of information processing in biological neural networks. Researchers in both neuroscience and engineering are pursuing a better understanding of the internal representations and operations that undergird the successes and failures of DNNs. Neuroscientists additionally evaluate DNNs as models of brain computation by comparing their internal representations to those found in brains. It is therefore essential to have a method to easily and exhaustively extract and characterize the results of the internal operations of any DNN. Many models are implemented in PyTorch, the leading framework for building DNN models. Here we introduce TorchLens , a new open-source Python package for extracting and characterizing hidden-layer activations in PyTorch models. Uniquely among existing approaches to this problem, TorchLens has the following features: (1) it exhaustively extracts the results of all intermediate operations, not just those associated with PyTorch module objects, yielding a full record of every step in the model's computational graph, (2) it provides an intuitive visualization of the model's complete computational graph along with metadata about each computational step in a model's forward pass for further analysis, (3) it contains a built-in validation procedure to algorithmically verify the accuracy of all saved hidden-layer activations, and (4) the approach it uses can be automatically applied to any PyTorch model with no modifications, including models with conditional (if-then) logic in their forward pass, recurrent models, branching models where layer outputs are fed into multiple subsequent layers in parallel, and models with internally generated tensors (e.g., injections of noise). Furthermore, using TorchLens requires minimal additional code, making it easy to incorporate into existing pipelines for model development and analysis, and useful as a pedagogical aid when teaching deep learning concepts. We hope this contribution will help researchers in AI and neuroscience understand the internal representations of DNNs.
Collapse
Affiliation(s)
- JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University (10027)
| | | |
Collapse
|
32
|
Mocz V, Jeong SK, Chun M, Xu Y. Representing Multiple Visual Objects in the Human Brain and Convolutional Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.530472. [PMID: 36909506 PMCID: PMC10002658 DOI: 10.1101/2023.02.28.530472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Objects in the real world often appear with other objects. To recover the identity of an object whether or not other objects are encoded concurrently, in primate object-processing regions, neural responses to an object pair have been shown to be well approximated by the average responses to each constituent object shown alone, indicating the whole is equal to the average of its parts. This is present at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in response patterns of fMRI voxels in human ventral object processing regions (e.g., LO). Here we show that averaging exists in both single fMRI voxels and voxel population responses in human LO, with better averaging in single voxels leading to better averaging in fMRI response patterns, demonstrating a close correspondence of averaging at the fMRI unit and population levels. To understand if a similar averaging mechanism exists in convolutional neural networks (CNNs) pretrained for object classification, we examined five CNNs with varying architecture, depth and the presence/absence of recurrent processing. We observed averaging at the CNN unit level but rarely at the population level, with CNN unit response distribution in most cases did not resemble human LO or macaque IT responses. The whole is thus not equal to the average of its parts in CNNs, potentially rendering the individual objects in a pair less accessible in CNNs during visual processing than they are in the human brain.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
33
|
Revsine C, Gonzalez-Castillo J, Merriam EP, Bandettini PA, Ramírez FM. A unifying model for discordant and concordant results in human neuroimaging studies of facial viewpoint selectivity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.08.527219. [PMID: 36945636 PMCID: PMC10028835 DOI: 10.1101/2023.02.08.527219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Our ability to recognize faces regardless of viewpoint is a key property of the primate visual system. Traditional theories hold that facial viewpoint is represented by view-selective mechanisms at early visual processing stages and that representations become increasingly tolerant to viewpoint changes in higher-level visual areas. Newer theories, based on single-neuron monkey electrophysiological recordings, suggest an additional intermediate processing stage invariant to mirror-symmetric face views. Consistent with traditional theories, human studies combining neuroimaging and multivariate pattern analysis (MVPA) methods have provided evidence of view-selectivity in early visual cortex. However, contradictory results have been reported in higher-level visual areas concerning the existence in humans of mirror-symmetrically tuned representations. We believe these results reflect low-level stimulus confounds and data analysis choices. To probe for low-level confounds, we analyzed images from two popular face databases. Analyses of mean image luminance and contrast revealed biases across face views described by even polynomials-i.e., mirror-symmetric. To explain major trends across human neuroimaging studies of viewpoint selectivity, we constructed a network model that incorporates three biological constraints: cortical magnification, convergent feedforward projections, and interhemispheric connections. Given the identified low-level biases, we show that a gradual increase of interhemispheric connections across network layers is sufficient to replicate findings of mirror-symmetry in high-level processing stages, as well as view-tuning in early processing stages. Data analysis decisions-pattern dissimilarity measure and data recentering-accounted for the variable observation of mirror-symmetry in late processing stages. The model provides a unifying explanation of MVPA studies of viewpoint selectivity. We also show how common analysis choices can lead to erroneous conclusions.
Collapse
Affiliation(s)
- Cambria Revsine
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Department of Psychology, University of Chicago, Chicago, IL
| | - Javier Gonzalez-Castillo
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Elisha P Merriam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Peter A Bandettini
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Functional MRI Core, National Institutes of Health, Bethesda, MD
| | - Fernando M Ramírez
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| |
Collapse
|
34
|
Barry DN, Love BC. A neural network account of memory replay and knowledge consolidation. Cereb Cortex 2022; 33:83-95. [PMID: 35213689 PMCID: PMC9758580 DOI: 10.1093/cercor/bhac054] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 01/25/2022] [Accepted: 01/26/2022] [Indexed: 11/15/2022] Open
Abstract
Replay can consolidate memories through offline neural reactivation related to past experiences. Category knowledge is learned across multiple experiences, and its subsequent generalization is promoted by consolidation and replay during rest and sleep. However, aspects of replay are difficult to determine from neuroimaging studies. We provided insights into category knowledge replay by simulating these processes in a neural network which approximated the roles of the human ventral visual stream and hippocampus. Generative replay, akin to imagining new category instances, facilitated generalization to new experiences. Consolidation-related replay may therefore help to prepare us for the future as much as remember the past. Generative replay was more effective in later network layers functionally similar to the lateral occipital cortex than layers corresponding to early visual cortex, drawing a distinction between neural replay and its relevance to consolidation. Category replay was most beneficial for newly acquired knowledge, suggesting replay helps us adapt to changes in our environment. Finally, we present a novel mechanism for the observation that the brain selectively consolidates weaker information, namely a reinforcement learning process in which categories were replayed according to their contribution to network performance. This reinforces the idea of consolidation-related replay as an active rather than passive process.
Collapse
Affiliation(s)
- Daniel N Barry
- Department of Experimental Psychology, University College London, 26 Bedford Way, London WC1H0AP, UK
| | - Bradley C Love
- Department of Experimental Psychology, University College London, 26 Bedford Way, London WC1H0AP, UK
- The Alan Turing Institute, 96 Euston Road, London NW12DB, UK
| |
Collapse
|
35
|
Ayzenberg V, Behrmann M. Does the brain's ventral visual pathway compute object shape? Trends Cogn Sci 2022; 26:1119-1132. [PMID: 36272937 DOI: 10.1016/j.tics.2022.09.019] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/22/2022] [Accepted: 09/26/2022] [Indexed: 11/11/2022]
Abstract
A rich behavioral literature has shown that human object recognition is supported by a representation of shape that is tolerant to variations in an object's appearance. Such 'global' shape representations are achieved by describing objects via the spatial arrangement of their local features, or structure, rather than by the appearance of the features themselves. However, accumulating evidence suggests that the ventral visual pathway - the primary substrate underlying object recognition - may not represent global shape. Instead, ventral representations may be better described as a basis set of local image features. We suggest that this evidence forces a reevaluation of the role of the ventral pathway in object perception and posits a broader network for shape perception that encompasses contributions from the dorsal pathway.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA; The Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
36
|
Lee J, Jo J, Lee B, Lee JH, Yoon S. Brain-inspired Predictive Coding Improves the Performance of Machine Challenging Tasks. Front Comput Neurosci 2022; 16:1062678. [PMID: 36465966 PMCID: PMC9709416 DOI: 10.3389/fncom.2022.1062678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 10/28/2022] [Indexed: 09/19/2023] Open
Abstract
Backpropagation has been regarded as the most favorable algorithm for training artificial neural networks. However, it has been criticized for its biological implausibility because its learning mechanism contradicts the human brain. Although backpropagation has achieved super-human performance in various machine learning applications, it often shows limited performance in specific tasks. We collectively referred to such tasks as machine-challenging tasks (MCTs) and aimed to investigate methods to enhance machine learning for MCTs. Specifically, we start with a natural question: Can a learning mechanism that mimics the human brain lead to the improvement of MCT performances? We hypothesized that a learning mechanism replicating the human brain is effective for tasks where machine intelligence is difficult. Multiple experiments corresponding to specific types of MCTs where machine intelligence has room to improve performance were performed using predictive coding, a more biologically plausible learning algorithm than backpropagation. This study regarded incremental learning, long-tailed, and few-shot recognition as representative MCTs. With extensive experiments, we examined the effectiveness of predictive coding that robustly outperformed backpropagation-trained networks for the MCTs. We demonstrated that predictive coding-based incremental learning alleviates the effect of catastrophic forgetting. Next, predictive coding-based learning mitigates the classification bias in long-tailed recognition. Finally, we verified that the network trained with predictive coding could correctly predict corresponding targets with few samples. We analyzed the experimental result by drawing analogies between the properties of predictive coding networks and those of the human brain and discussing the potential of predictive coding networks in general machine learning.
Collapse
Affiliation(s)
- Jangho Lee
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
| | - Jeonghee Jo
- Institute of New Media and Communications, Seoul National University, Seoul, South Korea
| | - Byounghwa Lee
- CybreBrain Research Section, Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Jung-Hoon Lee
- CybreBrain Research Section, Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea
| |
Collapse
|
37
|
Mocz V, Vaziri-Pashkam M, Chun M, Xu Y. Predicting Identity-Preserving Object Transformations in Human Posterior Parietal Cortex and Convolutional Neural Networks. J Cogn Neurosci 2022; 34:2406-2435. [PMID: 36122358 PMCID: PMC9988239 DOI: 10.1162/jocn_a_01916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous research shows that, within human occipito-temporal cortex (OTC), we can use a general linear mapping function to link visual object responses across nonidentity feature changes, including Euclidean features (e.g., position and size) and non-Euclidean features (e.g., image statistics and spatial frequency). Although the learned mapping is capable of predicting responses of objects not included in training, these predictions are better for categories included than those not included in training. These findings demonstrate a near-orthogonal representation of object identity and nonidentity features throughout human OTC. Here, we extended these findings to examine the mapping across both Euclidean and non-Euclidean feature changes in human posterior parietal cortex (PPC), including functionally defined regions in inferior and superior intraparietal sulcus. We additionally examined responses in five convolutional neural networks (CNNs) pretrained with object classification, as CNNs are considered as the current best model of the primate ventral visual system. We separately compared results from PPC and CNNs with those of OTC. We found that a linear mapping function could successfully link object responses in different states of nonidentity transformations in human PPC and CNNs for both Euclidean and non-Euclidean features. Overall, we found that object identity and nonidentity features are represented in a near-orthogonal, rather than complete-orthogonal, manner in PPC and CNNs, just like they do in OTC. Meanwhile, some differences existed among OTC, PPC, and CNNs. These results demonstrate the similarities and differences in how visual object information across an identity-preserving image transformation may be represented in OTC, PPC, and CNNs.
Collapse
|
38
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
39
|
Utsumi A. A test of indirect grounding of abstract concepts using multimodal distributional semantics. Front Psychol 2022; 13:906181. [PMID: 36267060 PMCID: PMC9577286 DOI: 10.3389/fpsyg.2022.906181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
How are abstract concepts grounded in perceptual experiences for shaping human conceptual knowledge? Recent studies on abstract concepts emphasizing the role of language have argued that abstract concepts are grounded indirectly in perceptual experiences and language (or words) functions as a bridge between abstract concepts and perceptual experiences. However, this “indirect grounding” view remains largely speculative and has hardly been supported directly by empirical evidence. In this paper, therefore, we test the indirect grounding view by means of multimodal distributional semantics, in which the meaning of a word (i.e., a concept) is represented as the combination of textual and visual vectors. The newly devised multimodal distributional semantic model incorporates the indirect grounding view by computing the visual vector of an abstract word through the visual vectors of concrete words semantically related to that abstract word. An evaluation experiment is conducted in which conceptual representation is predicted from multimodal vectors using a multilayer feed-forward neural network. The analysis of prediction performance demonstrates that the indirect grounding model achieves significantly better performance in predicting human conceptual representation of abstract words than other models that mimic competing views on abstract concepts, especially than the direct grounding model in which the visual vectors of abstract words are computed directly from the images of abstract concepts. This result lends some plausibility to the indirect grounding view as a cognitive mechanism of grounding abstract concepts.
Collapse
|
40
|
Janini D, Hamblin C, Deza A, Konkle T. General object-based features account for letter perception. PLoS Comput Biol 2022; 18:e1010522. [PMID: 36155642 PMCID: PMC9536565 DOI: 10.1371/journal.pcbi.1010522] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 10/06/2022] [Accepted: 08/29/2022] [Indexed: 11/30/2022] Open
Abstract
After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or by reusing general visual features previously learned in service of object categorization? To explore this question, we first measured the perceptual similarity of letters in two behavioral tasks, visual search and letter categorization. Then, we trained deep convolutional neural networks on either 26-way letter categorization or 1000-way object categorization, as a way to operationalize possible specialized letter features and general object-based features, respectively. We found that the general object-based features more robustly correlated with the perceptual similarity of letters. We then operationalized additional forms of experience-dependent letter specialization by altering object-trained networks with varied forms of letter training; however, none of these forms of letter specialization improved the match to human behavior. Thus, our findings reveal that it is not necessary to appeal to specialized letter representations to account for perceptual similarity of letters. Instead, we argue that it is more likely that the perception of letters depends on domain-general visual features. For over a century, scientists have conducted behavioral experiments to investigate how the visual system recognizes letters, but it has proven difficult to propose a model of the feature space underlying this capacity. Here we leveraged recent advances in machine learning to model a wide variety of features ranging from specialized letter features to general object-based features. Across two large-scale behavioral experiments we find that general object-based features account well for letter perception, and that adding letter specialization did not improve the correspondence to human behavior. It is plausible that the ability to recognize letters largely relies on general visual features unaltered by letter learning.
Collapse
Affiliation(s)
- Daniel Janini
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| | - Chris Hamblin
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Arturo Deza
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
41
|
Tang K, Chin M, Chun M, Xu Y. The contribution of object identity and configuration to scene representation in convolutional neural networks. PLoS One 2022; 17:e0270667. [PMID: 35763531 PMCID: PMC9239439 DOI: 10.1371/journal.pone.0270667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/14/2022] [Indexed: 11/23/2022] Open
Abstract
Scene perception involves extracting the identities of the objects comprising a scene in conjunction with their configuration (the spatial layout of the objects in the scene). How object identity and configuration information is weighted during scene processing and how this weighting evolves over the course of scene processing however, is not fully understood. Recent developments in convolutional neural networks (CNNs) have demonstrated their aptitude at scene processing tasks and identified correlations between processing in CNNs and in the human brain. Here we examined four CNN architectures (Alexnet, Resnet18, Resnet50, Densenet161) and their sensitivity to changes in object and configuration information over the course of scene processing. Despite differences among the four CNN architectures, across all CNNs, we observed a common pattern in the CNN's response to object identity and configuration changes. Each CNN demonstrated greater sensitivity to configuration changes in early stages of processing and stronger sensitivity to object identity changes in later stages. This pattern persists regardless of the spatial structure present in the image background, the accuracy of the CNN in classifying the scene, and even the task used to train the CNN. Importantly, CNNs' sensitivity to a configuration change is not the same as their sensitivity to any type of position change, such as that induced by a uniform translation of the objects without a configuration change. These results provide one of the first documentations of how object identity and configuration information are weighted in CNNs during scene processing.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Matthew Chin
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Marvin Chun
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Yaoda Xu
- Department of Psychology, Yale University, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|
42
|
Prediction of H-type hypertension based on pulse-taking and inquiry diagnosis. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
43
|
On the Reliability of CNNs in Clinical Practice: A Computer-Aided Diagnosis System Case Study. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Leukocytes classification is essential to assess their number and status since they are the body’s first defence against infection and disease. Automation of the process can reduce the laborious manual process of review and diagnosis by operators and has been the subject of study for at least two decades. Most computer-aided systems exploit convolutional neural networks for classification purposes without any intermediate step to produce an accurate classification. This work explores the current limitations of deep learning-based methods applied to medical blood smear data. In particular, we consider leukocyte analysis oriented towards leukaemia prediction as a case study. In particular, we aim to demonstrate that a single classification step can undoubtedly lead to incorrect predictions or, worse, to correct predictions obtained with wrong indicators provided by the images. By generating new synthetic leukocyte data, it is possible to demonstrate that the inclusion of a fine-grained method, such as detection or segmentation, before classification is essential to allow the network to understand the adequate information on individual white blood cells correctly. The effectiveness of this study is thoroughly analysed and quantified through a series of experiments on a public data set of blood smears taken under a microscope. Experimental results show that residual networks perform statistically better in this scenario, even though they make correct predictions with incorrect information.
Collapse
|
44
|
Han K, Joung JF, Han M, Sung W, Kang YN. Locoregional Recurrence Prediction Using a Deep Neural Network of Radiological and Radiotherapy Images. J Pers Med 2022; 12:jpm12020143. [PMID: 35207631 PMCID: PMC8875706 DOI: 10.3390/jpm12020143] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 01/08/2022] [Accepted: 01/10/2022] [Indexed: 02/04/2023] Open
Abstract
Radiation therapy (RT) is an important and potentially curative modality for head and neck squamous cell carcinoma (HNSCC). Locoregional recurrence (LR) of HNSCC after RT is ranging from 15% to 50% depending on the primary site and stage. In addition, the 5-year survival rate of patients with LR is low. To classify high-risk patients who might develop LR, a deep learning model for predicting LR needs to be established. In this work, 157 patients with HNSCC who underwent RT were analyzed. Based on the National Cancer Institute’s multi-institutional TCIA data set containing FDG-PET/CT/dose, a 3D deep learning model was proposed to predict LR without time-consuming segmentation or feature extraction. Our model achieved an averaged area under the curve (AUC) of 0.856. Adding clinical factors into the model improved the AUC to an average of 0.892 with the highest AUC of up to 0.974. The 3D deep learning model could perform individualized risk quantification of LR in patients with HNSCC without time-consuming tumor segmentation.
Collapse
Affiliation(s)
- Kyumin Han
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea;
- Advanced Institute for Radiation Fusion Medical Technology, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Joonyoung Francis Joung
- Department of Chemistry and Research, Institute for Natural Science, Korea University, Seoul 02841, Korea; (J.F.J.); (M.H.)
| | - Minhi Han
- Department of Chemistry and Research, Institute for Natural Science, Korea University, Seoul 02841, Korea; (J.F.J.); (M.H.)
| | - Wonmo Sung
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea;
- Department of Biomedical Engineering, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Correspondence: (W.S.); (Y.-n.K.)
| | - Young-nam Kang
- Advanced Institute for Radiation Fusion Medical Technology, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Department of Radiation Oncology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Correspondence: (W.S.); (Y.-n.K.)
| |
Collapse
|
45
|
Thompson JAF. Noise increases the correspondence between artificial and human vision. PLoS Biol 2021; 19:e3001477. [PMID: 34890404 PMCID: PMC8664186 DOI: 10.1371/journal.pbio.3001477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
This Primer explores the implications of a recent PLOS Biology study, arguing that noise-robustness, a property of human vision that standard computer vision models fail to mimic, provides an opportunity to probe the neural mechanisms underlying visual object recognition and refine computational models of the ventral visual stream.
Collapse
Affiliation(s)
- Jessica A. F. Thompson
- Human Information Processing Lab, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
46
|
Jang H, McCormack D, Tong F. Noise-trained deep neural networks effectively predict human vision and its neural responses to challenging images. PLoS Biol 2021; 19:e3001418. [PMID: 34882676 PMCID: PMC8659651 DOI: 10.1371/journal.pbio.3001418] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 09/20/2021] [Indexed: 11/18/2022] Open
Abstract
Deep neural networks (DNNs) for object classification have been argued to provide the most promising model of the visual system, accompanied by claims that they have attained or even surpassed human-level performance. Here, we evaluated whether DNNs provide a viable model of human vision when tested with challenging noisy images of objects, sometimes presented at the very limits of visibility. We show that popular state-of-the-art DNNs perform in a qualitatively different manner than humans—they are unusually susceptible to spatially uncorrelated white noise and less impaired by spatially correlated noise. We implemented a noise training procedure to determine whether noise-trained DNNs exhibit more robust responses that better match human behavioral and neural performance. We found that noise-trained DNNs provide a better qualitative match to human performance; moreover, they reliably predict human recognition thresholds on an image-by-image basis. Functional neuroimaging revealed that noise-trained DNNs provide a better correspondence to the pattern-specific neural representations found in both early visual areas and high-level object areas. A layer-specific analysis of the DNNs indicated that noise training led to broad-ranging modifications throughout the network, with greater benefits of noise robustness accruing in progressively higher layers. Our findings demonstrate that noise-trained DNNs provide a viable model to account for human behavioral and neural responses to objects in challenging noisy viewing conditions. Further, they suggest that robustness to noise may be acquired through a process of visual learning. Unlike human observers, deep neural networks fail to recognize objects in severe visual noise. This study develops noise-trained networks and shows that these networks better predict human performance and neural responses in the visual cortex to challenging noisy object images.
Collapse
Affiliation(s)
- Hojin Jang
- Psychology Department and Vanderbilt Vision Research Center, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail: (HJ); (FT)
| | - Devin McCormack
- Psychology Department and Vanderbilt Vision Research Center, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Frank Tong
- Psychology Department and Vanderbilt Vision Research Center, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail: (HJ); (FT)
| |
Collapse
|
47
|
Lonnqvist B, Bornet A, Doerig A, Herzog MH. A comparative biology approach to DNN modeling of vision: A focus on differences, not similarities. J Vis 2021; 21:17. [PMID: 34551062 PMCID: PMC8475290 DOI: 10.1167/jov.21.10.17] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 08/26/2021] [Indexed: 11/24/2022] Open
Abstract
Deep neural networks (DNNs) have revolutionized computer science and are now widely used for neuroscientific research. A hot debate has ensued about the usefulness of DNNs as neuroscientific models of the human visual system; the debate centers on to what extent certain shortcomings of DNNs are real failures and to what extent they are redeemable. Here, we argue that the main problem is that we often do not understand which human functions need to be modeled and, thus, what counts as a falsification. Hence, not only is there a problem on the DNN side, but there is also one on the brain side (i.e., with the explanandum-the thing to be explained). For example, should DNNs reproduce illusions? We posit that we can make better use of DNNs by adopting an approach of comparative biology by focusing on the differences, rather than the similarities, between DNNs and humans to improve our understanding of visual information processing in general.
Collapse
Affiliation(s)
- Ben Lonnqvist
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
| | - Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
48
|
Goetschalckx L, Andonian A, Wagemans J. Generative adversarial networks unlock new methods for cognitive science. Trends Cogn Sci 2021; 25:788-801. [PMID: 34364792 DOI: 10.1016/j.tics.2021.06.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022]
Abstract
Generative adversarial networks (GANs) enable computers to learn complex data distributions and sample from these distributions. When applied to the visual domain, this allows artificial, yet photorealistic images to be synthesized. Their success at this very challenging task triggered an explosion of research within the field of artificial intelligence (AI), yielding various new GAN findings and applications. After explaining the core principles behind GANs and reviewing recent GAN innovations, we illustrate how they can be applied to tackle thorny theoretical and methodological problems in cognitive science. We focus on how GANs can reveal hidden structure in internal representations and how they offer a valuable new compromise in the trade-off between experimental control and ecological validity.
Collapse
Affiliation(s)
- Lore Goetschalckx
- Department of Brain and Cognition, KU Leuven, 3000 Leuven, Belgium; Carney Institute for Brain Science, Department of Cognitive Linguistic & Psychological Sciences, Brown University, Providence, RI 02912, USA.
| | - Alex Andonian
- Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT, Cambridge, MA 02139, USA
| | - Johan Wagemans
- Department of Brain and Cognition, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
49
|
Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks. J Neurosci 2021; 41:4234-4252. [PMID: 33789916 DOI: 10.1523/jneurosci.1993-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 03/12/2021] [Accepted: 03/15/2021] [Indexed: 12/17/2022] Open
Abstract
A visual object is characterized by multiple visual features, including its identity, position and size. Despite the usefulness of identity and nonidentity features in vision and their joint coding throughout the primate ventral visual processing pathway, they have so far been studied relatively independently. Here in both female and male human participants, the coding of identity and nonidentity features was examined together across the human ventral visual pathway. The nonidentity features tested included two Euclidean features (position and size) and two non-Euclidean features (image statistics and spatial frequency (SF) content of an image). Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with identity outweighing the non-Euclidean but not the Euclidean features at higher levels of visual processing. In 14 convolutional neural networks (CNNs) pretrained for object categorization with varying architecture, depth, and with/without recurrent processing, nonidentity feature representation showed an initial large increase from early to mid-stage of processing, followed by a decrease at later stages of processing, different from brain responses. Additionally, from lower to higher levels of visual processing, position became more underrepresented and image statistics and SF became more overrepresented compared with identity in CNNs than in the human brain. Similar results were obtained in a CNN trained with stylized images that emphasized shape representations. Overall, by measuring the coding strength of object identity and nonidentity features together, our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.SIGNIFICANCE STATEMENT This study examined the coding strength of object identity and four types of nonidentity features along the human ventral visual processing pathway and compared brain responses with those of 14 convolutional neural networks (CNNs) pretrained to perform object categorization. Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with some notable differences among the different nonidentity features. CNNs differed from the brain in a number of aspects in their representations of identity and nonidentity features over the course of visual processing. Our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.
Collapse
|