1
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2024:10.3758/s13421-024-01580-1. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
2
|
George A, Yohannan DG. Windows into spatial cognition: Mechanisms by which gesture-based instruction improve anatomy learning. ANATOMICAL SCIENCES EDUCATION 2024; 17:462-467. [PMID: 38351605 DOI: 10.1002/ase.2399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 01/23/2024] [Accepted: 01/31/2024] [Indexed: 04/04/2024]
Abstract
The ability to create efficient "mental models" or representations of anatomical structures is crucial for achieving competence in most areas of anatomy. Gesture-based teaching has been recognized to lighten cognitive loads and allow superior mental model creation compared to non-gestural teaching practices. This commentary explores the cognitive basis and possible mechanisms behind this advantage such as (1) reducing visual working memory load, (2) allowing parallel and sequential development of internal representations, and (3) facilitating preferential feature extraction and improved organization of spatial information. We also highlight how information transfer limitations of the gestural medium, interestingly, unveil features and organizational motifs preserved in the "expert's" mental schemas concerning particular anatomical structures. The universal and innate use of gestures in communication, their visual nature, and the ability to break down complex spatial information through sequential steps, all add to the immense potential of this subtle yet powerful tool of hand gestures. As pedagogical practices in the anatomical sciences continue to evolve largely towards technology-enhanced teaching utilizing perceptually richer media, the unique advantages of gesture-based teaching need to be reemphasized.
Collapse
Affiliation(s)
- Asish George
- Royal Liverpool and Broadgreen University Hospitals NHS Trust, Liverpool, UK
| | | |
Collapse
|
3
|
Karimi-Rouzbahani H, Woolgar A, Henson R, Nili H. Caveats and Nuances of Model-Based and Model-Free Representational Connectivity Analysis. Front Neurosci 2022; 16:755988. [PMID: 35360178 PMCID: PMC8960982 DOI: 10.3389/fnins.2022.755988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 02/02/2022] [Indexed: 11/30/2022] Open
Abstract
Brain connectivity analyses have conventionally relied on statistical relationship between one-dimensional summaries of activation in different brain areas. However, summarizing activation patterns within each area to a single dimension ignores the potential statistical dependencies between their multi-dimensional activity patterns. Representational Connectivity Analyses (RCA) is a method that quantifies the relationship between multi-dimensional patterns of activity without reducing the dimensionality of the data. We consider two variants of RCA. In model-free RCA, the goal is to quantify the shared information for two brain regions. In model-based RCA, one tests whether two regions have shared information about a specific aspect of the stimuli/task, as defined by a model. However, this is a new approach and the potential caveats of model-free and model-based RCA are still understudied. We first explain how model-based RCA detects connectivity through the lens of models, and then present three scenarios where model-based and model-free RCA give discrepant results. These conflicting results complicate the interpretation of functional connectivity. We highlight the challenges in three scenarios: complex intermediate models, common patterns across regions, and transformation of representational structure across brain regions. The article is accompanied by scripts (https://osf.io/3nxfa/) that reproduce the results. In each case, we suggest potential ways to mitigate the difficulties caused by inconsistent results. The results of this study shed light on some understudied aspects of RCA, and allow researchers to use the method more effectively.
Collapse
Affiliation(s)
- Hamid Karimi-Rouzbahani
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Department of Computing, Macquarie University, Sydney, NSW, Australia
| | - Alexandra Woolgar
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Richard Henson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
| | - Hamed Nili
- Department of Excellence for Neural Information Processing, Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
4
|
A survey of brain network analysis by electroencephalographic signals. Cogn Neurodyn 2022; 16:17-41. [PMID: 35126769 PMCID: PMC8807775 DOI: 10.1007/s11571-021-09689-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 04/25/2021] [Accepted: 05/31/2021] [Indexed: 02/03/2023] Open
Abstract
Brain network analysis is one efficient tool in exploring human brain diseases and can differentiate the alterations from comparative networks. The alterations account for time, mental states, tasks, individuals, and so forth. Furthermore, the changes determine the segregation and integration of functional networks that lead to network reorganization (or reconfiguration) to extend the neuroplasticity of the brain. Exploring related brain networks should be of interest that may provide roadmaps for brain research and clinical diagnosis. Recent electroencephalogram (EEG) studies have revealed the secrets of the brain networks and diseases (or disorders) within and between subjects and have provided instructive and promising suggestions and methods. This review summarized the corresponding algorithms that had been used to construct functional or effective networks on the scalp and cerebral cortex. We reviewed EEG network analysis that unveils more cognitive functions and neural disorders of the human and then explored the relationship between brain science and artificial intelligence which may fuel each other to accelerate their advances, and also discussed some innovations and future challenges in the end.
Collapse
|
5
|
|
6
|
Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks. J Neurosci 2021; 41:4234-4252. [PMID: 33789916 DOI: 10.1523/jneurosci.1993-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 03/12/2021] [Accepted: 03/15/2021] [Indexed: 12/17/2022] Open
Abstract
A visual object is characterized by multiple visual features, including its identity, position and size. Despite the usefulness of identity and nonidentity features in vision and their joint coding throughout the primate ventral visual processing pathway, they have so far been studied relatively independently. Here in both female and male human participants, the coding of identity and nonidentity features was examined together across the human ventral visual pathway. The nonidentity features tested included two Euclidean features (position and size) and two non-Euclidean features (image statistics and spatial frequency (SF) content of an image). Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with identity outweighing the non-Euclidean but not the Euclidean features at higher levels of visual processing. In 14 convolutional neural networks (CNNs) pretrained for object categorization with varying architecture, depth, and with/without recurrent processing, nonidentity feature representation showed an initial large increase from early to mid-stage of processing, followed by a decrease at later stages of processing, different from brain responses. Additionally, from lower to higher levels of visual processing, position became more underrepresented and image statistics and SF became more overrepresented compared with identity in CNNs than in the human brain. Similar results were obtained in a CNN trained with stylized images that emphasized shape representations. Overall, by measuring the coding strength of object identity and nonidentity features together, our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.SIGNIFICANCE STATEMENT This study examined the coding strength of object identity and four types of nonidentity features along the human ventral visual processing pathway and compared brain responses with those of 14 convolutional neural networks (CNNs) pretrained to perform object categorization. Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with some notable differences among the different nonidentity features. CNNs differed from the brain in a number of aspects in their representations of identity and nonidentity features over the course of visual processing. Our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.
Collapse
|
7
|
Karimi-Rouzbahani H, Ramezani F, Woolgar A, Rich A, Ghodrati M. Perceptual difficulty modulates the direction of information flow in familiar face recognition. Neuroimage 2021; 233:117896. [PMID: 33667671 PMCID: PMC7614447 DOI: 10.1016/j.neuroimage.2021.117896] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 02/07/2023] Open
Abstract
Humans are fast and accurate when they recognize familiar faces. Previous neurophysiological studies have shown enhanced representations for the dichotomy of familiar vs. unfamiliar faces. As familiarity is a spectrum, however, any neural correlate should reflect graded representations for more vs. less familiar faces along the spectrum. By systematically varying familiarity across stimuli, we show a neural familiarity spectrum using electroencephalography. We then evaluated the spatiotemporal dynamics of familiar face recognition across the brain. Specifically, we developed a novel informational connectivity method to test whether peri-frontal brain areas contribute to familiar face recognition. Results showed that feed-forward flow dominates for the most familiar faces and top-down flow was only dominant when sensory evidence was insufficient to support face recognition. These results demonstrate that perceptual difficulty and the level of familiarity influence the neural representation of familiar faces and the degree to which peri-frontal neural networks contribute to familiar face recognition.
Collapse
Affiliation(s)
- Hamid Karimi-Rouzbahani
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, United Kingdom; Perception in Action Research Centre and Department of Cognitive Science Macquarie University, Australia.
| | - Farzad Ramezani
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Iran
| | - Alexandra Woolgar
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, United Kingdom; Perception in Action Research Centre and Department of Cognitive Science Macquarie University, Australia
| | - Anina Rich
- Perception in Action Research Centre and Department of Cognitive Science Macquarie University, Australia
| | - Masoud Ghodrati
- Neuroscience Program, Biomedicine Discovery Institute, Monash University, Australia.
| |
Collapse
|
8
|
Kiyokawa H, Tashiro T, Yamauchi Y, Nagai T. Spatial Frequency Effective for Increasing Perceived Glossiness by Contrast Enhancement. Front Psychol 2021; 12:625135. [PMID: 33613400 PMCID: PMC7892470 DOI: 10.3389/fpsyg.2021.625135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 01/15/2021] [Indexed: 11/13/2022] Open
Abstract
It has been suggested that luminance edges in retinal images are potential cues for glossiness perception, particularly when the perception relies on low-luminance specular regions. However, a previous study has shown only statistical correlations between luminance edges and perceived glossiness, not their causal relations. Additionally, although specular components should be embedded at various spatial frequencies depending on the micro-roughness on the object surface, it is not well understood what spatial frequencies are essential for glossiness perception on objects with different micro-roughness. To address these issues, we examined the impact of a sub-band contrast enhancement on the perceived glossiness in the two conditions of stimuli: the Full condition where the stimulus had natural specular components and the Dark condition where it had specular components only in dark regions. Object images with various degrees of surface roughness were generated as stimuli, and their contrast was increased in various spatial-frequency sub-bands. The results indicate that the enhancement of the sub-band contrast can significantly increase perceived glossiness as expected. Furthermore, the effectiveness of each spatial frequency band depends on the surface roughness in the Full condition. However, effective spatial frequencies are constant at a middle spatial frequency regardless of the stimulus surface roughness in the Dark condition. These results suggest that, for glossiness perception, our visual system depends on specular-related information embedded in high spatial frequency components but may change the dependency on spatial frequency based on the surface luminance to be judged.
Collapse
Affiliation(s)
- Hiroaki Kiyokawa
- Department of Electrical Engineering and Informatics, Yamagata University, Yamagata, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan
| | - Tomonori Tashiro
- Department of Informatics and Electronics, Yamagata University, Yamagata, Japan
| | - Yasuki Yamauchi
- Department of Informatics and Electronics, Yamagata University, Yamagata, Japan
| | - Takehiro Nagai
- Department of Information and Communications Engineering, Tokyo Institute of Technology, Yokohama, Japan
| |
Collapse
|
9
|
Han Y, Roig G, Geiger G, Poggio T. Scale and translation-invariance for novel objects in human vision. Sci Rep 2020; 10:1411. [PMID: 31996698 PMCID: PMC6989457 DOI: 10.1038/s41598-019-57261-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 12/19/2019] [Indexed: 11/09/2022] Open
Abstract
Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons' receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.
Collapse
Affiliation(s)
- Yena Han
- Center for Brains, Minds and Machines, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America.
| | - Gemma Roig
- Center for Brains, Minds and Machines, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America
- Computer Science Department, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Gad Geiger
- Center for Brains, Minds and Machines, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America
| | - Tomaso Poggio
- Center for Brains, Minds and Machines, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America
| |
Collapse
|
10
|
Spatiotemporal analysis of category and target-related information processing in the brain during object detection. Behav Brain Res 2019; 362:224-239. [DOI: 10.1016/j.bbr.2019.01.025] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 01/11/2019] [Accepted: 01/13/2019] [Indexed: 11/21/2022]
|
11
|
Parisi GI, Tani J, Weber C, Wermter S. Lifelong Learning of Spatiotemporal Representations With Dual-Memory Recurrent Self-Organization. Front Neurorobot 2018; 12:78. [PMID: 30546302 PMCID: PMC6279894 DOI: 10.3389/fnbot.2018.00078] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 11/06/2018] [Indexed: 11/28/2022] Open
Abstract
Artificial autonomous agents and robots interacting in complex environments are required to continually acquire and fine-tune knowledge over sustained periods of time. The ability to learn from continuous streams of information is referred to as lifelong learning and represents a long-standing challenge for neural network models due to catastrophic forgetting in which novel sensory experience interferes with existing representations and leads to abrupt decreases in the performance on previously acquired knowledge. Computational models of lifelong learning typically alleviate catastrophic forgetting in experimental scenarios with given datasets of static images and limited complexity, thereby differing significantly from the conditions artificial agents are exposed to. In more natural settings, sequential information may become progressively available over time and access to previous experience may be restricted. Therefore, specialized neural network mechanisms are required that adapt to novel sequential experience while preventing disruptive interference with existing representations. In this paper, we propose a dual-memory self-organizing architecture for lifelong learning scenarios. The architecture comprises two growing recurrent networks with the complementary tasks of learning object instances (episodic memory) and categories (semantic memory). Both growing networks can expand in response to novel sensory experience: the episodic memory learns fine-grained spatiotemporal representations of object instances in an unsupervised fashion while the semantic memory uses task-relevant signals to regulate structural plasticity levels and develop more compact representations from episodic experience. For the consolidation of knowledge in the absence of external sensory input, the episodic memory periodically replays trajectories of neural reactivations. We evaluate the proposed model on the CORe50 benchmark dataset for continuous object recognition, showing that we significantly outperform current methods of lifelong learning in three different incremental learning scenarios.
Collapse
Affiliation(s)
- German I. Parisi
- Knowledge Technology, Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Jun Tani
- Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
| | - Cornelius Weber
- Knowledge Technology, Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Stefan Wermter
- Knowledge Technology, Department of Informatics, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
12
|
Karimi-Rouzbahani H. Three-stage processing of category and variation information by entangled interactive mechanisms of peri-occipital and peri-frontal cortices. Sci Rep 2018; 8:12213. [PMID: 30111859 PMCID: PMC6093927 DOI: 10.1038/s41598-018-30601-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 08/02/2018] [Indexed: 11/30/2022] Open
Abstract
Object recognition has been a central question in human vision research. The general consensus is that the ventral and dorsal visual streams are the major processing pathways undertaking objects' category and variation processing. This overlooks mounting evidence supporting the role of peri-frontal areas in category processing. Yet, many aspects of visual processing in peri-frontal areas have remained unattended including whether these areas play role only during active recognition and whether they interact with lower visual areas or process information independently. To address these questions, subjects were presented with a set of variation-controlled object images while their EEG were recorded. Considerable amounts of category and variation information were decodable from occipital, parietal, temporal and prefrontal electrodes. Using information-selectivity indices, phase and Granger causality analyses, three processing stages were identified showing distinct directions of information transaction between peri-frontal and peri-occipital areas suggesting their parallel yet interactive role in visual processing. A brain-plausible model supported the possibility of interactive mechanisms in peri-occipital and peri-frontal areas. These findings, while promoting the role of prefrontal areas in object recognition, extend their contributions from active recognition, in which peri-frontal to peri-occipital pathways are activated by higher cognitive processes, to the general sensory-driven object and variation processing.
Collapse
Affiliation(s)
- Hamid Karimi-Rouzbahani
- Department of Electrical Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran.
- Perception in Action Research Centre & Department of Cognitive Science, Faculty of Human Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- ARC Centre of Excellence in Cognition and Its Disorders, Macquarie University, Sydney, NSW 2109, Australia.
| |
Collapse
|