1
|
Petilli MA, Rodio FM, Günther F, Marelli M. Visual search and real-image similarity: An empirical assessment through the lens of deep learning. Psychon Bull Rev 2024:10.3758/s13423-024-02583-4. [PMID: 39327401 DOI: 10.3758/s13423-024-02583-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 09/28/2024]
Abstract
The ability to predict how efficiently a person finds an object in the environment is a crucial goal of attention research. Central to this issue are the similarity principles initially proposed by Duncan and Humphreys, which outline how the similarity between target and distractor objects (TD) and between distractor objects themselves (DD) affect search efficiency. However, the search principles lack direct quantitative support from an ecological perspective, being a summary approximation of a wide range of lab-based results poorly generalisable to real-world scenarios. This study exploits deep convolutional neural networks to predict human search efficiency from computational estimates of similarity between objects populating, potentially, any visual scene. Our results provide ecological evidence supporting the similarity principles: search performance continuously varies across tasks and conditions and improves with decreasing TD similarity and increasing DD similarity. Furthermore, our results reveal a crucial dissociation: TD and DD similarities mainly operate at two distinct layers of the network: DD similarity at the intermediate layers of coarse object features and TD similarity at the final layers of complex features used for classification. This suggests that these different similarities exert their major effects at two distinct perceptual levels and demonstrates our methodology's potential to offer insights into the depth of visual processing on which the search relies. By combining computational techniques with visual search principles, this approach aligns with modern trends in other research areas and fulfils longstanding demands for more ecologically valid research in the field of visual search.
Collapse
Affiliation(s)
- Marco A Petilli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy.
| | - Francesca M Rodio
- Institute for Advanced Studies, IUSS, Pavia, Italy
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Fritz Günther
- Department of Psychology, Humboldt University at Berlin, Berlin, Germany
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy
- NeuroMI, Milan Center for Neuroscience, Milan, Italy
| |
Collapse
|
2
|
Raman R, Bognár A, Nejad GG, Taubert N, Giese M, Vogels R. Bodies in motion: Unraveling the distinct roles of motion and shape in dynamic body responses in the temporal cortex. Cell Rep 2023; 42:113438. [PMID: 37995183 PMCID: PMC10783614 DOI: 10.1016/j.celrep.2023.113438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 09/26/2023] [Accepted: 10/26/2023] [Indexed: 11/25/2023] Open
Abstract
The temporal cortex represents social stimuli, including bodies. We examine and compare the contributions of dynamic and static features to the single-unit responses to moving monkey bodies in and between a patch in the anterior dorsal bank of the superior temporal sulcus (dorsal patch [DP]) and patches in the anterior inferotemporal cortex (ventral patch [VP]), using fMRI guidance in macaques. The response to dynamics varies within both regions, being higher in DP. The dynamic body selectivity of VP neurons correlates with static features derived from convolutional neural networks and motion. DP neurons' dynamic body selectivity is not predicted by static features but is dominated by motion. Whereas these data support the dominance of motion in the newly proposed "dynamic social perception" stream, they challenge the traditional view that distinguishes DP and VP processing in terms of motion versus static features, underscoring the role of inferotemporal neurons in representing body dynamics.
Collapse
Affiliation(s)
- Rajani Raman
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| | - Anna Bognár
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| | - Ghazaleh Ghamkhari Nejad
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| | - Nick Taubert
- Hertie Institute for Clinical Brain Research and Center for Integrative Neuroscience, University Clinic Tuebingen, 72074 Tuebingen, Germany
| | - Martin Giese
- Hertie Institute for Clinical Brain Research and Center for Integrative Neuroscience, University Clinic Tuebingen, 72074 Tuebingen, Germany
| | - Rufin Vogels
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium.
| |
Collapse
|
3
|
Schnell AE, Leemans M, Vinken K, Op de Beeck H. A computationally informed comparison between the strategies of rodents and humans in visual object recognition. eLife 2023; 12:RP87719. [PMID: 38079481 PMCID: PMC10712954 DOI: 10.7554/elife.87719] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In this study, we created a large multidimensional stimulus set and designed a visual discrimination task partially based upon modelling with a convolutional deep neural network (CNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 45). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a CNN. A direct comparison with CNN representations and visual feature analyses revealed that rat performance was best captured by late convolutional layers and partially by visual features such as brightness and pixel-level similarity, while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision.
Collapse
Affiliation(s)
| | - Maarten Leemans
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| | - Kasper Vinken
- Department of Neurobiology, Harvard Medical SchoolBostonUnited States
| | - Hans Op de Beeck
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| |
Collapse
|
4
|
Vinken K, Prince JS, Konkle T, Livingstone MS. The neural code for "face cells" is not face-specific. SCIENCE ADVANCES 2023; 9:eadg1736. [PMID: 37647400 PMCID: PMC10468123 DOI: 10.1126/sciadv.adg1736] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/27/2023] [Indexed: 09/01/2023]
Abstract
Face cells are neurons that respond more to faces than to non-face objects. They are found in clusters in the inferotemporal cortex, thought to process faces specifically, and, hence, studied using faces almost exclusively. Analyzing neural responses in and around macaque face patches to hundreds of objects, we found graded response profiles for non-face objects that predicted the degree of face selectivity and provided information on face-cell tuning beyond that from actual faces. This relationship between non-face and face responses was not predicted by color and simple shape properties but by information encoded in deep neural networks trained on general objects rather than face classification. These findings contradict the long-standing assumption that face versus non-face selectivity emerges from face-specific features and challenge the practice of focusing on only the most effective stimulus. They provide evidence instead that category-selective neurons are best understood by their tuning directions in a domain-general object space.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jacob S. Prince
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | | |
Collapse
|
5
|
Abstract
Models of object recognition have mostly focused upon the hierarchical processing of objects from local edges up to more complex shape features. An alternative strategy that might be involved in pattern recognition centres around coarse-level contrast features. In humans and monkeys, the use of such features is most documented in the domain of face perception. Given prior suggestions that, generally, rodents might rely upon contrast features for object recognition, we hypothesized that they would pick up the typical contrast features relevant for face detection. We trained rats in a face-nonface categorization task with stimuli previously used in computer vision and tested for generalization with new, unseen stimuli by including manipulations of the presence and strength of a range of contrast features previously identified to be relevant for face detection. Although overall generalization performance was low, it was significantly modulated by contrast features. A model taking into account the summed strength of contrast features predicted the variation in accuracy across stimuli. Finally, with deep neural networks, we further investigated and quantified the performance and representations of the animals. The findings suggest that rat behaviour in visual pattern recognition tasks is partially explained by contrast feature processing.
Collapse
|
6
|
Zafirova Y, Cui D, Raman R, Vogels R. Keep the head in the right place: Face-body interactions in inferior temporal cortex. Neuroimage 2022; 264:119676. [PMID: 36216293 DOI: 10.1016/j.neuroimage.2022.119676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/23/2022] [Accepted: 10/06/2022] [Indexed: 11/05/2022] Open
Abstract
In primates, faces and bodies activate distinct regions in the inferior temporal (IT) cortex and are typically studied separately. Yet, primates interact with whole agents and not with random concatenations of faces and bodies. Despite its social importance, it is still poorly understood how faces and bodies interact in IT. Here, we addressed this gap by measuring fMRI activations to whole agents and to unnatural face-body configurations in which the head was mislocated with respect to the body, and examined how these relate to the sum of the activations to their corresponding faces and bodies. First, we mapped patches in the IT of awake macaques that were activated more by images of whole monkeys compared to objects and found that these mostly overlapped with body and face patches. In a second fMRI experiment, we obtained no evidence for superadditive responses in these "monkey patches", with the activation to the monkeys being less or equal to the summed face-body activations. However, monkey patches in the anterior IT were activated more by natural compared to unnatural configurations. The stronger activations to natural configurations could not be explained by the summed face-body activations. These univariate results were supported by regression analyses in which we modeled the activations to both configurations as a weighted linear combination of the activations to the faces and bodies, showing higher regression coefficients for the natural compared to the unnatural configurations. Deeper layers of trained convolutional neural networks also contained units that responded more to natural compared to unnatural monkey configurations. Unlike the monkey fMRI patches, these units showed substantial superadditive responses to the natural configurations. Our monkey fMRI data suggest configuration-sensitive face-body interactions in anterior IT, adding to the evidence for an integrated face-body processing in the primate ventral visual stream, and open the way for mechanistic studies using single unit recordings in these patches.
Collapse
Affiliation(s)
- Yordanka Zafirova
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Ding Cui
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Rajani Raman
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Rufin Vogels
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium.
| |
Collapse
|
7
|
Singer JJD, Seeliger K, Kietzmann TC, Hebart MN. From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction. J Vis 2022; 22:4. [PMID: 35129578 PMCID: PMC8822363 DOI: 10.1167/jov.22.2.4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Line drawings convey meaning with just a few strokes. Despite strong simplifications, humans can recognize objects depicted in such abstracted images without effort. To what degree do deep convolutional neural networks (CNNs) mirror this human ability to generalize to abstracted object images? While CNNs trained on natural images have been shown to exhibit poor classification performance on drawings, other work has demonstrated highly similar latent representations in the networks for abstracted and natural images. Here, we address these seemingly conflicting findings by analyzing the activation patterns of a CNN trained on natural images across a set of photographs, drawings, and sketches of the same objects and comparing them to human behavior. We find a highly similar representational structure across levels of visual abstraction in early and intermediate layers of the network. This similarity, however, does not translate to later stages in the network, resulting in low classification performance for drawings and sketches. We identified that texture bias in CNNs contributes to the dissimilar representational structure in late layers and the poor performance on drawings. Finally, by fine-tuning late network layers with object drawings, we show that performance can be largely restored, demonstrating the general utility of features learned on natural images in early and intermediate layers for the recognition of drawings. In conclusion, generalization to abstracted images, such as drawings, seems to be an emergent property of CNNs trained on natural images, which is, however, suppressed by domain-related biases that arise during later processing stages in the network.
Collapse
Affiliation(s)
- Johannes J D Singer
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Department of Psychology, Ludwig Maximilian University, Munich, Germany.,
| | - Katja Seeliger
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,
| | - Tim C Kietzmann
- Donders Institute for Brain, Cognition and Behavior, Nijmegen, The Netherlands.,
| | - Martin N Hebart
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,
| |
Collapse
|
8
|
Acoustic Signal Classification Using Symmetrized Dot Pattern and Convolutional Neural Network. MACHINES 2022. [DOI: 10.3390/machines10020090] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The classification of sound signals can be applied to the fault diagnosis of mechanical systems, such as vehicles. The traditional sound classification technology mainly uses the time-frequency domain characteristics of signals as the basis for identification. This study proposes a technique for visualizing sound signals, and uses artificial neural networks as the basis for signal classification. This feature extraction method mainly uses a principle to convert a time domain signal into a coordinate symmetrized dot pattern, and presents it in the form of snowflakes through signal conversion. To verify the feasibility of this method to classify different noise characteristic signals, the experimental work is divided into two parts, which are the identification of traditional engine vehicle noise and electric motor noise. In sound measurement, we first use the microphone and data acquisition system to measure the noise of different vehicles under the same operating conditions or the operating noise of different electric motors. We then convert the signal in the time domain into a symmetrized dot pattern and establish an acoustic symmetrized dot pattern database, and use a convolutional neural network to identify vehicle types. To achieve a better identification effect, in the process of data analysis, the effect of the time delay coefficient and weighting coefficient on the image identification effect is discussed. The experimental results show that the method can be effectively applied to the identification of traditional engine and electric vehicle classification, and can effectively achieve the purpose of sound signal classification.
Collapse
|
9
|
Vinken K, Op de Beeck H. Using deep neural networks to evaluate object vision tasks in rats. PLoS Comput Biol 2021; 17:e1008714. [PMID: 33651793 PMCID: PMC7954349 DOI: 10.1371/journal.pcbi.1008714] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 03/12/2021] [Accepted: 01/17/2021] [Indexed: 11/18/2022] Open
Abstract
In the last two decades rodents have been on the rise as a dominant model for visual neuroscience. This is particularly true for earlier levels of information processing, but a number of studies have suggested that also higher levels of processing such as invariant object recognition occur in rodents. Here we provide a quantitative and comprehensive assessment of this claim by comparing a wide range of rodent behavioral and neural data with convolutional deep neural networks. These networks have been shown to capture hallmark properties of information processing in primates through a succession of convolutional and fully connected layers. We find that performance on rodent object vision tasks can be captured using low to mid-level convolutional layers only, without any convincing evidence for the need of higher layers known to simulate complex object recognition in primates. Our approach also reveals surprising insights on assumptions made before, for example, that the best performing animals would be the ones using the most abstract representations-which we show to likely be incorrect. Our findings suggest a road ahead for further studies aiming at quantifying and establishing the richness of representations underlying information processing in animal models at large.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Ophthalmology, Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium
| | - Hans Op de Beeck
- Department of Brain and Cognition & Leuven Brain Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
10
|
Vinken K, Boix X, Kreiman G. Incorporating intrinsic suppression in deep neural networks captures dynamics of adaptation in neurophysiology and perception. SCIENCE ADVANCES 2020; 6:eabd4205. [PMID: 33055170 PMCID: PMC7556832 DOI: 10.1126/sciadv.abd4205] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 08/26/2020] [Indexed: 06/11/2023]
Abstract
Adaptation is a fundamental property of sensory systems that can change subjective experiences in the context of recent information. Adaptation has been postulated to arise from recurrent circuit mechanisms or as a consequence of neuronally intrinsic suppression. However, it is unclear whether intrinsic suppression by itself can account for effects beyond reduced responses. Here, we test the hypothesis that complex adaptation phenomena can emerge from intrinsic suppression cascading through a feedforward model of visual processing. A deep convolutional neural network with intrinsic suppression captured neural signatures of adaptation including novelty detection, enhancement, and tuning curve shifts, while producing aftereffects consistent with human perception. When adaptation was trained in a task where repeated input affects recognition performance, an intrinsic mechanism generalized better than a recurrent neural network. Our results demonstrate that feedforward propagation of intrinsic suppression changes the functional state of the network, reproducing key neurophysiological and perceptual properties of adaptation.
Collapse
Affiliation(s)
- K Vinken
- Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA.
- Center for Brains, Minds and Machines, Cambridge, MA 02139, USA
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, KU Leuven, 3000, Leuven, Belgium
| | - X Boix
- Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
- Center for Brains, Minds and Machines, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - G Kreiman
- Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
- Center for Brains, Minds and Machines, Cambridge, MA 02139, USA
| |
Collapse
|
11
|
Zeman AA, Ritchie JB, Bracci S, Op de Beeck H. Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex. Sci Rep 2020; 10:2453. [PMID: 32051467 PMCID: PMC7016009 DOI: 10.1038/s41598-020-59175-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 01/22/2020] [Indexed: 11/16/2022] Open
Abstract
Deep Convolutional Neural Networks (CNNs) are gaining traction as the benchmark model of visual object recognition, with performance now surpassing humans. While CNNs can accurately assign one image to potentially thousands of categories, network performance could be the result of layers that are tuned to represent the visual shape of objects, rather than object category, since both are often confounded in natural images. Using two stimulus sets that explicitly dissociate shape from category, we correlate these two types of information with each layer of multiple CNNs. We also compare CNN output with fMRI activation along the human visual ventral stream by correlating artificial with neural representations. We find that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures. Comparing CNNs with fMRI brain data, early visual cortex (V1) and early layers of CNNs encode shape information. Anterior ventral temporal cortex encodes category information, which correlates best with the final layer of CNNs. The interaction between shape and category that is found along the human visual ventral pathway is echoed in multiple deep networks. Our results suggest CNNs represent category information independently from shape, much like the human visual system.
Collapse
Affiliation(s)
- Astrid A Zeman
- Department of Brain and Cognition & Leuven Brain Institute, KU Leuven, Leuven, Belgium.
| | - J Brendan Ritchie
- Department of Brain and Cognition & Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Stefania Bracci
- Department of Brain and Cognition & Leuven Brain Institute, KU Leuven, Leuven, Belgium
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy
| | - Hans Op de Beeck
- Department of Brain and Cognition & Leuven Brain Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
12
|
Pruszynski JA, Zylberberg J. The language of the brain: real-world neural population codes. Curr Opin Neurobiol 2019; 58:30-36. [PMID: 31326721 DOI: 10.1016/j.conb.2019.06.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 06/22/2019] [Indexed: 11/29/2022]
Affiliation(s)
- J Andrew Pruszynski
- Department of Physiology and Pharmacology, Western University, London, ON, Canada; Department of Psychology, Western University, London, ON, Canada; Robarts Research Institute, London, ON, Canada
| | - Joel Zylberberg
- Center for Vision Research, York University, Toronto, ON, Canada; Department of Physics and Astronomy, York University, Toronto, ON, Canada; Canadian Institute for Advanced Research, Toronto, ON, Canada.
| |
Collapse
|