1
|
Jacob G, Pramod RT, Katti H, Arun SP. Qualitative similarities and differences in visual object representations between brains and deep networks. Nat Commun 2021; 12:1872. [PMID: 33767141 PMCID: PMC7994307 DOI: 10.1038/s41467-021-22078-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 02/22/2021] [Indexed: 02/06/2023] Open
Abstract
Deep neural networks have revolutionized computer vision, and their object representations across layers match coarsely with visual cortical areas in the brain. However, whether these representations exhibit qualitative patterns seen in human perception or brain representations remains unresolved. Here, we recast well-known perceptual and neural phenomena in terms of distance comparisons, and ask whether they are present in feedforward deep neural networks trained for object recognition. Some phenomena were present in randomly initialized networks, such as the global advantage effect, sparseness, and relative size. Many others were present after object recognition training, such as the Thatcher effect, mirror confusion, Weber’s law, relative size, multiple object normalization and correlated sparseness. Yet other phenomena were absent in trained networks, such as 3D shape processing, surface invariance, occlusion, natural parts and the global advantage. These findings indicate sufficient conditions for the emergence of these phenomena in brains and deep networks, and offer clues to the properties that could be incorporated to improve deep networks. Deep neural networks are widely considered as good models for biological vision. Here, we describe several qualitative similarities and differences in object representations between brains and deep networks that elucidate when deep networks can be considered good models for biological vision and how they can be improved.
Collapse
Affiliation(s)
- Georgin Jacob
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India.,Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India
| | - R T Pramod
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India.,Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India
| | - Harish Katti
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India
| | - S P Arun
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India.
| |
Collapse
|
2
|
Srinath R, Emonds A, Wang Q, Lempel AA, Dunn-Weiss E, Connor CE, Nielsen KJ. Early Emergence of Solid Shape Coding in Natural and Deep Network Vision. Curr Biol 2020; 31:51-65.e5. [PMID: 33096039 DOI: 10.1016/j.cub.2020.09.076] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 08/24/2020] [Accepted: 09/25/2020] [Indexed: 10/23/2022]
Abstract
Area V4 is the first object-specific processing stage in the ventral visual pathway, just as area MT is the first motion-specific processing stage in the dorsal pathway. For almost 50 years, coding of object shape in V4 has been studied and conceived in terms of flat pattern processing, given its early position in the transformation of 2D visual images. Here, however, in awake monkey recording experiments, we found that roughly half of V4 neurons are more tuned and responsive to solid, 3D shape-in-depth, as conveyed by shading, specularity, reflection, refraction, or disparity cues in images. Using 2-photon functional microscopy, we found that flat- and solid-preferring neurons were segregated into separate modules across the surface of area V4. These findings should impact early shape-processing theories and models, which have focused on 2D pattern processing. In fact, our analyses of early object processing in AlexNet, a standard visual deep network, revealed a similar distribution of sensitivities to flat and solid shape in layer 3. Early processing of solid shape, in parallel with flat shape, could represent a computational advantage discovered by both primate brain evolution and deep-network training.
Collapse
Affiliation(s)
- Ramanujan Srinath
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Alexandriya Emonds
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Qingyang Wang
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Augusto A Lempel
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Erika Dunn-Weiss
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Charles E Connor
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| | - Kristina J Nielsen
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
3
|
Multiplicative mixing of object identity and image attributes in single inferior temporal neurons. Proc Natl Acad Sci U S A 2018; 115:E3276-E3285. [PMID: 29559530 PMCID: PMC5889630 DOI: 10.1073/pnas.1714287115] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Vision is a challenging problem because the same object can produce a variety of images on the retina, mixing signals related to its identity with signals related to its viewing attributes, such as size, position, rotation, etc. Precisely how the brain separates these signals to form an efficient representation is unknown. Here, we show that single neurons in high-level visual cortex encode object identity and attribute multiplicatively and that doing so allows for better decoding of each signal. Object recognition is challenging because the same object can produce vastly different images, mixing signals related to its identity with signals due to its image attributes, such as size, position, rotation, etc. Previous studies have shown that both signals are present in high-level visual areas, but precisely how they are combined has remained unclear. One possibility is that neurons might encode identity and attribute signals multiplicatively so that each can be efficiently decoded without interference from the other. Here, we show that, in high-level visual cortex, responses of single neurons can be explained better as a product rather than a sum of tuning for object identity and tuning for image attributes. This subtle effect in single neurons produced substantially better population decoding of object identity and image attributes in the neural population as a whole. This property was absent both in low-level vision models and in deep neural networks. It was also unique to invariances: when tested with two-part objects, neural responses were explained better as a sum than as a product of part tuning. Taken together, our results indicate that signals requiring separate decoding, such as object identity and image attributes, are combined multiplicatively in IT neurons, whereas signals that require integration (such as parts in an object) are combined additively.
Collapse
|
4
|
Pramod RT, Arun SP. Symmetric Objects Become Special in Perception Because of Generic Computations in Neurons. Psychol Sci 2017; 29:95-109. [PMID: 29219748 PMCID: PMC5772447 DOI: 10.1177/0956797617729808] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Symmetry is a salient visual property: It is easy to detect and influences perceptual
phenomena from segmentation to recognition. Yet researchers know little about its neural
basis. Using recordings from single neurons in monkey IT cortex, we asked whether
symmetry—being an emergent property—induces nonlinear interactions between object parts.
Remarkably, we found no such deviation: Whole-object responses were always the sum of
responses to the object’s parts, regardless of symmetry. The only defining characteristic
of symmetric objects was that they were more distinctive compared with asymmetric objects.
This was a consequence of neurons preferring the same part across locations within an
object. Just as mixing diverse paints produces a homogeneous overall color, adding
heterogeneous parts within an asymmetric object renders it indistinct. In contrast, adding
identical parts within a symmetric object renders it distinct. This distinctiveness
systematically predicted human symmetry judgments, and it explains many previous
observations about symmetry perception. Thus, symmetry becomes special in perception
despite being driven by generic computations at the level of single neurons.
Collapse
Affiliation(s)
- R T Pramod
- Centre for Neuroscience and Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India
| | - S P Arun
- Centre for Neuroscience and Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India
| |
Collapse
|
5
|
Ratan Murty NA, Arun SP. Effect of silhouetting and inversion on view invariance in the monkey inferotemporal cortex. J Neurophysiol 2017; 118:353-362. [PMID: 28381484 PMCID: PMC5501916 DOI: 10.1152/jn.00008.2017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Revised: 03/31/2017] [Accepted: 04/01/2017] [Indexed: 11/23/2022] Open
Abstract
We easily recognize objects across changes in viewpoint, but the underlying features are unknown. Here, we show that view invariance in monkey inferotemporal cortex is driven mainly by external object contours and is not specialized for object orientation. We also find that the responses to natural objects match with that of their silhouettes early in the response, and with inverted versions later in the response—indicative of a coarse-to-fine processing sequence in the brain. We effortlessly recognize objects across changes in viewpoint, but we know relatively little about the features that underlie viewpoint invariance in the brain. Here, we set out to characterize how viewpoint invariance in monkey inferior temporal (IT) neurons is influenced by two image manipulations—silhouetting and inversion. Reducing an object into its silhouette removes internal detail, so this would reveal how much viewpoint invariance depends on the external contours. Inverting an object retains but rearranges features, so this would reveal how much viewpoint invariance depends on the arrangement and orientation of features. Our main findings are 1) view invariance is weakened by silhouetting but not by inversion; 2) view invariance was stronger in neurons that generalized across silhouetting and inversion; 3) neuronal responses to natural objects matched early with that of silhouettes and only later to that of inverted objects, indicative of coarse-to-fine processing; and 4) the impact of silhouetting and inversion depended on object structure. Taken together, our results elucidate the underlying features and dynamics of view-invariant object representations in the brain. NEW & NOTEWORTHY We easily recognize objects across changes in viewpoint, but the underlying features are unknown. Here, we show that view invariance in the monkey inferotemporal cortex is driven mainly by external object contours and is not specialized for object orientation. We also find that the responses to natural objects match with that of their silhouettes early in the response, and with inverted versions later in the response—indicative of a coarse-to-fine processing sequence in the brain.
Collapse
Affiliation(s)
| | - S P Arun
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India
| |
Collapse
|
6
|
A Balanced Comparison of Object Invariances in Monkey IT Neurons. eNeuro 2017; 4:eN-NWR-0333-16. [PMID: 28413827 PMCID: PMC5390242 DOI: 10.1523/eneuro.0333-16.2017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Revised: 03/28/2017] [Accepted: 03/29/2017] [Indexed: 11/21/2022] Open
Abstract
Our ability to recognize objects across variations in size, position, or rotation is based on invariant object representations in higher visual cortex. However, we know little about how these invariances are related. Are some invariances harder than others? Do some invariances arise faster than others? These comparisons can be made only upon equating image changes across transformations. Here, we targeted invariant neural representations in the monkey inferotemporal (IT) cortex using object images with balanced changes in size, position, and rotation. Across the recorded population, IT neurons generalized across size and position both stronger and faster than to rotations in the image plane as well as in depth. We obtained a similar ordering of invariances in deep neural networks but not in low-level visual representations. Thus, invariant neural representations dynamically evolve in a temporal order reflective of their underlying computational complexity.
Collapse
|