1
|
Hollis J, Humphreys GW, Allen PM. Intermediate, Wholistic Shape Representation in Object Recognition: A Pre-Attentive Stage of Processing? Front Hum Neurosci 2021; 15:761174. [PMID: 35002652 PMCID: PMC8735852 DOI: 10.3389/fnhum.2021.761174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open
Abstract
Evidence is presented for intermediate, wholistic visual representations of objects and non-objects that are computed online and independent of visual attention. Short-term visual priming was examined between visually similar shapes, with targets either falling at the (valid) location cued by primes or at another (invalid) location. Object decision latencies were facilitated when the overall shapes of the stimuli were similar irrespective of whether the location of the prime was valid or invalid, with the effects being equally large for object and non-object targets. In addition, the effects were based on the overall outlines of the stimuli and low spatial frequency components, not on local parts. In conclusion, wholistic shape representations based on outline form, are rapidly computed online during object recognition. Moreover, activation of common wholistic shape representations prime the processing of subsequent objects and non-objects irrespective of whether they appear at attended or unattended locations. Rapid derivation of wholistic form provides a key intermediate stage of object recognition.
Collapse
Affiliation(s)
- Jarrod Hollis
- Vision and Hearing Sciences Research Centre, Anglia Ruskin University, Cambridge, United Kingdom
| | - Glyn W. Humphreys
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Peter M. Allen
- Vision and Hearing Sciences Research Centre, Anglia Ruskin University, Cambridge, United Kingdom
| |
Collapse
|
2
|
Rolls ET. Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning. Front Comput Neurosci 2021; 15:686239. [PMID: 34366818 PMCID: PMC8335547 DOI: 10.3389/fncom.2021.686239] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
First, neurophysiological evidence for the learning of invariant representations in the inferior temporal visual cortex is described. This includes object and face representations with invariance for position, size, lighting, view and morphological transforms in the temporal lobe visual cortex; global object motion in the cortex in the superior temporal sulcus; and spatial view representations in the hippocampus that are invariant with respect to eye position, head direction, and place. Second, computational mechanisms that enable the brain to learn these invariant representations are proposed. For the ventral visual system, one key adaptation is the use of information available in the statistics of the environment in slow unsupervised learning to learn transform-invariant representations of objects. This contrasts with deep supervised learning in artificial neural networks, which uses training with thousands of exemplars forced into different categories by neuronal teachers. Similar slow learning principles apply to the learning of global object motion in the dorsal visual system leading to the cortex in the superior temporal sulcus. The learning rule that has been explored in VisNet is an associative rule with a short-term memory trace. The feed-forward architecture has four stages, with convergence from stage to stage. This type of slow learning is implemented in the brain in hierarchically organized competitive neuronal networks with convergence from stage to stage, with only 4-5 stages in the hierarchy. Slow learning is also shown to help the learning of coordinate transforms using gain modulation in the dorsal visual system extending into the parietal cortex and retrosplenial cortex. Representations are learned that are in allocentric spatial view coordinates of locations in the world and that are independent of eye position, head direction, and the place where the individual is located. This enables hippocampal spatial view cells to use idiothetic, self-motion, signals for navigation when the view details are obscured for short periods.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom.,Department of Computer Science, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
3
|
Hafri A, Firestone C. The Perception of Relations. Trends Cogn Sci 2021; 25:475-492. [PMID: 33812770 DOI: 10.1016/j.tics.2021.01.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/05/2021] [Accepted: 01/18/2021] [Indexed: 11/16/2022]
Abstract
The world contains not only objects and features (red apples, glass bowls, wooden tables), but also relations holding between them (apples contained in bowls, bowls supported by tables). Representations of these relations are often developmentally precocious and linguistically privileged; but how does the mind extract them in the first place? Although relations themselves cast no light onto our eyes, a growing body of work suggests that even very sophisticated relations display key signatures of automatic visual processing. Across physical, eventive, and social domains, relations such as support, fit, cause, chase, and even socially interact are extracted rapidly, are impossible to ignore, and influence other perceptual processes. Sophisticated and structured relations are not only judged and understood, but also seen - revealing surprisingly rich content in visual perception itself.
Collapse
Affiliation(s)
- Alon Hafri
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Cognitive Science, Johns Hopkins University, Baltimore, MD 21218, USA.
| | - Chaz Firestone
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Cognitive Science, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Philosophy, Johns Hopkins University, Baltimore, MD 21218, USA.
| |
Collapse
|
4
|
Using eye-tracking to parse object recognition: Priming activates primarily a parts-based but also a late-emerging features-based representation. Atten Percept Psychophys 2020; 82:3096-3111. [DOI: 10.3758/s13414-020-02040-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
5
|
Xiong C, Ceja CR, Ludwig CJH, Franconeri S. Biased Average Position Estimates in Line and Bar Graphs: Underestimation, Overestimation, and Perceptual Pull. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:301-310. [PMID: 31425112 DOI: 10.1109/tvcg.2019.2934400] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In visual depictions of data, position (i.e., the vertical height of a line or a bar) is believed to be the most precise way to encode information compared to other encodings (e.g., hue). Not only are other encodings less precise than position, but they can also be prone to systematic biases (e.g., color category boundaries can distort perceived differences between hues). By comparison, position's high level of precision may seem to protect it from such biases. In contrast, across three empirical studies, we show that while position may be a precise form of data encoding, it can also produce systematic biases in how values are visually encoded, at least for reports of average position across a short delay. In displays with a single line or a single set of bars, reports of average positions were significantly biased, such that line positions were underestimated and bar positions were overestimated. In displays with multiple data series (i.e., multiple lines and/or sets of bars), this systematic bias still persisted. We also observed an effect of "perceptual pull", where the average position estimate for each series was 'pulled' toward the other. These findings suggest that, although position may still be the most precise form of visual data encoding, it can also be systematically biased.
Collapse
|
6
|
Rolls ET, Mills WPC. Non-accidental properties, metric invariance, and encoding by neurons in a model of ventral stream visual object recognition, VisNet. Neurobiol Learn Mem 2018; 152:20-31. [PMID: 29723671 DOI: 10.1016/j.nlm.2018.04.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 04/02/2018] [Accepted: 04/27/2018] [Indexed: 11/18/2022]
Abstract
When objects transform into different views, some properties are maintained, such as whether the edges are convex or concave, and these non-accidental properties are likely to be important in view-invariant object recognition. The metric properties, such as the degree of curvature, may change with different views, and are less likely to be useful in object recognition. It is shown that in a model of invariant visual object recognition in the ventral visual stream, VisNet, non-accidental properties are encoded much more than metric properties by neurons. Moreover, it is shown how with the temporal trace rule training in VisNet, non-accidental properties of objects become encoded by neurons, and how metric properties are treated invariantly. We also show how VisNet can generalize between different objects if they have the same non-accidental property, because the metric properties are likely to overlap. VisNet is a 4-layer unsupervised model of visual object recognition trained by competitive learning that utilizes a temporal trace learning rule to implement the learning of invariance using views that occur close together in time. A second crucial property of this model of object recognition is, when neurons in the level corresponding to the inferior temporal visual cortex respond selectively to objects, whether neurons in the intermediate layers can respond to combinations of features that may be parts of two or more objects. In an investigation using the four sides of a square presented in every possible combination, it was shown that even though different layer 4 neurons are tuned to encode each feature or feature combination orthogonally, neurons in the intermediate layers can respond to features or feature combinations present is several objects. This property is an important part of the way in which high capacity can be achieved in the four-layer ventral visual cortical pathway. These findings concerning non-accidental properties and the use of neurons in intermediate layers of the hierarchy help to emphasise fundamental underlying principles of the computations that may be implemented in the ventral cortical visual stream used in object recognition.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK.
| | - W Patrick C Mills
- University of Warwick, Department of Computer Science, Coventry, UK. http://www.oxcns.org
| |
Collapse
|
7
|
An applet for the Gabor similarity scaling of the differences between complex stimuli. Atten Percept Psychophys 2017; 78:2298-2306. [PMID: 27557818 DOI: 10.3758/s13414-016-1191-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
It is widely accepted that after the first cortical visual area, V1, a series of stages achieves a representation of complex shapes, such as faces and objects, so that they can be understood and recognized. A major challenge for the study of complex shape perception has been the lack of a principled basis for scaling of the physical differences between stimuli so that their similarity can be specified, unconfounded by early-stage differences. Without the specification of such similarities, it is difficult to make sound inferences about the contributions of later stages to neural activity or psychophysical performance. A Web-based app is described that is based on the Malsburg Gabor-jet model (Lades et al., 1993), which allows easy specification of the V1 similarity of pairs of stimuli, no matter how intricate. The model predicts the psycho physical discriminability of metrically varying faces and complex blobs almost perfectly (Yue, Biederman, Mangini, von der Malsburg, & Amir, 2012), and serves as the input stage of a large family of contemporary neurocomputational models of vision.
Collapse
|
8
|
Margalit E, Biederman I, Tjan BS, Shah MP. What Is Actually Affected by the Scrambling of Objects When Localizing the Lateral Occipital Complex? J Cogn Neurosci 2017; 29:1595-1604. [DOI: 10.1162/jocn_a_01144] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
The lateral occipital complex (LOC), the cortical region critical for shape perception, is localized with fMRI by its greater BOLD activity when viewing intact objects compared with their scrambled versions (resembling texture). Despite hundreds of studies investigating LOC, what the LOC localizer accomplishes—beyond distinguishing shape from texture—has never been resolved. By independently scattering the intact parts of objects, the axis structure defining the relations between parts was no longer defined. This led to a diminished BOLD response, despite the increase in the number of independent entities (the parts) produced by the scattering, thus indicating that LOC specifies interpart relations, in addition to specifying the shape of the parts themselves. LOC's sensitivity to relations is not confined to those between parts but is also readily apparent between objects, rendering it—and not subsequent “place” areas—as the critical region for the representation of scenes. Moreover, that these effects are witnessed with novel as well as familiar intact objects and scenes suggests that the relations are computed on the fly, rather than being retrieved from memory.
Collapse
|
9
|
Lovett A, Franconeri SL. Topological Relations Between Objects Are Categorically Coded. Psychol Sci 2017; 28:1408-1418. [PMID: 28783447 DOI: 10.1177/0956797617709814] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
How do individuals compare images-for example, two graphs or diagrams-to identify differences between them? We argue that categorical relations between objects play a critical role. These relations divide continuous space into discrete categories, such as "above" and "below," or "containing" and "overlapping," which are remembered and compared more easily than precise metric values. These relations should lead to categorical perception, such that viewers find it easier to notice a change that crosses a category boundary (one object is now above, rather than below, another, or now contains, rather than overlaps with, another) than a change of equal magnitude that does not cross a boundary. We tested the influence of a set of topological categorical relations from the cognitive-modeling literature. In a visual same/different comparison task, viewers more accurately noticed changes that crossed relational category boundaries, compared with changes that did not cross these boundaries. The results highlight the potential of systematic exploration of the boundaries of between-object relational categories.
Collapse
|
10
|
Kubilius J, Sleurs C, Wagemans J. Sensitivity to Nonaccidental Configurations of Two-Line Stimuli. Iperception 2017; 8:2041669517699628. [PMID: 28491272 PMCID: PMC5405893 DOI: 10.1177/2041669517699628] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
According to Recognition-By-Components theory, object recognition relies on a specific subset of three-dimensional shapes called geons. In particular, these configurations constitute a powerful cue to three-dimensional object reconstruction because their two-dimensional projection remains viewpoint-invariant. While a large body of literature has demonstrated sensitivity to changes in these so-called nonaccidental configurations, it remains unclear what information is used in establishing such sensitivity. In this study, we explored the possibility that nonaccidental configurations can already be inferred from the basic constituents of objects, namely, their edges. We constructed a set of stimuli composed of two lines corresponding to various nonaccidental properties and configurations underlying the distinction between geons, including collinearity, alignment, curvature of contours, curvature of configuration axis, expansion, cotermination, and junction type. Using a simple visual search paradigm, we demonstrated that participants were faster at detecting targets that differed from distractors in a nonaccidental property than in a metric property. We also found that only some but not all of the observed sensitivity could have resulted from simple low-level properties of our stimuli. Given that such sensitivity emerged from a configuration of only two lines, our results support the view that nonaccidental configurations could be encoded throughout the visual processing hierarchy even in the absence of object context.
Collapse
|
11
|
Kubilius J, Bracci S, Op de Beeck HP. Deep Neural Networks as a Computational Model for Human Shape Sensitivity. PLoS Comput Biol 2016; 12:e1004896. [PMID: 27124699 PMCID: PMC4849740 DOI: 10.1371/journal.pcbi.1004896] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 03/30/2016] [Indexed: 11/19/2022] Open
Abstract
Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development. Shape plays an important role in object recognition. Despite years of research, no models of vision could account for shape understanding as found in human vision of natural images. Given recent successes of deep neural networks (DNNs) in object recognition, we hypothesized that DNNs might in fact learn to capture perceptually salient shape dimensions. Using a variety of stimulus sets, we demonstrate here that the output layers of several DNNs develop representations that relate closely to human perceptual shape judgments. Surprisingly, such sensitivity to shape develops in these models even though they were never explicitly trained for shape processing. Moreover, we show that these models also represent categorical object similarity that follows human semantic judgments, albeit to a lesser extent. Taken together, our results bring forward the exciting idea that DNNs capture not only objective dimensions of stimuli, such as their category, but also their subjective, or perceptual, aspects, such as shape and semantic similarity as judged by humans.
Collapse
Affiliation(s)
- Jonas Kubilius
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| | - Stefania Bracci
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
| | - Hans P. Op de Beeck
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| |
Collapse
|
12
|
Object grouping based on real-world regularities facilitates perception by reducing competitive interactions in visual cortex. Proc Natl Acad Sci U S A 2014; 111:11217-22. [PMID: 25024190 DOI: 10.1073/pnas.1400559111] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In virtually every real-life situation humans are confronted with complex and cluttered visual environments that contain a multitude of objects. Because of the limited capacity of the visual system, objects compete for neural representation and cognitive processing resources. Previous work has shown that such attentional competition is partly object based, such that competition among elements is reduced when these elements perceptually group into an object based on low-level cues. Here, using functional MRI (fMRI) and behavioral measures, we show that the attentional benefit of grouping extends to higher-level grouping based on the relative position of objects as experienced in the real world. An fMRI study designed to measure competitive interactions among objects in human visual cortex revealed reduced neural competition between objects when these were presented in commonly experienced configurations, such as a lamp above a table, relative to the same objects presented in other configurations. In behavioral visual search studies, we then related this reduced neural competition to improved target detection when distracter objects were shown in regular configurations. Control studies showed that low-level grouping could not account for these results. We interpret these findings as reflecting the grouping of objects based on higher-level spatial-relational knowledge acquired through a lifetime of seeing objects in specific configurations. This interobject grouping effectively reduces the number of objects that compete for representation and thereby contributes to the efficiency of real-world perception.
Collapse
|
13
|
Abstract
AbstractThe dissociation of a figure from its background is an essential feat of visual perception, as it allows us to detect, recognize, and interact with shapes and objects in our environment. In order to understand how the human brain gives rise to the perception of figures, we here review experiments that explore the links between activity in visual cortex and performance of perceptual tasks related to figure perception. We organize our review according to a proposed model that attempts to contextualize figure processing within the more general framework of object processing in the brain. Overall, the current literature provides us with individual linking hypotheses as to cortical regions that are necessary for particular tasks related to figure perception. Attempts to reach a more complete understanding of how the brain instantiates figure and object perception, however, will have to consider the temporal interaction between the many regions involved, the details of which may vary widely across different tasks.
Collapse
|