1
|
Dvoeglazova M, Sawada T. A role of rectangularity in perceiving a 3D shape of an object. Vision Res 2024; 221:108433. [PMID: 38772272 DOI: 10.1016/j.visres.2024.108433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/19/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024]
Abstract
Rectangularity and perpendicularity of contours are important properties of 3D shape for the visual system and the visual system can use them asa prioriconstraints for perceivingshape veridically. The presentarticle provides a comprehensive review ofpriorstudiesofthe perception of rectangularity and perpendicularity anditdiscussestheir effects on3D shape perception from both theoretical and empiricalapproaches. It has been shown that the visual system is biased to perceive a rectangular 3D shape from a 2D image. We thought that this bias might be attributable to the likelihood of a rectangular interpretation but this hypothesis is not supported by the results of our psychophysical experiment. Note that the perception ofa rectangular shape cannot be explained solely on the basis of geometry. A rectangular shape is perceived from an image that is inconsistent with a rectangular interpretation. To address thisissue, we developed a computational model that can recover a rectangular shape from an image of a parallelopiped. The model allows the recovered shape to be slightly inconsistent so that the recovered shape satisfies the a priori constraints of maximum compactness and minimal surface area. This model captures someof thephenomenaassociated withthe perception of the rectangular shape that were reported inpriorstudies. This finding suggests that rectangularity works for shape perception by incorporatingitwith someadditionalconstraints.
Collapse
Affiliation(s)
| | - Tadamasa Sawada
- School of Psychology, HSE University, Moscow, Russia; Akian College of Science and Engineering, American University of Armenia, Yerevan, Armenia; Department of Psychology, Russian-Armenian (Slavonic) University, Yerevan, Armenia; European University of Armenia, Yerevan, Armenia
| |
Collapse
|
2
|
Leemans M, Damiano C, Wagemans J. Finding the meaning in meaning maps: Quantifying the roles of semantic and non-semantic scene information in guiding visual attention. Cognition 2024; 247:105788. [PMID: 38579638 DOI: 10.1016/j.cognition.2024.105788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/16/2024] [Accepted: 03/30/2024] [Indexed: 04/07/2024]
Abstract
In real-world vision, people prioritise the most informative scene regions via eye-movements. According to the cognitive guidance theory of visual attention, viewers allocate visual attention to those parts of the scene that are expected to be the most informative. The expected information of a scene region is coded in the semantic distribution of that scene. Meaning maps have been proposed to capture the spatial distribution of local scene semantics in order to test cognitive guidance theories of attention. Notwithstanding the success of meaning maps, the reason for their success has been contested. This has led to at least two possible explanations for the success of meaning maps in predicting visual attention. On the one hand, meaning maps might measure scene semantics. On the other hand, meaning maps might measure scene features, overlapping with, but distinct from, scene semantics. This study aims to disentangle these two sources of information by considering both conceptual information and non-semantic scene entropy simultaneously. We found that both semantic and non-semantic information is captured by meaning maps, but scene entropy accounted for more unique variance in the success of meaning maps than conceptual information. Additionally, some explained variance was unaccounted for by either source of information. Thus, although meaning maps may index some aspect of semantic information, their success seems to be better explained by non-semantic information. We conclude that meaning maps may not yet be a good tool to test cognitive guidance theories of attention in general, since they capture non-semantic aspects of local semantic density and only a small portion of conceptual information. Rather, we suggest that researchers should better define the exact aspect of cognitive guidance theories they wish to test and then use the tool that best captures that desired semantic information. As it stands, the semantic information contained in meaning maps seems too ambiguous to draw strong conclusions about how and when semantic information guides visual attention.
Collapse
Affiliation(s)
- Maarten Leemans
- Laboratory of Experimental Psychology, Department of Brain and Cognition, University of Leuven (KU Leuven), Belgium.
| | - Claudia Damiano
- Laboratory of Experimental Psychology, Department of Brain and Cognition, University of Leuven (KU Leuven), Belgium
| | - Johan Wagemans
- Laboratory of Experimental Psychology, Department of Brain and Cognition, University of Leuven (KU Leuven), Belgium
| |
Collapse
|
3
|
Watier N. Measures of angularity in digital images. Behav Res Methods 2024:10.3758/s13428-024-02412-5. [PMID: 38689153 DOI: 10.3758/s13428-024-02412-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2024] [Indexed: 05/02/2024]
Abstract
In light of the growing interest in studying the affective and aesthetic attributes of curvature, the present paper describes four digital image processing techniques that can be used to objectively discriminate between angular and curvilinear stimuli. MATLAB scripts for each of the techniques accompany the paper. Three studies are then reported that evaluate the efficacy of five metrics, derived from the four techniques, at quantifying the degree of angularity depicted in an image. Images of simple polygons (Study 1), artistic drawings of everyday objects (Study 2), and real-world objects, typefaces, and abstract patterns (Study 3) were analyzed. Logistic regression models were used to determine the relative importance of the metrics at distinguishing between angular and curvilinear items. With one exception, all of the metrics were capable of distinguishing between angular and curvilinear items at a level above chance, but some metrics were better at doing so than others, and their discriminative capacity was influenced by the characteristics of the image. The strengths and limitations of the metrics are discussed, as well as some practical recommendations.
Collapse
Affiliation(s)
- Nicholas Watier
- Department of Psychology, Brandon University, 270 - 18th St, Brandon, MB, R7A 6A9, Canada.
| |
Collapse
|
4
|
Morgenstern Y, Storrs KR, Schmidt F, Hartmann F, Tiedemann H, Wagemans J, Fleming RW. High-level aftereffects reveal the role of statistical features in visual shape encoding. Curr Biol 2024; 34:1098-1106.e5. [PMID: 38218184 PMCID: PMC10931819 DOI: 10.1016/j.cub.2023.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 11/13/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
Visual shape perception is central to many everyday tasks, from object recognition to grasping and handling tools.1,2,3,4,5,6,7,8,9,10 Yet how shape is encoded in the visual system remains poorly understood. Here, we probed shape representations using visual aftereffects-perceptual distortions that occur following extended exposure to a stimulus.11,12,13,14,15,16,17 Such effects are thought to be caused by adaptation in neural populations that encode both simple, low-level stimulus characteristics17,18,19,20 and more abstract, high-level object features.21,22,23 To tease these two contributions apart, we used machine-learning methods to synthesize novel shapes in a multidimensional shape space, derived from a large database of natural shapes.24 Stimuli were carefully selected such that low-level and high-level adaptation models made distinct predictions about the shapes that observers would perceive following adaptation. We found that adaptation along vector trajectories in the high-level shape space predicted shape aftereffects better than simple low-level processes. Our findings reveal the central role of high-level statistical features in the visual representation of shape. The findings also hint that human vision is attuned to the distribution of shapes experienced in the natural environment.
Collapse
Affiliation(s)
- Yaniv Morgenstern
- Erasmus University Rotterdam, Department of Psychology, Burgemeester Oudlaan 50, 3062PA Rotterdam, the Netherlands; University of Leuven (KU Leuven), Brain and Cognition, Tiensestraat 102, 3000 Leuven, Belgium.
| | - Katherine R Storrs
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany; University of Auckland, School of Psychology, 23 Symonds Street, Auckland 1010, New Zealand
| | - Filipp Schmidt
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany; University of Marburg and Justus Liebig University Giessen, Center for Mind, Brain and Behavior (CMBB), Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Frieder Hartmann
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany
| | - Henning Tiedemann
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany
| | - Johan Wagemans
- University of Leuven (KU Leuven), Brain and Cognition, Tiensestraat 102, 3000 Leuven, Belgium
| | - Roland W Fleming
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany; University of Marburg and Justus Liebig University Giessen, Center for Mind, Brain and Behavior (CMBB), Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| |
Collapse
|
5
|
Han S, Rezanejad M, Walther DB. Memorability of line drawings of scenes: the role of contour properties. Mem Cognit 2023:10.3758/s13421-023-01478-4. [PMID: 37903987 DOI: 10.3758/s13421-023-01478-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2023] [Indexed: 11/01/2023]
Abstract
Why are some images more likely to be remembered than others? Previous work focused on the influence of global, low-level visual features as well as image content on memorability. To better understand the role of local, shape-based contours, we here investigate the memorability of photographs and line drawings of scenes. We find that the memorability of photographs and line drawings of the same scenes is correlated. We quantitatively measure the role of contour properties and their spatial relationships for scene memorability using a Random Forest analysis. To determine whether this relationship is merely correlational or if manipulating these contour properties causes images to be remembered better or worse, we split each line drawing into two half-images, one with high and the other with low predicted memorability according to the trained Random Forest model. In a new memorability experiment, we find that the half-images predicted to be more memorable were indeed remembered better, confirming a causal role of shape-based contour features, and, in particular, T junctions in scene memorability. We performed a categorization experiment on half-images to test for differential access to scene content. We found that half-images predicted to be more memorable were categorized more accurately. However, categorization accuracy for individual images was not correlated with their memorability. These results demonstrate that we can measure the contributions of individual contour properties to scene memorability and verify their causal involvement with targeted image manipulations, thereby bridging the gap between low-level features and scene semantics in our understanding of memorability.
Collapse
Affiliation(s)
- Seohee Han
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, Canada.
| | - Morteza Rezanejad
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, Canada
| | - Dirk B Walther
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, Canada
| |
Collapse
|
6
|
Farzanfar D, Walther DB. Changing What You Like: Modifying Contour Properties Shifts Aesthetic Valuations of Scenes. Psychol Sci 2023; 34:1101-1120. [PMID: 37669066 DOI: 10.1177/09567976231190546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023] Open
Abstract
To what extent do aesthetic experiences arise from the human ability to perceive and extract meaning from visual features? Ordinary scenes, such as a beach sunset, can elicit a sense of beauty in most observers. Although it appears that aesthetic responses can be shared among humans, little is known about the cognitive mechanisms that underlie this phenomenon. We developed a contour model of aesthetics that assigns values to visual properties in scenes, allowing us to predict aesthetic responses in adults from around the world. Through a series of experiments, we manipulate contours to increase or decrease aesthetic value while preserving scene semantic identity. Contour manipulations directly shift subjective aesthetic judgments. This provides the first experimental evidence for a causal relationship between contour properties and aesthetic valuation. Our findings support the notion that visual regularities underlie the human capacity to derive pleasure from visual information.
Collapse
|
7
|
Kang J, Park S. Combined representation of visual features in the scene-selective cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.24.550280. [PMID: 37546776 PMCID: PMC10402097 DOI: 10.1101/2023.07.24.550280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Visual features of separable dimensions like color and shape conjoin to represent an integrated entity. We investigated how visual features bind to form a complex visual scene. Specifically, we focused on features important for visually guided navigation: direction and distance. Previously, separate works have shown that directions and distances of navigable paths are coded in the occipital place area (OPA). Using functional magnetic resonance imaging (fMRI), we tested how separate features are concurrently represented in the OPA. Participants saw eight different types of scenes, in which four of them had one path and the other four had two paths. In single-path scenes, path direction was either to the left or to the right. In double-path scenes, both directions were present. Each path contained a glass wall located either near or far, changing the navigational distance. To test how the OPA represents paths in terms of direction and distance features, we took three approaches. First, the independent-features approach examined whether the OPA codes directions and distances independently in single-path scenes. Second, the integrated-features approach explored how directions and distances are integrated into path units, as compared to pooled features, using double-path scenes. Finally, the integrated-paths approach asked how separate paths are combined into a scene. Using multi-voxel pattern similarity analysis, we found that the OPA's representations of single-path scenes were similar to other single-path scenes of either the same direction or the same distance. Representations of double-path scenes were similar to the combination of two constituent single-paths, as a combined unit of direction and distance rather than pooled representation of all features. These results show that the OPA combines the two features to form path units, which are then used to build multiple-path scenes. Altogether, these results suggest that visually guided navigation may be supported by the OPA that automatically and efficiently combines multiple features relevant for navigation and represent a navigation file.
Collapse
Affiliation(s)
- Jisu Kang
- Department of Psychology, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Soojin Park
- Department of Psychology, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
| |
Collapse
|
8
|
Henderson MM, Tarr MJ, Wehbe L. A Texture Statistics Encoding Model Reveals Hierarchical Feature Selectivity across Human Visual Cortex. J Neurosci 2023; 43:4144-4161. [PMID: 37127366 PMCID: PMC10255092 DOI: 10.1523/jneurosci.1822-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/21/2023] [Accepted: 03/26/2023] [Indexed: 05/03/2023] Open
Abstract
Midlevel features, such as contour and texture, provide a computational link between low- and high-level visual representations. Although the nature of midlevel representations in the brain is not fully understood, past work has suggested a texture statistics model, called the P-S model (Portilla and Simoncelli, 2000), is a candidate for predicting neural responses in areas V1-V4 as well as human behavioral data. However, it is not currently known how well this model accounts for the responses of higher visual cortex to natural scene images. To examine this, we constructed single-voxel encoding models based on P-S statistics and fit the models to fMRI data from human subjects (both sexes) from the Natural Scenes Dataset (Allen et al., 2022). We demonstrate that the texture statistics encoding model can predict the held-out responses of individual voxels in early retinotopic areas and higher-level category-selective areas. The ability of the model to reliably predict signal in higher visual cortex suggests that the representation of texture statistics features is widespread throughout the brain. Furthermore, using variance partitioning analyses, we identify which features are most uniquely predictive of brain responses and show that the contributions of higher-order texture features increase from early areas to higher areas on the ventral and lateral surfaces. We also demonstrate that patterns of sensitivity to texture statistics can be used to recover broad organizational axes within visual cortex, including dimensions that capture semantic image content. These results provide a key step forward in characterizing how midlevel feature representations emerge hierarchically across the visual system.SIGNIFICANCE STATEMENT Intermediate visual features, like texture, play an important role in cortical computations and may contribute to tasks like object and scene recognition. Here, we used a texture model proposed in past work to construct encoding models that predict the responses of neural populations in human visual cortex (measured with fMRI) to natural scene stimuli. We show that responses of neural populations at multiple levels of the visual system can be predicted by this model, and that the model is able to reveal an increase in the complexity of feature representations from early retinotopic cortex to higher areas of ventral and lateral visual cortex. These results support the idea that texture-like representations may play a broad underlying role in visual processing.
Collapse
Affiliation(s)
- Margaret M Henderson
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Psychology
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Michael J Tarr
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Psychology
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Leila Wehbe
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Psychology
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| |
Collapse
|
9
|
The Spatiotemporal Neural Dynamics of Object Recognition for Natural Images and Line Drawings. J Neurosci 2023; 43:484-500. [PMID: 36535769 PMCID: PMC9864561 DOI: 10.1523/jneurosci.1546-22.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/18/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
Drawings offer a simple and efficient way to communicate meaning. While line drawings capture only coarsely how objects look in reality, we still perceive them as resembling real-world objects. Previous work has shown that this perceived similarity is mirrored by shared neural representations for drawings and natural images, which suggests that similar mechanisms underlie the recognition of both. However, other work has proposed that representations of drawings and natural images become similar only after substantial processing has taken place, suggesting distinct mechanisms. To arbitrate between those alternatives, we measured brain responses resolved in space and time using fMRI and MEG, respectively, while human participants (female and male) viewed images of objects depicted as photographs, line drawings, or sketch-like drawings. Using multivariate decoding, we demonstrate that object category information emerged similarly fast and across overlapping regions in occipital, ventral-temporal, and posterior parietal cortex for all types of depiction, yet with smaller effects at higher levels of visual abstraction. In addition, cross-decoding between depiction types revealed strong generalization of object category information from early processing stages on. Finally, by combining fMRI and MEG data using representational similarity analysis, we found that visual information traversed similar processing stages for all types of depiction, yet with an overall stronger representation for photographs. Together, our results demonstrate broad commonalities in the neural dynamics of object recognition across types of depiction, thus providing clear evidence for shared neural mechanisms underlying recognition of natural object images and abstract drawings.SIGNIFICANCE STATEMENT When we see a line drawing, we effortlessly recognize it as an object in the world despite its simple and abstract style. Here we asked to what extent this correspondence in perception is reflected in the brain. To answer this question, we measured how neural processing of objects depicted as photographs and line drawings with varying levels of detail (from natural images to abstract line drawings) evolves over space and time. We find broad commonalities in the spatiotemporal dynamics and the neural representations underlying the perception of photographs and even abstract drawings. These results indicate a shared basic mechanism supporting recognition of drawings and natural images.
Collapse
|
10
|
Schlegelmilch K, Wertz AE. Visual segmentation of complex naturalistic structures in an infant eye-tracking search task. PLoS One 2022; 17:e0266158. [PMID: 35363809 PMCID: PMC8975119 DOI: 10.1371/journal.pone.0266158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/15/2022] [Indexed: 11/24/2022] Open
Abstract
An infant’s everyday visual environment is composed of a complex array of entities, some of which are well integrated into their surroundings. Although infants are already sensitive to some categories in their first year of life, it is not clear which visual information supports their detection of meaningful elements within naturalistic scenes. Here we investigated the impact of image characteristics on 8-month-olds’ search performance using a gaze contingent eye-tracking search task. Infants had to detect a target patch on a background image. The stimuli consisted of images taken from three categories: vegetation, non-living natural elements (e.g., stones), and manmade artifacts, for which we also assessed target background differences in lower- and higher-level visual properties. Our results showed that larger target-background differences in the statistical properties scaling invariance and entropy, and also stimulus backgrounds including low pictorial depth, predicted better detection performance. Furthermore, category membership only affected search performance if supported by luminance contrast. Data from an adult comparison group also indicated that infants’ search performance relied more on lower-order visual properties than adults. Taken together, these results suggest that infants use a combination of property- and category-related information to parse complex visual stimuli.
Collapse
Affiliation(s)
- Karola Schlegelmilch
- Max Planck Research Group Naturalistic Social Cognition, Max Planck Institute for Human Development, Berlin, Germany
- * E-mail:
| | - Annie E. Wertz
- Max Planck Research Group Naturalistic Social Cognition, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
11
|
Three cortical scene systems and their development. Trends Cogn Sci 2022; 26:117-127. [PMID: 34857468 PMCID: PMC8770598 DOI: 10.1016/j.tics.2021.11.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/14/2021] [Accepted: 11/06/2021] [Indexed: 02/03/2023]
Abstract
Since the discovery of three scene-selective regions in the human brain, a central assumption has been that all three regions directly support navigation. We propose instead that cortical scene processing regions support three distinct computational goals (and one not for navigation at all): (i) The parahippocampal place area supports scene categorization, which involves recognizing the kind of place we are in; (ii) the occipital place area supports visually guided navigation, which involves finding our way through the immediately visible environment, avoiding boundaries and obstacles; and (iii) the retrosplenial complex supports map-based navigation, which involves finding our way from a specific place to some distant, out-of-sight place. We further hypothesize that these systems develop along different timelines, with both navigation systems developing slower than the scene categorization system.
Collapse
|
12
|
Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022; 54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]
Abstract
Precisely characterizing mental representations of visual experiences requires careful control of experimental stimuli. Recent work leveraging such stimulus control has led to important insights; however, these findings are constrained to simple visual properties like color and line orientation. There remains a critical methodological barrier to characterizing perceptual and mnemonic representations of realistic visual experiences. Here, we introduce a novel method to systematically control visual properties of natural scene stimuli. Using generative adversarial networks (GANs), a state-of-the-art deep learning technique for creating highly realistic synthetic images, we generated scene wheels in which continuously changing visual properties smoothly transition between meaningful realistic scenes. To validate the efficacy of scene wheels, we conducted two behavioral experiments that assess perceptual and mnemonic representations attained from the scene wheels. In the perceptual validation experiment, we tested whether the continuous transition of scene images along the wheel is reflected in human perceptual similarity judgment. The perceived similarity of the scene images correspondingly decreased as distances between the images increase on the wheel. In the memory experiment, participants reconstructed to-be-remembered scenes from the scene wheels. Reconstruction errors for these scenes resemble error distributions observed in prior studies using simple stimulus properties. Importantly, perceptual similarity judgment and memory precision varied systematically with scene wheel radius. These findings suggest our novel approach offers a window into the mental representations of naturalistic visual experiences.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| |
Collapse
|
13
|
Sawada T, Mendoza Arvizu A, Farshchi M, Kiba A. Navigation in Contour-Drawn Scenes Using Augmented Reality. Iperception 2022; 13:20416695221074707. [PMID: 35126990 PMCID: PMC8808034 DOI: 10.1177/20416695221074707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 01/03/2022] [Indexed: 11/30/2022] Open
Abstract
The visual system can recover 3D information from many different types of visual information, e.g., contour-drawings. How well can people navigate in a real dynamic environment with contour-drawings? This question was addressed by developing an AR-device that could show a contour-drawing of a real scene in an immersive manner and by conducting an observational field study in which the two authors navigated in real environments wearing this AR-device. The navigation with contour-drawings was difficult in natural scenes but easy in urban scenes. This suggests that the visual information from natural and urban environments is sufficiently different and our visual system can accommodate to this difference of the visual information in different environments.
Collapse
Affiliation(s)
- Tadamasa Sawada
- School of Psychology, HSE University, Moscow, Russian Federation
| | | | - Maddex Farshchi
- School of Psychology, HSE University, Moscow, Russian Federation
| | - Alexandra Kiba
- School of Psychology, HSE University, Moscow, Russian Federation
| |
Collapse
|
14
|
Abstract
We often take people’s ability to understand and produce line drawings for granted. But where should we draw lines, and why? We address psychological principles that underlie efficient representations of complex information in line drawings. First, 58 participants with varying degree of artistic experience produced multiple drawings of a small set of scenes by tracing contours on a digital tablet. Second, 37 independent observers ranked the drawings by how representative they are of the original photograph. Matching contours between drawings of the same scene revealed that the most consistently drawn contours tend to be drawn earlier. We generated half-images with the most- versus least-consistently drawn contours and asked 25 observers categorize the quickly presented scenes. Observers performed significantly better for the most compared to the least consistent half-images. The most consistently drawn contours were more likely to depict occlusion boundaries, whereas the least consistently drawn contours frequently depicted surface normals.
Collapse
Affiliation(s)
- Heping Sheng
- School of Medicine, Boston University, Boston, MA, United States of America
| | - John Wilder
- Department of Psychology, University of Toronto, Toronto, Canada
| | - Dirk B. Walther
- Department of Psychology, University of Toronto, Toronto, Canada
- * E-mail:
| |
Collapse
|
15
|
Contour features predict valence and threat judgements in scenes. Sci Rep 2021; 11:19405. [PMID: 34593933 PMCID: PMC8484627 DOI: 10.1038/s41598-021-99044-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/13/2021] [Indexed: 11/29/2022] Open
Abstract
Quickly scanning an environment to determine relative threat is an essential part of survival. Scene gist extracted rapidly from the environment may help people detect threats. Here, we probed this link between emotional judgements and features of visual scenes. We first extracted curvature, length, and orientation statistics of all images in the International Affective Picture System image set and related them to emotional valence scores. Images containing angular contours were rated as negative, and images containing long contours as positive. We then composed new abstract line drawings with specific combinations of length, angularity, and orientation values and asked participants to rate them as positive or negative, and as safe or threatening. Smooth, long, horizontal contour scenes were rated as positive/safe, while short angular contour scenes were rated as negative/threatening. Our work shows that particular combinations of image features help people make judgements about potential threat in the environment.
Collapse
|
16
|
Hayes TR, Henderson JM. Deep saliency models learn low-, mid-, and high-level features to predict scene attention. Sci Rep 2021; 11:18434. [PMID: 34531484 PMCID: PMC8445969 DOI: 10.1038/s41598-021-97879-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/31/2021] [Indexed: 02/08/2023] Open
Abstract
Deep saliency models represent the current state-of-the-art for predicting where humans look in real-world scenes. However, for deep saliency models to inform cognitive theories of attention, we need to know how deep saliency models prioritize different scene features to predict where people look. Here we open the black box of three prominent deep saliency models (MSI-Net, DeepGaze II, and SAM-ResNet) using an approach that models the association between attention, deep saliency model output, and low-, mid-, and high-level scene features. Specifically, we measured the association between each deep saliency model and low-level image saliency, mid-level contour symmetry and junctions, and high-level meaning by applying a mixed effects modeling approach to a large eye movement dataset. We found that all three deep saliency models were most strongly associated with high-level and low-level features, but exhibited qualitatively different feature weightings and interaction patterns. These findings suggest that prominent deep saliency models are primarily learning image features associated with high-level scene meaning and low-level image saliency and highlight the importance of moving beyond simply benchmarking performance.
Collapse
Affiliation(s)
- Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, 95618, USA.
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, 95618, USA
- Department of Psychology, University of California, Davis, 95616, USA
| |
Collapse
|
17
|
Dreneva A, Shvarts A, Chumachenko D, Krichevets A. Extrafoveal Processing in Categorical Search for Geometric Shapes: General Tendencies and Individual Variations. Cogn Sci 2021; 45:e13025. [PMID: 34379345 PMCID: PMC8459262 DOI: 10.1111/cogs.13025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 06/10/2021] [Accepted: 06/27/2021] [Indexed: 11/29/2022]
Abstract
The paper addresses the capabilities and limitations of extrafoveal processing during a categorical visual search. Previous research has established that a target could be identified from the very first or without any saccade, suggesting that extrafoveal perception is necessarily involved. However, the limits in complexity defining the processed information are still not clear. We performed four experiments with a gradual increase of stimuli complexity to determine the role of extrafoveal processing in searching for the categorically defined geometric shape. The series of experiments demonstrated a significant role of extrafoveal processing while searching for simple two-dimensional shapes and its gradual decrease in a condition with more complicated three-dimensional shapes. The factors of objects' spatial orientation and distractor homogeneity significantly influenced both reaction time and the number of saccades required to identify a categorically defined target. An analysis of the individual p-value distributions revealed pronounced individual differences in using extrafoveal analysis and allowed examination of the performance of each particular participant. The condition with the forced prohibition of eye movements enabled us to investigate the efficacy of covert attention in the condition with complicated shapes. Our results indicate that both foveal and extrafoveal processing are simultaneously involved during a categorical search, and the specificity of their interaction is determined by the spatial orientation of objects, type of distractors, the prohibition to use overt attention, and individual characteristics of the participants.
Collapse
Affiliation(s)
- Anna Dreneva
- Faculty of PsychologyLomonosov Moscow State University
| | - Anna Shvarts
- Freudenthal InstituteFaculty of ScienceUtrecht University
| | | | | |
Collapse
|
18
|
Dvoeglazova M, Koshmanova E, Sawada T. Visual sensitivity to parallel configurations of contours compared with sensitivity to other configurations. Vision Res 2021; 188:149-161. [PMID: 34333200 DOI: 10.1016/j.visres.2021.07.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/11/2021] [Accepted: 07/09/2021] [Indexed: 11/28/2022]
Abstract
People can perceive 3D information from contour drawings and some types of configurations of contours in such drawings are important for 3D perception. We know that our visual system is sensitive to these configurations. Koshmanova & Sawada (2019, Vision Research, 154, 97-104) showed that the sensitivity is higher to a parallel configuration of contours than to a perpendicular configuration of contours. In this study, two psychophysical experiments were conducted that compared the sensitivity to a parallel configuration to two different configurations. In Experiment 1, orientation thresholds were measured with parallel and converging configurations composed of three contours. In Experiment 2, orientation thresholds of configurations composed of two contours were measured with parallel, collinear, and perpendicular configurations. The results of Experiment 1 showed that the visual system is more sensitive to parallel configurations than to converging configurations. The results of Experiment 2 showed that the sensitivity to the parallel configuration is analogous to the sensitivity to the collinear configuration, and it is higher than the sensitivity to the perpendicular configuration. The role that the parallel configuration plays in the 3D perception of contour-drawings is discussed.
Collapse
|
19
|
Tharmaratnam V, Patel M, Lowe MX, Cant JS. Shared cognitive mechanisms involved in the processing of scene texture and scene shape. J Vis 2021; 21:11. [PMID: 34269793 PMCID: PMC8297417 DOI: 10.1167/jov.21.7.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Recent research has demonstrated that the parahippocampal place area represents both the shape and texture features of scenes, with the importance of each feature varying according to perceived scene category. Namely, shape features are predominately more diagnostic to the processing of artificial human–made scenes, while shape and texture are equally diagnostic in natural scene processing. However, to date little is known regarding the degree of interactivity or independence observed in the processing of these scene features. Furthermore, manipulating the scope of visual attention (i.e., globally vs. locally) when processing ensembles of multiple objects—stimuli that share a functional neuroanatomical link with scenes—has been shown to affect their cognitive visual representation. It remains unknown whether manipulating the scope of attention impacts scene processing in a similar manner. Using the well-established Garner speeded-classification behavioral paradigm, we investigated the influence of both feature diagnosticity and the scope of visual attention on potential interactivity or independence in the shape and texture processing of artificial human–made scenes. The results revealed asymmetric interference between scene shape and texture processing, with the more diagnostic feature (i.e., shape) interfering with the less diagnostic feature (i.e., texture), but not vice versa. Furthermore, this interference was attenuated and enhanced with more local and global visual processing strategies, respectively. These findings suggest that the scene shape and texture processing are mediated by shared cognitive mechanisms and that, although these representations are governed primarily via feature diagnosticity, they can nevertheless be influenced by the scope of visual attention.
Collapse
Affiliation(s)
| | | | - Matthew X Lowe
- Graduate Program in Psychology, University of Toronto, Toronto, ON, Canada.,
| | - Jonathan S Cant
- Graduate Program in Psychology, University of Toronto, Toronto, ON, Canada.,Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada.,
| |
Collapse
|
20
|
Melnik N, Coates DR, Sayim B. Geometrically restricted image descriptors: A method to capture the appearance of shape. J Vis 2021; 21:14. [PMID: 33688921 PMCID: PMC7961119 DOI: 10.1167/jov.21.3.14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Shape perception varies depending on many factors. For example, presenting a stimulus in the periphery often yields a different appearance compared with its foveal presentation. However, how exactly shape appearance is altered under different conditions remains elusive. One reason for this is that studies typically measure identification performance, leaving details about target appearance unknown. The lack of appearance-based methods and general challenges to quantify appearance complicate the investigation of shape appearance. Here, we introduce Geometrically Restricted Image Descriptors (GRIDs), a method to investigate the appearance of shapes. Stimuli in the GRID paradigm are shapes consisting of distinct line elements placed on a grid by connecting grid nodes. Each line is treated as a discrete target. Observers are asked to capture target appearance by placing lines on a freely viewed response grid. We used GRIDs to investigate the appearance of letters and letter-like shapes. Targets were presented at 10° eccentricity in the right visual field. Gaze-contingent stimulus presentation was used to prevent eye movements to the target. The data were analyzed by quantifying the differences between targets and response in regard to overall accuracy, element discriminability, and several distinct error types. Our results show how shape appearance can be captured by GRIDs, and how a fine-grained analysis of stimulus parts provides quantifications of appearance typically not available in standard measures of performance. We propose that GRIDs are an effective tool to investigate the appearance of shapes.
Collapse
Affiliation(s)
- Natalia Melnik
- Institute of Psychology, University of Bern, Bern, Switzerland.,
| | - Daniel R Coates
- Institute of Psychology, University of Bern, Bern, Switzerland and College of Optometry, University of Houston, Houston, Texas, USA.,
| | - Bilge Sayim
- Institute of Psychology, University of Bern, Bern, Switzerland and Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France., http://www.appearancelab.org/
| |
Collapse
|
21
|
Farshchi M, Kiba A, Sawada T. Seeing our 3D world while only viewing contour-drawings. PLoS One 2021; 16:e0242581. [PMID: 33481778 PMCID: PMC7822326 DOI: 10.1371/journal.pone.0242581] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 11/04/2020] [Indexed: 11/19/2022] Open
Abstract
Artists can represent a 3D object by using only contours in a 2D drawing. Prior studies have shown that people can use such drawings to perceive 3D shapes reliably, but it is not clear how useful this kind of contour information actually is in a real dynamical scene in which people interact with objects. To address this issue, we developed an Augmented Reality (AR) device that can show a participant a contour-drawing or a grayscale-image of a real dynamical scene in an immersive manner. We compared the performance of people in a variety of run-of-the-mill tasks with both contour-drawings and grayscale-images under natural viewing conditions in three behavioral experiments. The results of these experiments showed that the people could perform almost equally well with both types of images. This contour information may be sufficient to provide the basis for our visual system to obtain much of the 3D information needed for successful visuomotor interactions in our everyday life.
Collapse
Affiliation(s)
- Maddex Farshchi
- School of Psychology, National Research University Higher School of Economics, Moscow, Russia
| | - Alexandra Kiba
- School of Psychology, National Research University Higher School of Economics, Moscow, Russia
| | - Tadamasa Sawada
- School of Psychology, National Research University Higher School of Economics, Moscow, Russia
| |
Collapse
|
22
|
Vision at a glance: The role of attention in processing object-to-object categorical relations. Atten Percept Psychophys 2020; 82:671-688. [PMID: 31907840 DOI: 10.3758/s13414-019-01940-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When viewing a scene at a glance, the visual and categorical relations between objects in the scene are extracted rapidly. In the present study, the involvement of spatial attention in the processing of such relations was investigated. Participants performed a category detection task (e.g., "is there an animal") on briefly flashed object pairs. In one condition, visual attention spanned both stimuli, and in another, attention was focused on a single object while its counterpart object served as a task-irrelevant distractor. The results showed that when participants attended to both objects, a categorical relation effect was obtained (Exp. 1). Namely, latencies were shorter to objects from the same category than to those from different superordinate categories (e.g., clothes, vehicles), even if categories were not prioritized by the task demands. Focusing attention on only one of two stimuli, however, largely eliminated this effect (Exp. 2). Some relational processing was seen when categories were narrowed to the basic level and were highly distinct from each other (Exp. 3), implying that categorical relational processing necessitates attention, unless the unattended input is highly predictable. Critically, when a prioritized (to-be-detected) object category, positioned in a distractor's location, differed from an attended object, a robust distraction effect was consistently observed, regardless of category homogeneity and/or of response conflict factors (Exp. 4). This finding suggests that object relations that involve stimuli that are highly relevant to the task settings may survive attentional deprivation at the distractor location. The involvement of spatial attention in object-to-object categorical processing is most critical in situations that include wide categories that are irrelevant to one's current goals.
Collapse
|
23
|
Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization. J Neurosci 2020; 40:5283-5299. [PMID: 32467356 DOI: 10.1523/jneurosci.2088-19.2020] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 04/18/2020] [Accepted: 04/23/2020] [Indexed: 11/21/2022] Open
Abstract
Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.SIGNIFICANCE STATEMENT In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties, such as colors and contours, to high-level properties, such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.
Collapse
|
24
|
Effects of Spatial Frequency Filtering Choices on the Perception of Filtered Images. Vision (Basel) 2020; 4:vision4020029. [PMID: 32466442 PMCID: PMC7355859 DOI: 10.3390/vision4020029] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 05/13/2020] [Accepted: 05/22/2020] [Indexed: 11/17/2022] Open
Abstract
The early visual system is composed of spatial frequency-tuned channels that break an image into its individual frequency components. Therefore, researchers commonly filter images for spatial frequencies to arrive at conclusions about the differential importance of high versus and low spatial frequency image content. Here, we show how simple decisions about the filtering of the images, and how they are displayed on the screen, can result in drastically different behavioral outcomes. We show that jointly normalizing the contrast of the stimuli is critical in order to draw accurate conclusions about the influence of the different spatial frequencies, as images of the real world naturally have higher contrast energy at low than high spatial frequencies. Furthermore, the specific choice of filter shape can result in contradictory results about whether high or low spatial frequencies are more useful for understanding image content. Finally, we show that the manner in which the high spatial frequency content is displayed on the screen influences how recognizable an image is. Previous findings that make claims about the visual system's use of certain spatial frequency bands should be revisited, especially if their methods sections do not make clear what filtering choices were made.
Collapse
|
25
|
Smith ME, Loschky LC. The influence of sequential predictions on scene-gist recognition. J Vis 2020; 19:14. [PMID: 31622473 DOI: 10.1167/19.12.14] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Past research suggests that recognizing scene gist, a viewer's holistic semantic representation of a scene acquired within a single eye fixation, involves purely feed-forward mechanisms. We investigated whether expectations can influence scene categorization. To do this, we embedded target scenes in more ecologically valid, first-person-viewpoint image sequences, along spatiotemporally connected routes (e.g., an office to a parking lot). We manipulated the sequences' spatiotemporal coherence by presenting them either coherently or in random order. Participants identified the category of one target scene in a 10-scene-image rapid serial visual presentation. Categorization accuracy was greater for targets in coherent sequences. Accuracy was also greater for targets with more visually similar primes. In Experiment 2, we investigated whether targets in coherent sequences were more predictable and whether predictable images were identified more accurately in Experiment 1 after accounting for the effect of prime-to-target visual similarity. To do this, we removed targets and had participants predict the category of the missing scene. Images were more accurately predicted in coherent sequences, and both image predictability and prime-to-target visual similarity independently contributed to performance in Experiment 1. To test whether prediction-based facilitation effects were solely due to response bias, participants performed a two-alternative forced-choice task in which they indicated whether the target was an intact or a phase-randomized scene. Critically, predictability of the target category was irrelevant to this task. Nevertheless, results showed that sensitivity, but not response bias, was greater for targets in coherent sequences. Predictions made prior to viewing a scene facilitate scene-gist recognition.
Collapse
Affiliation(s)
- Maverick E Smith
- Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA
| | - Lester C Loschky
- Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA
| |
Collapse
|
26
|
Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020; 141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
Recent electrophysiological research highlights the significance of global scene properties (GSPs) for scene perception. However, since real-world scenes span a range of low-level stimulus properties and high-level contextual semantics, GSP effects may also reflect additional processing of such non-global factors. We examined this question by asking whether Event-Related Potentials (ERPs) to GSPs will still be observed when specific low- and high-level scene properties are absent from the scene. We presented participants with computer-based artificially-manipulated scenes varying in two GSPs (spatial expanse and naturalness) which minimized other sources of scene information (color and semantic object detail). We found that the peak amplitude of the P2 component was sensitive to the spatial expanse and naturalness of the artificially-generated scenes: P2 amplitude was higher to closed than open scenes, and in response to manmade than natural scenes. A control experiment showed that the effect of Naturalness on the P2 is not driven by local texture information, while earlier effects of naturalness, expressed as a modulation of the P1 and N1 amplitudes, are sensitive to texture information. Our results demonstrate that GSPs are processed robustly around 220 ms and that P2 can be used as an index of global scene perception.
Collapse
Affiliation(s)
- Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, USA.
| | - Mavuso W Mzozoyana
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Hamada Al Zoubi
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Matthew X Lowe
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada
| |
Collapse
|
27
|
Dillon MR, Persichetti AS, Spelke ES, Dilks DD. Places in the Brain: Bridging Layout and Object Geometry in Scene-Selective Cortex. Cereb Cortex 2019. [PMID: 28633321 DOI: 10.1093/cercor/bhx139] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Diverse animal species primarily rely on sense (left-right) and egocentric distance (proximal-distal) when navigating the environment. Recent neuroimaging studies with human adults show that this information is represented in 2 scene-selective cortical regions-the occipital place area (OPA) and retrosplenial complex (RSC)-but not in a third scene-selective region-the parahippocampal place area (PPA). What geometric properties, then, does the PPA represent, and what is its role in scene processing? Here we hypothesize that the PPA represents relative length and angle, the geometric properties classically associated with object recognition, but only in the context of large extended surfaces that compose the layout of a scene. Using functional magnetic resonance imaging adaptation, we found that the PPA is indeed sensitive to relative length and angle changes in pictures of scenes, but not pictures of objects that reliably elicited responses to the same geometric changes in object-selective cortical regions. Moreover, we found that the OPA is also sensitive to such changes, while the RSC is tolerant to such changes. Thus, the geometric information typically associated with object recognition is also used during some aspects of scene processing. These findings provide evidence that scene-selective cortex differentially represents the geometric properties guiding navigation versus scene categorization.
Collapse
Affiliation(s)
- Moira R Dillon
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | | | | | - Daniel D Dilks
- Department of Psychology, Emory University, Atlanta, GA, USA
| |
Collapse
|
28
|
Shafer-Skelton A, Brady TF. Scene layout priming relies primarily on low-level features rather than scene layout. J Vis 2019; 19:14. [PMID: 30677124 DOI: 10.1167/19.1.14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The ability to perceive and remember the spatial layout of a scene is critical to understanding the visual world, both for navigation and for other complex tasks that depend upon the structure of the current environment. However, surprisingly little work has investigated how and when scene layout information is maintained in memory. One prominent line of work investigating this issue is a scene-priming paradigm (e.g., Sanocki & Epstein, 1997), in which different types of previews are presented to participants shortly before they judge which of two regions of a scene is closer in depth to the viewer. Experiments using this paradigm have been widely cited as evidence that scene layout information is stored across brief delays and have been used to investigate the structure of the representations underlying memory for scene layout. In the present experiments, we better characterize these scene-priming effects. We find that a large amount of visual detail rather than the presence of depth information is necessary for the priming effect; that participants show a preview benefit for a judgment completely unrelated to the scene itself; and that preview benefits are susceptible to masking and quickly decay. Together, these results suggest that "scene priming" effects do not isolate scene layout information in memory, and that they may arise from low-level visual information held in sensory memory. This broadens the range of interpretations of scene priming effects and suggests that other paradigms may need to be developed to selectively investigate how we represent scene layout information in memory.
Collapse
Affiliation(s)
| | - Timothy F Brady
- Department of Psychology, University of California, San Diego, CA, USA
| |
Collapse
|
29
|
Wallis TS, Funke CM, Ecker AS, Gatys LA, Wichmann FA, Bethge M. Image content is more important than Bouma's Law for scene metamers. eLife 2019; 8:42512. [PMID: 31038458 PMCID: PMC6491040 DOI: 10.7554/elife.42512] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 03/09/2019] [Indexed: 11/16/2022] Open
Abstract
We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling. As you read this digest, your eyes move to follow the lines of text. But now try to hold your eyes in one position, while reading the text on either side and below: it soon becomes clear that peripheral vision is not as good as we tend to assume. It is not possible to read text far away from the center of your line of vision, but you can see ‘something’ out of the corner of your eye. You can see that there is text there, even if you cannot read it, and you can see where your screen or page ends. So how does the brain generate peripheral vision, and why does it differ from what you see when you look straight ahead? One idea is that the visual system averages information over areas of the peripheral visual field. This gives rise to texture-like patterns, as opposed to images made up of fine details. Imagine looking at an expanse of foliage, gravel or fur, for example. Your eyes cannot make out the individual leaves, pebbles or hairs. Instead, you perceive an overall pattern in the form of a texture. Our peripheral vision may also consist of such textures, created when the brain averages information over areas of space. Wallis, Funke et al. have now tested this idea using an existing computer model that averages visual input in this way. By giving the model a series of photographs to process, Wallis, Funke et al. obtained images that should in theory simulate peripheral vision. If the model mimics the mechanisms that generate peripheral vision, then healthy volunteers should be unable to distinguish the processed images from the original photographs. But in fact, the participants could easily discriminate the two sets of images. This suggests that the visual system does not solely use textures to represent information in the peripheral visual field. Wallis, Funke et al. propose that other factors, such as how the visual system separates and groups objects, may instead determine what we see in our peripheral vision. This knowledge could ultimately benefit patients with eye diseases such as macular degeneration, a condition that causes loss of vision in the center of the visual field and forces patients to rely on their peripheral vision.
Collapse
Affiliation(s)
- Thomas Sa Wallis
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Christina M Funke
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Alexander S Ecker
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany.,Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Leon A Gatys
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Felix A Wichmann
- Neural Information Processing Group, Faculty of Science, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Matthias Bethge
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
30
|
Abstract
Our research has previously shown that scene categories can be predicted from observers' eye movements when they view photographs of real-world scenes. The time course of category predictions reveals the differential influences of bottom-up and top-down information. Here we used these known differences to determine to what extent image features at different representational levels contribute toward guiding gaze in a category-specific manner. Participants viewed grayscale photographs and line drawings of real-world scenes while their gaze was tracked. Scene categories could be predicted from fixation density at all times over a 2-s time course in both photographs and line drawings. We replicated the shape of the prediction curve found previously, with an initial steep decrease in prediction accuracy from 300 to 500 ms, representing the contribution of bottom-up information, followed by a steady increase, representing top-down knowledge of category-specific information. We then computed the low-level features (luminance contrasts and orientation statistics), mid-level features (local symmetry and contour junctions), and Deep Gaze II output from the images, and used that information as a reference in our category predictions in order to assess their respective contributions to category-specific guidance of gaze. We observed that, as expected, low-level salience contributes mostly to the initial bottom-up peak of gaze guidance. Conversely, the mid-level features that describe scene structure (i.e., local symmetry and junctions) split their contributions between bottom-up and top-down attentional guidance, with symmetry contributing to both bottom-up and top-down guidance, while junctions play a more prominent role in the top-down guidance of gaze.
Collapse
|
31
|
Zhu Z, Rao C, Bai S, Latecki LJ. Training convolutional neural network from multi-domain contour images for 3D shape retrieval. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2017.08.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
32
|
Greene MR. The information content of scene categories. PSYCHOLOGY OF LEARNING AND MOTIVATION 2019. [DOI: 10.1016/bs.plm.2019.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
33
|
Robin J, Rai Y, Valli M, Olsen RK. Category specificity in the medial temporal lobe: A systematic review. Hippocampus 2018; 29:313-339. [PMID: 30155943 DOI: 10.1002/hipo.23024] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 08/03/2018] [Accepted: 08/07/2018] [Indexed: 01/30/2023]
Abstract
Theoretical accounts of medial temporal lobe (MTL) function ascribe different functions to subregions of the MTL including perirhinal, entorhinal, parahippocampal cortices, and the hippocampus. Some have suggested that the functional roles of these subregions vary in terms of their category specificity, showing preferential coding for certain stimulus types, but the evidence for this functional organization is mixed. In this systematic review, we evaluate existing evidence for regional specialization in the MTL for three categories of visual stimuli: faces, objects, and scenes. We review and synthesize across univariate and multivariate neuroimaging studies, as well as neuropsychological studies of cases with lesions to the MTL. Neuroimaging evidence suggests that faces activate the perirhinal cortex, entorhinal cortex, and the anterior hippocampus, while scenes engage the parahippocampal cortex and both the anterior and posterior hippocampus, depending on the contrast condition. There is some evidence for object-related activity in anterior MTL regions when compared to scenes, and in posterior MTL regions when compared to faces, suggesting that aspects of object representations may share similarities with face and scene representations. While neuroimaging evidence suggests some hippocampal specialization for faces and scenes, neuropsychological evidence shows that hippocampal damage leads to impairments in scene memory and perception, but does not entail equivalent impairments for faces in cases where the perirhinal cortex remains intact. Regional specialization based on stimulus categories has implications for understanding the mechanisms of MTL subregions, and highlights the need for the development of theoretical models of MTL function that can accommodate the differential patterns of specificity observed in the MTL.
Collapse
Affiliation(s)
- Jessica Robin
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario, Canada
| | - Yeshith Rai
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario, Canada
| | - Mikaeel Valli
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario, Canada.,Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | - Rosanna K Olsen
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario, Canada.,Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
34
|
Wilder J, Rezanejad M, Dickinson S, Siddiqi K, Jepson A, Walther DB. Local contour symmetry facilitates scene categorization. Cognition 2018; 182:307-317. [PMID: 30415132 DOI: 10.1016/j.cognition.2018.09.014] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 09/20/2018] [Accepted: 09/22/2018] [Indexed: 10/27/2022]
Abstract
People are able to rapidly categorize briefly flashed images of real-world environments, even when they are reduced to line drawings. This setting allows for the study of time-limited perceptual grouping processes in the human visual system that are applicable to line drawings. Previous work (Wilder, Dickinson, Jepson, & Walther, 2018) showed that standard local features of individual contours, or junctions between contours, do not account for this rapid classification ability but, rather, the relative placement of these contours appeared to be important. Here we provide strong support for this observation by demonstrating that local ribbon symmetry between neighboring pairs of contours facilitates the categorization of complex real-world environments. To this end, we introduce a novel computational approach, based on the medial axis transform, for measuring the degree of local ribbon symmetry in a line drawing. We use this measure to separate the contour pixels for a given scene into the most ribbon symmetric half and the least ribbon symmetric half. We then show human observers the resulting half-images in a rapid-categorization experiment. Our results demonstrate that local ribbon symmetry facilitates the categorization of complex real-world environments. This is the first study of the role of local symmetry in inter-contour grouping for human scene classification. We conclude that local ribbon symmetry appears to play an important role in jump-starting the grouping of image content into meaningful units, even in flashed presentations.
Collapse
|
35
|
Dissociable Neural Systems for Recognizing Places and Navigating through Them. J Neurosci 2018; 38:10295-10304. [PMID: 30348675 DOI: 10.1523/jneurosci.1200-18.2018] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 09/19/2018] [Accepted: 09/24/2018] [Indexed: 02/05/2023] Open
Abstract
When entering an environment, we can use the present visual information from the scene to either recognize the kind of place it is (e.g., a kitchen or a bedroom) or navigate through it. Here we directly test the hypothesis that these two processes, what we call "scene categorization" and "visually-guided navigation", are supported by dissociable neural systems. Specifically, we manipulated task demands by asking human participants (male and female) to perform a scene categorization, visually-guided navigation, and baseline task on images of scenes, and measured both the average univariate responses and multivariate spatial pattern of responses within two scene-selective cortical regions, the parahippocampal place area (PPA) and occipital place area (OPA), hypothesized to be separably involved in scene categorization and visually-guided navigation, respectively. As predicted, in the univariate analysis, PPA responded significantly more during the categorization task than during both the navigation and baseline tasks, whereas OPA showed the complete opposite pattern. Similarly, in the multivariate analysis, a linear support vector machine achieved above-chance classification for the categorization task, but not the navigation task in PPA. By contrast, above-chance classification was achieved for both the navigation and categorization tasks in OPA. However, above-chance classification for both tasks was also found in early visual cortex and hence not specific to OPA, suggesting that the spatial patterns of responses in OPA are merely inherited from early vision, and thus may be epiphenomenal to behavior. Together, these results are evidence for dissociable neural systems involved in recognizing places and navigating through them.SIGNIFICANCE STATEMENT It has been nearly three decades since Goodale and Milner demonstrated that recognizing objects and manipulating them involve distinct neural processes. Today we show the same is true of our interactions with our environment: recognizing places and navigating through them are neurally dissociable. More specifically, we found that a scene-selective region, the parahippocampal place area, is active when participants are asked to categorize a scene, but not when asked to imagine navigating through it, whereas another scene-selective region, the occipital place area, shows the exact opposite pattern. This double dissociation is evidence for dissociable neural systems within scene processing, similar to the bifurcation of object processing described by Goodale and Milner (1992).
Collapse
|
36
|
Wilder J, Dickinson S, Jepson A, Walther DB. Spatial relationships between contours impact rapid scene classification. J Vis 2018; 18:1. [DOI: 10.1167/18.8.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Affiliation(s)
- John Wilder
- University of Toronto, Toronto, Ontario, Canada
- http://www.cs.toronto.edu/~jdwilder/
| | - Sven Dickinson
- University of Toronto, Toronto, Ontario, Canada
- ://www.cs.toronto.edu/~sven/
| | - Allan Jepson
- University of Toronto, Toronto, Ontario, Canada
- ://www.cs.toronto.edu/~jepson/
| | - Dirk B. Walther
- University of Toronto, Toronto, Ontario, Canada
- ://bwlab.utoronto.ca/dirk-bernhardt-walther/
| |
Collapse
|
37
|
O'Connell TP, Sederberg PB, Walther DB. Representational differences between line drawings and photographs of natural scenes: A dissociation between multi-voxel pattern analysis and repetition suppression. Neuropsychologia 2018; 117:513-519. [PMID: 29936121 DOI: 10.1016/j.neuropsychologia.2018.06.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 06/14/2018] [Accepted: 06/17/2018] [Indexed: 11/18/2022]
Abstract
Distributed representations of scene categories are consistent between color photographs (CPs) and line drawings (LDs) in the parahippocampal place area (PPA) and the retrosplenial cortex (RSC), as shown using multi-voxel pattern analysis (MVPA). Here, we used repetition suppression (RS) to further investigate the degree of representational convergence between CPs and LDs of natural scenes. MVPA and RS can capture different aspects of visual representations, and RS may prove useful in elucidating important differences in the representations of CPs and LDs of natural scenes. We performed an event-related fMRI experiment, including image-repetitions either within-type (i.e., CP to CP or LD to LD) or between-types (CP to LD, LD to CP). We found significant RS for within-type repetitions in PPA, RSC and the occipital place area (OPA), but did not observe RS for between-types repetitions. By contrast, scene categories were decodable from activity patterns evoked by both CPs and LDs using SVM classification for both within-type decoding and between-types cross-decoding. We conclude that there are representational differences between CPs and LDs in scene-selective cortex despite a category-level correspondence.
Collapse
Affiliation(s)
- Thomas P O'Connell
- Department of Psychology, Yale University, Box 208205, New Haven, CT 06520-8205, USA.
| | - Per B Sederberg
- Department of Psychology, University of Virginia, Charlottesville, VA, USA
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
38
|
Bonner MF, Epstein RA. Computational mechanisms underlying cortical responses to the affordance properties of visual scenes. PLoS Comput Biol 2018; 14:e1006111. [PMID: 29684011 PMCID: PMC5933806 DOI: 10.1371/journal.pcbi.1006111] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 05/03/2018] [Accepted: 03/31/2018] [Indexed: 11/24/2022] Open
Abstract
Biologically inspired deep convolutional neural networks (CNNs), trained for computer vision tasks, have been found to predict cortical responses with remarkable accuracy. However, the internal operations of these models remain poorly understood, and the factors that account for their success are unknown. Here we develop a set of techniques for using CNNs to gain insights into the computational mechanisms underlying cortical responses. We focused on responses in the occipital place area (OPA), a scene-selective region of dorsal occipitoparietal cortex. In a previous study, we showed that fMRI activation patterns in the OPA contain information about the navigational affordances of scenes; that is, information about where one can and cannot move within the immediate environment. We hypothesized that this affordance information could be extracted using a set of purely feedforward computations. To test this idea, we examined a deep CNN with a feedforward architecture that had been previously trained for scene classification. We found that responses in the CNN to scene images were highly predictive of fMRI responses in the OPA. Moreover the CNN accounted for the portion of OPA variance relating to the navigational affordances of scenes. The CNN could thus serve as an image-computable candidate model of affordance-related responses in the OPA. We then ran a series of in silico experiments on this model to gain insights into its internal operations. These analyses showed that the computation of affordance-related features relied heavily on visual information at high-spatial frequencies and cardinal orientations, both of which have previously been identified as low-level stimulus preferences of scene-selective visual cortex. These computations also exhibited a strong preference for information in the lower visual field, which is consistent with known retinotopic biases in the OPA. Visualizations of feature selectivity within the CNN suggested that affordance-based responses encoded features that define the layout of the spatial environment, such as boundary-defining junctions and large extended surfaces. Together, these results map the sensory functions of the OPA onto a fully quantitative model that provides insights into its visual computations. More broadly, they advance integrative techniques for understanding visual cortex across multiple level of analysis: from the identification of cortical sensory functions to the modeling of their underlying algorithms. How does visual cortex compute behaviorally relevant properties of the local environment from sensory inputs? For decades, computational models have been able to explain only the earliest stages of biological vision, but recent advances in deep neural networks have yielded a breakthrough in the modeling of high-level visual cortex. However, these models are not explicitly designed for testing neurobiological theories, and, like the brain itself, their internal operations remain poorly understood. We examined a deep neural network for insights into the cortical representation of navigational affordances in visual scenes. In doing so, we developed a set of high-throughput techniques and statistical tools that are broadly useful for relating the internal operations of neural networks with the information processes of the brain. Our findings demonstrate that a deep neural network with purely feedforward computations can account for the processing of navigational layout in high-level visual cortex. We next performed a series of experiments and visualization analyses on this neural network. These analyses characterized a set of stimulus input features that may be critical for computing navigationally related cortical representations, and they identified a set of high-level, complex scene features that may serve as a basis set for the cortical coding of navigational layout. These findings suggest a computational mechanism through which high-level visual cortex might encode the spatial structure of the local navigational environment, and they demonstrate an experimental approach for leveraging the power of deep neural networks to understand the visual computations of the brain.
Collapse
Affiliation(s)
- Michael F. Bonner
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States of America
- * E-mail:
| | - Russell A. Epstein
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States of America
| |
Collapse
|
39
|
Berman D, Golomb JD, Walther DB. Scene content is predominantly conveyed by high spatial frequencies in scene-selective visual cortex. PLoS One 2017; 12:e0189828. [PMID: 29272283 PMCID: PMC5741213 DOI: 10.1371/journal.pone.0189828] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 12/01/2017] [Indexed: 11/19/2022] Open
Abstract
In complex real-world scenes, image content is conveyed by a large collection of intertwined visual features. The visual system disentangles these features in order to extract information about image content. Here, we investigate the role of one integral component: the content of spatial frequencies in an image. Specifically, we measure the amount of image content carried by low versus high spatial frequencies for the representation of real-world scenes in scene-selective regions of human visual cortex. To this end, we attempted to decode scene categories from the brain activity patterns of participants viewing scene images that contained the full spatial frequency spectrum, only low spatial frequencies, or only high spatial frequencies, all carefully controlled for contrast and luminance. Contrary to the findings from numerous behavioral studies and computational models that have highlighted how low spatial frequencies preferentially encode image content, decoding of scene categories from the scene-selective brain regions, including the parahippocampal place area (PPA), was significantly more accurate for high than low spatial frequency images. In fact, decoding accuracy was just as high for high spatial frequency images as for images containing the full spatial frequency spectrum in scene-selective areas PPA, RSC, OPA and object selective area LOC. We also found an interesting dissociation between the posterior and anterior subdivisions of PPA: categories were decodable from both high and low spatial frequency scenes in posterior PPA but only from high spatial frequency scenes in anterior PPA; and spatial frequency was explicitly decodable from posterior but not anterior PPA. Our results are consistent with recent findings that line drawings, which consist almost entirely of high spatial frequencies, elicit a neural representation of scene categories that is equivalent to that of full-spectrum color photographs. Collectively, these findings demonstrate the importance of high spatial frequencies for conveying the content of complex real-world scenes.
Collapse
Affiliation(s)
- Daniel Berman
- Department of Psychology, The Ohio State University, Columbus, Ohio, United States of America
| | - Julie D. Golomb
- Department of Psychology, The Ohio State University, Columbus, Ohio, United States of America
| | - Dirk B. Walther
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
40
|
Robin J, Lowe MX, Pishdadian S, Rivest J, Cant JS, Moscovitch M. Selective scene perception deficits in a case of topographical disorientation. Cortex 2017; 92:70-80. [DOI: 10.1016/j.cortex.2017.03.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 11/22/2016] [Accepted: 03/20/2017] [Indexed: 11/28/2022]
|
41
|
Kubilius J, Sleurs C, Wagemans J. Sensitivity to Nonaccidental Configurations of Two-Line Stimuli. Iperception 2017; 8:2041669517699628. [PMID: 28491272 PMCID: PMC5405893 DOI: 10.1177/2041669517699628] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
According to Recognition-By-Components theory, object recognition relies on a specific subset of three-dimensional shapes called geons. In particular, these configurations constitute a powerful cue to three-dimensional object reconstruction because their two-dimensional projection remains viewpoint-invariant. While a large body of literature has demonstrated sensitivity to changes in these so-called nonaccidental configurations, it remains unclear what information is used in establishing such sensitivity. In this study, we explored the possibility that nonaccidental configurations can already be inferred from the basic constituents of objects, namely, their edges. We constructed a set of stimuli composed of two lines corresponding to various nonaccidental properties and configurations underlying the distinction between geons, including collinearity, alignment, curvature of contours, curvature of configuration axis, expansion, cotermination, and junction type. Using a simple visual search paradigm, we demonstrated that participants were faster at detecting targets that differed from distractors in a nonaccidental property than in a metric property. We also found that only some but not all of the observed sensitivity could have resulted from simple low-level properties of our stimuli. Given that such sensitivity emerged from a configuration of only two lines, our results support the view that nonaccidental configurations could be encoded throughout the visual processing hierarchy even in the absence of object context.
Collapse
|
42
|
Making Sense of Real-World Scenes. Trends Cogn Sci 2016; 20:843-856. [PMID: 27769727 DOI: 10.1016/j.tics.2016.09.003] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/06/2016] [Accepted: 09/06/2016] [Indexed: 11/23/2022]
Abstract
To interact with the world, we have to make sense of the continuous sensory input conveying information about our environment. A recent surge of studies has investigated the processes enabling scene understanding, using increasingly complex stimuli and sophisticated analyses to highlight the visual features and brain regions involved. However, there are two major challenges to producing a comprehensive framework for scene understanding. First, scene perception is highly dynamic, subserving multiple behavioral goals. Second, a multitude of different visual properties co-occur across scenes and may be correlated or independent. We synthesize the recent literature and argue that for a complete view of scene understanding, it is necessary to account for both differing observer goals and the contribution of diverse scene properties.
Collapse
|
43
|
Ferrara K, Park S. Neural representation of scene boundaries. Neuropsychologia 2016; 89:180-190. [PMID: 27181883 DOI: 10.1016/j.neuropsychologia.2016.05.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 05/02/2016] [Accepted: 05/11/2016] [Indexed: 10/21/2022]
Abstract
Three-dimensional environmental boundaries fundamentally define the limits of a given space. A body of research employing a variety of methods points to their importance as cues in navigation. However, little is known about the nature of the representation of scene boundaries by high-level scene cortices in the human brain (namely, the parahippocampal place area (PPA) and retrosplenial complex (RSC)). Here we use univariate and multivoxel pattern analysis to study classification performance for artificial scene images that vary in degree of vertical boundary structure (a flat 2D boundary, a very slight addition of 3D boundary, or full walls). Our findings present evidence that there are distinct neural components for representing two different aspects of boundaries: 1) acute sensitivity to the presence of grounded 3D vertical structure, represented by the PPA, and 2) whether a boundary introduces a significant impediment to the viewer's potential navigation within a space, represented by RSC.
Collapse
Affiliation(s)
- Katrina Ferrara
- Department of Cognitive Science, Johns Hopkins University, United States
| | - Soojin Park
- Department of Cognitive Science, Johns Hopkins University, United States.
| |
Collapse
|
44
|
Kubilius J, Bracci S, Op de Beeck HP. Deep Neural Networks as a Computational Model for Human Shape Sensitivity. PLoS Comput Biol 2016; 12:e1004896. [PMID: 27124699 PMCID: PMC4849740 DOI: 10.1371/journal.pcbi.1004896] [Citation(s) in RCA: 131] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 03/30/2016] [Indexed: 11/19/2022] Open
Abstract
Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development. Shape plays an important role in object recognition. Despite years of research, no models of vision could account for shape understanding as found in human vision of natural images. Given recent successes of deep neural networks (DNNs) in object recognition, we hypothesized that DNNs might in fact learn to capture perceptually salient shape dimensions. Using a variety of stimulus sets, we demonstrate here that the output layers of several DNNs develop representations that relate closely to human perceptual shape judgments. Surprisingly, such sensitivity to shape develops in these models even though they were never explicitly trained for shape processing. Moreover, we show that these models also represent categorical object similarity that follows human semantic judgments, albeit to a lesser extent. Taken together, our results bring forward the exciting idea that DNNs capture not only objective dimensions of stimuli, such as their category, but also their subjective, or perceptual, aspects, such as shape and semantic similarity as judged by humans.
Collapse
Affiliation(s)
- Jonas Kubilius
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| | - Stefania Bracci
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
| | - Hans P. Op de Beeck
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| |
Collapse
|
45
|
Choo H, Walther DB. Contour junctions underlie neural representations of scene categories in high-level human visual cortex. Neuroimage 2016; 135:32-44. [PMID: 27118087 DOI: 10.1016/j.neuroimage.2016.04.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Revised: 03/16/2016] [Accepted: 04/08/2016] [Indexed: 10/21/2022] Open
Abstract
Humans efficiently grasp complex visual environments, making highly consistent judgments of entry-level category despite their high variability in visual appearance. How does the human brain arrive at the invariant neural representations underlying categorization of real-world environments? We here show that the neural representation of visual environments in scene-selective human visual cortex relies on statistics of contour junctions, which provide cues for the three-dimensional arrangement of surfaces in a scene. We manipulated line drawings of real-world environments such that statistics of contour orientations or junctions were disrupted. Manipulated and intact line drawings were presented to participants in an fMRI experiment. Scene categories were decoded from neural activity patterns in the parahippocampal place area (PPA), the occipital place area (OPA) and other visual brain regions. Disruption of junctions but not orientations led to a drastic decrease in decoding accuracy in the PPA and OPA, indicating the reliance of these areas on intact junction statistics. Accuracy of decoding from early visual cortex, on the other hand, was unaffected by either image manipulation. We further show that the correlation of error patterns between decoding from the scene-selective brain areas and behavioral experiments is contingent on intact contour junctions. Finally, a searchlight analysis exposes the reliance of visually active brain regions on different sets of contour properties. Statistics of contour length and curvature dominate neural representations of scene categories in early visual areas and contour junctions in high-level scene-selective brain regions.
Collapse
Affiliation(s)
- Heeyoung Choo
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, ON M5S 3G3, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, ON M5S 3G3, Canada
| |
Collapse
|
46
|
Bryan PB, Julian JB, Epstein RA. Rectilinear Edge Selectivity Is Insufficient to Explain the Category Selectivity of the Parahippocampal Place Area. Front Hum Neurosci 2016; 10:137. [PMID: 27064591 PMCID: PMC4811863 DOI: 10.3389/fnhum.2016.00137] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 03/15/2016] [Indexed: 11/30/2022] Open
Abstract
The parahippocampal place area (PPA) is one of several brain regions that respond more strongly to scenes than to non-scene items such as objects and faces. The mechanism underlying this scene-preferential response remains unclear. One possibility is that the PPA is tuned to low-level stimulus features that are found more often in scenes than in less-preferred stimuli. Supporting this view, Nasr et al. (2014) recently observed that some of the stimuli that are known to strongly activate the PPA contain a large number of rectilinear edges. They further demonstrated that PPA response is modulated by rectilinearity for a range of non-scene images. Motivated by these results, we tested whether rectilinearity suffices to explain PPA selectivity for scenes. In the first experiment, we replicated the previous finding of modulation by rectilinearity in the PPA for arrays of 2-d shapes. However, two further experiments failed to find a rectilinearity effect for faces or scenes: high-rectilinearity faces and scenes did not activate the PPA any more strongly than low-rectilinearity faces and scenes. Moreover, the categorical advantage for scenes vs. faces was maintained in the PPA and two other scene-selective regions—the retrosplenial complex (RSC) and occipital place area (OPA)—when rectilinearity was matched between stimulus sets. We conclude that selectivity for scenes in the PPA cannot be explained by a preference for low-level rectilinear edges.
Collapse
Affiliation(s)
- Peter B Bryan
- Department of Psychology, University of Pennsylvania Philadelphia, PA, USA
| | - Joshua B Julian
- Department of Psychology, University of Pennsylvania Philadelphia, PA, USA
| | - Russell A Epstein
- Department of Psychology, University of Pennsylvania Philadelphia, PA, USA
| |
Collapse
|
47
|
Frane A. A Call for Considering Color Vision Deficiency When Creating Graphics for Psychology Reports. The Journal of General Psychology 2015; 142:194-211. [PMID: 26273941 DOI: 10.1080/00221309.2015.1063475] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Although color vision deficiency (CVD) is fairly common, it is often not adequately considered when data is presented in color graphics. This study found that CVD tends to be mentioned neither in the author guidelines of psychology journals nor in the standard publication manuals of the field (e.g., the publication manuals of the American Psychological Association and the American Medical Association). To illustrate the relevance of this problem, a panel of scholars with CVD was used to evaluate the color figures in three respected psychological science journals. Results suggested that a substantial proportion of those figures were needlessly confusing for viewers with CVD and could have been easily improved through simple adjustments. Based on prior literature and on feedback from the panelists, recommendations are made for improving the accessibility of graphics in psychology reports.
Collapse
|
48
|
Aminoff EM, Toneva M, Shrivastava A, Chen X, Misra I, Gupta A, Tarr MJ. Applying artificial vision models to human scene understanding. Front Comput Neurosci 2015; 9:8. [PMID: 25698964 PMCID: PMC4316773 DOI: 10.3389/fncom.2015.00008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 01/15/2015] [Indexed: 12/01/2022] Open
Abstract
How do we understand the complex patterns of neural responses that underlie scene understanding? Studies of the network of brain regions held to be scene-selective—the parahippocampal/lingual region (PPA), the retrosplenial complex (RSC), and the occipital place area (TOS)—have typically focused on single visual dimensions (e.g., size), rather than the high-dimensional feature space in which scenes are likely to be neurally represented. Here we leverage well-specified artificial vision systems to explicate a more complex understanding of how scenes are encoded in this functional network. We correlated similarity matrices within three different scene-spaces arising from: (1) BOLD activity in scene-selective brain regions; (2) behavioral measured judgments of visually-perceived scene similarity; and (3) several different computer vision models. These correlations revealed: (1) models that relied on mid- and high-level scene attributes showed the highest correlations with the patterns of neural activity within the scene-selective network; (2) NEIL and SUN—the models that best accounted for the patterns obtained from PPA and TOS—were different from the GIST model that best accounted for the pattern obtained from RSC; (3) The best performing models outperformed behaviorally-measured judgments of scene similarity in accounting for neural data. One computer vision method—NEIL (“Never-Ending-Image-Learner”), which incorporates visual features learned as statistical regularities across web-scale numbers of scenes—showed significant correlations with neural activity in all three scene-selective regions and was one of the two models best able to account for variance in the PPA and TOS. We suggest that these results are a promising first step in explicating more fine-grained models of neural scene understanding, including developing a clearer picture of the division of labor among the components of the functional scene-selective brain network.
Collapse
Affiliation(s)
- Elissa M Aminoff
- Center for the Neural Basis of Cognition, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Psychology, Carnegie Mellon University Pittsburgh, PA, USA
| | - Mariya Toneva
- Center for the Neural Basis of Cognition, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Machine Learning, Carnegie Mellon University Pittsburgh, PA, USA
| | | | - Xinlei Chen
- Robotics Institute, Carnegie Mellon University Pittsburgh, PA, USA
| | - Ishan Misra
- Robotics Institute, Carnegie Mellon University Pittsburgh, PA, USA
| | - Abhinav Gupta
- Robotics Institute, Carnegie Mellon University Pittsburgh, PA, USA
| | - Michael J Tarr
- Center for the Neural Basis of Cognition, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Psychology, Carnegie Mellon University Pittsburgh, PA, USA
| |
Collapse
|