Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Walther DB, Shen D. Nonaccidental properties underlie human categorization of complex natural scenes. Psychol Sci 2014;25:851-60. [PMID: 24474725 DOI: 10.1177/0956797613512662] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

For:	Walther DB, Shen D. Nonaccidental properties underlie human categorization of complex natural scenes. Psychol Sci 2014;25:851-60. [PMID: 24474725 DOI: 10.1177/0956797613512662] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Number

Cited by Other Article(s)

Dvoeglazova M, Sawada T. A role of rectangularity in perceiving a 3D shape of an object. Vision Res 2024;221:108433. [PMID: 38772272 DOI: 10.1016/j.visres.2024.108433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/19/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024]

Leemans M, Damiano C, Wagemans J. Finding the meaning in meaning maps: Quantifying the roles of semantic and non-semantic scene information in guiding visual attention. Cognition 2024;247:105788. [PMID: 38579638 DOI: 10.1016/j.cognition.2024.105788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/16/2024] [Accepted: 03/30/2024] [Indexed: 04/07/2024]

Abstract

In real-world vision, people prioritise the most informative scene regions via eye-movements. According to the cognitive guidance theory of visual attention, viewers allocate visual attention to those parts of the scene that are expected to be the most informative. The expected information of a scene region is coded in the semantic distribution of that scene. Meaning maps have been proposed to capture the spatial distribution of local scene semantics in order to test cognitive guidance theories of attention. Notwithstanding the success of meaning maps, the reason for their success has been contested. This has led to at least two possible explanations for the success of meaning maps in predicting visual attention. On the one hand, meaning maps might measure scene semantics. On the other hand, meaning maps might measure scene features, overlapping with, but distinct from, scene semantics. This study aims to disentangle these two sources of information by considering both conceptual information and non-semantic scene entropy simultaneously. We found that both semantic and non-semantic information is captured by meaning maps, but scene entropy accounted for more unique variance in the success of meaning maps than conceptual information. Additionally, some explained variance was unaccounted for by either source of information. Thus, although meaning maps may index some aspect of semantic information, their success seems to be better explained by non-semantic information. We conclude that meaning maps may not yet be a good tool to test cognitive guidance theories of attention in general, since they capture non-semantic aspects of local semantic density and only a small portion of conceptual information. Rather, we suggest that researchers should better define the exact aspect of cognitive guidance theories they wish to test and then use the tool that best captures that desired semantic information. As it stands, the semantic information contained in meaning maps seems too ambiguous to draw strong conclusions about how and when semantic information guides visual attention.

Collapse

Watier N. Measures of angularity in digital images. Behav Res Methods 2024:10.3758/s13428-024-02412-5. [PMID: 38689153 DOI: 10.3758/s13428-024-02412-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2024] [Indexed: 05/02/2024]

Morgenstern Y, Storrs KR, Schmidt F, Hartmann F, Tiedemann H, Wagemans J, Fleming RW. High-level aftereffects reveal the role of statistical features in visual shape encoding. Curr Biol 2024;34:1098-1106.e5. [PMID: 38218184 PMCID: PMC10931819 DOI: 10.1016/j.cub.2023.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 11/13/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]

Han S, Rezanejad M, Walther DB. Memorability of line drawings of scenes: the role of contour properties. Mem Cognit 2023:10.3758/s13421-023-01478-4. [PMID: 37903987 DOI: 10.3758/s13421-023-01478-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2023] [Indexed: 11/01/2023]

Farzanfar D, Walther DB. Changing What You Like: Modifying Contour Properties Shifts Aesthetic Valuations of Scenes. Psychol Sci 2023;34:1101-1120. [PMID: 37669066 DOI: 10.1177/09567976231190546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023] Open

Kang J, Park S. Combined representation of visual features in the scene-selective cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.24.550280. [PMID: 37546776 PMCID: PMC10402097 DOI: 10.1101/2023.07.24.550280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]

Abstract

Visual features of separable dimensions like color and shape conjoin to represent an integrated entity. We investigated how visual features bind to form a complex visual scene. Specifically, we focused on features important for visually guided navigation: direction and distance. Previously, separate works have shown that directions and distances of navigable paths are coded in the occipital place area (OPA). Using functional magnetic resonance imaging (fMRI), we tested how separate features are concurrently represented in the OPA. Participants saw eight different types of scenes, in which four of them had one path and the other four had two paths. In single-path scenes, path direction was either to the left or to the right. In double-path scenes, both directions were present. Each path contained a glass wall located either near or far, changing the navigational distance. To test how the OPA represents paths in terms of direction and distance features, we took three approaches. First, the independent-features approach examined whether the OPA codes directions and distances independently in single-path scenes. Second, the integrated-features approach explored how directions and distances are integrated into path units, as compared to pooled features, using double-path scenes. Finally, the integrated-paths approach asked how separate paths are combined into a scene. Using multi-voxel pattern similarity analysis, we found that the OPA's representations of single-path scenes were similar to other single-path scenes of either the same direction or the same distance. Representations of double-path scenes were similar to the combination of two constituent single-paths, as a combined unit of direction and distance rather than pooled representation of all features. These results show that the OPA combines the two features to form path units, which are then used to build multiple-path scenes. Altogether, these results suggest that visually guided navigation may be supported by the OPA that automatically and efficiently combines multiple features relevant for navigation and represent a navigation file.

Collapse

Henderson MM, Tarr MJ, Wehbe L. A Texture Statistics Encoding Model Reveals Hierarchical Feature Selectivity across Human Visual Cortex. J Neurosci 2023;43:4144-4161. [PMID: 37127366 PMCID: PMC10255092 DOI: 10.1523/jneurosci.1822-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/21/2023] [Accepted: 03/26/2023] [Indexed: 05/03/2023] Open

Abstract

Midlevel features, such as contour and texture, provide a computational link between low- and high-level visual representations. Although the nature of midlevel representations in the brain is not fully understood, past work has suggested a texture statistics model, called the P-S model (Portilla and Simoncelli, 2000), is a candidate for predicting neural responses in areas V1-V4 as well as human behavioral data. However, it is not currently known how well this model accounts for the responses of higher visual cortex to natural scene images. To examine this, we constructed single-voxel encoding models based on P-S statistics and fit the models to fMRI data from human subjects (both sexes) from the Natural Scenes Dataset (Allen et al., 2022). We demonstrate that the texture statistics encoding model can predict the held-out responses of individual voxels in early retinotopic areas and higher-level category-selective areas. The ability of the model to reliably predict signal in higher visual cortex suggests that the representation of texture statistics features is widespread throughout the brain. Furthermore, using variance partitioning analyses, we identify which features are most uniquely predictive of brain responses and show that the contributions of higher-order texture features increase from early areas to higher areas on the ventral and lateral surfaces. We also demonstrate that patterns of sensitivity to texture statistics can be used to recover broad organizational axes within visual cortex, including dimensions that capture semantic image content. These results provide a key step forward in characterizing how midlevel feature representations emerge hierarchically across the visual system.SIGNIFICANCE STATEMENT Intermediate visual features, like texture, play an important role in cortical computations and may contribute to tasks like object and scene recognition. Here, we used a texture model proposed in past work to construct encoding models that predict the responses of neural populations in human visual cortex (measured with fMRI) to natural scene stimuli. We show that responses of neural populations at multiple levels of the visual system can be predicted by this model, and that the model is able to reveal an increase in the complexity of feature representations from early retinotopic cortex to higher areas of ventral and lateral visual cortex. These results support the idea that texture-like representations may play a broad underlying role in visual processing.

Collapse

The Spatiotemporal Neural Dynamics of Object Recognition for Natural Images and Line Drawings. J Neurosci 2023;43:484-500. [PMID: 36535769 PMCID: PMC9864561 DOI: 10.1523/jneurosci.1546-22.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/18/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open

Abstract

Drawings offer a simple and efficient way to communicate meaning. While line drawings capture only coarsely how objects look in reality, we still perceive them as resembling real-world objects. Previous work has shown that this perceived similarity is mirrored by shared neural representations for drawings and natural images, which suggests that similar mechanisms underlie the recognition of both. However, other work has proposed that representations of drawings and natural images become similar only after substantial processing has taken place, suggesting distinct mechanisms. To arbitrate between those alternatives, we measured brain responses resolved in space and time using fMRI and MEG, respectively, while human participants (female and male) viewed images of objects depicted as photographs, line drawings, or sketch-like drawings. Using multivariate decoding, we demonstrate that object category information emerged similarly fast and across overlapping regions in occipital, ventral-temporal, and posterior parietal cortex for all types of depiction, yet with smaller effects at higher levels of visual abstraction. In addition, cross-decoding between depiction types revealed strong generalization of object category information from early processing stages on. Finally, by combining fMRI and MEG data using representational similarity analysis, we found that visual information traversed similar processing stages for all types of depiction, yet with an overall stronger representation for photographs. Together, our results demonstrate broad commonalities in the neural dynamics of object recognition across types of depiction, thus providing clear evidence for shared neural mechanisms underlying recognition of natural object images and abstract drawings.SIGNIFICANCE STATEMENT When we see a line drawing, we effortlessly recognize it as an object in the world despite its simple and abstract style. Here we asked to what extent this correspondence in perception is reflected in the brain. To answer this question, we measured how neural processing of objects depicted as photographs and line drawings with varying levels of detail (from natural images to abstract line drawings) evolves over space and time. We find broad commonalities in the spatiotemporal dynamics and the neural representations underlying the perception of photographs and even abstract drawings. These results indicate a shared basic mechanism supporting recognition of drawings and natural images.

Collapse

Schlegelmilch K, Wertz AE. Visual segmentation of complex naturalistic structures in an infant eye-tracking search task. PLoS One 2022;17:e0266158. [PMID: 35363809 PMCID: PMC8975119 DOI: 10.1371/journal.pone.0266158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/15/2022] [Indexed: 11/24/2022] Open

Three cortical scene systems and their development. Trends Cogn Sci 2022;26:117-127. [PMID: 34857468 PMCID: PMC8770598 DOI: 10.1016/j.tics.2021.11.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/14/2021] [Accepted: 11/06/2021] [Indexed: 02/03/2023]

Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022;54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]

Sawada T, Mendoza Arvizu A, Farshchi M, Kiba A. Navigation in Contour-Drawn Scenes Using Augmented Reality. Iperception 2022;13:20416695221074707. [PMID: 35126990 PMCID: PMC8808034 DOI: 10.1177/20416695221074707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 01/03/2022] [Indexed: 11/30/2022] Open

Sheng H, Wilder J, Walther DB. Where to draw the line? PLoS One 2021;16:e0258376. [PMID: 34748556 PMCID: PMC8575256 DOI: 10.1371/journal.pone.0258376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 09/27/2021] [Indexed: 12/02/2022] Open

Contour features predict valence and threat judgements in scenes. Sci Rep 2021;11:19405. [PMID: 34593933 PMCID: PMC8484627 DOI: 10.1038/s41598-021-99044-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/13/2021] [Indexed: 11/29/2022] Open

Hayes TR, Henderson JM. Deep saliency models learn low-, mid-, and high-level features to predict scene attention. Sci Rep 2021;11:18434. [PMID: 34531484 PMCID: PMC8445969 DOI: 10.1038/s41598-021-97879-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/31/2021] [Indexed: 02/08/2023] Open

Dreneva A, Shvarts A, Chumachenko D, Krichevets A. Extrafoveal Processing in Categorical Search for Geometric Shapes: General Tendencies and Individual Variations. Cogn Sci 2021;45:e13025. [PMID: 34379345 PMCID: PMC8459262 DOI: 10.1111/cogs.13025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 06/10/2021] [Accepted: 06/27/2021] [Indexed: 11/29/2022]

Dvoeglazova M, Koshmanova E, Sawada T. Visual sensitivity to parallel configurations of contours compared with sensitivity to other configurations. Vision Res 2021;188:149-161. [PMID: 34333200 DOI: 10.1016/j.visres.2021.07.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/11/2021] [Accepted: 07/09/2021] [Indexed: 11/28/2022]

Tharmaratnam V, Patel M, Lowe MX, Cant JS. Shared cognitive mechanisms involved in the processing of scene texture and scene shape. J Vis 2021;21:11. [PMID: 34269793 PMCID: PMC8297417 DOI: 10.1167/jov.21.7.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Melnik N, Coates DR, Sayim B. Geometrically restricted image descriptors: A method to capture the appearance of shape. J Vis 2021;21:14. [PMID: 33688921 PMCID: PMC7961119 DOI: 10.1167/jov.21.3.14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Farshchi M, Kiba A, Sawada T. Seeing our 3D world while only viewing contour-drawings. PLoS One 2021;16:e0242581. [PMID: 33481778 PMCID: PMC7822326 DOI: 10.1371/journal.pone.0242581] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 11/04/2020] [Indexed: 11/19/2022] Open

Vision at a glance: The role of attention in processing object-to-object categorical relations. Atten Percept Psychophys 2020;82:671-688. [PMID: 31907840 DOI: 10.3758/s13414-019-01940-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abstract

When viewing a scene at a glance, the visual and categorical relations between objects in the scene are extracted rapidly. In the present study, the involvement of spatial attention in the processing of such relations was investigated. Participants performed a category detection task (e.g., "is there an animal") on briefly flashed object pairs. In one condition, visual attention spanned both stimuli, and in another, attention was focused on a single object while its counterpart object served as a task-irrelevant distractor. The results showed that when participants attended to both objects, a categorical relation effect was obtained (Exp. 1). Namely, latencies were shorter to objects from the same category than to those from different superordinate categories (e.g., clothes, vehicles), even if categories were not prioritized by the task demands. Focusing attention on only one of two stimuli, however, largely eliminated this effect (Exp. 2). Some relational processing was seen when categories were narrowed to the basic level and were highly distinct from each other (Exp. 3), implying that categorical relational processing necessitates attention, unless the unattended input is highly predictable. Critically, when a prioritized (to-be-detected) object category, positioned in a distractor's location, differed from an attended object, a robust distraction effect was consistently observed, regardless of category homogeneity and/or of response conflict factors (Exp. 4). This finding suggests that object relations that involve stimuli that are highly relevant to the task settings may survive attentional deprivation at the distractor location. The involvement of spatial attention in object-to-object categorical processing is most critical in situations that include wide categories that are irrelevant to one's current goals.

Collapse

Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization. J Neurosci 2020;40:5283-5299. [PMID: 32467356 DOI: 10.1523/jneurosci.2088-19.2020] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 04/18/2020] [Accepted: 04/23/2020] [Indexed: 11/21/2022] Open

Abstract

Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.SIGNIFICANCE STATEMENT In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties, such as colors and contours, to high-level properties, such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.

Collapse

Effects of Spatial Frequency Filtering Choices on the Perception of Filtered Images. Vision (Basel) 2020;4:vision4020029. [PMID: 32466442 PMCID: PMC7355859 DOI: 10.3390/vision4020029] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 05/13/2020] [Accepted: 05/22/2020] [Indexed: 11/17/2022] Open

Smith ME, Loschky LC. The influence of sequential predictions on scene-gist recognition. J Vis 2020;19:14. [PMID: 31622473 DOI: 10.1167/19.12.14] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Abstract

Past research suggests that recognizing scene gist, a viewer's holistic semantic representation of a scene acquired within a single eye fixation, involves purely feed-forward mechanisms. We investigated whether expectations can influence scene categorization. To do this, we embedded target scenes in more ecologically valid, first-person-viewpoint image sequences, along spatiotemporally connected routes (e.g., an office to a parking lot). We manipulated the sequences' spatiotemporal coherence by presenting them either coherently or in random order. Participants identified the category of one target scene in a 10-scene-image rapid serial visual presentation. Categorization accuracy was greater for targets in coherent sequences. Accuracy was also greater for targets with more visually similar primes. In Experiment 2, we investigated whether targets in coherent sequences were more predictable and whether predictable images were identified more accurately in Experiment 1 after accounting for the effect of prime-to-target visual similarity. To do this, we removed targets and had participants predict the category of the missing scene. Images were more accurately predicted in coherent sequences, and both image predictability and prime-to-target visual similarity independently contributed to performance in Experiment 1. To test whether prediction-based facilitation effects were solely due to response bias, participants performed a two-alternative forced-choice task in which they indicated whether the target was an intact or a phase-randomized scene. Critically, predictability of the target category was irrelevant to this task. Nevertheless, results showed that sensitivity, but not response bias, was greater for targets in coherent sequences. Predictions made prior to viewing a scene facilitate scene-gist recognition.

Collapse

Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020;141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]

Dillon MR, Persichetti AS, Spelke ES, Dilks DD. Places in the Brain: Bridging Layout and Object Geometry in Scene-Selective Cortex. Cereb Cortex 2019. [PMID: 28633321 DOI: 10.1093/cercor/bhx139] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open

Shafer-Skelton A, Brady TF. Scene layout priming relies primarily on low-level features rather than scene layout. J Vis 2019;19:14. [PMID: 30677124 DOI: 10.1167/19.1.14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Wallis TS, Funke CM, Ecker AS, Gatys LA, Wichmann FA, Bethge M. Image content is more important than Bouma's Law for scene metamers. eLife 2019;8:42512. [PMID: 31038458 PMCID: PMC6491040 DOI: 10.7554/elife.42512] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 03/09/2019] [Indexed: 11/16/2022] Open

Abstract

We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling.

As you read this digest, your eyes move to follow the lines of text. But now try to hold your eyes in one position, while reading the text on either side and below: it soon becomes clear that peripheral vision is not as good as we tend to assume. It is not possible to read text far away from the center of your line of vision, but you can see ‘something’ out of the corner of your eye. You can see that there is text there, even if you cannot read it, and you can see where your screen or page ends. So how does the brain generate peripheral vision, and why does it differ from what you see when you look straight ahead?

One idea is that the visual system averages information over areas of the peripheral visual field. This gives rise to texture-like patterns, as opposed to images made up of fine details. Imagine looking at an expanse of foliage, gravel or fur, for example. Your eyes cannot make out the individual leaves, pebbles or hairs. Instead, you perceive an overall pattern in the form of a texture. Our peripheral vision may also consist of such textures, created when the brain averages information over areas of space.

Wallis, Funke et al. have now tested this idea using an existing computer model that averages visual input in this way. By giving the model a series of photographs to process, Wallis, Funke et al. obtained images that should in theory simulate peripheral vision. If the model mimics the mechanisms that generate peripheral vision, then healthy volunteers should be unable to distinguish the processed images from the original photographs. But in fact, the participants could easily discriminate the two sets of images. This suggests that the visual system does not solely use textures to represent information in the peripheral visual field. Wallis, Funke et al. propose that other factors, such as how the visual system separates and groups objects, may instead determine what we see in our peripheral vision.

This knowledge could ultimately benefit patients with eye diseases such as macular degeneration, a condition that causes loss of vision in the center of the visual field and forces patients to rely on their peripheral vision.

Collapse

Mid-level feature contributions to category-specific gaze guidance. Atten Percept Psychophys 2019;81:35-46. [PMID: 30191476 DOI: 10.3758/s13414-018-1594-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Zhu Z, Rao C, Bai S, Latecki LJ. Training convolutional neural network from multi-domain contour images for 3D shape retrieval. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2017.08.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Greene MR. The information content of scene categories. PSYCHOLOGY OF LEARNING AND MOTIVATION 2019. [DOI: 10.1016/bs.plm.2019.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Robin J, Rai Y, Valli M, Olsen RK. Category specificity in the medial temporal lobe: A systematic review. Hippocampus 2018;29:313-339. [PMID: 30155943 DOI: 10.1002/hipo.23024] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 08/03/2018] [Accepted: 08/07/2018] [Indexed: 01/30/2023]

Abstract

Theoretical accounts of medial temporal lobe (MTL) function ascribe different functions to subregions of the MTL including perirhinal, entorhinal, parahippocampal cortices, and the hippocampus. Some have suggested that the functional roles of these subregions vary in terms of their category specificity, showing preferential coding for certain stimulus types, but the evidence for this functional organization is mixed. In this systematic review, we evaluate existing evidence for regional specialization in the MTL for three categories of visual stimuli: faces, objects, and scenes. We review and synthesize across univariate and multivariate neuroimaging studies, as well as neuropsychological studies of cases with lesions to the MTL. Neuroimaging evidence suggests that faces activate the perirhinal cortex, entorhinal cortex, and the anterior hippocampus, while scenes engage the parahippocampal cortex and both the anterior and posterior hippocampus, depending on the contrast condition. There is some evidence for object-related activity in anterior MTL regions when compared to scenes, and in posterior MTL regions when compared to faces, suggesting that aspects of object representations may share similarities with face and scene representations. While neuroimaging evidence suggests some hippocampal specialization for faces and scenes, neuropsychological evidence shows that hippocampal damage leads to impairments in scene memory and perception, but does not entail equivalent impairments for faces in cases where the perirhinal cortex remains intact. Regional specialization based on stimulus categories has implications for understanding the mechanisms of MTL subregions, and highlights the need for the development of theoretical models of MTL function that can accommodate the differential patterns of specificity observed in the MTL.

Collapse

Wilder J, Rezanejad M, Dickinson S, Siddiqi K, Jepson A, Walther DB. Local contour symmetry facilitates scene categorization. Cognition 2018;182:307-317. [PMID: 30415132 DOI: 10.1016/j.cognition.2018.09.014] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 09/20/2018] [Accepted: 09/22/2018] [Indexed: 10/27/2022]

Dissociable Neural Systems for Recognizing Places and Navigating through Them. J Neurosci 2018;38:10295-10304. [PMID: 30348675 DOI: 10.1523/jneurosci.1200-18.2018] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 09/19/2018] [Accepted: 09/24/2018] [Indexed: 02/05/2023] Open

Abstract

When entering an environment, we can use the present visual information from the scene to either recognize the kind of place it is (e.g., a kitchen or a bedroom) or navigate through it. Here we directly test the hypothesis that these two processes, what we call "scene categorization" and "visually-guided navigation", are supported by dissociable neural systems. Specifically, we manipulated task demands by asking human participants (male and female) to perform a scene categorization, visually-guided navigation, and baseline task on images of scenes, and measured both the average univariate responses and multivariate spatial pattern of responses within two scene-selective cortical regions, the parahippocampal place area (PPA) and occipital place area (OPA), hypothesized to be separably involved in scene categorization and visually-guided navigation, respectively. As predicted, in the univariate analysis, PPA responded significantly more during the categorization task than during both the navigation and baseline tasks, whereas OPA showed the complete opposite pattern. Similarly, in the multivariate analysis, a linear support vector machine achieved above-chance classification for the categorization task, but not the navigation task in PPA. By contrast, above-chance classification was achieved for both the navigation and categorization tasks in OPA. However, above-chance classification for both tasks was also found in early visual cortex and hence not specific to OPA, suggesting that the spatial patterns of responses in OPA are merely inherited from early vision, and thus may be epiphenomenal to behavior. Together, these results are evidence for dissociable neural systems involved in recognizing places and navigating through them.SIGNIFICANCE STATEMENT It has been nearly three decades since Goodale and Milner demonstrated that recognizing objects and manipulating them involve distinct neural processes. Today we show the same is true of our interactions with our environment: recognizing places and navigating through them are neurally dissociable. More specifically, we found that a scene-selective region, the parahippocampal place area, is active when participants are asked to categorize a scene, but not when asked to imagine navigating through it, whereas another scene-selective region, the occipital place area, shows the exact opposite pattern. This double dissociation is evidence for dissociable neural systems within scene processing, similar to the bifurcation of object processing described by Goodale and Milner (1992).

Collapse

Wilder J, Dickinson S, Jepson A, Walther DB. Spatial relationships between contours impact rapid scene classification. J Vis 2018;18:1. [DOI: 10.1167/18.8.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

O'Connell TP, Sederberg PB, Walther DB. Representational differences between line drawings and photographs of natural scenes: A dissociation between multi-voxel pattern analysis and repetition suppression. Neuropsychologia 2018;117:513-519. [PMID: 29936121 DOI: 10.1016/j.neuropsychologia.2018.06.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 06/14/2018] [Accepted: 06/17/2018] [Indexed: 11/18/2022]

Bonner MF, Epstein RA. Computational mechanisms underlying cortical responses to the affordance properties of visual scenes. PLoS Comput Biol 2018;14:e1006111. [PMID: 29684011 PMCID: PMC5933806 DOI: 10.1371/journal.pcbi.1006111] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 05/03/2018] [Accepted: 03/31/2018] [Indexed: 11/24/2022] Open

Abstract

Biologically inspired deep convolutional neural networks (CNNs), trained for computer vision tasks, have been found to predict cortical responses with remarkable accuracy. However, the internal operations of these models remain poorly understood, and the factors that account for their success are unknown. Here we develop a set of techniques for using CNNs to gain insights into the computational mechanisms underlying cortical responses. We focused on responses in the occipital place area (OPA), a scene-selective region of dorsal occipitoparietal cortex. In a previous study, we showed that fMRI activation patterns in the OPA contain information about the navigational affordances of scenes; that is, information about where one can and cannot move within the immediate environment. We hypothesized that this affordance information could be extracted using a set of purely feedforward computations. To test this idea, we examined a deep CNN with a feedforward architecture that had been previously trained for scene classification. We found that responses in the CNN to scene images were highly predictive of fMRI responses in the OPA. Moreover the CNN accounted for the portion of OPA variance relating to the navigational affordances of scenes. The CNN could thus serve as an image-computable candidate model of affordance-related responses in the OPA. We then ran a series of in silico experiments on this model to gain insights into its internal operations. These analyses showed that the computation of affordance-related features relied heavily on visual information at high-spatial frequencies and cardinal orientations, both of which have previously been identified as low-level stimulus preferences of scene-selective visual cortex. These computations also exhibited a strong preference for information in the lower visual field, which is consistent with known retinotopic biases in the OPA. Visualizations of feature selectivity within the CNN suggested that affordance-based responses encoded features that define the layout of the spatial environment, such as boundary-defining junctions and large extended surfaces. Together, these results map the sensory functions of the OPA onto a fully quantitative model that provides insights into its visual computations. More broadly, they advance integrative techniques for understanding visual cortex across multiple level of analysis: from the identification of cortical sensory functions to the modeling of their underlying algorithms.

How does visual cortex compute behaviorally relevant properties of the local environment from sensory inputs? For decades, computational models have been able to explain only the earliest stages of biological vision, but recent advances in deep neural networks have yielded a breakthrough in the modeling of high-level visual cortex. However, these models are not explicitly designed for testing neurobiological theories, and, like the brain itself, their internal operations remain poorly understood. We examined a deep neural network for insights into the cortical representation of navigational affordances in visual scenes. In doing so, we developed a set of high-throughput techniques and statistical tools that are broadly useful for relating the internal operations of neural networks with the information processes of the brain. Our findings demonstrate that a deep neural network with purely feedforward computations can account for the processing of navigational layout in high-level visual cortex. We next performed a series of experiments and visualization analyses on this neural network. These analyses characterized a set of stimulus input features that may be critical for computing navigationally related cortical representations, and they identified a set of high-level, complex scene features that may serve as a basis set for the cortical coding of navigational layout. These findings suggest a computational mechanism through which high-level visual cortex might encode the spatial structure of the local navigational environment, and they demonstrate an experimental approach for leveraging the power of deep neural networks to understand the visual computations of the brain.

Collapse

Berman D, Golomb JD, Walther DB. Scene content is predominantly conveyed by high spatial frequencies in scene-selective visual cortex. PLoS One 2017;12:e0189828. [PMID: 29272283 PMCID: PMC5741213 DOI: 10.1371/journal.pone.0189828] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 12/01/2017] [Indexed: 11/19/2022] Open

Abstract

In complex real-world scenes, image content is conveyed by a large collection of intertwined visual features. The visual system disentangles these features in order to extract information about image content. Here, we investigate the role of one integral component: the content of spatial frequencies in an image. Specifically, we measure the amount of image content carried by low versus high spatial frequencies for the representation of real-world scenes in scene-selective regions of human visual cortex. To this end, we attempted to decode scene categories from the brain activity patterns of participants viewing scene images that contained the full spatial frequency spectrum, only low spatial frequencies, or only high spatial frequencies, all carefully controlled for contrast and luminance. Contrary to the findings from numerous behavioral studies and computational models that have highlighted how low spatial frequencies preferentially encode image content, decoding of scene categories from the scene-selective brain regions, including the parahippocampal place area (PPA), was significantly more accurate for high than low spatial frequency images. In fact, decoding accuracy was just as high for high spatial frequency images as for images containing the full spatial frequency spectrum in scene-selective areas PPA, RSC, OPA and object selective area LOC. We also found an interesting dissociation between the posterior and anterior subdivisions of PPA: categories were decodable from both high and low spatial frequency scenes in posterior PPA but only from high spatial frequency scenes in anterior PPA; and spatial frequency was explicitly decodable from posterior but not anterior PPA. Our results are consistent with recent findings that line drawings, which consist almost entirely of high spatial frequencies, elicit a neural representation of scene categories that is equivalent to that of full-spectrum color photographs. Collectively, these findings demonstrate the importance of high spatial frequencies for conveying the content of complex real-world scenes.

Collapse

Robin J, Lowe MX, Pishdadian S, Rivest J, Cant JS, Moscovitch M. Selective scene perception deficits in a case of topographical disorientation. Cortex 2017;92:70-80. [DOI: 10.1016/j.cortex.2017.03.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 11/22/2016] [Accepted: 03/20/2017] [Indexed: 11/28/2022]

Kubilius J, Sleurs C, Wagemans J. Sensitivity to Nonaccidental Configurations of Two-Line Stimuli. Iperception 2017;8:2041669517699628. [PMID: 28491272 PMCID: PMC5405893 DOI: 10.1177/2041669517699628] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Making Sense of Real-World Scenes. Trends Cogn Sci 2016;20:843-856. [PMID: 27769727 DOI: 10.1016/j.tics.2016.09.003] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/06/2016] [Accepted: 09/06/2016] [Indexed: 11/23/2022]

Ferrara K, Park S. Neural representation of scene boundaries. Neuropsychologia 2016;89:180-190. [PMID: 27181883 DOI: 10.1016/j.neuropsychologia.2016.05.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 05/02/2016] [Accepted: 05/11/2016] [Indexed: 10/21/2022]

Kubilius J, Bracci S, Op de Beeck HP. Deep Neural Networks as a Computational Model for Human Shape Sensitivity. PLoS Comput Biol 2016;12:e1004896. [PMID: 27124699 PMCID: PMC4849740 DOI: 10.1371/journal.pcbi.1004896] [Citation(s) in RCA: 131] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 03/30/2016] [Indexed: 11/19/2022] Open

Abstract

Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development.

Shape plays an important role in object recognition. Despite years of research, no models of vision could account for shape understanding as found in human vision of natural images. Given recent successes of deep neural networks (DNNs) in object recognition, we hypothesized that DNNs might in fact learn to capture perceptually salient shape dimensions. Using a variety of stimulus sets, we demonstrate here that the output layers of several DNNs develop representations that relate closely to human perceptual shape judgments. Surprisingly, such sensitivity to shape develops in these models even though they were never explicitly trained for shape processing. Moreover, we show that these models also represent categorical object similarity that follows human semantic judgments, albeit to a lesser extent. Taken together, our results bring forward the exciting idea that DNNs capture not only objective dimensions of stimuli, such as their category, but also their subjective, or perceptual, aspects, such as shape and semantic similarity as judged by humans.

Collapse

Choo H, Walther DB. Contour junctions underlie neural representations of scene categories in high-level human visual cortex. Neuroimage 2016;135:32-44. [PMID: 27118087 DOI: 10.1016/j.neuroimage.2016.04.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Revised: 03/16/2016] [Accepted: 04/08/2016] [Indexed: 10/21/2022] Open

Bryan PB, Julian JB, Epstein RA. Rectilinear Edge Selectivity Is Insufficient to Explain the Category Selectivity of the Parahippocampal Place Area. Front Hum Neurosci 2016;10:137. [PMID: 27064591 PMCID: PMC4811863 DOI: 10.3389/fnhum.2016.00137] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 03/15/2016] [Indexed: 11/30/2022] Open

Frane A. A Call for Considering Color Vision Deficiency When Creating Graphics for Psychology Reports. The Journal of General Psychology 2015;142:194-211. [PMID: 26273941 DOI: 10.1080/00221309.2015.1063475] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Aminoff EM, Toneva M, Shrivastava A, Chen X, Misra I, Gupta A, Tarr MJ. Applying artificial vision models to human scene understanding. Front Comput Neurosci 2015;9:8. [PMID: 25698964 PMCID: PMC4316773 DOI: 10.3389/fncom.2015.00008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 01/15/2015] [Indexed: 12/01/2022] Open

Abstract

How do we understand the complex patterns of neural responses that underlie scene understanding? Studies of the network of brain regions held to be scene-selective—the parahippocampal/lingual region (PPA), the retrosplenial complex (RSC), and the occipital place area (TOS)—have typically focused on single visual dimensions (e.g., size), rather than the high-dimensional feature space in which scenes are likely to be neurally represented. Here we leverage well-specified artificial vision systems to explicate a more complex understanding of how scenes are encoded in this functional network. We correlated similarity matrices within three different scene-spaces arising from: (1) BOLD activity in scene-selective brain regions; (2) behavioral measured judgments of visually-perceived scene similarity; and (3) several different computer vision models. These correlations revealed: (1) models that relied on mid- and high-level scene attributes showed the highest correlations with the patterns of neural activity within the scene-selective network; (2) NEIL and SUN—the models that best accounted for the patterns obtained from PPA and TOS—were different from the GIST model that best accounted for the pattern obtained from RSC; (3) The best performing models outperformed behaviorally-measured judgments of scene similarity in accounting for neural data. One computer vision method—NEIL (“Never-Ending-Image-Learner”), which incorporates visual features learned as statistical regularities across web-scale numbers of scenes—showed significant correlations with neural activity in all three scene-selective regions and was one of the two models best able to account for variance in the PPA and TOS. We suggest that these results are a promising first step in explicating more fine-grained models of neural scene understanding, including developing a clearer picture of the division of labor among the components of the functional scene-selective brain network.

Collapse