1
|
Ritchie JB, Montesinos S, Carter MJ. What Is a Visual Stream? J Cogn Neurosci 2024; 36:2627-2638. [PMID: 38820554 PMCID: PMC11602008 DOI: 10.1162/jocn_a_02191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
Abstract
The dual stream model of the human and non-human primate visual systems remains Leslie Ungerleider's (1946-2020) most indelible contribution to visual neuroscience. In this model, a dorsal "where" stream specialized for visuospatial representation extends through occipitoparietal cortex, whereas a ventral "what" stream specialized for representing object qualities extends through occipitotemporal cortex. Over time, this model underwent a number of revisions and expansions. In one of her last scientific contributions, Leslie proposed a third visual stream specialized for representing dynamic signals related to social perception. This alteration invites the question: What is a visual stream, and how are different visual streams individuated? In this article, we first consider and reject a simple answer to this question based on a common idealizing visualization of the model, which conflicts with the complexities of the visual system that the model was intended to capture. Next, we propose a taxonomic answer that takes inspiration from the philosophy of science and Leslie's body of work, which distinguishes between neural mechanisms, pathways, and streams. In this taxonomy, visual streams are superordinate to pathways and mechanisms and provide individuation conditions for determining whether collections of cortical connections delineate different visual streams. Given this characterization, we suggest that the proposed third visual stream does not yet meet these conditions, although the tripartite model still suggests important revisions to how we think about the organization of the human and non-human primate visual systems.
Collapse
|
2
|
Ritchie JB, Andrews ST, Vaziri-Pashkam M, Baker CI. Graspable foods and tools elicit similar responses in visual cortex. Cereb Cortex 2024; 34:bhae383. [PMID: 39319569 DOI: 10.1093/cercor/bhae383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 08/28/2024] [Accepted: 09/04/2024] [Indexed: 09/26/2024] Open
Abstract
The extrastriatal visual cortex is known to exhibit distinct response profiles to complex stimuli of varying ecological importance (e.g. faces, scenes, and tools). Although food is primarily distinguished from other objects by its edibility, not its appearance, recent evidence suggests that there is also food selectivity in human visual cortex. Food is also associated with a common behavior, eating, and food consumption typically also involves the manipulation of food, often with hands. In this context, food items share many properties with tools: they are graspable objects that we manipulate in self-directed and stereotyped forms of action. Thus, food items may be preferentially represented in extrastriatal visual cortex in part because of these shared affordance properties, rather than because they reflect a wholly distinct kind of category. We conducted functional MRI and behavioral experiments to test this hypothesis. We found that graspable food items and tools were judged to be similar in their action-related properties and that the location, magnitude, and patterns of neural responses for images of graspable food items were similar in profile to the responses for tool stimuli. Our findings suggest that food selectivity may reflect the behavioral affordances of food items rather than a distinct form of category selectivity.
Collapse
Affiliation(s)
- John Brendan Ritchie
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
| | - Spencer T Andrews
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
- Harvard Law School, Harvard University, 1585 Massachusetts Ave, Cambridge, MA 02138, United States
| | - Maryam Vaziri-Pashkam
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
- Department of Psychological and Brain Sciences, University of Delaware, 434 Wolf Hall, Newark, DE 19716, United States
| | - Chris I Baker
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
| |
Collapse
|
3
|
Nah JC, Malcolm GL, Shomstein S. Task-irrelevant semantic relationship between objects and scene influence attentional allocation. Sci Rep 2024; 14:13175. [PMID: 38849398 PMCID: PMC11161465 DOI: 10.1038/s41598-024-62867-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
Recent behavioral evidence suggests that the semantic relationships between isolated objects can influence attentional allocation, with highly semantically related objects showing an increase in processing efficiency. This semantic influence is present even when it is task-irrelevant (i.e., when semantic information is not central to the task). However, given that objects exist within larger contexts, i.e., scenes, it is critical to understand whether the semantic relationship between a scene and its objects continuously influence attention. Here, we investigated the influence of task-irrelevant scene semantic properties on attentional allocation and the degree to which semantic relationships between scenes and objects interact. Results suggest that task-irrelevant associations between scenes and objects continuously influence attention and that this influence is directly predicted by the perceived strength of semantic associations.
Collapse
Affiliation(s)
| | | | - Sarah Shomstein
- Department of Psychological and Brain Sciences, The George Washington University, Washington, DC, USA
| |
Collapse
|
4
|
Wegner-Clemens K, Malcolm GL, Shomstein S. Predicting attentional allocation in real-world environments: The need to investigate crossmodal semantic guidance. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2024; 15:e1675. [PMID: 38243393 DOI: 10.1002/wcs.1675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024]
Abstract
Real-world environments are multisensory, meaningful, and highly complex. To parse these environments in a highly efficient manner, a subset of this information must be selected both within and across modalities. However, the bulk of attention research has been conducted within sensory modalities, with a particular focus on vision. Visual attention research has made great strides, with over a century of research methodically identifying the underlying mechanisms that allow us to select critical visual information. Spatial attention, attention to features, and object-based attention have all been studied extensively. More recently, research has established semantics (meaning) as a key component to allocating attention in real-world scenes, with the meaning of an item or environment affecting visual attentional selection. However, a full understanding of how semantic information modulates real-world attention requires studying more than vision in isolation. The world provides semantic information across all senses, but with this extra information comes greater complexity. Here, we summarize visual attention (including semantic-based visual attention), crossmodal attention, and argue for the importance of studying crossmodal semantic guidance of attention. This article is categorized under: Psychology > Attention Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Kira Wegner-Clemens
- Psychological and Brain Sciences, George Washington University, Washington, DC, USA
| | | | - Sarah Shomstein
- Psychological and Brain Sciences, George Washington University, Washington, DC, USA
| |
Collapse
|
5
|
Westebbe L, Liang Y, Blaser E. The Accuracy and Precision of Memory for Natural Scenes: A Walk in the Park. Open Mind (Camb) 2024; 8:131-147. [PMID: 38435706 PMCID: PMC10898787 DOI: 10.1162/opmi_a_00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/17/2024] [Indexed: 03/05/2024] Open
Abstract
It is challenging to quantify the accuracy and precision of scene memory because it is unclear what 'space' scenes occupy (how can we quantify error when misremembering a natural scene?). To address this, we exploited the ecologically valid, metric space in which scenes occur and are represented: routes. In a delayed estimation task, participants briefly saw a target scene drawn from a video of an outdoor 'route loop', then used a continuous report wheel of the route to pinpoint the scene. Accuracy was high and unbiased, indicating there was no net boundary extension/contraction. Interestingly, precision was higher for routes that were more self-similar (as characterized by the half-life, in meters, of a route's Multiscale Structural Similarity index), consistent with previous work finding a 'similarity advantage' where memory precision is regulated according to task demands. Overall, scenes were remembered to within a few meters of their actual location.
Collapse
Affiliation(s)
- Leo Westebbe
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Yibiao Liang
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Erik Blaser
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
6
|
Ritchie JB, Andrews S, Vaziri-Pashkam M, Baker CI. Graspable foods and tools elicit similar responses in visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.20.581258. [PMID: 38529495 PMCID: PMC10962699 DOI: 10.1101/2024.02.20.581258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Extrastriatal visual cortex is known to exhibit distinct response profiles to complex stimuli of varying ecological importance (e.g., faces, scenes, and tools). The dominant interpretation of these effects is that they reflect activation of distinct "category-selective" brain regions specialized to represent these and other stimulus categories. We sought to explore an alternative perspective: that the response to these stimuli is determined less by whether they form distinct categories, and more by their relevance to different forms of natural behavior. In this regard, food is an interesting test case, since it is primarily distinguished from other objects by its edibility, not its appearance, and there is evidence of food-selectivity in human visual cortex. Food is also associated with a common behavior, eating, and food consumption typically also involves the manipulation of food, often with the hands. In this context, food items share many properties in common with tools: they are graspable objects that we manipulate in self-directed and stereotyped forms of action. Thus, food items may be preferentially represented in extrastriatal visual cortex in part because of these shared affordance properties, rather than because they reflect a wholly distinct kind of category. We conducted fMRI and behavioral experiments to test this hypothesis. We found that behaviorally graspable food items and tools were judged to be similar in their action-related properties, and that the location, magnitude, and patterns of neural responses for images of graspable food items were similar in profile to the responses for tool stimuli. Our findings suggest that food-selectivity may reflect the behavioral affordances of food items rather than a distinct form of category-selectivity.
Collapse
Affiliation(s)
- J. Brendan Ritchie
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
| | - Spencer Andrews
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
| | - Maryam Vaziri-Pashkam
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
- Department of Psychological and Brain Sciences, University of Delaware, Newark, DE, USA
| | - Christopher I. Baker
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
| |
Collapse
|
7
|
Zhou Z, Geng JJ. Learned associations serve as target proxies during difficult but not easy visual search. Cognition 2024; 242:105648. [PMID: 37897882 DOI: 10.1016/j.cognition.2023.105648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 10/03/2023] [Accepted: 10/12/2023] [Indexed: 10/30/2023]
Abstract
The target template contains information in memory that is used to guide attention during visual search and is typically thought of as containing features of the actual target object. However, when targets are hard to find, it is advantageous to use other information in the visual environment that is predictive of the target's location to help guide attention. The purpose of these studies was to test if newly learned associations between face and scene category images lead observers to use scene information as a proxy for the face target. Our results showed that scene information was used as a proxy for the target to guide attention but only when the target face was difficult to discriminate from the distractor face; when the faces were easy to distinguish, attention was no longer guided by the scene unless the scene was presented earlier. The results suggest that attention is flexibly guided by both target features as well as features of objects that are predictive of the target location. The degree to which each contributes to guiding attention depends on the efficiency with which that information can be used to decode the location of the target in the current moment. The results contribute to the view that attentional guidance is highly flexible in its use of information to rapidly locate the target.
Collapse
Affiliation(s)
- Zhiheng Zhou
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA 95618, USA.
| | - Joy J Geng
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA 95618, USA; Department of Psychology, University of California, One Shields Ave, Davis, CA 95616, USA.
| |
Collapse
|
8
|
Peelen MV, Berlot E, de Lange FP. Predictive processing of scenes and objects. NATURE REVIEWS PSYCHOLOGY 2024; 3:13-26. [PMID: 38989004 PMCID: PMC7616164 DOI: 10.1038/s44159-023-00254-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/25/2023] [Indexed: 07/12/2024]
Abstract
Real-world visual input consists of rich scenes that are meaningfully composed of multiple objects which interact in complex, but predictable, ways. Despite this complexity, we recognize scenes, and objects within these scenes, from a brief glance at an image. In this review, we synthesize recent behavioral and neural findings that elucidate the mechanisms underlying this impressive ability. First, we review evidence that visual object and scene processing is partly implemented in parallel, allowing for a rapid initial gist of both objects and scenes concurrently. Next, we discuss recent evidence for bidirectional interactions between object and scene processing, with scene information modulating the visual processing of objects, and object information modulating the visual processing of scenes. Finally, we review evidence that objects also combine with each other to form object constellations, modulating the processing of individual objects within the object pathway. Altogether, these findings can be understood by conceptualizing object and scene perception as the outcome of a joint probabilistic inference, in which "best guesses" about objects act as priors for scene perception and vice versa, in order to concurrently optimize visual inference of objects and scenes.
Collapse
Affiliation(s)
- Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Berlot
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
9
|
Ullman S, Assif L, Strugatski A, Vatashsky BZ, Levi H, Netanyahu A, Yaari A. Human-like scene interpretation by a guided counterstream processing. Proc Natl Acad Sci U S A 2023; 120:e2211179120. [PMID: 37769256 PMCID: PMC10556630 DOI: 10.1073/pnas.2211179120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 08/24/2023] [Indexed: 09/30/2023] Open
Abstract
In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, Trends. Cogn. Sci. 20, 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, 35th International Conference on Machine Learning, ICML 2018 (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.
Collapse
Affiliation(s)
- Shimon Ullman
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Liav Assif
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Alona Strugatski
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Ben-Zion Vatashsky
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Hila Levi
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Aviv Netanyahu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Adam Yaari
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
10
|
Steel A, Garcia BD, Goyal K, Mynick A, Robertson CE. Scene Perception and Visuospatial Memory Converge at the Anterior Edge of Visually Responsive Cortex. J Neurosci 2023; 43:5723-5737. [PMID: 37474310 PMCID: PMC10401646 DOI: 10.1523/jneurosci.2043-22.2023] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 07/10/2023] [Accepted: 07/14/2023] [Indexed: 07/22/2023] Open
Abstract
To fluidly engage with the world, our brains must simultaneously represent both the scene in front of us and our memory of the immediate surrounding environment (i.e., local visuospatial context). How does the brain's functional architecture enable sensory and mnemonic representations to closely interface while also avoiding sensory-mnemonic interference? Here, we asked this question using first-person, head-mounted virtual reality and fMRI. Using virtual reality, human participants of both sexes learned a set of immersive, real-world visuospatial environments in which we systematically manipulated the extent of visuospatial context associated with a scene image in memory across three learning conditions, spanning from a single FOV to a city street. We used individualized, within-subject fMRI to determine which brain areas support memory of the visuospatial context associated with a scene during recall (Experiment 1) and recognition (Experiment 2). Across the whole brain, activity in three patches of cortex was modulated by the amount of known visuospatial context, each located immediately anterior to one of the three scene perception areas of high-level visual cortex. Individual subject analyses revealed that these anterior patches corresponded to three functionally defined place memory areas, which selectively respond when visually recalling personally familiar places. In addition to showing activity levels that were modulated by the amount of visuospatial context, multivariate analyses showed that these anterior areas represented the identity of the specific environment being recalled. Together, these results suggest a convergence zone for scene perception and memory of the local visuospatial context at the anterior edge of high-level visual cortex.SIGNIFICANCE STATEMENT As we move through the world, the visual scene around us is integrated with our memory of the wider visuospatial context. Here, we sought to understand how the functional architecture of the brain enables coexisting representations of the current visual scene and memory of the surrounding environment. Using a combination of immersive virtual reality and fMRI, we show that memory of visuospatial context outside the current FOV is represented in a distinct set of brain areas immediately anterior and adjacent to the perceptually oriented scene-selective areas of high-level visual cortex. This functional architecture would allow efficient interaction between immediately adjacent mnemonic and perceptual areas while also minimizing interference between mnemonic and perceptual representations.
Collapse
Affiliation(s)
- Adam Steel
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| | - Brenda D Garcia
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| | - Kala Goyal
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| | - Anna Mynick
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| | - Caroline E Robertson
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| |
Collapse
|
11
|
Schmid AC, Barla P, Doerschner K. Material category of visual objects computed from specular image structure. Nat Hum Behav 2023:10.1038/s41562-023-01601-0. [PMID: 37386108 PMCID: PMC10365995 DOI: 10.1038/s41562-023-01601-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 04/14/2023] [Indexed: 07/01/2023]
Abstract
Recognizing materials and their properties visually is vital for successful interactions with our environment, from avoiding slippery floors to handling fragile objects. Yet there is no simple mapping of retinal image intensities to physical properties. Here, we investigated what image information drives material perception by collecting human psychophysical judgements about complex glossy objects. Variations in specular image structure-produced either by manipulating reflectance properties or visual features directly-caused categorical shifts in material appearance, suggesting that specular reflections provide diagnostic information about a wide range of material classes. Perceived material category appeared to mediate cues for surface gloss, providing evidence against a purely feedforward view of neural processing. Our results suggest that the image structure that triggers our perception of surface gloss plays a direct role in visual categorization, and that the perception and neural processing of stimulus properties should be studied in the context of recognition, not in isolation.
Collapse
Affiliation(s)
- Alexandra C Schmid
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany.
| | | | - Katja Doerschner
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
12
|
Coggan DD, Tong F. Spikiness and animacy as potential organizing principles of human ventral visual cortex. Cereb Cortex 2023; 33:8194-8217. [PMID: 36958809 PMCID: PMC10321104 DOI: 10.1093/cercor/bhad108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 03/05/2023] [Accepted: 03/06/2023] [Indexed: 03/25/2023] Open
Abstract
Considerable research has been devoted to understanding the fundamental organizing principles of the ventral visual pathway. A recent study revealed a series of 3-4 topographical maps arranged along the macaque inferotemporal (IT) cortex. The maps articulated a two-dimensional space based on the spikiness and animacy of visual objects, with "inanimate-spiky" and "inanimate-stubby" regions of the maps constituting two previously unidentified cortical networks. The goal of our study was to determine whether a similar functional organization might exist in human IT. To address this question, we presented the same object stimuli and images from "classic" object categories (bodies, faces, houses) to humans while recording fMRI activity at 7 Tesla. Contrasts designed to reveal the spikiness-animacy object space evoked extensive significant activation across human IT. However, unlike the macaque, we did not observe a clear sequence of complete maps, and selectivity for the spikiness-animacy space was deeply and mutually entangled with category-selectivity. Instead, we observed multiple new stimulus preferences in category-selective regions, including functional sub-structure related to object spikiness in scene-selective cortex. Taken together, these findings highlight spikiness as a promising organizing principle of human IT and provide new insights into the role of category-selective regions in visual object processing.
Collapse
Affiliation(s)
- David D Coggan
- Department of Psychology, Vanderbilt University, 111 21st Ave S, Nashville, TN 37240, United States
| | - Frank Tong
- Department of Psychology, Vanderbilt University, 111 21st Ave S, Nashville, TN 37240, United States
| |
Collapse
|
13
|
Wiesmann SL, Võ MLH. Disentangling diagnostic object properties for human scene categorization. Sci Rep 2023; 13:5912. [PMID: 37041222 PMCID: PMC10090043 DOI: 10.1038/s41598-023-32385-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/27/2023] [Indexed: 04/13/2023] Open
Abstract
It usually only takes a single glance to categorize our environment into different scene categories (e.g. a kitchen or a highway). Object information has been suggested to play a crucial role in this process, and some proposals even claim that the recognition of a single object can be sufficient to categorize the scene around it. Here, we tested this claim in four behavioural experiments by having participants categorize real-world scene photographs that were reduced to a single, cut-out object. We show that single objects can indeed be sufficient for correct scene categorization and that scene category information can be extracted within 50 ms of object presentation. Furthermore, we identified object frequency and specificity for the target scene category as the most important object properties for human scene categorization. Interestingly, despite the statistical definition of specificity and frequency, human ratings of these properties were better predictors of scene categorization behaviour than more objective statistics derived from databases of labelled real-world images. Taken together, our findings support a central role of object information during human scene categorization, showing that single objects can be indicative of a scene category if they are assumed to frequently and exclusively occur in a certain environment.
Collapse
Affiliation(s)
- Sandro L Wiesmann
- Department of Psychology, Johann Wolfgang Goethe-Universität, Theodor-W.-Adorno-Platz 6, 60323, Frankfurt Am Main, Germany.
| | - Melissa L-H Võ
- Department of Psychology, Johann Wolfgang Goethe-Universität, Theodor-W.-Adorno-Platz 6, 60323, Frankfurt Am Main, Germany
| |
Collapse
|
14
|
Cheng A, Chen Z, Dilks DD. A stimulus-driven approach reveals vertical luminance gradient as a stimulus feature that drives human cortical scene selectivity. Neuroimage 2023; 269:119935. [PMID: 36764369 PMCID: PMC10044493 DOI: 10.1016/j.neuroimage.2023.119935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/19/2023] [Accepted: 02/07/2023] [Indexed: 02/11/2023] Open
Abstract
Human neuroimaging studies have revealed a dedicated cortical system for visual scene processing. But what is a "scene"? Here, we use a stimulus-driven approach to identify a stimulus feature that selectively drives cortical scene processing. Specifically, using fMRI data from BOLD5000, we examined the images that elicited the greatest response in the cortical scene processing system, and found that there is a common "vertical luminance gradient" (VLG), with the top half of a scene image brighter than the bottom half; moreover, across the entire set of images, VLG systematically increases with the neural response in the scene-selective regions (Study 1). Thus, we hypothesized that VLG is a stimulus feature that selectively engages cortical scene processing, and directly tested the role of VLG in driving cortical scene selectivity using tightly controlled VLG stimuli (Study 2). Consistent with our hypothesis, we found that the scene-selective cortical regions-but not an object-selective region or early visual cortex-responded significantly more to images of VLG over control stimuli with minimal VLG. Interestingly, such selectivity was also found for images with an "inverted" VLG, resembling the luminance gradient in night scenes. Finally, we also tested the behavioral relevance of VLG for visual scene recognition (Study 3); we found that participants even categorized tightly controlled stimuli of both upright and inverted VLG to be a place more than an object, indicating that VLG is also used for behavioral scene recognition. Taken together, these results reveal that VLG is a stimulus feature that selectively engages cortical scene processing, and provide evidence for a recent proposal that visual scenes can be characterized by a set of common and unique visual features.
Collapse
Affiliation(s)
- Annie Cheng
- Department of Psychology, Emory University, Atlanta, GA, USA; Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Zirui Chen
- Department of Psychology, Emory University, Atlanta, GA, USA; Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel D Dilks
- Department of Psychology, Emory University, Atlanta, GA, USA.
| |
Collapse
|
15
|
Anderson MD, Elder JH, Graf EW, Adams WJ. The time-course of real-world scene perception: Spatial and semantic processing. iScience 2022; 25:105633. [PMID: 36505927 PMCID: PMC9732406 DOI: 10.1016/j.isci.2022.105633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/16/2022] [Accepted: 11/16/2022] [Indexed: 11/21/2022] Open
Abstract
Real-world scene perception unfolds remarkably quickly, yet the underlying visual processes are poorly understood. Space-centered theory maintains that a scene's spatial structure (e.g., openness, mean depth) can be rapidly recovered from low-level image statistics. In turn, the statistical relationship between a scene's spatial properties and semantic content allows for semantic identity to be inferred from its layout. We tested this theory by investigating (1) the temporal dynamics of spatial and semantic perception in real-world scenes, and (2) dependencies between spatial and semantic judgments. Participants viewed backward-masked images for 13.3 to 106.7 ms, and identified the semantic (e.g., beach, road) or spatial structure (e.g., open, closed-off) category. We found no temporal precedence of spatial discrimination relative to semantic discrimination. Computational analyses further suggest that, instead of using spatial layout to infer semantic categories, humans exploit semantic information to discriminate spatial structure categories. These findings challenge traditional 'bottom-up' views of scene perception.
Collapse
Affiliation(s)
- Matt D. Anderson
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| | - James H. Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Canada
| | - Erich W. Graf
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| | - Wendy J. Adams
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
16
|
Nuthmann A, Thibaut M, Tran THC, Boucart M. Impact of neovascular age-related macular degeneration on eye-movement control during scene viewing: Viewing biases and guidance by visual salience. Vision Res 2022; 201:108105. [PMID: 36081228 DOI: 10.1016/j.visres.2022.108105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 06/06/2022] [Accepted: 07/19/2022] [Indexed: 01/25/2023]
Abstract
Human vision requires us to analyze the visual periphery to decide where to fixate next. In the present study, we investigated this process in people with age-related macular degeneration (AMD). In particular, we examined viewing biases and the extent to which visual salience guides fixation selection during free-viewing of naturalistic scenes. We used an approach combining generalized linear mixed modeling (GLMM) with a-priori scene parcellation. This method allows one to investigate group differences in terms of scene coverage and observers' well-known tendency to look at the center of scene images. Moreover, it allows for testing whether image salience influences fixation probability above and beyond what can be accounted for by the central bias. Compared with age-matched normally sighted control subjects (and young subjects), AMD patients' viewing behavior was less exploratory, with a stronger central fixation bias. All three subject groups showed a salience effect on fixation selection-higher-salience scene patches were more likely to be fixated. Importantly, the salience effect for the AMD group was of similar size as the salience effect for the control group, suggesting that guidance by visual salience was still intact. The variances for by-subject random effects in the GLMM indicated substantial individual differences. A separate model exclusively considered the AMD data and included fixation stability as a covariate, with the results suggesting that reduced fixation stability was associated with a reduced impact of visual salience on fixation selection.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, University of Kiel, Kiel, Germany.
| | - Miguel Thibaut
- University of Lille, Lille Neuroscience & Cognition, INSERM, Lille, France
| | - Thi Ha Chau Tran
- University of Lille, Lille Neuroscience & Cognition, INSERM, Lille, France; Ophthalmology Department, Lille Catholic Hospital, Catholic University of Lille, Lille, France
| | - Muriel Boucart
- University of Lille, Lille Neuroscience & Cognition, INSERM, Lille, France.
| |
Collapse
|
17
|
Hayes TR, Henderson JM. Scene inversion reveals distinct patterns of attention to semantically interpreted and uninterpreted features. Cognition 2022; 229:105231. [DOI: 10.1016/j.cognition.2022.105231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 11/03/2022]
|
18
|
Sabra Z, Alawieh A, Bonilha L, Naselaris T, AuYong N. Modulation of Spectral Representation and Connectivity Patterns in Response to Visual Narrative in the Human Brain. Front Hum Neurosci 2022; 16:886938. [PMID: 36277048 PMCID: PMC9582122 DOI: 10.3389/fnhum.2022.886938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/22/2022] [Indexed: 11/24/2022] Open
Abstract
The regional brain networks and the underlying neurophysiological mechanisms subserving the cognition of visual narrative in humans have largely been studied with non-invasive brain recording. In this study, we specifically investigated how regional and cross-regional cortical activities support visual narrative interpretation using intracranial stereotactic electroencephalograms recordings from thirteen human subjects (6 females, and 7 males). Widely distributed recording sites across the brain were sampled while subjects were explicitly instructed to observe images from fables presented in “sequential” order, and a set of images drawn from multiple fables presented in “scrambled” order. Broadband activity mainly within the frontal and temporal lobes were found to encode if a presented image is part of a visual narrative (sequential) or random image set (scrambled). Moreover, the temporal lobe exhibits strong activation in response to visual narratives while the frontal lobe is more engaged when contextually novel stimuli are presented. We also investigated the dynamics of interregional interactions between visual narratives and contextually novel series of images. Interestingly, the interregional connectivity is also altered between sequential and scrambled sequences. Together, these results suggest that both changes in regional neuronal activity and cross-regional interactions subserve visual narrative and contextual novelty processing.
Collapse
Affiliation(s)
- Zahraa Sabra
- Department of Neurosurgery, Emory University, Atlanta, GA, United States
| | - Ali Alawieh
- Department of Neurosurgery, Emory University, Atlanta, GA, United States
| | - Leonardo Bonilha
- Department of Neurology, Medical University of South Carolina, Charleston, SC, United States
| | - Thomas Naselaris
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, United States
| | - Nicholas AuYong
- Department of Neurosurgery, Emory University, Atlanta, GA, United States
- *Correspondence: Nicholas AuYong,
| |
Collapse
|
19
|
Broers N, Bainbridge WA, Michel R, Balestrieri E, Busch NA. The extent and specificity of visual exploration determines the formation of recollected memories in complex scenes. J Vis 2022; 22:9. [PMID: 36227616 PMCID: PMC9583750 DOI: 10.1167/jov.22.11.9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Our visual memories of complex scenes often appear as robust, detailed records of the past. Several studies have demonstrated that active exploration with eye movements improves recognition memory for scenes, but it is unclear whether this improvement is due to stronger feelings of familiarity or more detailed recollection. We related the extent and specificity of fixation patterns at encoding and retrieval to different recognition decisions in an incidental memory paradigm. After incidental encoding of 240 real-world scene photographs, participants (N = 44) answered a surprise memory test by reporting whether an image was new, remembered (indicating recollection), or just known to be old (indicating familiarity). To assess the specificity of their visual memories, we devised a novel report procedure in which participants selected the scene region that they specifically recollected, that appeared most familiar, or that was particularly new to them. At encoding, when considering the entire scene,subsequently recollected compared to familiar or forgotten scenes showed a larger number of fixations that were more broadly distributed, suggesting that more extensive visual exploration determines stronger and more detailed memories. However, when considering only the memory-relevant image areas, fixations were more dense and more clustered for subsequently recollected compared to subsequently familiar scenes. At retrieval, the extent of visual exploration was more restricted for recollected compared to new or forgotten scenes, with a smaller number of fixations. Importantly, fixation density and clustering was greater in memory-relevant areas for recollected versus familiar or falsely recognized images. Our findings suggest that more extensive visual exploration across the entire scene, with a subset of more focal and dense fixations in specific image areas, leads to increased potential for recollecting specific image aspects.
Collapse
Affiliation(s)
- Nico Broers
- Institute of Psychology, University of Münster, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany.,
| | | | - René Michel
- Institute of Psychology, University of Münster, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany.,
| | - Elio Balestrieri
- Institute of Psychology, University of Münster, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany.,
| | - Niko A Busch
- Institute of Psychology, University of Münster, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany.,
| |
Collapse
|
20
|
Helbing J, Draschkow D, L-H Võ M. Auxiliary Scene-Context Information Provided by Anchor Objects Guides Attention and Locomotion in Natural Search Behavior. Psychol Sci 2022; 33:1463-1476. [PMID: 35942922 DOI: 10.1177/09567976221091838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Successful adaptive behavior requires efficient attentional and locomotive systems. Previous research has thoroughly investigated how we achieve this efficiency during natural behavior by exploiting prior knowledge related to targets of our actions (e.g., attending to metallic targets when looking for a pot) and to the environmental context (e.g., looking for the pot in the kitchen). Less is known about whether and how individual nontarget components of the environment support natural behavior. In our immersive virtual reality task, 24 adult participants searched for objects in naturalistic scenes in which we manipulated the presence and arrangement of large, static objects that anchor predictions about targets (e.g., the sink provides a prediction for the location of the soap). Our results show that gaze and body movements in this naturalistic setting are strongly guided by these anchors. These findings demonstrate that objects auxiliary to the target are incorporated into the representations guiding attention and locomotion.
Collapse
Affiliation(s)
- Jason Helbing
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| | - Dejan Draschkow
- Brain and Cognition Laboratory, Department of Experimental Psychology, University of Oxford.,Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford
| | - Melissa L-H Võ
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| |
Collapse
|
21
|
Cappello EM, Lettieri G, Malizia AP, d'Arcangelo S, Handjaras G, Lattanzi N, Ricciardi E, Cecchetti L. The Contribution of Shape Features and Demographic Variables to Disembedding Abilities. Front Psychol 2022; 13:798871. [PMID: 35422741 PMCID: PMC9004388 DOI: 10.3389/fpsyg.2022.798871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 02/24/2022] [Indexed: 11/17/2022] Open
Abstract
Humans naturally perceive visual patterns in a global manner and are remarkably capable of extracting object shapes based on properties such as proximity, closure, symmetry, and good continuation. Notwithstanding the role of these properties in perceptual grouping, studies highlighted differences in disembedding performance across individuals, which are summarized by the field dependence dimension. Evidence suggests that age and educational attainment explain part of this variability, whereas the role of sex is still highly debated. Also, which stimulus features primarily influence inter-individual variations in perceptual grouping has still to be fully determined. Building upon these premises, we assessed the role of age, education level, and sex on performance at the Leuven Embedded Figure Test—a proxy of disembedding abilities—in 391 cisgender individuals. We also investigated to what extent shape symmetry, closure, complexity, and continuation relate to task accuracy. Overall, target asymmetry, closure, and good continuation with the embedding context increase task difficulty. Simpler shapes are more difficult to detect than those with more lines, yet context complexity impairs the recognition of complex targets (i.e., those with 6 lines or more) to a greater extent. Concerning demographic data, we confirm that age and educational attainment are significantly associated with disembedding abilities and reveal a perceptual advantage in males. In summary, our study further highlights the role of shape properties in disembedding performance and unveils sex differences not reported so far.
Collapse
Affiliation(s)
- Elisa Morgana Cappello
- Social and Affective Neuroscience (SANe) group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Giada Lettieri
- Social and Affective Neuroscience (SANe) group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Sonia d'Arcangelo
- Intesa Sanpaolo Innovation Center SpA, Neuroscience Lab, Torino, Italy
| | - Giacomo Handjaras
- Social and Affective Neuroscience (SANe) group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Nicola Lattanzi
- Laboratory for the Analysis of CompleX Economic Systems, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Emiliano Ricciardi
- Molecular Mind Laboratory, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Luca Cecchetti
- Social and Affective Neuroscience (SANe) group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| |
Collapse
|
22
|
Hayes TR, Henderson JM. Meaning maps detect the removal of local semantic scene content but deep saliency models do not. Atten Percept Psychophys 2022; 84:647-654. [PMID: 35138579 PMCID: PMC11128357 DOI: 10.3758/s13414-021-02395-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2021] [Indexed: 11/08/2022]
Abstract
Meaning mapping uses human raters to estimate different semantic features in scenes, and has been a useful tool in demonstrating the important role semantics play in guiding attention. However, recent work has argued that meaning maps do not capture semantic content, but like deep learning models of scene attention, represent only semantically-neutral image features. In the present study, we directly tested this hypothesis using a diffeomorphic image transformation that is designed to remove the meaning of an image region while preserving its image features. Specifically, we tested whether meaning maps and three state-of-the-art deep learning models were sensitive to the loss of semantic content in this critical diffeomorphed scene region. The results were clear: meaning maps generated by human raters showed a large decrease in the diffeomorphed scene regions, while all three deep saliency models showed a moderate increase in the diffeomorphed scene regions. These results demonstrate that meaning maps reflect local semantic content in scenes while deep saliency models do something else. We conclude the meaning mapping approach is an effective tool for estimating semantic content in scenes.
Collapse
Affiliation(s)
- Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, CA, USA.
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, CA, USA
- Department of Psychology, University of California, Davis, CA, USA
| |
Collapse
|
23
|
Anami BS, Sagarnal CV. Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1134/s1054661821040039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
Konkle T, Alvarez GA. A self-supervised domain-general learning framework for human ventral stream representation. Nat Commun 2022; 13:491. [PMID: 35078981 PMCID: PMC8789817 DOI: 10.1038/s41467-022-28091-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 12/13/2021] [Indexed: 12/25/2022] Open
Abstract
Anterior regions of the ventral visual stream encode substantial information about object categories. Are top-down category-level forces critical for arriving at this representation, or can this representation be formed purely through domain-general learning of natural image structure? Here we present a fully self-supervised model which learns to represent individual images, rather than categories, such that views of the same image are embedded nearby in a low-dimensional feature space, distinctly from other recently encountered views. We find that category information implicitly emerges in the local similarity structure of this feature space. Further, these models learn hierarchical features which capture the structure of brain responses across the human ventral visual stream, on par with category-supervised models. These results provide computational support for a domain-general framework guiding the formation of visual representation, where the proximate goal is not explicitly about category information, but is instead to learn unique, compressed descriptions of the visual world.
Collapse
Affiliation(s)
- Talia Konkle
- Department of Psychology & Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - George A Alvarez
- Department of Psychology & Center for Brain Science, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
25
|
Harel A, Nador JD, Bonner MF, Epstein RA. Early Electrophysiological Markers of Navigational Affordances in Scenes. J Cogn Neurosci 2021; 34:397-410. [PMID: 35015877 DOI: 10.1162/jocn_a_01810] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Scene perception and spatial navigation are interdependent cognitive functions, and there is increasing evidence that cortical areas that process perceptual scene properties also carry information about the potential for navigation in the environment (navigational affordances). However, the temporal stages by which visual information is transformed into navigationally relevant information are not yet known. We hypothesized that navigational affordances are encoded during perceptual processing and therefore should modulate early visually evoked ERPs, especially the scene-selective P2 component. To test this idea, we recorded ERPs from participants while they passively viewed computer-generated room scenes matched in visual complexity. By simply changing the number of doors (no doors, 1 door, 2 doors, 3 doors), we were able to systematically vary the number of pathways that afford movement in the local environment, while keeping the overall size and shape of the environment constant. We found that rooms with no doors evoked a higher P2 response than rooms with three doors, consistent with prior research reporting higher P2 amplitude to closed relative to open scenes. Moreover, we found P2 amplitude scaled linearly with the number of doors in the scenes. Navigability effects on the ERP waveform were also observed in a multivariate analysis, which showed significant decoding of the number of doors and their location at earlier time windows. Together, our results suggest that navigational affordances are represented in the early stages of scene perception. This complements research showing that the occipital place area automatically encodes the structure of navigable space and strengthens the link between scene perception and navigation.
Collapse
|
26
|
Groen IIA, Dekker TM, Knapen T, Silson EH. Visuospatial coding as ubiquitous scaffolding for human cognition. Trends Cogn Sci 2021; 26:81-96. [PMID: 34799253 DOI: 10.1016/j.tics.2021.10.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 10/19/2021] [Accepted: 10/19/2021] [Indexed: 01/28/2023]
Abstract
For more than 100 years we have known that the visual field is mapped onto the surface of visual cortex, imposing an inherently spatial reference frame on visual information processing. Recent studies highlight visuospatial coding not only throughout visual cortex, but also brain areas not typically considered visual. Such widespread access to visuospatial coding raises important questions about its role in wider cognitive functioning. Here, we synthesise these recent developments and propose that visuospatial coding scaffolds human cognition by providing a reference frame through which neural computations interface with environmental statistics and task demands via perception-action loops.
Collapse
Affiliation(s)
- Iris I A Groen
- Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Tessa M Dekker
- Institute of Ophthalmology, University College London, London, UK
| | - Tomas Knapen
- Behavioral and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Spinoza Centre for NeuroImaging, Royal Dutch Academy of Sciences, Amsterdam, The Netherlands
| | - Edward H Silson
- Department of Psychology, School of Philosophy, Psychology & Language Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
27
|
Wurm MF, Caramazza A. Two 'what' pathways for action and object recognition. Trends Cogn Sci 2021; 26:103-116. [PMID: 34702661 DOI: 10.1016/j.tics.2021.10.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 09/03/2021] [Accepted: 10/01/2021] [Indexed: 10/20/2022]
Abstract
The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition critically depends on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are typically found in ventral occipitotemporal cortex. We argue that occipitotemporal cortex contains similarly organized lateral and ventral 'what' pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and ventral cortex.
Collapse
Affiliation(s)
- Moritz F Wurm
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Corso Bettini 31, 38068 Rovereto, Italy.
| | - Alfonso Caramazza
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Corso Bettini 31, 38068 Rovereto, Italy; Department of Psychology, Harvard University, 33 Kirkland St, Cambridge, MA 02138, USA
| |
Collapse
|
28
|
Miranda MD. The trace in the technique: Forensic science and the Connoisseur's gaze. Forensic Sci Int Synerg 2021; 3:100203. [PMID: 34632356 PMCID: PMC8493590 DOI: 10.1016/j.fsisyn.2021.100203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/15/2021] [Accepted: 09/16/2021] [Indexed: 11/29/2022]
Abstract
Both scientific art investigations and forensic investigations rely on observation, inferential reasoning, and analytical techniques to answer questions concerning identification, source, and activity. The forensic scientist and the art connoisseur evaluate the whole—a crime scene or work of art, respectively—and draw meaning from the often-overlooked details, or traces, contained therein. This manuscript considers the correlations between art connoisseurship and forensic science, first by outlining the history of connoisseurship, focusing on the detection and evaluation of traces through patient observation, reasoning, and comparison based on methods established by Giovanni Morelli in the nineteenth century. This article then explores connoisseurship and forensic science within the historical sciences framework, based on the process in which observable traces can be ordered to provide a reconstruction of unobservable past events. Finally, this article asserts that art can be used to shape and refine the scientist's practiced eye, thereby improving trace detection and interpretation in investigations.
Collapse
Affiliation(s)
- Michelle D Miranda
- Farmingdale State College, The State University of New York, United States
| |
Collapse
|
29
|
Çelik E, Keles U, Kiremitçi İ, Gallant JL, Çukur T. Cortical networks of dynamic scene category representation in the human brain. Cortex 2021; 143:127-147. [PMID: 34411847 DOI: 10.1016/j.cortex.2021.07.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 06/28/2021] [Accepted: 07/14/2021] [Indexed: 10/20/2022]
Abstract
Humans have an impressive ability to rapidly process global information in natural scenes to infer their category. Yet, it remains unclear whether and how scene categories observed dynamically in the natural world are represented in cerebral cortex beyond few canonical scene-selective areas. To address this question, here we examined the representation of dynamic visual scenes by recording whole-brain blood oxygenation level-dependent (BOLD) responses while subjects viewed natural movies. We fit voxelwise encoding models to estimate tuning for scene categories that reflect statistical ensembles of objects and actions in the natural world. We find that this scene-category model explains a significant portion of the response variance broadly across cerebral cortex. Cluster analysis of scene-category tuning profiles across cortex reveals nine spatially-segregated networks of brain regions consistently across subjects. These networks show heterogeneous tuning for a diverse set of dynamic scene categories related to navigation, human activity, social interaction, civilization, natural environment, non-human animals, motion-energy, and texture, suggesting that the organization of scene category representation is quite complex.
Collapse
Affiliation(s)
- Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey.
| | - Umit Keles
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey; Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - İbrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA; Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA; Department of Psychology, University of California, Berkeley, CA, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey; Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey
| |
Collapse
|
30
|
Ruiz-Rizzo AL, Pruitt PJ, Finke K, Müller HJ, Damoiseaux JS. Lower-Resolution Retrieval of Scenes in Older Adults With Subjective Cognitive Decline. Arch Clin Neuropsychol 2021; 37:408-422. [PMID: 34342647 PMCID: PMC8865194 DOI: 10.1093/arclin/acab061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/09/2021] [Accepted: 07/05/2021] [Indexed: 11/21/2022] Open
Abstract
Objective Scenes with more perceptual detail can help detect subtle memory deficits more than scenes with less detail. Here, we investigated whether older adults with subjective cognitive decline (SCD) show less brain activation and more memory deficits to scenes with more (vs. scenes with less) perceptual detail compared to controls (CON). Method In 37 healthy older adults (SCD: 16), we measured blood oxygenation level-dependent-functional magnetic resonance imaging during encoding and behavioral performance during retrieval. Results During encoding, higher activation to scenes with more (vs. less) perceptual detail in the parahippocampal place area predicted better memory performance in SCD and CON. During retrieval, superior performance for new scenes with more (vs. less) perceptual detail was significantly more pronounced in CON than inSCD. Conclusions Together, these results suggest a present, but attenuated benefit from perceptual detail for memory retrieval in SCD. Memory complaints in SCD might, thus, refer to a decreased availability of perceptual detail of previously encoded stimuli.
Collapse
Affiliation(s)
- Adriana L Ruiz-Rizzo
- Department of Psychology, General and Experimental Psychology Unit, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Patrick J Pruitt
- Institute of Gerontology, Wayne State University, Detroit, MI, USA
| | - Kathrin Finke
- Department of Psychology, General and Experimental Psychology Unit, Ludwig-Maximilians-Universität München, Munich, Germany.,Hans-Berger Department of Neurology, Jena University Hospital, Jena, Germany
| | - Hermann J Müller
- Department of Psychology, General and Experimental Psychology Unit, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jessica S Damoiseaux
- Institute of Gerontology, Wayne State University, Detroit, MI, USA.,Department of Psychology, Wayne State University, Detroit, MI, USA
| |
Collapse
|
31
|
Tharmaratnam V, Patel M, Lowe MX, Cant JS. Shared cognitive mechanisms involved in the processing of scene texture and scene shape. J Vis 2021; 21:11. [PMID: 34269793 PMCID: PMC8297417 DOI: 10.1167/jov.21.7.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Recent research has demonstrated that the parahippocampal place area represents both the shape and texture features of scenes, with the importance of each feature varying according to perceived scene category. Namely, shape features are predominately more diagnostic to the processing of artificial human–made scenes, while shape and texture are equally diagnostic in natural scene processing. However, to date little is known regarding the degree of interactivity or independence observed in the processing of these scene features. Furthermore, manipulating the scope of visual attention (i.e., globally vs. locally) when processing ensembles of multiple objects—stimuli that share a functional neuroanatomical link with scenes—has been shown to affect their cognitive visual representation. It remains unknown whether manipulating the scope of attention impacts scene processing in a similar manner. Using the well-established Garner speeded-classification behavioral paradigm, we investigated the influence of both feature diagnosticity and the scope of visual attention on potential interactivity or independence in the shape and texture processing of artificial human–made scenes. The results revealed asymmetric interference between scene shape and texture processing, with the more diagnostic feature (i.e., shape) interfering with the less diagnostic feature (i.e., texture), but not vice versa. Furthermore, this interference was attenuated and enhanced with more local and global visual processing strategies, respectively. These findings suggest that the scene shape and texture processing are mediated by shared cognitive mechanisms and that, although these representations are governed primarily via feature diagnosticity, they can nevertheless be influenced by the scope of visual attention.
Collapse
Affiliation(s)
| | | | - Matthew X Lowe
- Graduate Program in Psychology, University of Toronto, Toronto, ON, Canada.,
| | - Jonathan S Cant
- Graduate Program in Psychology, University of Toronto, Toronto, ON, Canada.,Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada.,
| |
Collapse
|
32
|
Abstract
Every aspect of vision, from the opsin proteins to the eyes and the ways that they serve animal behavior, is incredibly diverse. It is only with an evolutionary perspective that this diversity can be understood and fully appreciated. In this review, I describe and explain the diversity at each level and try to convey an understanding of how the origin of the first opsin some 800 million years ago could initiate the avalanche that produced the astonishing diversity of eyes and vision that we see today. Despite the diversity, many types of photoreceptors, eyes, and visual roles have evolved multiple times independently in different animals, revealing a pattern of eye evolution strictly guided by functional constraints and driven by the evolution of gradually more demanding behaviors. I conclude the review by introducing a novel distinction between active and passive vision that points to uncharted territories in vision research. Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Dan-E Nilsson
- Lund Vision Group, Department of Biology, Lund University, 22362 Lund, Sweden;
| |
Collapse
|
33
|
Abstract
Categorization performance is a popular metric of scene recognition and understanding in behavioral and computational research. However, categorical constructs and their labels can be somewhat arbitrary. Derived from exhaustive vocabularies of place names (e.g., Deng et al., 2009), or the judgements of small groups of researchers (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007), these categories may not correspond with human-preferred taxonomies. Here, we propose clustering by increasing the rand index via coordinate ascent (CIRCA): an unsupervised, data-driven clustering method for deriving ground-truth scene categories. In Experiment 1, human participants organized 80 stereoscopic images of outdoor scenes from the Southampton-York Natural Scenes (SYNS) dataset (Adams et al., 2016) into discrete categories. In separate tasks, images were grouped according to i) semantic content, ii) three-dimensional spatial structure, or iii) two-dimensional image appearance. Participants provided text labels for each group. Using the CIRCA method, we determined the most representative category structure and then derived category labels for each task/dimension. In Experiment 2, we found that these categories generalized well to a larger set of SYNS images, and new observers. In Experiment 3, we tested the relationship between our category systems and the spatial envelope model (Oliva & Torralba, 2001). Finally, in Experiment 4, we validated CIRCA on a larger, independent dataset of same-different category judgements. The derived category systems outperformed the SUN taxonomy (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010) and an alternative clustering method (Greene, 2019). In summary, we believe this novel categorization method can be applied to a wide range of datasets to derive optimal categorical groupings and labels from psychophysical judgements of stimulus similarity.
Collapse
Affiliation(s)
- Matt D Anderson
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - Erich W Graf
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - James H Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, Canada.,
| | - Krista A Ehinger
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.,
| | - Wendy J Adams
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| |
Collapse
|
34
|
Abstract
The eye movement analysis with hidden Markov models (EMHMM) method provides quantitative measures of individual differences in eye-movement pattern. However, it is limited to tasks where stimuli have the same feature layout (e.g., faces). Here we proposed to combine EMHMM with the data mining technique co-clustering to discover participant groups with consistent eye-movement patterns across stimuli for tasks involving stimuli with different feature layouts. Through applying this method to eye movements in scene perception, we discovered explorative (switching between the foreground and background information or different regions of interest) and focused (mainly looking at the foreground with less switching) eye-movement patterns among Asian participants. Higher similarity to the explorative pattern predicted better foreground object recognition performance, whereas higher similarity to the focused pattern was associated with better feature integration in the flanker task. These results have important implications for using eye tracking as a window into individual differences in cognitive abilities and styles. Thus, EMHMM with co-clustering provides quantitative assessments on eye-movement patterns across stimuli and tasks. It can be applied to many other real-life visual tasks, making a significant impact on the use of eye tracking to study cognitive behavior across disciplines.
Collapse
|
35
|
Kristjánsson Á, Draschkow D. Keeping it real: Looking beyond capacity limits in visual cognition. Atten Percept Psychophys 2021; 83:1375-1390. [PMID: 33791942 PMCID: PMC8084831 DOI: 10.3758/s13414-021-02256-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2020] [Indexed: 11/23/2022]
Abstract
Research within visual cognition has made tremendous strides in uncovering the basic operating characteristics of the visual system by reducing the complexity of natural vision to artificial but well-controlled experimental tasks and stimuli. This reductionist approach has for example been used to assess the basic limitations of visual attention, visual working memory (VWM) capacity, and the fidelity of visual long-term memory (VLTM). The assessment of these limits is usually made in a pure sense, irrespective of goals, actions, and priors. While it is important to map out the bottlenecks our visual system faces, we focus here on selected examples of how such limitations can be overcome. Recent findings suggest that during more natural tasks, capacity may be higher than reductionist research suggests and that separable systems subserve different actions, such as reaching and looking, which might provide important insights about how pure attentional or memory limitations could be circumvented. We also review evidence suggesting that the closer we get to naturalistic behavior, the more we encounter implicit learning mechanisms that operate "for free" and "on the fly." These mechanisms provide a surprisingly rich visual experience, which can support capacity-limited systems. We speculate whether natural tasks may yield different estimates of the limitations of VWM, VLTM, and attention, and propose that capacity measurements should also pass the real-world test within naturalistic frameworks. Our review highlights various approaches for this and suggests that our understanding of visual cognition will benefit from incorporating the complexities of real-world cognition in experimental approaches.
Collapse
Affiliation(s)
- Árni Kristjánsson
- School of Health Sciences, University of Iceland, Reykjavík, Iceland.
- School of Psychology, National Research University Higher School of Economics, Moscow, Russia.
| | - Dejan Draschkow
- Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford, UK.
| |
Collapse
|
36
|
Nuthmann A, Clayden AC, Fisher RB. The effect of target salience and size in visual search within naturalistic scenes under degraded vision. J Vis 2021; 21:2. [PMID: 33792616 PMCID: PMC8024777 DOI: 10.1167/jov.21.4.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We address two questions concerning eye guidance during visual search in naturalistic scenes. First, search has been described as a task in which visual salience is unimportant. Here, we revisit this question by using a letter-in-scene search task that minimizes any confounding effects that may arise from scene guidance. Second, we investigate how important the different regions of the visual field are for different subprocesses of search (target localization, verification). In Experiment 1, we manipulated both the salience (low vs. high) and the size (small vs. large) of the target letter (a "T"), and we implemented a foveal scotoma (radius: 1°) in half of the trials. In Experiment 2, observers searched for high- and low-salience targets either with full vision or with a central or peripheral scotoma (radius: 2.5°). In both experiments, we found main effects of salience with better performance for high-salience targets. In Experiment 1, search was faster for large than for small targets, and high-salience helped more for small targets. When searching with a foveal scotoma, performance was relatively unimpaired regardless of the target's salience and size. In Experiment 2, both visual-field manipulations led to search time costs, but the peripheral scotoma was much more detrimental than the central scotoma. Peripheral vision proved to be important for target localization, and central vision for target verification. Salience affected eye movement guidance to the target in both central and peripheral vision. Collectively, the results lend support for search models that incorporate salience for predicting eye-movement behavior.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, University of Kiel, Germany.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK., http://orcid.org/0000-0003-3338-3434
| | - Adam C Clayden
- School of Engineering, Arts, Science and Technology, University of Suffolk, UK.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK.,
| | | |
Collapse
|
37
|
Leroy A, Spotorno S, Faure S. Emotional scene processing in children and adolescents with attention deficit/hyperactivity disorder: a systematic review. Eur Child Adolesc Psychiatry 2021; 30:331-346. [PMID: 32034554 DOI: 10.1007/s00787-020-01480-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 01/23/2020] [Indexed: 10/25/2022]
Abstract
"Impairments in emotional information processing are frequently reported in attention deficit hyperactivity disorder (ADHD) at a voluntary, explicit level (e.g., emotion recognition) and at an involuntary, implicit level (e.g., emotional interference). Most of previous studies have used faces with emotional expressions, rarely examining other important sources of information usually co-occurring with faces in our every day experience. Here, we examined how the emotional content of an entire visual scene depicting real-world environments and situations is processed in ADHD. We systematically reviewed in PubMed, SCOPUS and ScienceDirect, using the PRISMA guidelines, empirical studies published in English until March 2019, about processing of visual scenes, with or without emotional content, in children and adolescents with ADHD. We included 17 studies among the 154 initially identified. Fifteen used scenes with emotional content (which was task-relevant in seven and irrelevant in eight studies) and two used scenes without emotional content. Even though the interpretation of the results differed according to the theoretical model of emotions of the study and the presence of comorbidity, differences in scene information processing between ADHD and typically developing children and adolescents were reported in all but one study. ADHD children and adolescents show difficulties in the processing of emotional information conveyed by visual scenes, which may stem from a stronger bottom-up impact of emotional stimuli in ADHD, increasing the emotional experience, and from core deficits of the disorder, decreasing the overall processing of the scene".
Collapse
Affiliation(s)
- Anaïs Leroy
- Laboratoire D'Anthropologie Et de Psychologie Cliniques, Cognitives Et Sociales (LAPCOS), MSHS Sud Est, Université Côte D'Azur, Pôle Universitaire Saint Jean D'Angely, 24 avenue des Diables Bleus, 06357, Nice Cédex 4, France. .,CERTA, Reference Centre for Learning Disabilities, Fondation Lenval, Hôpitaux Pédiatriques de Nice CHU-Lenval, Nice, France.
| | | | - Sylvane Faure
- Laboratoire D'Anthropologie Et de Psychologie Cliniques, Cognitives Et Sociales (LAPCOS), MSHS Sud Est, Université Côte D'Azur, Pôle Universitaire Saint Jean D'Angely, 24 avenue des Diables Bleus, 06357, Nice Cédex 4, France
| |
Collapse
|
38
|
Leroy A, Spotorno S, Faure S. Traitements sémantiques et émotionnels des scènes visuelles complexes : une synthèse critique de l’état actuel des connaissances. ANNEE PSYCHOLOGIQUE 2021. [DOI: 10.3917/anpsy1.211.0101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
39
|
Global and local interference effects in ensemble encoding are best explained by interactions between summary representations of the mean and the range. Atten Percept Psychophys 2021; 83:1106-1128. [PMID: 33506350 PMCID: PMC8049940 DOI: 10.3758/s13414-020-02224-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2020] [Indexed: 11/16/2022]
Abstract
Through ensemble encoding, the visual system compresses redundant statistical properties from multiple items into a single summary metric (e.g., average size). Numerous studies have shown that global summary information is extracted quickly, does not require access to single-item representations, and often interferes with reports of single items from the set. Yet a thorough understanding of ensemble processing would benefit from a more extensive investigation at the local level. Thus, the purpose of this study was to provide a more critical inspection of global-local processing in ensemble perception. Taking inspiration from Navon (Cognitive Psychology, 9(3), 353-383, 1977), we employed a novel paradigm that independently manipulates the degree of interference at the global (mean) or local (single item) level of the ensemble. Initial results were consistent with reciprocal interference between global and local ensemble processing. However, further testing revealed that local interference effects were better explained by interference from another summary statistic, the range of the set. Furthermore, participants were unable to disambiguate single items from the ensemble display from other items that were within the ensemble range but, critically, were not actually present in the ensemble. Thus, it appears that local item values are likely inferred based on their relationship to higher-order summary statistics such as the range and the mean. These results conflict with claims that local information is captured alongside global information in summary representations. In such studies, successful identification of set members was not compared with misidentification of items within the range, but which were nevertheless not presented within the set.
Collapse
|
40
|
Võ MLH. The meaning and structure of scenes. Vision Res 2021; 181:10-20. [PMID: 33429218 DOI: 10.1016/j.visres.2020.11.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 10/31/2020] [Accepted: 11/03/2020] [Indexed: 01/09/2023]
Abstract
We live in a rich, three dimensional world with complex arrangements of meaningful objects. For decades, however, theories of visual attention and perception have been based on findings generated from lines and color patches. While these theories have been indispensable for our field, the time has come to move on from this rather impoverished view of the world and (at least try to) get closer to the real thing. After all, our visual environment consists of objects that we not only look at, but constantly interact with. Having incorporated the meaning and structure of scenes, i.e. its "grammar", then allows us to easily understand objects and scenes we have never encountered before. Studying this grammar provides us with the fascinating opportunity to gain new insights into the complex workings of attention, perception, and cognition. In this review, I will discuss how the meaning and the complex, yet predictive structure of real-world scenes influence attention allocation, search, and object identification.
Collapse
Affiliation(s)
- Melissa Le-Hoa Võ
- Department of Psychology, Johann Wolfgang-Goethe-Universität, Frankfurt, Germany. https://www.scenegrammarlab.com/
| |
Collapse
|
41
|
David E, Beitner J, Võ MLH. Effects of Transient Loss of Vision on Head and Eye Movements during Visual Search in a Virtual Environment. Brain Sci 2020; 10:E841. [PMID: 33198116 PMCID: PMC7696943 DOI: 10.3390/brainsci10110841] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/09/2020] [Accepted: 11/10/2020] [Indexed: 11/19/2022] Open
Abstract
Central and peripheral fields of view extract information of different quality and serve different roles during visual tasks. Past research has studied this dichotomy on-screen in conditions remote from natural situations where the scene would be omnidirectional and the entire field of view could be of use. In this study, we had participants looking for objects in simulated everyday rooms in virtual reality. By implementing a gaze-contingent protocol we masked central or peripheral vision (masks of 6 deg. of radius) during trials. We analyzed the impact of vision loss on visuo-motor variables related to fixation (duration) and saccades (amplitude and relative directions). An important novelty is that we segregated eye, head and the general gaze movements in our analyses. Additionally, we studied these measures after separating trials into two search phases (scanning and verification). Our results generally replicate past on-screen literature and teach about the role of eye and head movements. We showed that the scanning phase is dominated by short fixations and long saccades to explore, and the verification phase by long fixations and short saccades to analyze. One finding indicates that eye movements are strongly driven by visual stimulation, while head movements serve a higher behavioral goal of exploring omnidirectional scenes. Moreover, losing central vision has a smaller impact than reported on-screen, hinting at the importance of peripheral scene processing for visual search with an extended field of view. Our findings provide more information concerning how knowledge gathered on-screen may transfer to more natural conditions, and attest to the experimental usefulness of eye tracking in virtual reality.
Collapse
Affiliation(s)
- Erwan David
- Scene Grammar Lab, Department of Psychology, Theodor-W.-Adorno-Platz 6, Johann Wolfgang-Goethe-Universität, 60323 Frankfurt, Germany; (J.B.); (M.L.-H.V.)
| | | | | |
Collapse
|
42
|
Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations. Cognition 2020; 206:104465. [PMID: 33096374 DOI: 10.1016/j.cognition.2020.104465] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/04/2020] [Accepted: 09/08/2020] [Indexed: 11/24/2022]
Abstract
Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II - a deep neural network trained to predict fixations based on high-level features rather than meaning - outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.
Collapse
|
43
|
Henderson JM, Goold JE, Choi W, Hayes TR. Neural Correlates of Fixated Low- and High-level Scene Properties during Active Scene Viewing. J Cogn Neurosci 2020; 32:2013-2023. [PMID: 32573384 PMCID: PMC11164273 DOI: 10.1162/jocn_a_01599] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
During real-world scene perception, viewers actively direct their attention through a scene in a controlled sequence of eye fixations. During each fixation, local scene properties are attended, analyzed, and interpreted. What is the relationship between fixated scene properties and neural activity in the visual cortex? Participants inspected photographs of real-world scenes in an MRI scanner while their eye movements were recorded. Fixation-related fMRI was used to measure activation as a function of lower- and higher-level scene properties at fixation, operationalized as edge density and meaning maps, respectively. We found that edge density at fixation was most associated with activation in early visual areas, whereas semantic content at fixation was most associated with activation along the ventral visual stream including core object and scene-selective areas (lateral occipital complex, parahippocampal place area, occipital place area, and retrosplenial cortex). The observed activation from semantic content was not accounted for by differences in edge density. The results are consistent with active vision models in which fixation gates detailed visual analysis for fixated scene regions, and this gating influences both lower and higher levels of scene analysis.
Collapse
Affiliation(s)
| | | | - Wonil Choi
- Gwangju Institute of Science and Technology
| | | |
Collapse
|
44
|
Rigby SN, Jakobson LS, Pearson PM, Stoesz BM. Alexithymia and the Evaluation of Emotionally Valenced Scenes. Front Psychol 2020; 11:1820. [PMID: 32793083 PMCID: PMC7394003 DOI: 10.3389/fpsyg.2020.01820] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 07/01/2020] [Indexed: 01/15/2023] Open
Abstract
Alexithymia is a personality trait characterized by difficulties identifying and describing feelings (DIF and DDF) and an externally oriented thinking (EOT) style. The primary aim of the present study was to investigate links between alexithymia and the evaluation of emotional scenes. We also investigated whether viewers' evaluations of emotional scenes were better predicted by specific alexithymic traits or by individual differences in sensory processing sensitivity (SPS). Participants (N = 106) completed measures of alexithymia and SPS along with a task requiring speeded judgments of the pleasantness of 120 moderately arousing scenes. We did not replicate laterality effects previously described with the scene perception task. Compared to those with weak alexithymic traits, individuals with moderate-to-strong alexithymic traits were less likely to classify positively valenced scenes as pleasant and were less likely to classify scenes with (vs. without) implied motion (IM) in a way that was consistent with normative scene valence ratings. In addition, regression analyses confirmed that reporting strong EOT and a tendency to be easily overwhelmed by busy sensory environments negatively predicted classification accuracy for positive scenes, and that both DDF and EOT negatively predicted classification accuracy for scenes depicting IM. These findings highlight the importance of accounting for stimulus characteristics and individual differences in specific traits associated with alexithymia and SPS when investigating the processing of emotional stimuli. Learning more about the links between these individual difference variables may have significant clinical implications, given that alexithymia is an important, transdiagnostic risk factor for a wide range of psychopathologies.
Collapse
Affiliation(s)
- Sarah N Rigby
- Department of Psychology, University of Manitoba, Winnipeg, MB, Canada
| | - Lorna S Jakobson
- Department of Psychology, University of Manitoba, Winnipeg, MB, Canada
| | - Pauline M Pearson
- Department of Psychology, University of Manitoba, Winnipeg, MB, Canada.,Department of Psychology, University of Winnipeg, Winnipeg, MB, Canada
| | - Brenda M Stoesz
- Department of Psychology, University of Manitoba, Winnipeg, MB, Canada.,Centre for the Advancement of Teaching and Learning, University of Manitoba, Winnipeg, MB, Canada
| |
Collapse
|
45
|
Oba K, Sugiura M, Hanawa S, Suzuki M, Jeong H, Kotozaki Y, Sasaki Y, Kikuchi T, Nozawa T, Nakagawa S, Kawashima R. Differential roles of amygdala and posterior superior temporal sulcus in social scene understanding. Soc Neurosci 2020; 15:516-529. [PMID: 32692950 DOI: 10.1080/17470919.2020.1793811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Neuropsychology and neuroimaging studies provide distinct views on the key neural underpinnings of social scene understanding (SSU): the amygdala and multimodal neocortical areas such as the posterior superior temporal sulcus (pSTS), respectively. This apparent incongruity may stem from the difference in the assumed cognitive processes of the situation-response association and the integrative or creative processing of social information. To examine the neural correlates of different SSU types using functional magnetic resonance imaging (fMRI), we devised a clothing recommendation task in three types of client's standpoint. Situation-response association was induced by a situation-congruent standpoint (ecological SSU), whereas the integrative and creative processing of social information was elicited by a lack and situation incongruence of the standpoint (perceptual and elaborative SSUs, respectively). Activation characteristic of the ecological SSU was identified in the right amygdala, while that of the perceptual SSU and elaborative SSU demand was identified in the right pSTS and left middle temporal gyrus (MTG), respectively. Thus, the current results provide evidence for the conceptual and neural distinction of the three types of SSU, with basic ecological SSU being supported by a limbic structure while sophisticated integrative or creative SSUs being developed in humans by multimodal association cortices.
Collapse
Affiliation(s)
- Kentaro Oba
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Motoaki Sugiura
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan.,International Research Institute of Disaster Science, Tohoku University , Sendai, Japan.,Smart-Ageing Research Center, Tohoku University , Sendai, Japan
| | - Sugiko Hanawa
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Mizue Suzuki
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Hyeonjeong Jeong
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan.,Graduate School of International Cultural Studies, Tohoku University , Sendai, Japan
| | - Yuka Kotozaki
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Yukako Sasaki
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Tatsuo Kikuchi
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Takayuki Nozawa
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan
| | - Seishu Nakagawa
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan.,Division of Psychiatry, Tohoku Medical and Pharmaceutical University , Sendai, Japan
| | - Ryuta Kawashima
- Institute of Development, Aging and Cancer, Tohoku University , Sendai, Japan.,Smart-Ageing Research Center, Tohoku University , Sendai, Japan
| |
Collapse
|
46
|
Seijdel N, Jahfari S, Groen IIA, Scholte HS. Low-level image statistics in natural scenes influence perceptual decision-making. Sci Rep 2020; 10:10573. [PMID: 32601499 PMCID: PMC7324621 DOI: 10.1038/s41598-020-67661-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 06/08/2020] [Indexed: 11/10/2022] Open
Abstract
A fundamental component of interacting with our environment is gathering and interpretation of sensory information. When investigating how perceptual information influences decision-making, most researchers have relied on manipulated or unnatural information as perceptual input, resulting in findings that may not generalize to real-world scenes. Unlike simplified, artificial stimuli, real-world scenes contain low-level regularities that are informative about the structural complexity, which the brain could exploit. In this study, participants performed an animal detection task on low, medium or high complexity scenes as determined by two biologically plausible natural scene statistics, contrast energy (CE) or spatial coherence (SC). In experiment 1, stimuli were sampled such that CE and SC both influenced scene complexity. Diffusion modelling showed that the speed of information processing was affected by low-level scene complexity. Experiment 2a/b refined these observations by showing how isolated manipulation of SC resulted in weaker but comparable effects, with an additional change in response boundary, whereas manipulation of only CE had no effect. Overall, performance was best for scenes with intermediate complexity. Our systematic definition quantifies how natural scene complexity interacts with decision-making. We speculate that CE and SC serve as an indication to adjust perceptual decision-making based on the complexity of the input.
Collapse
Affiliation(s)
- Noor Seijdel
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands. .,Amsterdam Brain and Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands.
| | - Sara Jahfari
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.,Spinoza Centre for Neuroimaging, Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, The Netherlands
| | - Iris I A Groen
- Department of Psychology, New York University, New York, USA
| | - H Steven Scholte
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.,Amsterdam Brain and Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
47
|
Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization. J Neurosci 2020; 40:5283-5299. [PMID: 32467356 DOI: 10.1523/jneurosci.2088-19.2020] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 04/18/2020] [Accepted: 04/23/2020] [Indexed: 11/21/2022] Open
Abstract
Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.SIGNIFICANCE STATEMENT In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties, such as colors and contours, to high-level properties, such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.
Collapse
|
48
|
Backhaus D, Engbert R, Rothkegel LOM, Trukenbrod HA. Task-dependence in scene perception: Head unrestrained viewing using mobile eye-tracking. J Vis 2020; 20:3. [PMID: 32392286 PMCID: PMC7409614 DOI: 10.1167/jov.20.5.3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 12/15/2019] [Indexed: 11/24/2022] Open
Abstract
Real-world scene perception is typically studied in the laboratory using static picture viewing with restrained head position. Consequently, the transfer of results obtained in this paradigm to real-word scenarios has been questioned. The advancement of mobile eye-trackers and the progress in image processing, however, permit a more natural experimental setup that, at the same time, maintains the high experimental control from the standard laboratory setting. We investigated eye movements while participants were standing in front of a projector screen and explored images under four specific task instructions. Eye movements were recorded with a mobile eye-tracking device and raw gaze data were transformed from head-centered into image-centered coordinates. We observed differences between tasks in temporal and spatial eye-movement parameters and found that the bias to fixate images near the center differed between tasks. Our results demonstrate that current mobile eye-tracking technology and a highly controlled design support the study of fine-scaled task dependencies in an experimental setting that permits more natural viewing behavior than the static picture viewing paradigm.
Collapse
Affiliation(s)
- Daniel Backhaus
- Experimental and Biological Psychology, University of Potsdam, Potsdam, Germany
| | - Ralf Engbert
- Experimental and Biological Psychology, University of Potsdam, Potsdam, Germany
| | | | - Hans A. Trukenbrod
- Experimental and Biological Psychology, University of Potsdam, Potsdam, Germany
| |
Collapse
|
49
|
Valdés-Sosa M, Ontivero-Ortega M, Iglesias-Fuster J, Lage-Castellanos A, Gong J, Luo C, Castro-Laguardia AM, Bobes MA, Marinazzo D, Yao D. Objects seen as scenes: Neural circuitry for attending whole or parts. Neuroimage 2020; 210:116526. [DOI: 10.1016/j.neuroimage.2020.116526] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 12/10/2019] [Accepted: 01/06/2020] [Indexed: 01/03/2023] Open
|
50
|
Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020; 141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
Recent electrophysiological research highlights the significance of global scene properties (GSPs) for scene perception. However, since real-world scenes span a range of low-level stimulus properties and high-level contextual semantics, GSP effects may also reflect additional processing of such non-global factors. We examined this question by asking whether Event-Related Potentials (ERPs) to GSPs will still be observed when specific low- and high-level scene properties are absent from the scene. We presented participants with computer-based artificially-manipulated scenes varying in two GSPs (spatial expanse and naturalness) which minimized other sources of scene information (color and semantic object detail). We found that the peak amplitude of the P2 component was sensitive to the spatial expanse and naturalness of the artificially-generated scenes: P2 amplitude was higher to closed than open scenes, and in response to manmade than natural scenes. A control experiment showed that the effect of Naturalness on the P2 is not driven by local texture information, while earlier effects of naturalness, expressed as a modulation of the P1 and N1 amplitudes, are sensitive to texture information. Our results demonstrate that GSPs are processed robustly around 220 ms and that P2 can be used as an index of global scene perception.
Collapse
Affiliation(s)
- Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, USA.
| | - Mavuso W Mzozoyana
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Hamada Al Zoubi
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Matthew X Lowe
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada
| |
Collapse
|