1
|
Pandya S, Nicholls VI, Krugliak A, Davis SW, Clarke A. Context and semantic object properties interact to support recognition memory. Q J Exp Psychol (Hove) 2024:17470218241283028. [PMID: 39238183 DOI: 10.1177/17470218241283028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
We have a great capacity to remember a large number of items, yet memory is selective. While multiple factors dictate why we remember some things and not others, it is increasingly acknowledged that some objects are more memorable than others. Recent studies show semantically distinctive objects are better remembered, as are objects located in expected scene contexts. However, we know little about how object semantics and context interact to facilitate memory. Here we test the intriguing hypothesis that these factors have complementary benefits for memory. Participants rated the congruency of object-scene pairs, followed by a surprise memory test. We show that object memory is best predicted by semantic familiarity when an object-scene pairing was congruent, but when object-scene pairings were incongruent, semantic statistics have an especially prominent impact. This demonstrates both the item and its schematic relationship to the environment interact to shape what we will and will not remember.
Collapse
Affiliation(s)
- Shirley Pandya
- Department of Psychology, University of Cambridge, Cambridge, UK
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| | | | | | - Simon W Davis
- Department of Neurology, Duke University, Durham, NC, USA
| | - Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK
- Department of Psychology, University of Warwick, UK
| |
Collapse
|
2
|
Kallmayer A, Võ MLH. Anchor objects drive realism while diagnostic objects drive categorization in GAN generated scenes. COMMUNICATIONS PSYCHOLOGY 2024; 2:68. [PMID: 39242968 PMCID: PMC11332195 DOI: 10.1038/s44271-024-00119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/15/2024] [Indexed: 09/09/2024]
Abstract
Our visual surroundings are highly complex. Despite this, we understand and navigate them effortlessly. This requires transforming incoming sensory information into representations that not only span low- to high-level visual features (e.g., edges, object parts, objects), but likely also reflect co-occurrence statistics of objects in real-world scenes. Here, so-called anchor objects are defined as being highly predictive of the location and identity of frequently co-occuring (usually smaller) objects, derived from object clustering statistics in real-world scenes, while so-called diagnostic objects are predictive of the larger semantic context (i.e., scene category). Across two studies (N1 = 50, N2 = 44), we investigate which of these properties underlie scene understanding across two dimensions - realism and categorisation - using scenes generated from Generative Adversarial Networks (GANs) which naturally vary along these dimensions. We show that anchor objects and mainly high-level features extracted from a range of pre-trained deep neural networks (DNNs) drove realism both at first glance and after initial processing. Categorisation performance was mainly determined by diagnostic objects, regardless of realism, at first glance and after initial processing. Our results are testament to the visual system's ability to pick up on reliable, category specific sources of information that are flexible towards disturbances across the visual feature-hierarchy.
Collapse
Affiliation(s)
- Aylin Kallmayer
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany.
| | - Melissa L-H Võ
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany
| |
Collapse
|
3
|
Baltaretu BR, Schuetz I, Võ MLH, Fiehler K. Scene semantics affects allocentric spatial coding for action in naturalistic (virtual) environments. Sci Rep 2024; 14:15549. [PMID: 38969745 PMCID: PMC11226608 DOI: 10.1038/s41598-024-66428-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 07/01/2024] [Indexed: 07/07/2024] Open
Abstract
Interacting with objects in our environment requires determining their locations, often with respect to surrounding objects (i.e., allocentrically). According to the scene grammar framework, these usually small, local objects are movable within a scene and represent the lowest level of a scene's hierarchy. How do higher hierarchical levels of scene grammar influence allocentric coding for memory-guided actions? Here, we focused on the effect of large, immovable objects (anchors) on the encoding of local object positions. In a virtual reality study, participants (n = 30) viewed one of four possible scenes (two kitchens or two bathrooms), with two anchors connected by a shelf, onto which were presented three local objects (congruent with one anchor) (Encoding). The scene was re-presented (Test) with 1) local objects missing and 2) one of the anchors shifted (Shift) or not (No shift). Participants, then, saw a floating local object (target), which they grabbed and placed back on the shelf in its remembered position (Response). Eye-tracking data revealed that both local objects and anchors were fixated, with preference for local objects. Additionally, anchors guided allocentric coding of local objects, despite being task-irrelevant. Overall, anchors implicitly influence spatial coding of local object locations for memory-guided actions within naturalistic (virtual) environments.
Collapse
Affiliation(s)
- Bianca R Baltaretu
- Department of Experimental Psychology, Justus Liebig University Giessen, Otto-Behaghel-Strasse 10F, 35394, Giessen, Hesse, Germany.
| | - Immo Schuetz
- Department of Experimental Psychology, Justus Liebig University Giessen, Otto-Behaghel-Strasse 10F, 35394, Giessen, Hesse, Germany
| | - Melissa L-H Võ
- Department of Psychology, Goethe University Frankfurt, 60323, Frankfurt am Main, Hesse, Germany
| | - Katja Fiehler
- Department of Experimental Psychology, Justus Liebig University Giessen, Otto-Behaghel-Strasse 10F, 35394, Giessen, Hesse, Germany
| |
Collapse
|
4
|
Hafri A, Bonner MF, Landau B, Firestone C. A Phone in a Basket Looks Like a Knife in a Cup: Role-Filler Independence in Visual Processing. Open Mind (Camb) 2024; 8:766-794. [PMID: 38957507 PMCID: PMC11219067 DOI: 10.1162/opmi_a_00146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 04/17/2024] [Indexed: 07/04/2024] Open
Abstract
When a piece of fruit is in a bowl, and the bowl is on a table, we appreciate not only the individual objects and their features, but also the relations containment and support, which abstract away from the particular objects involved. Independent representation of roles (e.g., containers vs. supporters) and "fillers" of those roles (e.g., bowls vs. cups, tables vs. chairs) is a core principle of language and higher-level reasoning. But does such role-filler independence also arise in automatic visual processing? Here, we show that it does, by exploring a surprising error that such independence can produce. In four experiments, participants saw a stream of images containing different objects arranged in force-dynamic relations-e.g., a phone contained in a basket, a marker resting on a garbage can, or a knife sitting in a cup. Participants had to respond to a single target image (e.g., a phone in a basket) within a stream of distractors presented under time constraints. Surprisingly, even though participants completed this task quickly and accurately, they false-alarmed more often to images matching the target's relational category than to those that did not-even when those images involved completely different objects. In other words, participants searching for a phone in a basket were more likely to mistakenly respond to a knife in a cup than to a marker on a garbage can. Follow-up experiments ruled out strategic responses and also controlled for various confounding image features. We suggest that visual processing represents relations abstractly, in ways that separate roles from fillers.
Collapse
Affiliation(s)
- Alon Hafri
- Department of Linguistics and Cognitive Science, University of Delaware
- Department of Cognitive Science, Johns Hopkins University
- Department of Psychological and Brain Sciences, Johns Hopkins University
| | | | - Barbara Landau
- Department of Cognitive Science, Johns Hopkins University
| | - Chaz Firestone
- Department of Cognitive Science, Johns Hopkins University
- Department of Psychological and Brain Sciences, Johns Hopkins University
| |
Collapse
|
5
|
Stecher R, Kaiser D. Representations of imaginary scenes and their properties in cortical alpha activity. Sci Rep 2024; 14:12796. [PMID: 38834699 DOI: 10.1038/s41598-024-63320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 05/28/2024] [Indexed: 06/06/2024] Open
Abstract
Imagining natural scenes enables us to engage with a myriad of simulated environments. How do our brains generate such complex mental images? Recent research suggests that cortical alpha activity carries information about individual objects during visual imagery. However, it remains unclear if more complex imagined contents such as natural scenes are similarly represented in alpha activity. Here, we answer this question by decoding the contents of imagined scenes from rhythmic cortical activity patterns. In an EEG experiment, participants imagined natural scenes based on detailed written descriptions, which conveyed four complementary scene properties: openness, naturalness, clutter level and brightness. By conducting classification analyses on EEG power patterns across neural frequencies, we were able to decode both individual imagined scenes as well as their properties from the alpha band, showing that also the contents of complex visual images are represented in alpha rhythms. A cross-classification analysis between alpha power patterns during the imagery task and during a perception task, in which participants were presented images of the described scenes, showed that scene representations in the alpha band are partly shared between imagery and late stages of perception. This suggests that alpha activity mediates the top-down re-activation of scene-related visual contents during imagery.
Collapse
Affiliation(s)
- Rico Stecher
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany.
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, 35032, Marburg, Germany
| |
Collapse
|
6
|
Leemans M, Damiano C, Wagemans J. Finding the meaning in meaning maps: Quantifying the roles of semantic and non-semantic scene information in guiding visual attention. Cognition 2024; 247:105788. [PMID: 38579638 DOI: 10.1016/j.cognition.2024.105788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/16/2024] [Accepted: 03/30/2024] [Indexed: 04/07/2024]
Abstract
In real-world vision, people prioritise the most informative scene regions via eye-movements. According to the cognitive guidance theory of visual attention, viewers allocate visual attention to those parts of the scene that are expected to be the most informative. The expected information of a scene region is coded in the semantic distribution of that scene. Meaning maps have been proposed to capture the spatial distribution of local scene semantics in order to test cognitive guidance theories of attention. Notwithstanding the success of meaning maps, the reason for their success has been contested. This has led to at least two possible explanations for the success of meaning maps in predicting visual attention. On the one hand, meaning maps might measure scene semantics. On the other hand, meaning maps might measure scene features, overlapping with, but distinct from, scene semantics. This study aims to disentangle these two sources of information by considering both conceptual information and non-semantic scene entropy simultaneously. We found that both semantic and non-semantic information is captured by meaning maps, but scene entropy accounted for more unique variance in the success of meaning maps than conceptual information. Additionally, some explained variance was unaccounted for by either source of information. Thus, although meaning maps may index some aspect of semantic information, their success seems to be better explained by non-semantic information. We conclude that meaning maps may not yet be a good tool to test cognitive guidance theories of attention in general, since they capture non-semantic aspects of local semantic density and only a small portion of conceptual information. Rather, we suggest that researchers should better define the exact aspect of cognitive guidance theories they wish to test and then use the tool that best captures that desired semantic information. As it stands, the semantic information contained in meaning maps seems too ambiguous to draw strong conclusions about how and when semantic information guides visual attention.
Collapse
Affiliation(s)
- Maarten Leemans
- Laboratory of Experimental Psychology, Department of Brain and Cognition, University of Leuven (KU Leuven), Belgium.
| | - Claudia Damiano
- Laboratory of Experimental Psychology, Department of Brain and Cognition, University of Leuven (KU Leuven), Belgium
| | - Johan Wagemans
- Laboratory of Experimental Psychology, Department of Brain and Cognition, University of Leuven (KU Leuven), Belgium
| |
Collapse
|
7
|
Koolen R, Krahmer E. Realistic About Reference Production: Testing the Effects of Domain Size and Saturation. Cogn Sci 2024; 48:e13473. [PMID: 38924126 DOI: 10.1111/cogs.13473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/22/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024]
Abstract
Experiments on visually grounded, definite reference production often manipulate simple visual scenes in the form of grids filled with objects, for example, to test how speakers are affected by the number of objects that are visible. Regarding the latter, it was found that speech onset times increase along with domain size, at least when speakers refer to nonsalient target objects that do not pop out of the visual domain. This finding suggests that even in the case of many distractors, speakers perform object-by-object scans of the visual scene. The current study investigates whether this systematic processing strategy can be explained by the simplified nature of the scenes that were used, and if different strategies can be identified for photo-realistic visual scenes. In doing so, we conducted a preregistered experiment that manipulated domain size and saturation; replicated the measures of speech onset times; and recorded eye movements to measure speakers' viewing strategies more directly. Using controlled photo-realistic scenes, we find (1) that speech onset times increase linearly as more distractors are present; (2) that larger domains elicit relatively fewer fixation switches back and forth between the target and its distractors, mainly before speech onset; and (3) that speakers fixate the target relatively less often in larger domains, mainly after speech onset. We conclude that careful object-by-object scans remain the dominant strategy in our photo-realistic scenes, to a limited extent combined with low-level saliency mechanisms. A relevant direction for future research would be to employ less controlled photo-realistic stimuli that do allow for interpretation based on context.
Collapse
Affiliation(s)
- Ruud Koolen
- Department of Cognition and Communication, Tilburg University
| | - Emiel Krahmer
- Department of Cognition and Communication, Tilburg University
| |
Collapse
|
8
|
Damiano C, Leemans M, Wagemans J. Exploring the Semantic-Inconsistency Effect in Scenes Using a Continuous Measure of Linguistic-Semantic Similarity. Psychol Sci 2024; 35:623-634. [PMID: 38652604 DOI: 10.1177/09567976241238217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024] Open
Abstract
Viewers use contextual information to visually explore complex scenes. Object recognition is facilitated by exploiting object-scene relations (which objects are expected in a given scene) and object-object relations (which objects are expected because of the occurrence of other objects). Semantically inconsistent objects deviate from these expectations, so they tend to capture viewers' attention (the semantic-inconsistency effect). Some objects fit the identity of a scene more or less than others, yet semantic inconsistencies have hitherto been operationalized as binary (consistent vs. inconsistent). In an eye-tracking experiment (N = 21 adults), we study the semantic-inconsistency effect in a continuous manner by using the linguistic-semantic similarity of an object to the scene category and to other objects in the scene. We found that both highly consistent and highly inconsistent objects are viewed more than other objects (U-shaped relationship), revealing that the (in)consistency effect is more than a simple binary classification.
Collapse
Affiliation(s)
- Claudia Damiano
- Department of Psychology, University of Toronto
- Laboratory of Experimental Psychology, Department of Brain and Cognition, KU Leuven
| | - Maarten Leemans
- Laboratory of Experimental Psychology, Department of Brain and Cognition, KU Leuven
| | - Johan Wagemans
- Laboratory of Experimental Psychology, Department of Brain and Cognition, KU Leuven
| |
Collapse
|
9
|
Djambazovska S, Zafer A, Ramezanpour H, Kreiman G, Kar K. The Impact of Scene Context on Visual Object Recognition: Comparing Humans, Monkeys, and Computational Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596127. [PMID: 38854011 PMCID: PMC11160639 DOI: 10.1101/2024.05.27.596127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
During natural vision, we rarely see objects in isolation but rather embedded in rich and complex contexts. Understanding how the brain recognizes objects in natural scenes by integrating contextual information remains a key challenge. To elucidate neural mechanisms compatible with human visual processing, we need an animal model that behaves similarly to humans, so that inferred neural mechanisms can provide hypotheses relevant to the human brain. Here we assessed whether rhesus macaques could model human context-driven object recognition by quantifying visual object identification abilities across variations in the amount, quality, and congruency of contextual cues. Behavioral metrics revealed strikingly similar context-dependent patterns between humans and monkeys. However, neural responses in the inferior temporal (IT) cortex of monkeys that were never explicitly trained to discriminate objects in context, as well as current artificial neural network models, could only partially explain this cross-species correspondence. The shared behavioral variance unexplained by context-naive neural data or computational models highlights fundamental knowledge gaps. Our findings demonstrate an intriguing alignment of human and monkey visual object processing that defies full explanation by either brain activity in a key visual region or state-of-the-art models.
Collapse
Affiliation(s)
- Sara Djambazovska
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
- Children’s Hospital, Harvard Medical School, MA, USA
| | - Anaa Zafer
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | - Hamidreza Ramezanpour
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | | | - Kohitij Kar
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| |
Collapse
|
10
|
Lande KJ. Compositionality in perception: A framework. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2024:e1691. [PMID: 38807187 DOI: 10.1002/wcs.1691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]
Abstract
Perception involves the processing of content or information about the world. In what form is this content represented? I argue that perception is widely compositional. The perceptual system represents many stimulus features (including shape, orientation, and motion) in terms of combinations of other features (such as shape parts, slant and tilt, common and residual motion vectors). But compositionality can take a variety of forms. The ways in which perceptual representations compose are markedly different from the ways in which sentences or thoughts are thought to be composed. I suggest that the thesis that perception is compositional is not itself a concrete hypothesis with specific predictions; rather it affords a productive framework for developing and evaluating specific empirical hypotheses about the form and content of perceptual representations. The question is not just whether perception is compositional, but how. Answering this latter question can provide fundamental insights into perception. This article is categorized under: Philosophy > Representation Philosophy > Foundations of Cognitive Science Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Kevin J Lande
- Department of Philosophy and Centre for Vision Research, York University, Toronto, Canada
| |
Collapse
|
11
|
Ferreira F, Barker M. Perceptual Clauses as Units of Production in Visual Descriptions. Top Cogn Sci 2024. [PMID: 38781450 DOI: 10.1111/tops.12738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 05/06/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024]
Abstract
Describing our visual environments is challenging because although an enormous amount of information is simultaneously available to the visual system, the language channel must impose a linear order on that information. Moreover, the production system is at least moderately incremental, meaning that it interleaves planning and speaking processes. Here, we address how the operations of these two cognitive systems are coordinated given their different characteristics. We propose the concept of a perceptual clause, defined as an interface representation that allows the visual and linguistic systems to exchange information. The perceptual clause serves as the input to the language formulator, which translates the representation into a linguistic sequence. Perceptual clauses capture speakers' ability to describe visual scenes coherently while at the same time taking advantage of the incremental abilities of the language production system.
Collapse
Affiliation(s)
| | - Madison Barker
- Department of Psychology, University of California, Davis
| |
Collapse
|
12
|
Walter K, Freeman M, Bex P. Quantifying task-related gaze. Atten Percept Psychophys 2024; 86:1318-1329. [PMID: 38594445 PMCID: PMC11093728 DOI: 10.3758/s13414-024-02883-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/11/2024]
Abstract
Competing theories attempt to explain what guides eye movements when exploring natural scenes: bottom-up image salience and top-down semantic salience. In one study, we apply language-based analyses to quantify the well-known observation that task influences gaze in natural scenes. Subjects viewed ten scenes as if they were performing one of two tasks. We found that the semantic similarity between the task and the labels of objects in the scenes captured the task-dependence of gaze (t(39) = 13.083; p < 0.001). In another study, we examined whether image salience or semantic salience better predicts gaze during a search task, and if viewing strategies are affected by searching for targets of high or low semantic relevance to the scene. Subjects searched 100 scenes for a high- or low-relevance object. We found that image salience becomes a worse predictor of gaze across successive fixations, while semantic salience remains a consistent predictor (X2(1, N=40) = 75.148, p < .001). Furthermore, we found that semantic salience decreased as object relevance decreased (t(39) = 2.304; p = .027). These results suggest that semantic salience is a useful predictor of gaze during task-related scene viewing, and that even in target-absent trials, gaze is modulated by the relevance of a search target to the scene in which it might be located.
Collapse
Affiliation(s)
- Kerri Walter
- Department of Psychology, Northeastern University, Boston, MA, USA.
| | - Michelle Freeman
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Peter Bex
- Department of Psychology, Northeastern University, Boston, MA, USA
| |
Collapse
|
13
|
Beitner J, Helbing J, David EJ, Võ MLH. Using a flashlight-contingent window paradigm to investigate visual search and object memory in virtual reality and on computer screens. Sci Rep 2024; 14:8596. [PMID: 38615047 PMCID: PMC11379806 DOI: 10.1038/s41598-024-58941-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 04/04/2024] [Indexed: 04/15/2024] Open
Abstract
A popular technique to modulate visual input during search is to use gaze-contingent windows. However, these are often rather discomforting, providing the impression of visual impairment. To counteract this, we asked participants in this study to search through illuminated as well as dark three-dimensional scenes using a more naturalistic flashlight with which they could illuminate the rooms. In a surprise incidental memory task, we tested the identities and locations of objects encountered during search. Importantly, we tested this study design in both immersive virtual reality (VR; Experiment 1) and on a desktop-computer screen (Experiment 2). As hypothesized, searching with a flashlight increased search difficulty and memory usage during search. We found a memory benefit for identities of distractors in the flashlight condition in VR but not in the computer screen experiment. Surprisingly, location memory was comparable across search conditions despite the enormous difference in visual input. Subtle differences across experiments only appeared in VR after accounting for previous recognition performance, hinting at a benefit of flashlight search in VR. Our findings highlight that removing visual information does not necessarily impair location memory, and that screen experiments using virtual environments can elicit the same major effects as VR setups.
Collapse
Affiliation(s)
- Julia Beitner
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany.
| | - Jason Helbing
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Erwan Joël David
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
- LIUM, Le Mans Université, Le Mans, France
| | - Melissa Lê-Hoa Võ
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| |
Collapse
|
14
|
Andrade MÂ, Cipriano M, Raposo A. ObScene database: Semantic congruency norms for 898 pairs of object-scene pictures. Behav Res Methods 2024; 56:3058-3071. [PMID: 37488464 PMCID: PMC11133025 DOI: 10.3758/s13428-023-02181-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/23/2023] [Indexed: 07/26/2023]
Abstract
Research on the interaction between object and scene processing has a long history in the fields of perception and visual memory. Most databases have established norms for pictures where the object is embedded in the scene. In this study, we provide a diverse and controlled stimulus set comprising real-world pictures of 375 objects (e.g., suitcase), 245 scenes (e.g., airport), and 898 object-scene pairs (e.g., suitcase-airport), with object and scene presented separately. Our goal was twofold. First, to create a database of object and scene pictures, normed for the same variables to have comparable measures for both types of pictures. Second, to acquire normative data for the semantic relationships between objects and scenes presented separately, which offers more flexibility in the use of the pictures and allows disentangling the processing of the object and its context (the scene). Along three experiments, participants evaluated each object or scene picture on name agreement, familiarity, and visual complexity, and rated object-scene pairs on semantic congruency. A total of 125 septuplets of one scene and six objects (three congruent, three incongruent), and 120 triplets of one object and two scenes (in congruent and incongruent pairings) were built. In future studies, these objects and scenes can be used separately or combined, while controlling for their key features. Additionally, as object-scene pairs received semantic congruency ratings along the entire scale, researchers may select among a wide range of congruency values. ObScene is a comprehensive and ecologically valid database, useful for psychology and neuroscience studies of visual object and scene processing.
Collapse
Affiliation(s)
- Miguel Ângelo Andrade
- Research Center for Psychological Science, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013, Lisboa, Portugal.
| | - Margarida Cipriano
- Research Center for Psychological Science, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013, Lisboa, Portugal
| | - Ana Raposo
- Research Center for Psychological Science, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013, Lisboa, Portugal
| |
Collapse
|
15
|
Westebbe L, Liang Y, Blaser E. The Accuracy and Precision of Memory for Natural Scenes: A Walk in the Park. Open Mind (Camb) 2024; 8:131-147. [PMID: 38435706 PMCID: PMC10898787 DOI: 10.1162/opmi_a_00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/17/2024] [Indexed: 03/05/2024] Open
Abstract
It is challenging to quantify the accuracy and precision of scene memory because it is unclear what 'space' scenes occupy (how can we quantify error when misremembering a natural scene?). To address this, we exploited the ecologically valid, metric space in which scenes occur and are represented: routes. In a delayed estimation task, participants briefly saw a target scene drawn from a video of an outdoor 'route loop', then used a continuous report wheel of the route to pinpoint the scene. Accuracy was high and unbiased, indicating there was no net boundary extension/contraction. Interestingly, precision was higher for routes that were more self-similar (as characterized by the half-life, in meters, of a route's Multiscale Structural Similarity index), consistent with previous work finding a 'similarity advantage' where memory precision is regulated according to task demands. Overall, scenes were remembered to within a few meters of their actual location.
Collapse
Affiliation(s)
- Leo Westebbe
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Yibiao Liang
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Erik Blaser
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
16
|
Servais A, Barbeau EJ, Bastin C. Contextual novelty detection and novelty-related memory enhancement in amnestic mild cognitive impairment. Cortex 2024; 172:72-85. [PMID: 38237229 DOI: 10.1016/j.cortex.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/23/2023] [Accepted: 12/03/2023] [Indexed: 03/09/2024]
Abstract
INTRODUCTION Though novelty processing plays a critical role in memory function, little is known about how it influences learning in memory-impaired populations, such as amnestic Mild Cognitive Impairment (aMCI). METHODS 21 aMCI patients and 22 age- and education-matched healthy older participants performed two tasks-(i) an oddball paradigm where fractals that were often repeated (60 % of the stimuli), less frequently repeated (20 %), or novel (presented once each) were shown to assess novelty preference (longer viewing time for novel than familiar stimuli), and (ii) a Von Restorff paradigm assessing novelty-related effects on memory. Participants studied 22 lists of 10 words. Among these lists, 18 contained an isolated word different from the others by its distinctive aspect, here the font size (90-point, 120-point or 150-point against 60-point for non-isolated words). The remaining four were control lists without isolated words. After studying each list, participants freely recalled the maximum words possible. RESULTS For the oddball task, a group-by-stimulus type ANOVA on median viewing times revealed a significant effect of stimulus type, but not of group. Both groups spent more time on novel stimuli. For the Von Restorff task, both aMCI and healthy controls recalled the isolated words (presented in 120-point or 150-point, but not 90-point) better than others (excluding primacy and recency effects). Novelty-related memory benefit-gain factor-was computed as the difference between the recall scores for isolated and other words. A group-by-font size ANOVA on gain factors revealed no group effect, nor interaction, suggesting that aMCI patients benefited from novelty, alike controls. CONCLUSION Novelty preference and the boosting effect of isolation-related novelty on subsequent recall seem preserved despite impaired episodic memory in aMCI patients. This is discussed in the light of contemporary divergent theories regarding the relationship between novelty and memory, as either being independent or parts of a continuum.
Collapse
Affiliation(s)
- Anaïs Servais
- Centre de recherche Cerveau et Cognition, CNRS UMR5549, CHU Purpan, Toulouse, France
| | - Emmanuel J Barbeau
- Centre de recherche Cerveau et Cognition, CNRS UMR5549, CHU Purpan, Toulouse, France
| | | |
Collapse
|
17
|
Walter K, Manley CE, Bex PJ, Merabet LB. Visual search patterns during exploration of naturalistic scenes are driven by saliency cues in individuals with cerebral visual impairment. Sci Rep 2024; 14:3074. [PMID: 38321069 PMCID: PMC10847433 DOI: 10.1038/s41598-024-53642-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/03/2024] [Indexed: 02/08/2024] Open
Abstract
We investigated the relative influence of image salience and image semantics during the visual search of naturalistic scenes, comparing performance in individuals with cerebral visual impairment (CVI) and controls with neurotypical development. Participants searched for a prompted target presented as either an image or text cue. Success rate and reaction time were collected, and gaze behavior was recorded with an eye tracker. A receiver operating characteristic (ROC) analysis compared the distribution of individual gaze landings based on predictions of image salience (using Graph-Based Visual Saliency) and image semantics (using Global Vectors for Word Representations combined with Linguistic Analysis of Semantic Salience) models. CVI participants were less likely and were slower in finding the target. Their visual search behavior was also associated with a larger visual search area and greater number of fixations. ROC scores were also lower in CVI compared to controls for both model predictions. Furthermore, search strategies in the CVI group were not affected by cue type, although search times and accuracy showed a significant correlation with verbal IQ scores for text-cued searches. These results suggest that visual search patterns in CVI are driven mainly by image salience and provide further characterization of higher-order processing deficits observed in this population.
Collapse
Affiliation(s)
- Kerri Walter
- Translational Vision Lab, Department of Psychology, Northeastern University, Boston, MA, USA
| | - Claire E Manley
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, 20 Staniford Street, Boston, MA, 02114, USA
| | - Peter J Bex
- Translational Vision Lab, Department of Psychology, Northeastern University, Boston, MA, USA
| | - Lotfi B Merabet
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, 20 Staniford Street, Boston, MA, 02114, USA.
| |
Collapse
|
18
|
Mikhailova A, Lightfoot S, Santos-Victor J, Coco MI. Differential effects of intrinsic properties of natural scenes and interference mechanisms on recognition processes in long-term visual memory. Cogn Process 2024; 25:173-187. [PMID: 37831320 DOI: 10.1007/s10339-023-01164-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 09/20/2023] [Indexed: 10/14/2023]
Abstract
Humans display remarkable long-term visual memory (LTVM) processes. Even though images may be intrinsically memorable, the fidelity of their visual representations, and consequently the likelihood of successfully retrieving them, hinges on their similarity when concurrently held in LTVM. In this debate, it is still unclear whether intrinsic features of images (perceptual and semantic) may be mediated by mechanisms of interference generated at encoding, or during retrieval, and how these factors impinge on recognition processes. In the current study, participants (32) studied a stream of 120 natural scenes from 8 semantic categories, which varied in frequencies (4, 8, 16 or 32 exemplars per category) to generate different levels of category interference, in preparation for a recognition test. Then they were asked to indicate which of two images, presented side by side (i.e. two-alternative forced-choice), they remembered. The two images belonged to the same semantic category but varied in their perceptual similarity (similar or dissimilar). Participants also expressed their confidence (sure/not sure) about their recognition response, enabling us to tap into their metacognitive efficacy (meta-d'). Additionally, we extracted the activation of perceptual and semantic features in images (i.e. their informational richness) through deep neural network modelling and examined their impact on recognition processes. Corroborating previous literature, we found that category interference and perceptual similarity negatively impact recognition processes, as well as response times and metacognitive efficacy. Moreover, images semantically rich were less likely remembered, an effect that trumped a positive memorability boost coming from perceptual information. Critically, we did not observe any significant interaction between intrinsic features of images and interference generated either at encoding or during retrieval. All in all, our study calls for a more integrative understanding of the representational dynamics during encoding and recognition enabling us to form, maintain and access visual information.
Collapse
Affiliation(s)
- Anastasiia Mikhailova
- Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.
| | | | - José Santos-Victor
- Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Moreno I Coco
- Sapienza, University of Rome, Rome, Italy.
- I.R.C.C.S. Santa Lucia, Fondazione Santa Lucia, Roma, Italy.
| |
Collapse
|
19
|
Coderre EL, Cohn N. Individual differences in the neural dynamics of visual narrative comprehension: The effects of proficiency and age of acquisition. Psychon Bull Rev 2024; 31:89-103. [PMID: 37578688 PMCID: PMC10866750 DOI: 10.3758/s13423-023-02334-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/30/2023] [Indexed: 08/15/2023]
Abstract
Understanding visual narrative sequences, as found in comics, is known to recruit similar cognitive mechanisms to verbal language. As measured by event-related potentials (ERPs), these manifest as initial negativities (N400, LAN) and subsequent positivities (P600). While these components are thought to index discrete processing stages, they differentially arise across participants for any given stimulus. In language contexts, proficiency modulates brain responses, with smaller N400 effects and larger P600 effects appearing with increasing proficiency. In visual narratives, recent work has also emphasized the role of proficiency in neural response patterns. We thus explored whether individual differences in proficiency modulate neural responses to visual narrative sequencing in similar ways as in language. We combined ERP data from 12 studies examining semantic and/or grammatical processing of visual narrative sequences. Using linear mixed effects modeling, we demonstrate differential effects of visual language proficiency and "age of acquisition" on N400 and P600 responses. Our results align with those reported in language contexts, providing further evidence for the similarity of linguistic and visual narrative processing, and emphasize the role of both proficiency and age of acquisition in visual narrative comprehension.
Collapse
Affiliation(s)
- Emily L Coderre
- Department of Communication Sciences and Disorders, University of Vermont, 489 Main St, Burlington, VT, 05405, USA
| | - Neil Cohn
- Department of Communication and Cognition, Tilburg School of Humanities and Digital Sciences, Tilburg Center for Cognition and Communication (TiCC), Tilburg University, Tilburg, The Netherlands.
| |
Collapse
|
20
|
Atilla F, Klomberg B, Cardoso B, Cohn N. Background check: cross-cultural differences in the spatial context of comic scenes. MULTIMODAL COMMUNICATION 2023; 12:179-189. [PMID: 38144414 PMCID: PMC10740350 DOI: 10.1515/mc-2023-0027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 10/16/2023] [Indexed: 12/26/2023]
Abstract
Cognitive research points towards cultural differences in the way people perceive and express scenes. Whereas people from Western cultures focus more on focal objects, those from East Asia have been shown to focus on the surrounding context. This paper examines whether these cultural differences are expressed in complex multimodal media such as comics. We compared annotated panels across comics from six countries to examine how backgrounds convey contextual information of scenes in explicit or implicit ways. Compared to Western comics from the United States and Spain, East Asian comics from Japan and China expressed the context of scenes more implicitly. In addition, Nigerian comics moderately emulated American comics in background use, while Russian comics emulated Japanese manga, consistent with their visual styles. The six countries grouped together based on whether they employed more explicit strategies such as detailed, depicted backgrounds, or implicit strategies such as leaving the background empty. These cultural differences in background use can be attributed to both cognitive patterns of attention and comics' graphic styles. Altogether, this study provides support for cultural differences in attention manifesting in visual narratives, and elucidates how spatial relationships are depicted in visual narratives across cultures.
Collapse
Affiliation(s)
- Fred Atilla
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands
| | - Bien Klomberg
- Department of Communication and Cognition, Tilburg University, Tilburg, The Netherlands
| | - Bruno Cardoso
- Department of Communication and Cognition, Tilburg University, Tilburg, The Netherlands
| | - Neil Cohn
- Department of Communication and Cognition, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
21
|
Fooken J, Baltaretu BR, Barany DA, Diaz G, Semrau JA, Singh T, Crawford JD. Perceptual-Cognitive Integration for Goal-Directed Action in Naturalistic Environments. J Neurosci 2023; 43:7511-7522. [PMID: 37940592 PMCID: PMC10634571 DOI: 10.1523/jneurosci.1373-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/15/2023] [Accepted: 08/18/2023] [Indexed: 11/10/2023] Open
Abstract
Real-world actions require one to simultaneously perceive, think, and act on the surrounding world, requiring the integration of (bottom-up) sensory information and (top-down) cognitive and motor signals. Studying these processes involves the intellectual challenge of cutting across traditional neuroscience silos, and the technical challenge of recording data in uncontrolled natural environments. However, recent advances in techniques, such as neuroimaging, virtual reality, and motion tracking, allow one to address these issues in naturalistic environments for both healthy participants and clinical populations. In this review, we survey six topics in which naturalistic approaches have advanced both our fundamental understanding of brain function and how neurologic deficits influence goal-directed, coordinated action in naturalistic environments. The first part conveys fundamental neuroscience mechanisms related to visuospatial coding for action, adaptive eye-hand coordination, and visuomotor integration for manual interception. The second part discusses applications of such knowledge to neurologic deficits, specifically, steering in the presence of cortical blindness, impact of stroke on visual-proprioceptive integration, and impact of visual search and working memory deficits. This translational approach-extending knowledge from lab to rehab-provides new insights into the complex interplay between perceptual, motor, and cognitive control in naturalistic tasks that are relevant for both basic and clinical research.
Collapse
Affiliation(s)
- Jolande Fooken
- Centre for Neuroscience, Queen's University, Kingston, Ontario K7L3N6, Canada
| | - Bianca R Baltaretu
- Department of Psychology, Justus Liebig University, Giessen, 35394, Germany
| | - Deborah A Barany
- Department of Kinesiology, University of Georgia, and Augusta University/University of Georgia Medical Partnership, Athens, Georgia 30602
| | - Gabriel Diaz
- Center for Imaging Science, Rochester Institute of Technology, Rochester, New York 14623
| | - Jennifer A Semrau
- Department of Kinesiology and Applied Physiology, University of Delaware, Newark, Delaware 19713
| | - Tarkeshwar Singh
- Department of Kinesiology, Pennsylvania State University, University Park, Pennsylvania 16802
| | - J Douglas Crawford
- Centre for Integrative and Applied Neuroscience, York University, Toronto, Ontario M3J 1P3, Canada
| |
Collapse
|
22
|
Mahr JB, Schacter DL. A language of episodic thought? Behav Brain Sci 2023; 46:e283. [PMID: 37766653 DOI: 10.1017/s0140525x2300198x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
We propose that episodic thought (i.e., episodic memory and imagination) is a domain where the language-of-thought hypothesis (LoTH) could be fruitfully applied. On the one hand, LoTH could explain the structure of what is encoded into and retrieved from long-term memory. On the other, LoTH can help make sense of how episodic contents come to play such a large variety of different cognitive roles after they have been retrieved.
Collapse
Affiliation(s)
- Johannes B Mahr
- Department of Psychology, Harvard University, Cambridge, MA, USA ;
| | | |
Collapse
|
23
|
Kallmayer A, Võ MLH, Draschkow D. Viewpoint dependence and scene context effects generalize to depth rotated three-dimensional objects. J Vis 2023; 23:9. [PMID: 37707802 PMCID: PMC10506680 DOI: 10.1167/jov.23.10.9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 08/17/2023] [Indexed: 09/15/2023] Open
Abstract
Viewpoint effects on object recognition interact with object-scene consistency effects. While recognition of objects seen from "noncanonical" viewpoints (e.g., a cup from below) is typically impeded compared to processing of objects seen from canonical viewpoints (e.g., the string-side of a guitar), this effect is reduced by meaningful scene context information. In the present study we investigated if these findings established by using photographic images, generalize to strongly noncanonical orientations of three-dimensional (3D) models of objects. Using 3D models allowed us to probe a broad range of viewpoints and empirically establish viewpoints with very strong noncanonical and canonical orientations. In Experiment 1, we presented 3D models of objects from six different viewpoints (0°, 60°, 120°, 180° 240°, 300°) in color (1a) and grayscaled (1b) in a sequential matching task. Viewpoint had a significant effect on accuracy and response times. Based on the viewpoint effect in Experiments 1a and 1b, we could empirically determine the most canonical and noncanonical viewpoints from our set of viewpoints to use in Experiment 2. In Experiment 2, participants again performed a sequential matching task, however now the objects were paired with scene backgrounds which could be either consistent (e.g., a cup in the kitchen) or inconsistent (e.g., a guitar in the bathroom) to the object. Viewpoint interacted significantly with scene consistency in that object recognition was less affected by viewpoint when consistent scene information was provided, compared to inconsistent information. Our results show that scene context supports object recognition even when using extremely noncanonical orientations of depth rotated 3D objects. This supports the important role object-scene processing plays for object constancy especially under conditions of high uncertainty.
Collapse
Affiliation(s)
- Aylin Kallmayer
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Melissa L-H Võ
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Dejan Draschkow
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford, UK
| |
Collapse
|
24
|
Klever L, Islam J, Võ MLH, Billino J. Aging attenuates the memory advantage for unexpected objects in real-world scenes. Heliyon 2023; 9:e20241. [PMID: 37809883 PMCID: PMC10560015 DOI: 10.1016/j.heliyon.2023.e20241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 09/14/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open
Abstract
Across the adult lifespan memory processes are subject to pronounced changes. Prior knowledge and expectations might critically shape functional differences; however, corresponding findings have remained ambiguous so far. Here, we chose a tailored approach to scrutinize how schema (in-)congruencies affect older and younger adults' memory for objects embedded in real-world scenes, a scenario close to everyday life memory demands. A sample of 23 older (52-81 years) and 23 younger adults (18-38 years) freely viewed 60 photographs of scenes in which target objects were included that were either congruent or incongruent with the given context information. After a delay, recognition performance for those objects was determined. In addition, recognized objects had to be matched to the scene context in which they were previously presented. While we found schema violations beneficial for object recognition across age groups, the advantage was significantly less pronounced in older adults. We moreover observed an age-related congruency bias for matching objects to their original scene context. Our findings support a critical role of predictive processes for age-related memory differences and indicate enhanced weighting of predictions with age. We suggest that recent predictive processing theories provide a particularly useful framework to elaborate on age-related functional vulnerabilities as well as stability.
Collapse
Affiliation(s)
- Lena Klever
- Experimental Psychology, Justus Liebig University Giessen, Germany
- Center for Mind, Brain, And Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany
| | - Jasmin Islam
- Experimental Psychology, Justus Liebig University Giessen, Germany
| | - Melissa Le-Hoa Võ
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Jutta Billino
- Experimental Psychology, Justus Liebig University Giessen, Germany
- Center for Mind, Brain, And Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany
| |
Collapse
|
25
|
Tachmatzidou O, Vatakis A. Attention and schema violations of real world scenes differentially modulate time perception. Sci Rep 2023; 13:10002. [PMID: 37340029 DOI: 10.1038/s41598-023-37030-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/14/2023] [Indexed: 06/22/2023] Open
Abstract
In the real world, object arrangement follows a number of rules. Some of the rules pertain to the spatial relations between objects and scenes (i.e., syntactic rules) and others about the contextual relations (i.e., semantic rules). Research has shown that violation of semantic rules influences interval timing with the duration of scenes containing such violations to be overestimated as compared to scenes with no violations. However, no study has yet investigated whether both semantic and syntactic violations can affect timing in the same way. Furthermore, it is unclear whether the effect of scene violations on timing is due to attentional or other cognitive accounts. Using an oddball paradigm and real-world scenes with or without semantic and syntactic violations, we conducted two experiments on whether time dilation will be obtained in the presence of any type of scene violation and the role of attention in any such effect. Our results from Experiment 1 showed that time dilation indeed occurred in the presence of syntactic violations, while time compression was observed for semantic violations. In Experiment 2, we further investigated whether these estimations were driven by attentional accounts, by utilizing a contrast manipulation of the target objects. The results showed that an increased contrast led to duration overestimation for both semantic and syntactic oddballs. Together, our results indicate that scene violations differentially affect timing due to violation processing differences and, moreover, their effect on timing seems to be sensitive to attentional manipulations such as target contrast.
Collapse
Affiliation(s)
- Ourania Tachmatzidou
- Multisensory and Temporal Processing Laboratory (MultiTimeLab), Department of Psychology, Panteion University of Social and Political Sciences, 136 Syngrou Ave., 17671, Athens, Greece
| | - Argiro Vatakis
- Multisensory and Temporal Processing Laboratory (MultiTimeLab), Department of Psychology, Panteion University of Social and Political Sciences, 136 Syngrou Ave., 17671, Athens, Greece.
| |
Collapse
|
26
|
Han NX, Eckstein MP. Head and body cues guide eye movements and facilitate target search in real-world videos. J Vis 2023; 23:5. [PMID: 37294703 PMCID: PMC10259675 DOI: 10.1167/jov.23.6.5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 05/03/2023] [Indexed: 06/11/2023] Open
Abstract
Static gaze cues presented in central vision result in observer shifts of covert attention and eye movements, and benefits in perceptual performance in the detection of simple targets. Less is known about how dynamic gazer behaviors with head and body motion influence search eye movements and performance in perceptual tasks in real-world scenes. Participants searched for a target person (yes/no task, 50% presence), whereas watching videos of one to three gazers looking at a designated person (50% valid gaze cue, looking at the target). To assess the contributions of different body parts, we digitally erase parts of the gazers in the videos to create three different body parts/whole conditions for gazers: floating heads (only head movements), headless bodies (only lower body movements), and the baseline condition with intact head and body. We show that valid dynamic gaze cues guided participants' eye movements (up to 3 fixations) closer to the target, speeded the time to foveate the target, reduced fixations to the gazers, and improved target detection. The effect of gaze cues in guiding eye movements to the search target was the smallest when the gazer's head was removed from the videos. To assess the inherent information about gaze goal location for each body parts/whole condition, we collected perceptual judgments estimating gaze goals by a separate group of observers with unlimited time. Observers' perceptual judgments showed larger estimate errors when the gazer's head was removed. This suggests that the reduced eye movement guidance from lower body cueing is related to observers' difficulty extracting gaze information without the presence of the head. Together, the study extends previous work by evaluating the impact of dynamic gazer behaviors on search with videos of real-world cluttered scenes.
Collapse
Affiliation(s)
- Nicole X Han
- Department of Psychological and Brain Sciences, Institute for Collaborative Biotechnologies, University of California, Santa Barbara, CA, USA
| | - Miguel P Eckstein
- Department of Psychological and Brain Sciences, Institute for Collaborative Biotechnologies, University of California, Santa Barbara, CA, USA
| |
Collapse
|
27
|
Wiesmann SL, Võ MLH. Disentangling diagnostic object properties for human scene categorization. Sci Rep 2023; 13:5912. [PMID: 37041222 PMCID: PMC10090043 DOI: 10.1038/s41598-023-32385-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/27/2023] [Indexed: 04/13/2023] Open
Abstract
It usually only takes a single glance to categorize our environment into different scene categories (e.g. a kitchen or a highway). Object information has been suggested to play a crucial role in this process, and some proposals even claim that the recognition of a single object can be sufficient to categorize the scene around it. Here, we tested this claim in four behavioural experiments by having participants categorize real-world scene photographs that were reduced to a single, cut-out object. We show that single objects can indeed be sufficient for correct scene categorization and that scene category information can be extracted within 50 ms of object presentation. Furthermore, we identified object frequency and specificity for the target scene category as the most important object properties for human scene categorization. Interestingly, despite the statistical definition of specificity and frequency, human ratings of these properties were better predictors of scene categorization behaviour than more objective statistics derived from databases of labelled real-world images. Taken together, our findings support a central role of object information during human scene categorization, showing that single objects can be indicative of a scene category if they are assumed to frequently and exclusively occur in a certain environment.
Collapse
Affiliation(s)
- Sandro L Wiesmann
- Department of Psychology, Johann Wolfgang Goethe-Universität, Theodor-W.-Adorno-Platz 6, 60323, Frankfurt Am Main, Germany.
| | - Melissa L-H Võ
- Department of Psychology, Johann Wolfgang Goethe-Universität, Theodor-W.-Adorno-Platz 6, 60323, Frankfurt Am Main, Germany
| |
Collapse
|
28
|
Schüz S, Gatt A, Zarrieß S. Rethinking symbolic and visual context in Referring Expression Generation. Front Artif Intell 2023; 6:1067125. [PMID: 37026020 PMCID: PMC10072327 DOI: 10.3389/frai.2023.1067125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/28/2023] [Indexed: 03/31/2023] Open
Abstract
Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (REG), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains throughsymbolicinformation about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research invisual REGhas turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction betweenpositiveandnegative semantic forcesexerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.
Collapse
Affiliation(s)
- Simeon Schüz
- Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany
- *Correspondence: Simeon Schüz
| | - Albert Gatt
- Natural Language Processing Group, Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
| | - Sina Zarrieß
- Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
29
|
Quilty-Dunn J, Porot N, Mandelbaum E. The best game in town: The reemergence of the language-of-thought hypothesis across the cognitive sciences. Behav Brain Sci 2022; 46:e261. [PMID: 36471543 DOI: 10.1017/s0140525x22002849] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Mental representations remain the central posits of psychology after many decades of scrutiny. However, there is no consensus about the representational format(s) of biological cognition. This paper provides a survey of evidence from computational cognitive psychology, perceptual psychology, developmental psychology, comparative psychology, and social psychology, and concludes that one type of format that routinely crops up is the language-of-thought (LoT). We outline six core properties of LoTs: (i) discrete constituents; (ii) role-filler independence; (iii) predicate-argument structure; (iv) logical operators; (v) inferential promiscuity; and (vi) abstract content. These properties cluster together throughout cognitive science. Bayesian computational modeling, compositional features of object perception, complex infant and animal reasoning, and automatic, intuitive cognition in adults all implicate LoT-like structures. Instead of regarding LoT as a relic of the previous century, researchers in cognitive science and philosophy-of-mind must take seriously the explanatory breadth of LoT-based architectures. We grant that the mind may harbor many formats and architectures, including iconic and associative structures as well as deep-neural-network-like architectures. However, as computational/representational approaches to the mind continue to advance, classical compositional symbolic structures - that is, LoTs - only prove more flexible and well-supported over time.
Collapse
Affiliation(s)
- Jake Quilty-Dunn
- Department of Philosophy and Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO, USA. , sites.google.com/site/jakequiltydunn/
| | - Nicolas Porot
- Africa Institute for Research in Economics and Social Sciences, Mohammed VI Polytechnic University, Rabat, Morocco. , nicolasporot.com
| | - Eric Mandelbaum
- Departments of Philosophy and Psychology, The Graduate Center & Baruch College, CUNY, New York, NY, USA. , ericmandelbaum.com
| |
Collapse
|
30
|
Walter K, Bex P. Low-level factors increase gaze-guidance under cognitive load: A comparison of image-salience and semantic-salience models. PLoS One 2022; 17:e0277691. [PMID: 36441789 PMCID: PMC9704686 DOI: 10.1371/journal.pone.0277691] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 11/01/2022] [Indexed: 11/29/2022] Open
Abstract
Growing evidence links eye movements and cognitive functioning, however there is debate concerning what image content is fixated in natural scenes. Competing approaches have argued that low-level/feedforward and high-level/feedback factors contribute to gaze-guidance. We used one low-level model (Graph Based Visual Salience, GBVS) and a novel language-based high-level model (Global Vectors for Word Representation, GloVe) to predict gaze locations in a natural image search task, and we examined how fixated locations during this task vary under increasing levels of cognitive load. Participants (N = 30) freely viewed a series of 100 natural scenes for 10 seconds each. Between scenes, subjects identified a target object from the scene a specified number of trials (N) back among three distracter objects of the same type but from alternate scenes. The N-back was adaptive: N-back increased following two correct trials and decreased following one incorrect trial. Receiver operating characteristic (ROC) analysis of gaze locations showed that as cognitive load increased, there was a significant increase in prediction power for GBVS, but not for GloVe. Similarly, there was no significant difference in the area under the ROC between the minimum and maximum N-back achieved across subjects for GloVe (t(29) = -1.062, p = 0.297), while there was a cohesive upwards trend for GBVS (t(29) = -1.975, p = .058), although not significant. A permutation analysis showed that gaze locations were correlated with GBVS indicating that salient features were more likely to be fixated. However, gaze locations were anti-correlated with GloVe, indicating that objects with low semantic consistency with the scene were more likely to be fixated. These results suggest that fixations are drawn towards salient low-level image features and this bias increases with cognitive load. Additionally, there is a bias towards fixating improbable objects that does not vary under increasing levels of cognitive load.
Collapse
Affiliation(s)
- Kerri Walter
- Psychology Department, Northeastern University, Boston, MA, United States of America
| | - Peter Bex
- Psychology Department, Northeastern University, Boston, MA, United States of America
| |
Collapse
|
31
|
Turini J, Võ MLH. Hierarchical organization of objects in scenes is reflected in mental representations of objects. Sci Rep 2022; 12:20068. [PMID: 36418411 PMCID: PMC9684142 DOI: 10.1038/s41598-022-24505-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 11/16/2022] [Indexed: 11/25/2022] Open
Abstract
The arrangement of objects in scenes follows certain rules ("Scene Grammar"), which we exploit to perceive and interact efficiently with our environment. We have proposed that Scene Grammar is hierarchically organized: scenes are divided into clusters of objects ("phrases", e.g., the sink phrase); within every phrase, one object ("anchor", e.g., the sink) holds strong predictions about identity and position of other objects ("local objects", e.g., a toothbrush). To investigate if this hierarchy is reflected in the mental representations of objects, we collected pairwise similarity judgments for everyday object pictures and for the corresponding words. Similarity judgments were stronger not only for object pairs appearing in the same scene, but also object pairs appearing within the same phrase of the same scene as opposed to appearing in different phrases of the same scene. Besides, object pairs with the same status in the scenes (i.e., being both anchors or both local objects) were judged as more similar than pairs of different status. Comparing effects between pictures and words, we found similar, significant impact of scene hierarchy on the organization of mental representation of objects, independent of stimulus modality. We conclude that the hierarchical structure of visual environment is incorporated into abstract, domain general mental representations of the world.
Collapse
Affiliation(s)
- Jacopo Turini
- Scene Grammar Lab, Department of Psychology and Sports Sciences, Goethe University, Frankfurt am Main, Germany.
- Scene Grammar Lab, Institut Für Psychologie, PEG, Room 5.G105, Theodor-W.-Adorno Platz 6, 60323, Frankfurt am Main, Germany.
| | - Melissa Le-Hoa Võ
- Scene Grammar Lab, Department of Psychology and Sports Sciences, Goethe University, Frankfurt am Main, Germany
| |
Collapse
|
32
|
Lukashova-Sanz O, Agarwala R, Wahl S. Context matters during pick-and-place in VR: Impact on search and transport phases. Front Psychol 2022; 13:881269. [PMID: 36160516 PMCID: PMC9493493 DOI: 10.3389/fpsyg.2022.881269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 08/19/2022] [Indexed: 11/13/2022] Open
Abstract
When considering external assistive systems for people with motor impairments, gaze has been shown to be a powerful tool as it is anticipatory to motor actions and is promising for understanding intentions of an individual even before the action. Up until now, the vast majority of studies investigating the coordinated eye and hand movement in a grasping task focused on single objects manipulation without placing them in a meaningful scene. Very little is known about the impact of the scene context on how we manipulate objects in an interactive task. In the present study, it was investigated how the scene context affects human object manipulation in a pick-and-place task in a realistic scenario implemented in VR. During the experiment, participants were instructed to find the target object in a room, pick it up, and transport it to a predefined final location. Thereafter, the impact of the scene context on different stages of the task was examined using head and hand movement, as well as eye tracking. As the main result, the scene context had a significant effect on the search and transport phases, but not on the reach phase of the task. The present work provides insights into the development of potential supporting intention predicting systems, revealing the dynamics of the pick-and-place task behavior once it is realized in a realistic context-rich scenario.
Collapse
Affiliation(s)
- Olga Lukashova-Sanz
- Zeiss Vision Science Lab, Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
- Carl Zeiss Vision International Gesellschaft mit beschränkter Haftung (GmbH), Aalen, Germany
| | - Rajat Agarwala
- Zeiss Vision Science Lab, Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
| | - Siegfried Wahl
- Zeiss Vision Science Lab, Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
- Carl Zeiss Vision International Gesellschaft mit beschränkter Haftung (GmbH), Aalen, Germany
| |
Collapse
|
33
|
Helbing J, Draschkow D, L-H Võ M. Auxiliary Scene-Context Information Provided by Anchor Objects Guides Attention and Locomotion in Natural Search Behavior. Psychol Sci 2022; 33:1463-1476. [PMID: 35942922 DOI: 10.1177/09567976221091838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Successful adaptive behavior requires efficient attentional and locomotive systems. Previous research has thoroughly investigated how we achieve this efficiency during natural behavior by exploiting prior knowledge related to targets of our actions (e.g., attending to metallic targets when looking for a pot) and to the environmental context (e.g., looking for the pot in the kitchen). Less is known about whether and how individual nontarget components of the environment support natural behavior. In our immersive virtual reality task, 24 adult participants searched for objects in naturalistic scenes in which we manipulated the presence and arrangement of large, static objects that anchor predictions about targets (e.g., the sink provides a prediction for the location of the soap). Our results show that gaze and body movements in this naturalistic setting are strongly guided by these anchors. These findings demonstrate that objects auxiliary to the target are incorporated into the representations guiding attention and locomotion.
Collapse
Affiliation(s)
- Jason Helbing
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| | - Dejan Draschkow
- Brain and Cognition Laboratory, Department of Experimental Psychology, University of Oxford.,Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford
| | - Melissa L-H Võ
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| |
Collapse
|
34
|
D'Innocenzo G, Della Sala S, Coco MI. Similar mechanisms of temporary bindings for identity and location of objects in healthy ageing: an eye-tracking study with naturalistic scenes. Sci Rep 2022; 12:11163. [PMID: 35778449 PMCID: PMC9249875 DOI: 10.1038/s41598-022-13559-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 05/25/2022] [Indexed: 11/25/2022] Open
Abstract
The ability to maintain visual working memory (VWM) associations about the identity and location of objects has at times been found to decrease with age. To date, however, this age-related difficulty was mostly observed in artificial visual contexts (e.g., object arrays), and so it is unclear whether it may manifest in naturalistic contexts, and in which ways. In this eye-tracking study, 26 younger and 24 healthy older adults were asked to detect changes in a critical object situated in a photographic scene (192 in total), about its identity (the object becomes a different object but maintains the same position), location (the object only changes position) or both (the object changes in location and identity). Aging was associated with a lower change detection performance. A change in identity was harder to detect than a location change, and performance was best when both features changed, especially in younger adults. Eye movements displayed minor differences between age groups (e.g., shorter saccades in older adults) but were similarly modulated by the type of change. Latencies to the first fixation were longer and the amplitude of incoming saccades was larger when the critical object changed in location. Once fixated, the target object was inspected for longer when it only changed in identity compared to location. Visually salient objects were fixated earlier, but saliency did not affect any other eye movement measures considered, nor did it interact with the type of change. Our findings suggest that even though aging results in lower performance, it does not selectively disrupt temporary bindings of object identity, location, or their association in VWM, and highlight the importance of using naturalistic contexts to discriminate the cognitive processes that undergo detriment from those that are instead spared by aging.
Collapse
Affiliation(s)
- Giorgia D'Innocenzo
- Centro de Investigação em Ciência Psicológica (CICPSI), Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal.
| | - Sergio Della Sala
- Human Cognitive Neuroscience, Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - Moreno I Coco
- Centro de Investigação em Ciência Psicológica (CICPSI), Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal. .,Department of Psychology, "Sapienza" University of Rome, Rome, Italy. .,IRCCS Santa Lucia, Rome, Italy.
| |
Collapse
|
35
|
Abstract
Increasing research has revealed that uninformative spatial sounds facilitate the early processing of visual stimuli. This study examined the crossmodal interactions of semantically congruent stimuli by assessing whether the presentation of event-related characteristic sounds facilitated or interfered with the visual search for corresponding event scenes in pictures. The search array consisted of four images: one target and three non-target pictures. Auditory stimuli were presented to participants in synchronization with picture onset using three types of sounds: a sound congruent with a target, a sound congruent with a distractor, or a control sound. The control sound varied across six experiments, alternating between a sound unrelated to the search stimuli, white noise, and no sound. Participants were required to swiftly localize a target position while ignoring the sound presentation. Visual localization resulted in rapid responses when a sound that was semantically related to the target was played. Furthermore, when a sound was semantically related to a distractor picture, the response times were longer. When the distractor-congruent sound was used, participants incorrectly localized the distractor position more often than at the chance level. These findings were replicated when the experiments ruled out the possibility that participants would learn picture-sound pairs during the visual tasks (i.e., the possibility of brief training during the experiments). Overall, event-related crossmodal interactions occur based on semantic representations, and audiovisual associations may develop as a result of long-term experiences rather than brief training in a laboratory.
Collapse
|
36
|
Nuthmann A, Canas-Bajo T. Visual search in naturalistic scenes from foveal to peripheral vision: A comparison between dynamic and static displays. J Vis 2022; 22:10. [PMID: 35044436 PMCID: PMC8802022 DOI: 10.1167/jov.22.1.10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 12/03/2021] [Indexed: 11/24/2022] Open
Abstract
How important foveal, parafoveal, and peripheral vision are depends on the task. For object search and letter search in static images of real-world scenes, peripheral vision is crucial for efficient search guidance, whereas foveal vision is relatively unimportant. Extending this research, we used gaze-contingent Blindspots and Spotlights to investigate visual search in complex dynamic and static naturalistic scenes. In Experiment 1, we used dynamic scenes only, whereas in Experiments 2 and 3, we directly compared dynamic and static scenes. Each scene contained a static, contextually irrelevant target (i.e., a gray annulus). Scene motion was not predictive of target location. For dynamic scenes, the search-time results from all three experiments converge on the novel finding that neither foveal nor central vision was necessary to attain normal search proficiency. Since motion is known to attract attention and gaze, we explored whether guidance to the target was equally efficient in dynamic as compared to static scenes. We found that the very first saccade was guided by motion in the scene. This was not the case for subsequent saccades made during the scanning epoch, representing the actual search process. Thus, effects of task-irrelevant motion were fast-acting and short-lived. Furthermore, when motion was potentially present (Spotlights) or absent (Blindspots) in foveal or central vision only, we observed differences in verification times for dynamic and static scenes (Experiment 2). When using scenes with greater visual complexity and more motion (Experiment 3), however, the differences between dynamic and static scenes were much reduced.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, Kiel University, Kiel, Germany
- Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK
- http://orcid.org/0000-0003-3338-3434
| | - Teresa Canas-Bajo
- Vision Science Graduate Group, University of California, Berkeley, Berkeley, CA, USA
- Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
37
|
Hessels RS, Benjamins JS, van Doorn AJ, Koenderink JJ, Hooge ITC. Perception of the Potential for Interaction in Social Scenes. Iperception 2021; 12:20416695211040237. [PMID: 34589197 PMCID: PMC8474344 DOI: 10.1177/20416695211040237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 07/30/2021] [Indexed: 11/15/2022] Open
Abstract
In urban environments, humans often encounter other people that may engage one in interaction. How do humans perceive such invitations to interact at a glance? We briefly presented participants with pictures of actors carrying out one of 11 behaviors (e.g., waving or looking at a phone) at four camera-actor distances. Participants were asked to describe what they might do in such a situation, how they decided, and what stood out most in the photograph. In addition, participants rated how likely they deemed interaction to take place. Participants formulated clear responses about how they might act. We show convincingly that what participants would do depended on the depicted behavior, but not the camera-actor distance. The likeliness to interact ratings depended both on the depicted behavior and the camera-actor distance. We conclude that humans perceive the "gist" of photographs and that various aspects of the actor, action, and context depicted in photographs are subjectively available at a glance. Our conclusions are discussed in the context of scene perception, social robotics, and intercultural differences.
Collapse
Affiliation(s)
- Roy S. Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| | - Jeroen S. Benjamins
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| | - Andrea J. van Doorn
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| | - Jan J. Koenderink
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| | - Ignace T. C. Hooge
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
38
|
Training detection of camouflaged targets in natural scenes: Backgrounds and targets both matter. Acta Psychol (Amst) 2021; 219:103394. [PMID: 34390930 DOI: 10.1016/j.actpsy.2021.103394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 08/04/2021] [Accepted: 08/09/2021] [Indexed: 11/22/2022] Open
Abstract
As target-background similarity increases, search performance declines, but this pattern can be attenuated with training. In the present study we (1) characterized training and transfer effects in visual search for camouflaged targets in naturalistic scenes, (2) evaluated whether transfer effects are preserved 3 months after training, (3) tested the suitability of the perceptual learning hypothesis (i.e., using learned scene statistics to aid camouflaged target detection) for explaining camouflage search improvements over training, and (4) provide guidance for camouflage detection training in practice. Participants were assigned to one of three training groups: adaptive camouflage (difficulty varied by performance), massed camouflage (difficulty increased over time), or an active control (no camouflage), and trained over 14 sessions. Additional sessions measured transfer (immediately post training) and retention of training benefits (10 days and 3 months post training). Both the adaptive and massed training groups showed improved camouflaged target detection up to 3 months following training, relative to the control. These benefits were observed only with backgrounds and targets that were similar to those experienced during training and are broadly consistent with the perceptual learning hypothesis. In practice, training interventions should utilize stimuli similar to the operational environment in which detection is expected to occur.
Collapse
|
39
|
David EJ, Beitner J, Võ MLH. The importance of peripheral vision when searching 3D real-world scenes: A gaze-contingent study in virtual reality. J Vis 2021; 21:3. [PMID: 34251433 PMCID: PMC8287039 DOI: 10.1167/jov.21.7.3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 05/09/2021] [Indexed: 11/24/2022] Open
Abstract
Visual search in natural scenes is a complex task relying on peripheral vision to detect potential targets and central vision to verify them. The segregation of the visual fields has been particularly established by on-screen experiments. We conducted a gaze-contingent experiment in virtual reality in order to test how the perceived roles of central and peripheral visions translated to more natural settings. The use of everyday scenes in virtual reality allowed us to study visual attention by implementing a fairly ecological protocol that cannot be implemented in the real world. Central or peripheral vision was masked during visual search, with target objects selected according to scene semantic rules. Analyzing the resulting search behavior, we found that target objects that were not spatially constrained to a probable location within the scene impacted search measures negatively. Our results diverge from on-screen studies in that search performances were only slightly affected by central vision loss. In particular, a central mask did not impact verification times when the target was grammatically constrained to an anchor object. Our findings demonstrates that the role of central vision (up to 6 degrees of eccentricities) in identifying objects in natural scenes seems to be minor, while the role of peripheral preprocessing of targets in immersive real-world searches may have been underestimated by on-screen experiments.
Collapse
Affiliation(s)
- Erwan Joël David
- Department of Psychology, Goethe-Universität, Frankfurt, Germany
| | - Julia Beitner
- Department of Psychology, Goethe-Universität, Frankfurt, Germany
| | | |
Collapse
|
40
|
Rehrig GL, Cheng M, McMahan BC, Shome R. Why are the batteries in the microwave?: Use of semantic information under uncertainty in a search task. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2021; 6:32. [PMID: 33855644 PMCID: PMC8046897 DOI: 10.1186/s41235-021-00294-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 03/23/2021] [Indexed: 11/10/2022]
Abstract
A major problem in human cognition is to understand how newly acquired information and long-standing beliefs about the environment combine to make decisions and plan behaviors. Over-dependence on long-standing beliefs may be a significant source of suboptimal decision-making in unusual circumstances. While the contribution of long-standing beliefs about the environment to search in real-world scenes is well-studied, less is known about how new evidence informs search decisions, and it is unclear whether the two sources of information are used together optimally to guide search. The present study expanded on the literature on semantic guidance in visual search by modeling a Bayesian ideal observer's use of long-standing semantic beliefs and recent experience in an active search task. The ability to adjust expectations to the task environment was simulated using the Bayesian ideal observer, and subjects' performance was compared to ideal observers that depended on prior knowledge and recent experience to varying degrees. Target locations were either congruent with scene semantics, incongruent with what would be expected from scene semantics, or random. Half of the subjects were able to learn to search for the target in incongruent locations over repeated experimental sessions when it was optimal to do so. These results suggest that searchers can learn to prioritize recent experience over knowledge of scenes in a near-optimal fashion when it is beneficial to do so, as long as the evidence from recent experience was learnable.
Collapse
Affiliation(s)
- Gwendolyn L Rehrig
- Department of Psychology, University of California, Davis, CA, 95616, USA.
| | - Michelle Cheng
- School of Social Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Brian C McMahan
- Department of Computer Science, Rutgers University-New Brunswick, New Brunswick, USA
| | - Rahul Shome
- Department of Computer Science, Rice University, Houston, USA
| |
Collapse
|