1
|
Peacock CE, Singh P, Hayes TR, Rehrig G, Henderson JM. Searching for meaning: Local scene semantics guide attention during natural visual search in scenes. Q J Exp Psychol (Hove) 2023; 76:632-648. [PMID: 35510885 PMCID: PMC11132926 DOI: 10.1177/17470218221101334] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Models of visual search in scenes include image salience as a source of attentional guidance. However, because scene meaning is correlated with image salience, it could be that the salience predictor in these models is driven by meaning. To test this proposal, we generated meaning maps that represented the spatial distribution of semantic informativeness in scenes, and salience maps which represented the spatial distribution of conspicuous image features and tested their influence on fixation densities from two object search tasks in real-world scenes. The results showed that meaning accounted for significantly greater variance in fixation densities than image salience, both overall and in early attention across both studies. Here, meaning explained 58% and 63% of the theoretical ceiling of variance in attention across both studies, respectively. Furthermore, both studies demonstrated that fast initial saccades were not more likely to be directed to higher salience regions than slower initial saccades, and initial saccades of all latencies were directed to regions containing higher meaning than salience. Together, these results demonstrated that even though meaning was task-neutral, the visual system still selected meaningful over salient scene regions for attention during search.
Collapse
Affiliation(s)
- Candace E Peacock
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - Praveena Singh
- Center for Neuroscience, University of California, Davis, Davis, CA, USA
| | - Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Gwendolyn Rehrig
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| |
Collapse
|
2
|
Effect of Target Semantic Consistency in Different Sequence Positions and Processing Modes on T2 Recognition: Integration and Suppression Based on Cross-Modal Processing. Brain Sci 2023; 13:brainsci13020340. [PMID: 36831882 PMCID: PMC9954507 DOI: 10.3390/brainsci13020340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 02/09/2023] [Accepted: 02/14/2023] [Indexed: 02/19/2023] Open
Abstract
In the rapid serial visual presentation (RSVP) paradigm, sound affects participants' recognition of targets. Although many studies have shown that sound improves cross-modal processing, researchers have not yet explored the effects of sound semantic information with respect to different locations and processing modalities after removing sound saliency. In this study, the RSVP paradigm was used to investigate the difference between attention under conditions of consistent and inconsistent semantics with the target (Experiment 1), as well as the difference between top-down (Experiment 2) and bottom-up processing (Experiment 3) for sounds with consistent semantics with target 2 (T2) at different sequence locations after removing sound saliency. The results showed that cross-modal processing significantly improved attentional blink (AB). The early or lagged appearance of sounds consistent with T2 did not affect participants' judgments in the exogenous attentional modality. However, visual target judgments were improved with endogenous attention. The sequential location of sounds consistent with T2 influenced the judgment of auditory and visual congruency. The results illustrate the effects of sound semantic information in different locations and processing modalities.
Collapse
|
3
|
Yu X, Lau E. The Binding Problem 2.0: Beyond Perceptual Features. Cogn Sci 2023; 47:e13244. [PMID: 36744750 DOI: 10.1111/cogs.13244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/22/2022] [Accepted: 01/04/2023] [Indexed: 02/07/2023]
Abstract
The "binding problem" has been a central question in vision science for some 30 years: When encoding multiple objects or maintaining them in working memory, how are we able to represent the correspondence between a specific feature and its corresponding object correctly? In this letter we argue that the boundaries of this research program in fact extend far beyond vision, and we call for coordinated pursuit across the broader cognitive science community of this central question for cognition, which we dub "Binding Problem 2.0".
Collapse
Affiliation(s)
- Xinchi Yu
- Program of Neuroscience and Cognitive Science, University of Maryland.,Department of Linguistics, University of Maryland
| | - Ellen Lau
- Program of Neuroscience and Cognitive Science, University of Maryland.,Department of Linguistics, University of Maryland
| |
Collapse
|
4
|
Hayes TR, Henderson JM. Scene inversion reveals distinct patterns of attention to semantically interpreted and uninterpreted features. Cognition 2022; 229:105231. [DOI: 10.1016/j.cognition.2022.105231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 11/03/2022]
|
5
|
Zhang H, Pan JS. Visual search as an embodied process: The effects of perspective change and external reference on search performance. J Vis 2022; 22:13. [PMID: 36107125 PMCID: PMC9483234 DOI: 10.1167/jov.22.10.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Traditional visual search tasks in the laboratories typically involve looking for targets in 2D displays with exemplar views of objects. In real life, visual search commonly entails 3D objects in 3D spaces with nonperpendicular viewing and relative motions between observers and search array items, both of which lead to transformations of objects’ projected images in lawful but unpredicted ways. Furthermore, observers often do not have to memorize a target before searching, but may refer to it while searching, for example, holding a picture of someone while looking for them from a crowd. Extending the traditional visual search task, in this study, we investigated the effects of image transformation as a result of perspective change yielded by discrete viewing angle change (Experiment 1) or continuous rotation of the search array (Experiment 2) and of having external references on visual search performance. Results showed that when searching from 3D objects with a non-zero viewing angle, performance was similar to searching from 2D exemplar views of objects; when searching for 3D targets from rotating arrays in virtual reality, performance was similar to searching from stationary arrays. In general, discrete or continuous perspective change did not affect the search outcomes in terms of accuracy, response time, and self-rated confidence, or the search process in terms of eye movement patterns. Therefore, visual search does not require the exact match of retinal images. Additionally, being able to see the target during the search improved search accuracy and observers’ confidence. It increased search time because, as revealed by the eye movements, observers actively checked back on the reference target. Thus, visual search is an embodied process that involves real-time information exchange between the observers and the environment.
Collapse
Affiliation(s)
- Huiyuan Zhang
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Jing Samantha Pan
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Guangzhou, China
| |
Collapse
|
6
|
Thiessen A, Brown J, Basinger M. Examining the Visual Attention Patterns and Identification Accuracy of Adults With Aphasia for Grids and Visual Scene Displays. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2022; 31:1979-1991. [PMID: 35858268 DOI: 10.1044/2022_ajslp-21-00248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE We compared the degree of cognitive processing needed by people with aphasia to identify themes depicted in grids and visual scene displays (VSDs). We also compared the accuracy of theme identification for both display types. METHOD Eye-tracking technology was employed to measure the visual processing patterns of 21 adults with aphasia when interpreting themes presented through grids and VSDs. Additionally, we assessed theme identification accuracy by having participants select themes from four choices after viewing each display. RESULTS Participants more rapidly identified VSDs than grid displays, and VSDs required fewer visual fixations to process than grids. No significant differences were noted between grids and VSDs for theme identification accuracy; however, results indicate a ceiling effect for the variable, as participant accuracy levels were nearly 100% for both display conditions. CONCLUSIONS Results from this study add to a growing body of evidence supporting the use of VSDs for adults with aphasia. Both display types were accurately identified; however, VSDs were processed more efficiently than grids indicating that both display types may prove effective for people with aphasia; however, VSDs may require less cognitive effort to effectively use than grid displays.
Collapse
Affiliation(s)
- Amber Thiessen
- Department of Communication Sciences and Disorders, University of Houston, TX
| | - Jessica Brown
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson
| | - Melanie Basinger
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson
| |
Collapse
|
7
|
Rehrig G, Barker M, Peacock CE, Hayes TR, Henderson JM, Ferreira F. Look at what I can do: Object affordances guide visual attention while speakers describe potential actions. Atten Percept Psychophys 2022; 84:1583-1610. [PMID: 35484443 PMCID: PMC9246959 DOI: 10.3758/s13414-022-02467-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2022] [Indexed: 11/08/2022]
Abstract
As we act on the world around us, our eyes seek out objects we plan to interact with. A growing body of evidence suggests that overt visual attention selects objects in the environment that could be interacted with, even when the task precludes physical interaction. In previous work, objects that afford grasping interactions influenced attention when static scenes depicted reachable spaces, and attention was otherwise better explained by general informativeness. Because grasping is but one of many object interactions, previous work may have downplayed the influence of object affordances on attention. The current study investigated the relationship between overt visual attention and object affordances versus broadly construed semantic information in scenes as speakers describe or memorize scenes. In addition to meaning and grasp maps-which capture informativeness and grasping object affordances in scenes, respectively-we introduce interact maps, which capture affordances more broadly. In a mixed-effects analysis of 5 eyetracking experiments, we found that meaning predicted fixated locations in a general description task and during scene memorization. Grasp maps marginally predicted fixated locations during action description for scenes that depicted reachable spaces only. Interact maps predicted fixated regions in description experiments alone. Our findings suggest observers allocate attention to scene regions that could be readily interacted with when talking about the scene, while general informativeness preferentially guides attention when the task does not encourage careful consideration of objects in the scene. The current study suggests that the influence of object affordances on visual attention in scenes is mediated by task demands.
Collapse
Affiliation(s)
- Gwendolyn Rehrig
- Department of Psychology, University of California, Davis, Davis, CA, 95616, USA.
| | - Madison Barker
- Department of Psychology, University of California, Davis, Davis, CA, 95616, USA
| | - Candace E Peacock
- Department of Psychology and Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - John M Henderson
- Department of Psychology and Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Fernanda Ferreira
- Department of Psychology, University of California, Davis, Davis, CA, 95616, USA
| |
Collapse
|
8
|
O’Reilly RC, Ranganath C, Russin JL. The Structure of Systematicity in the Brain. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2022; 31:124-130. [PMID: 35785023 PMCID: PMC9246245 DOI: 10.1177/09637214211049233] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A hallmark of human intelligence is the ability to adapt to new situations, by applying learned rules to new content (systematicity) and thereby enabling an open-ended number of inferences and actions (generativity). Here, we propose that the human brain accomplishes these feats through pathways in the parietal cortex that encode the abstract structure of space, events, and tasks, and pathways in the temporal cortex that encode information about specific people, places, and things (content). Recent neural network models show how the separation of structure and content might emerge through a combination of architectural biases and learning, and these networks show dramatic improvements in the ability to capture systematic, generative behavior. We close by considering how the hippocampal formation may form integrative memories that enable rapid learning of new structure and content representations.
Collapse
Affiliation(s)
| | - Charan Ranganath
- Department of Psychology
- Center for Neuroscience, University of California, Davis
| | - Jacob L. Russin
- Department of Psychology
- Center for Neuroscience, University of California, Davis
| |
Collapse
|
9
|
The impact of semantic matching on the additive effects of object-based attentional selection. CURRENT PSYCHOLOGY 2022. [DOI: 10.1007/s12144-022-02990-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Peacock CE, Cronin DA, Hayes TR, Henderson JM. Meaning and expected surfaces combine to guide attention during visual search in scenes. J Vis 2021; 21:1. [PMID: 34609475 PMCID: PMC8496418 DOI: 10.1167/jov.21.11.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/02/2021] [Indexed: 11/24/2022] Open
Abstract
How do spatial constraints and meaningful scene regions interact to control overt attention during visual search for objects in real-world scenes? To answer this question, we combined novel surface maps of the likely locations of target objects with maps of the spatial distribution of scene semantic content. The surface maps captured likely target surfaces as continuous probabilities. Meaning was represented by meaning maps highlighting the distribution of semantic content in local scene regions. Attention was indexed by eye movements during the search for target objects that varied in the likelihood they would appear on specific surfaces. The interaction between surface maps and meaning maps was analyzed to test whether fixations were directed to meaningful scene regions on target-related surfaces. Overall, meaningful scene regions were more likely to be fixated if they appeared on target-related surfaces than if they appeared on target-unrelated surfaces. These findings suggest that the visual system prioritizes meaningful scene regions on target-related surfaces during visual search in scenes.
Collapse
Affiliation(s)
- Candace E Peacock
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - Deborah A Cronin
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| |
Collapse
|
11
|
Scene meaningfulness guides eye movements even during mind-wandering. Atten Percept Psychophys 2021; 84:1130-1150. [PMID: 34553314 DOI: 10.3758/s13414-021-02370-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/17/2021] [Indexed: 11/08/2022]
Abstract
During scene viewing, semantic information in the scene has been shown to play a dominant role in guiding fixations compared to visual salience (e.g., Henderson & Hayes, 2017). However, scene viewing is sometimes disrupted by cognitive processes unrelated to the scene. For example, viewers sometimes engage in mind-wandering, or having thoughts unrelated to the current task. How do meaning and visual salience account for fixation allocation when the viewer is mind-wandering, and does it differ from when the viewer is on-task? We asked participants to study a series of real-world scenes in preparation for a later memory test. Thought probes occasionally occurred after a subset of scenes to assess whether participants were on-task or mind-wandering. We used salience maps (Graph-Based Visual Saliency; Harel, Koch, & Perona, 2007) and meaning maps (Henderson & Hayes, 2017) to represent the distribution of visual salience and semantic richness in the scene, respectively. Because visual salience and meaning were represented similarly, we could directly compare how well they predicted fixation allocation. Our results indicate that fixations prioritized meaningful over visually salient regions in the scene during mind-wandering just as during attentive viewing. These results held across the entire viewing time. A re-analysis of an independent study (Krasich, Huffman, Faber, & Brockmole Journal of Vision, 20(9), 10, 2020) showed similar results. Therefore, viewers appear to prioritize meaningful regions over visually salient regions in real-world scenes even during mind-wandering.
Collapse
|
12
|
Gronau N. To Grasp the World at a Glance: The Role of Attention in Visual and Semantic Associative Processing. J Imaging 2021; 7:jimaging7090191. [PMID: 34564117 PMCID: PMC8470651 DOI: 10.3390/jimaging7090191] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 08/30/2021] [Accepted: 09/15/2021] [Indexed: 11/16/2022] Open
Abstract
Associative relations among words, concepts and percepts are the core building blocks of high-level cognition. When viewing the world ‘at a glance’, the associative relations between objects in a scene, or between an object and its visual background, are extracted rapidly. The extent to which such relational processing requires attentional capacity, however, has been heavily disputed over the years. In the present manuscript, I review studies investigating scene–object and object–object associative processing. I then present a series of studies in which I assessed the necessity of spatial attention to various types of visual–semantic relations within a scene. Importantly, in all studies, the spatial and temporal aspects of visual attention were tightly controlled in an attempt to minimize unintentional attention shifts from ‘attended’ to ‘unattended’ regions. Pairs of stimuli—either objects, scenes or a scene and an object—were briefly presented on each trial, while participants were asked to detect a pre-defined target category (e.g., an animal, a nonsense shape). Response times (RTs) to the target detection task were registered when visual attention spanned both stimuli in a pair vs. when attention was focused on only one of two stimuli. Among non-prioritized stimuli that were not defined as to-be-detected targets, findings consistently demonstrated rapid associative processing when stimuli were fully attended, i.e., shorter RTs to associated than unassociated pairs. Focusing attention on a single stimulus only, however, largely impaired this relational processing. Notably, prioritized targets continued to affect performance even when positioned at an unattended location, and their associative relations with the attended items were well processed and analyzed. Our findings portray an important dissociation between unattended task-irrelevant and task-relevant items: while the former require spatial attentional resources in order to be linked to stimuli positioned inside the attentional focus, the latter may influence high-level recognition and associative processes via feature-based attentional mechanisms that are largely independent of spatial attention.
Collapse
Affiliation(s)
- Nurit Gronau
- Department of Psychology and Department of Cognitive Science Studies, The Open University of Israel, Raanana 4353701, Israel
| |
Collapse
|
13
|
Hayes TR, Henderson JM. Deep saliency models learn low-, mid-, and high-level features to predict scene attention. Sci Rep 2021; 11:18434. [PMID: 34531484 PMCID: PMC8445969 DOI: 10.1038/s41598-021-97879-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/31/2021] [Indexed: 02/08/2023] Open
Abstract
Deep saliency models represent the current state-of-the-art for predicting where humans look in real-world scenes. However, for deep saliency models to inform cognitive theories of attention, we need to know how deep saliency models prioritize different scene features to predict where people look. Here we open the black box of three prominent deep saliency models (MSI-Net, DeepGaze II, and SAM-ResNet) using an approach that models the association between attention, deep saliency model output, and low-, mid-, and high-level scene features. Specifically, we measured the association between each deep saliency model and low-level image saliency, mid-level contour symmetry and junctions, and high-level meaning by applying a mixed effects modeling approach to a large eye movement dataset. We found that all three deep saliency models were most strongly associated with high-level and low-level features, but exhibited qualitatively different feature weightings and interaction patterns. These findings suggest that prominent deep saliency models are primarily learning image features associated with high-level scene meaning and low-level image saliency and highlight the importance of moving beyond simply benchmarking performance.
Collapse
Affiliation(s)
- Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, 95618, USA.
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, 95618, USA
- Department of Psychology, University of California, Davis, 95616, USA
| |
Collapse
|
14
|
Smith ME, Loschky LC, Bailey HR. Knowledge guides attention to goal-relevant information in older adults. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2021; 6:56. [PMID: 34406505 PMCID: PMC8374018 DOI: 10.1186/s41235-021-00321-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 07/31/2021] [Indexed: 11/18/2022]
Abstract
How does viewers’ knowledge guide their attention while they watch everyday events, how does it affect their memory, and does it change with age? Older adults have diminished episodic memory for everyday events, but intact semantic knowledge. Indeed, research suggests that older adults may rely on their semantic memory to offset impairments in episodic memory, and when relevant knowledge is lacking, older adults’ memory can suffer. Yet, the mechanism by which prior knowledge guides attentional selection when watching dynamic activity is unclear. To address this, we studied the influence of knowledge on attention and memory for everyday events in young and older adults by tracking their eyes while they watched videos. The videos depicted activities that older adults perform more frequently than young adults (balancing a checkbook, planting flowers) or activities that young adults perform more frequently than older adults (installing a printer, setting up a video game). Participants completed free recall, recognition, and order memory tests after each video. We found age-related memory deficits when older adults had little knowledge of the activities, but memory did not differ between age groups when older adults had relevant knowledge and experience with the activities. Critically, results showed that knowledge influenced where viewers fixated when watching the videos. Older adults fixated less goal-relevant information compared to young adults when watching young adult activities, but they fixated goal-relevant information similarly to young adults, when watching more older adult activities. Finally, results showed that fixating goal-relevant information predicted free recall of the everyday activities for both age groups. Thus, older adults may use relevant knowledge to more effectively infer the goals of actors, which guides their attention to goal-relevant actions, thus improving their episodic memory for everyday activities.
Collapse
Affiliation(s)
- Maverick E Smith
- Department of Psychological Sciences, Kansas State University, 471 Bluemont Hall, 1100 Mid-campus Dr., Manhattan, KS, 66506, USA.
| | - Lester C Loschky
- Department of Psychological Sciences, Kansas State University, 471 Bluemont Hall, 1100 Mid-campus Dr., Manhattan, KS, 66506, USA
| | - Heather R Bailey
- Department of Psychological Sciences, Kansas State University, 471 Bluemont Hall, 1100 Mid-campus Dr., Manhattan, KS, 66506, USA
| |
Collapse
|
15
|
Hayes TR, Henderson JM. Looking for Semantic Similarity: What a Vector-Space Model of Semantics Can Tell Us About Attention in Real-World Scenes. Psychol Sci 2021; 32:1262-1270. [PMID: 34252325 PMCID: PMC8726595 DOI: 10.1177/0956797621994768] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 12/23/2020] [Indexed: 11/15/2022] Open
Abstract
The visual world contains more information than we can perceive and understand in any given moment. Therefore, we must prioritize important scene regions for detailed analysis. Semantic knowledge gained through experience is theorized to play a central role in determining attentional priority in real-world scenes but is poorly understood. Here, we examined the relationship between object semantics and attention by combining a vector-space model of semantics with eye movements in scenes. In this approach, the vector-space semantic model served as the basis for a concept map, an index of the spatial distribution of the semantic similarity of objects across a given scene. The results showed a strong positive relationship between the semantic similarity of a scene region and viewers' focus of attention; specifically, greater attention was given to more semantically related scene regions. We conclude that object semantics play a critical role in guiding attention through real-world scenes.
Collapse
Affiliation(s)
| | - John M. Henderson
- Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of California, Davis
| |
Collapse
|
16
|
Does task-irrelevant music affect gaze allocation during real-world scene viewing? Psychon Bull Rev 2021; 28:1944-1960. [PMID: 34159530 DOI: 10.3758/s13423-021-01947-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2021] [Indexed: 11/08/2022]
Abstract
Gaze control manifests from a dynamic integration of visual and auditory information, with sound providing important cues for how a viewer should behave. Some past research suggests that music, even if entirely irrelevant to the current task demands, may also sway the timing and frequency of fixations. The current work sought to further assess this idea as well as investigate whether task-irrelevant music could also impact how gaze is spatially allocated. In preparation for a later memory test, participants studied pictures of urban scenes in silence or while simultaneously listening to one of two types of music. Eye tracking was recorded, and nine gaze behaviors were measured to characterize the temporal and spatial aspects of gaze control. Findings showed that while these gaze behaviors changed over the course of viewing, music had no impact. Participants in the music conditions, however, did show better memory performance than those who studied in silence. These findings are discussed within theories of multimodal gaze control.
Collapse
|
17
|
Henderson JM, Goold JE, Choi W, Hayes TR. Neural Correlates of Fixated Low- and High-level Scene Properties during Active Scene Viewing. J Cogn Neurosci 2020; 32:2013-2023. [PMID: 32573384 PMCID: PMC11164273 DOI: 10.1162/jocn_a_01599] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
During real-world scene perception, viewers actively direct their attention through a scene in a controlled sequence of eye fixations. During each fixation, local scene properties are attended, analyzed, and interpreted. What is the relationship between fixated scene properties and neural activity in the visual cortex? Participants inspected photographs of real-world scenes in an MRI scanner while their eye movements were recorded. Fixation-related fMRI was used to measure activation as a function of lower- and higher-level scene properties at fixation, operationalized as edge density and meaning maps, respectively. We found that edge density at fixation was most associated with activation in early visual areas, whereas semantic content at fixation was most associated with activation along the ventral visual stream including core object and scene-selective areas (lateral occipital complex, parahippocampal place area, occipital place area, and retrosplenial cortex). The observed activation from semantic content was not accounted for by differences in edge density. The results are consistent with active vision models in which fixation gates detailed visual analysis for fixated scene regions, and this gating influences both lower and higher levels of scene analysis.
Collapse
Affiliation(s)
| | | | - Wonil Choi
- Gwangju Institute of Science and Technology
| | | |
Collapse
|
18
|
Krasich K, Huffman G, Faber M, Brockmole JR. Where the eyes wander: The relationship between mind wandering and fixation allocation to visually salient and semantically informative static scene content. J Vis 2020; 20:10. [PMID: 32926071 PMCID: PMC7490225 DOI: 10.1167/jov.20.9.10] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Vision is crucial for many everyday activities, but the mind is not always focused on what the eyes see. Mind wandering occurs frequently and is associated with attenuated visual and cognitive processing of external information. Corresponding changes in gaze behavior—namely, fewer, longer, and more dispersed fixations—suggest a shift in how the visual system samples external information. Using three computational models of visual salience and two innovative approaches for measuring semantic informativeness, the current work assessed whether these changes reflect how the visual system prioritizes visually salient and semantically informative scene content, two major determinants in most theoretical frameworks and computational models of gaze control. Findings showed that, in a static scene viewing task, fixations were allocated to scene content that was more visually salient 10 seconds prior to probe-caught, self-reported mind wandering compared to self-reported attentive viewing. The relationship between mind wandering and semantic content was more equivocal, with weaker evidence that fixations are more likely to fall on locally informative scene regions. This indicates that the visual system is still able to discriminate visually salient and semantically informative scene content during mind wandering and may fixate on such information more frequently than during attentive viewing. Theoretical implications are discussed in light of these findings.
Collapse
Affiliation(s)
- Kristina Krasich
- Department of Psychology, University of Notre Dame, Notre Dame, IN, USA
| | - Greg Huffman
- Department of Psychology, University of Notre Dame, Notre Dame, IN, USA.,Present address: Leidos, Reston, VA, USA
| | - Myrthe Faber
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - James R Brockmole
- Department of Psychology, University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
19
|
Rehrig G, Peacock CE, Hayes TR, Henderson JM, Ferreira F. Where the action could be: Speakers look at graspable objects and meaningful scene regions when describing potential actions. J Exp Psychol Learn Mem Cogn 2020; 46:1659-1681. [PMID: 32271065 PMCID: PMC7483632 DOI: 10.1037/xlm0000837] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The world is visually complex, yet we can efficiently describe it by extracting the information that is most relevant to convey. How do the properties of real-world scenes help us decide where to look and what to say? Image salience has been the dominant explanation for what drives visual attention and production as we describe displays, but new evidence shows scene meaning predicts attention better than image salience. Here we investigated the relevance of one aspect of meaning, graspability (the grasping interactions objects in the scene afford), given that affordances have been implicated in both visual and linguistic processing. We quantified image salience, meaning, and graspability for real-world scenes. In 3 eyetracking experiments, native English speakers described possible actions that could be carried out in a scene. We hypothesized that graspability would preferentially guide attention due to its task-relevance. In 2 experiments using stimuli from a previous study, meaning explained visual attention better than graspability or salience did, and graspability explained attention better than salience. In a third experiment we quantified image salience, meaning, graspability, and reach-weighted graspability for scenes that depicted reachable spaces containing graspable objects. Graspability and meaning explained attention equally well in the third experiment, and both explained attention better than salience. We conclude that speakers use object graspability to allocate attention to plan descriptions when scenes depict graspable objects within reach, and otherwise rely more on general meaning. The results shed light on what aspects of meaning guide attention during scene viewing in language production tasks. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
|
20
|
Peacock CE, Hayes TR, Henderson JM. Center Bias Does Not Account for the Advantage of Meaning Over Salience in Attentional Guidance During Scene Viewing. Front Psychol 2020; 11:1877. [PMID: 32849101 PMCID: PMC7399206 DOI: 10.3389/fpsyg.2020.01877] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 07/07/2020] [Indexed: 11/23/2022] Open
Abstract
Studies assessing the relationship between high-level meaning and low-level image salience on real-world attention have shown that meaning better predicts eye movements than image salience. However, it is not yet clear whether the advantage of meaning over salience is a general phenomenon or whether it is related to center bias: the tendency for viewers to fixate scene centers. Previous meaning mapping studies have shown meaning predicts eye movements beyond center bias whereas saliency does not. However, these past findings were correlational or post hoc in nature. Therefore, to causally test whether meaning predicts eye movements beyond center bias, we used an established paradigm to reduce center bias in free viewing: moving the initial fixation position away from the center and delaying the first saccade. We compared the ability of meaning maps and image salience maps to account for the spatial distribution of fixations with reduced center bias. We found that meaning continued to explain both overall and early attention significantly better than image salience even when center bias was reduced by manipulation. In addition, although both meaning and image salience capture scene-specific information, image salience is driven by significantly greater scene-independent center bias in viewing than meaning. In total, the present findings indicate that the strong association of attention with meaning is not due to center bias.
Collapse
Affiliation(s)
- Candace E. Peacock
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
- Department of Psychology, University of California, Davis, Davis, CA, United States
| | - Taylor R. Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
| | - John M. Henderson
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
- Department of Psychology, University of California, Davis, Davis, CA, United States
| |
Collapse
|
21
|
Ishrat M, Abrol P. Image complexity analysis with scanpath identification using remote gaze estimation model. MULTIMEDIA TOOLS AND APPLICATIONS 2020; 79:24393-24412. [PMID: 32837248 PMCID: PMC7305931 DOI: 10.1007/s11042-020-09117-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 05/20/2020] [Accepted: 05/27/2020] [Indexed: 06/11/2023]
Abstract
Analysis of gaze points has been a vital tool for understanding varied human behavioral pattern and underlying psychological processing. Gaze points are analyzed generally in terms of two events of fixations and saccades that are collectively termed as scanpath. Scanpath could potentially establish correlation between visual scenery and human cognitive tendencies. Scanpath has been analyzed for different domains that include visual perception, usability, memory, visual search or low level attributes like color, illumination and edges in an image. Visual search is one prominent area that examines scanpath of subjects while a target object is searched in a given set of images. Visual search explores behavioral tendencies of subjects with respect to image complexity. Complexity of an image is governed by spatial, frequency and color information present in the image. Scanpath based image complexity analysis determines human visual behavior that could lead to development of interactive and intelligent systems. There are several sophisticated eye tracking devices and associated algorithms for recording and classification of scanpath. However, in the present scenario when the chances of viral infections (COVID-19) from known and unknown sources are high, it is very important that the contact less methods and models be designed. In addition, even though the devices acquire and process eye movement data with fair accuracy but are intrusive and costly. The objective of current research work is to establish the complexity of the given set of images while target objects are searched and to present analysis of gaze search pattern. To achieve these objectives a remote gaze estimation and analysis model has been proposed for scanpath identification and analysis. The model is an alternate option for gaze point tracking and scanpath analysis that is non intrusive and low cost. The gaze points are tracked remotely as against sophisticated wearable eye tracking devices available in the market. The model employs easily available softwares and hardware devices. In the current work, complexity is derived on the basis of analysis of fixation and saccade gaze points. Based on the results generated by the proposed model, influence on subjects due to external stimuli is studied. The set of images chosen, act as external stimuli for the subjects during visual search. In order to statistically analyze scanpath for different subjects, certain scanpath parameters have been identified. The model maps and classifies eye movement gaze points into fixations and saccades and generates data for identified parameters. For eye detection and subsequent iris detection voila jones and circular hough transform (CHT) algorithms have been used. Identification by dispersion threshold (I-DT) is implemented for scanpath identification. The algorithms are customized for better iris and scanpath detection. Algorithms are developed for gaze screen mapping and classification of fixations and saccades. The experimentation has been carried on different subjects. Variations during visual search have been observed and analyzed. The present model requires no contact of human subject with any equipment including eye tracking devices, screen or computing devices.
Collapse
Affiliation(s)
- Mohsina Ishrat
- Department of Computer Science & IT, University of Jammu (J&K), Jammu, India
| | - Pawanesh Abrol
- Department of Computer Science & IT, University of Jammu (J&K), Jammu, India
| |
Collapse
|
22
|
When scenes speak louder than words: Verbal encoding does not mediate the relationship between scene meaning and visual attention. Mem Cognit 2020; 48:1181-1195. [PMID: 32430889 DOI: 10.3758/s13421-020-01050-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The complexity of the visual world requires that we constrain visual attention and prioritize some regions of the scene for attention over others. The current study investigated whether verbal encoding processes influence how attention is allocated in scenes. Specifically, we asked whether the advantage of scene meaning over image salience in attentional guidance is modulated by verbal encoding, given that we often use language to process information. In two experiments, 60 subjects studied scenes (N1 = 30 and N2 = 60) for 12 s each in preparation for a scene-recognition task. Half of the time, subjects engaged in a secondary articulatory suppression task concurrent with scene viewing. Meaning and saliency maps were quantified for each of the experimental scenes. In both experiments, we found that meaning explained more of the variance in visual attention than image salience did, particularly when we controlled for the overlap between meaning and salience, with and without the suppression task. Based on these results, verbal encoding processes do not appear to modulate the relationship between scene meaning and visual attention. Our findings suggest that semantic information in the scene steers the attentional ship, consistent with cognitive guidance theory.
Collapse
|
23
|
Henderson JM. Meaning and attention in scenes. PSYCHOLOGY OF LEARNING AND MOTIVATION 2020. [DOI: 10.1016/bs.plm.2020.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|