1
|
Jeong J, Cho YS. Object-based suppression in target search but not in distractor inhibition. Atten Percept Psychophys 2024:10.3758/s13414-024-02905-7. [PMID: 38839715 DOI: 10.3758/s13414-024-02905-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2024] [Indexed: 06/07/2024]
Abstract
The present study investigated the effect of object representation on attentional priority regarding distractor inhibition and target search processes while the statistical regularities of singleton distractor location were biased. A color singleton distractor appeared more frequently at one of six stimulus locations, called the 'high-probability location,' to induce location-based suppression. Critically, three objects were presented, each of which paired two adjacent stimuli in a target display by adding background contours (Experiment 1) or using perceptual grouping (Experiments 2 and 3). The results revealed that attention capture by singleton distractors was hardly modulated by objects. In contrast, target selection was impeded at the location in the object containing the high-probability location compared to an equidistant location in a different object. This object-based suppression in target selection was evident when object-related features were parts of task-relevant features. These findings suggest that task-irrelevant objects modulate attentional suppression. Moreover, different features are engaged in determining attentional priority for distractor inhibition and target search processes.
Collapse
Affiliation(s)
- Jiyoon Jeong
- School of Psychology, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea
| | - Yang Seok Cho
- School of Psychology, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea.
| |
Collapse
|
2
|
Walper D, Bendixen A, Grimm S, Schubö A, Einhäuser W. Attention deployment in natural scenes: Higher-order scene statistics rather than semantics modulate the N2pc component. J Vis 2024; 24:7. [PMID: 38848099 PMCID: PMC11166226 DOI: 10.1167/jov.24.6.7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/19/2024] [Indexed: 06/13/2024] Open
Abstract
Which properties of a natural scene affect visual search? We consider the alternative hypotheses that low-level statistics, higher-level statistics, semantics, or layout affect search difficulty in natural scenes. Across three experiments (n = 20 each), we used four different backgrounds that preserve distinct scene properties: (a) natural scenes (all experiments); (b) 1/f noise (pink noise, which preserves only low-level statistics and was used in Experiments 1 and 2); (c) textures that preserve low-level and higher-level statistics but not semantics or layout (Experiments 2 and 3); and (d) inverted (upside-down) scenes that preserve statistics and semantics but not layout (Experiment 2). We included "split scenes" that contained different backgrounds left and right of the midline (Experiment 1, natural/noise; Experiment 3, natural/texture). Participants searched for a Gabor patch that occurred at one of six locations (all experiments). Reaction times were faster for targets on noise and slower on inverted images, compared to natural scenes and textures. The N2pc component of the event-related potential, a marker of attentional selection, had a shorter latency and a higher amplitude for targets in noise than for all other backgrounds. The background contralateral to the target had an effect similar to that on the target side: noise led to faster reactions and shorter N2pc latencies than natural scenes, although we observed no difference in N2pc amplitude. There were no interactions between the target side and the non-target side. Together, this shows that-at least when searching simple targets without own semantic content-natural scenes are more effective distractors than noise and that this results from higher-order statistics rather than from semantics or layout.
Collapse
Affiliation(s)
- Daniel Walper
- Physics of Cognition Group, Chemnitz University of Technology, Chemnitz, Germany
| | - Alexandra Bendixen
- Cognitive Systems Lab, Chemnitz University of Technology, Chemnitz, Germany
- https://www.tu-chemnitz.de/physik/SFKS/index.html.en
| | - Sabine Grimm
- Physics of Cognition Group, Chemnitz University of Technology, Chemnitz, Germany
- Cognitive Systems Lab, Chemnitz University of Technology, Chemnitz, Germany
| | - Anna Schubö
- Cognitive Neuroscience of Perception & Action, Philipps University Marburg, Marburg, Germany
- https://www.uni-marburg.de/en/fb04/team-schuboe
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Chemnitz University of Technology, Chemnitz, Germany
- https://www.tu-chemnitz.de/physik/PHKP/index.html.en
| |
Collapse
|
3
|
Del Campo VL, Morán JFO, Cagigal VM, Martín JM, Pagador JB, Hornero R. The use of the eye-fixation-related potential to investigate visual perception in professional domains with high attentional demand: a literature review. Neurol Sci 2024; 45:1849-1860. [PMID: 38157102 DOI: 10.1007/s10072-023-07275-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 12/17/2023] [Indexed: 01/03/2024]
Abstract
INTRODUCTION Visual attention is a cognitive skill related to visual perception and neural activity, and also moderated by expertise, in time-constrained professional domains (e.g., aviation, driving, sport, surgery). However, the contribution of both perceptual and neural processes on performance has been studied separately in the literature. DEVELOPMENT We defend an integration of visual and neural signals to offer a more complete picture of the visual attention displayed by professionals of different skill levels when performing free-viewing tasks. Specifically, we propose to zoom the analysis in data related to the quiet eye and P300 component jointly, as a novel signal processing approach to evaluate professionals' visual attention. CONCLUSION This review highlights the advantages of using portable eye trackers and electroencephalogram systems altogether, as a promising technique for a better understanding of early cognitive components related to attentional processes. Altogether, the eye-fixation-related potentials method may provide a better understanding of the cognitive mechanisms employed by the participants in natural settings, revealing what visual information is of interest for participants and distinguishing the neural bases of visual attention between targets and non-targets whenever they perceive a stimulus during free viewing experiments.
Collapse
Affiliation(s)
- Vicente Luis Del Campo
- Laboratorio de Aprendizaje y Control Motor, Facultad de Ciencias del Deporte, Universidad de Extremadura, Avda. de La Universidad, S/N, 10003, Cáceres, Spain.
| | | | - Víctor Martínez Cagigal
- Grupo de Ingeniería Biomédica, Universidad de Valladolid, E.T.S.I. Telecomunicación, Paseo Belén 15, 47011, Valladolid, Spain
- Centro de Investigación Biomédica en Red - Bioingeniería, Biomateriales y Biomedicina (CIBER-BBN), E.T.S.I. Telecomunicación, Paseo Belén 15, 47011, Valladolid, Spain
| | - Jesús Morenas Martín
- Laboratorio de Aprendizaje y Control Motor, Facultad de Ciencias del Deporte, Universidad de Extremadura, Avda. de La Universidad, S/N, 10003, Cáceres, Spain
| | - J Blas Pagador
- Centro de Cirugía de Mínima Invasión Jesús Usón, Ctra. N-521, Km. 41,8, 10071, Cáceres, Spain
| | - Roberto Hornero
- Grupo de Ingeniería Biomédica, Universidad de Valladolid, E.T.S.I. Telecomunicación, Paseo Belén 15, 47011, Valladolid, Spain
- Centro de Investigación Biomédica en Red - Bioingeniería, Biomateriales y Biomedicina (CIBER-BBN), E.T.S.I. Telecomunicación, Paseo Belén 15, 47011, Valladolid, Spain
| |
Collapse
|
4
|
Walter K, Freeman M, Bex P. Quantifying task-related gaze. Atten Percept Psychophys 2024; 86:1318-1329. [PMID: 38594445 PMCID: PMC11093728 DOI: 10.3758/s13414-024-02883-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/11/2024]
Abstract
Competing theories attempt to explain what guides eye movements when exploring natural scenes: bottom-up image salience and top-down semantic salience. In one study, we apply language-based analyses to quantify the well-known observation that task influences gaze in natural scenes. Subjects viewed ten scenes as if they were performing one of two tasks. We found that the semantic similarity between the task and the labels of objects in the scenes captured the task-dependence of gaze (t(39) = 13.083; p < 0.001). In another study, we examined whether image salience or semantic salience better predicts gaze during a search task, and if viewing strategies are affected by searching for targets of high or low semantic relevance to the scene. Subjects searched 100 scenes for a high- or low-relevance object. We found that image salience becomes a worse predictor of gaze across successive fixations, while semantic salience remains a consistent predictor (X2(1, N=40) = 75.148, p < .001). Furthermore, we found that semantic salience decreased as object relevance decreased (t(39) = 2.304; p = .027). These results suggest that semantic salience is a useful predictor of gaze during task-related scene viewing, and that even in target-absent trials, gaze is modulated by the relevance of a search target to the scene in which it might be located.
Collapse
Affiliation(s)
- Kerri Walter
- Department of Psychology, Northeastern University, Boston, MA, USA.
| | - Michelle Freeman
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Peter Bex
- Department of Psychology, Northeastern University, Boston, MA, USA
| |
Collapse
|
5
|
Walter K, Manley CE, Bex PJ, Merabet LB. Visual search patterns during exploration of naturalistic scenes are driven by saliency cues in individuals with cerebral visual impairment. Sci Rep 2024; 14:3074. [PMID: 38321069 PMCID: PMC10847433 DOI: 10.1038/s41598-024-53642-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/03/2024] [Indexed: 02/08/2024] Open
Abstract
We investigated the relative influence of image salience and image semantics during the visual search of naturalistic scenes, comparing performance in individuals with cerebral visual impairment (CVI) and controls with neurotypical development. Participants searched for a prompted target presented as either an image or text cue. Success rate and reaction time were collected, and gaze behavior was recorded with an eye tracker. A receiver operating characteristic (ROC) analysis compared the distribution of individual gaze landings based on predictions of image salience (using Graph-Based Visual Saliency) and image semantics (using Global Vectors for Word Representations combined with Linguistic Analysis of Semantic Salience) models. CVI participants were less likely and were slower in finding the target. Their visual search behavior was also associated with a larger visual search area and greater number of fixations. ROC scores were also lower in CVI compared to controls for both model predictions. Furthermore, search strategies in the CVI group were not affected by cue type, although search times and accuracy showed a significant correlation with verbal IQ scores for text-cued searches. These results suggest that visual search patterns in CVI are driven mainly by image salience and provide further characterization of higher-order processing deficits observed in this population.
Collapse
Affiliation(s)
- Kerri Walter
- Translational Vision Lab, Department of Psychology, Northeastern University, Boston, MA, USA
| | - Claire E Manley
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, 20 Staniford Street, Boston, MA, 02114, USA
| | - Peter J Bex
- Translational Vision Lab, Department of Psychology, Northeastern University, Boston, MA, USA
| | - Lotfi B Merabet
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, 20 Staniford Street, Boston, MA, 02114, USA.
| |
Collapse
|
6
|
Uejima T, Mancinelli E, Niebur E, Etienne-Cummings R. The influence of stereopsis on visual saliency in a proto-object based model of selective attention. Vision Res 2023; 212:108304. [PMID: 37542763 PMCID: PMC10592191 DOI: 10.1016/j.visres.2023.108304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/18/2023] [Accepted: 07/18/2023] [Indexed: 08/07/2023]
Abstract
Some animals including humans use stereoscopic vision which reconstructs spatial information about the environment from the disparity between images captured by eyes in two separate adjacent locations. Like other sensory information, such stereoscopic information is expected to influence attentional selection. We develop a biologically plausible model of binocular vision to study its effect on bottom-up visual attention, i.e., visual saliency. In our model, the scene is organized in terms of proto-objects on which attention acts, rather than on unbound sets of elementary features. We show that taking into account the stereoscopic information improves the performance of the model in the prediction of human eye movements with statistically significant differences.
Collapse
Affiliation(s)
- Takeshi Uejima
- The Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA.
| | - Elena Mancinelli
- The Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Ernst Niebur
- The Solomon Snyder Department of Neuroscience and the Zanvyl Krieger Mind/Brain Institute, The Johns Hopkins University, Baltimore, MD, USA
| | - Ralph Etienne-Cummings
- The Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
7
|
Roth N, Rolfs M, Hellwich O, Obermayer K. Objects guide human gaze behavior in dynamic real-world scenes. PLoS Comput Biol 2023; 19:e1011512. [PMID: 37883331 PMCID: PMC10602265 DOI: 10.1371/journal.pcbi.1011512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 09/12/2023] [Indexed: 10/28/2023] Open
Abstract
The complexity of natural scenes makes it challenging to experimentally study the mechanisms behind human gaze behavior when viewing dynamic environments. Historically, eye movements were believed to be driven primarily by space-based attention towards locations with salient features. Increasing evidence suggests, however, that visual attention does not select locations with high saliency but operates on attentional units given by the objects in the scene. We present a new computational framework to investigate the importance of objects for attentional guidance. This framework is designed to simulate realistic scanpaths for dynamic real-world scenes, including saccade timing and smooth pursuit behavior. Individual model components are based on psychophysically uncovered mechanisms of visual attention and saccadic decision-making. All mechanisms are implemented in a modular fashion with a small number of well-interpretable parameters. To systematically analyze the importance of objects in guiding gaze behavior, we implemented five different models within this framework: two purely spatial models, where one is based on low-level saliency and one on high-level saliency, two object-based models, with one incorporating low-level saliency for each object and the other one not using any saliency information, and a mixed model with object-based attention and selection but space-based inhibition of return. We optimized each model's parameters to reproduce the saccade amplitude and fixation duration distributions of human scanpaths using evolutionary algorithms. We compared model performance with respect to spatial and temporal fixation behavior, including the proportion of fixations exploring the background, as well as detecting, inspecting, and returning to objects. A model with object-based attention and inhibition, which uses saliency information to prioritize between objects for saccadic selection, leads to scanpath statistics with the highest similarity to the human data. This demonstrates that scanpath models benefit from object-based attention and selection, suggesting that object-level attentional units play an important role in guiding attentional processing.
Collapse
Affiliation(s)
- Nicolas Roth
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Germany
| | - Martin Rolfs
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Department of Psychology, Humboldt-Universität zu Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| | - Olaf Hellwich
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Institute of Computer Engineering and Microelectronics, Technische Universität Berlin, Germany
| | - Klaus Obermayer
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| |
Collapse
|
8
|
Peacock CE, Hall EH, Henderson JM. Objects are selected for attention based upon meaning during passive scene viewing. Psychon Bull Rev 2023; 30:1874-1886. [PMID: 37095319 PMCID: PMC11164276 DOI: 10.3758/s13423-023-02286-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/26/2023] [Indexed: 04/26/2023]
Abstract
While object meaning has been demonstrated to guide attention during active scene viewing and object salience guides attention during passive viewing, it is unknown whether object meaning predicts attention in passive viewing tasks and whether attention during passive viewing is more strongly related to meaning or salience. To answer this question, we used a mixed modeling approach where we computed the average meaning and physical salience of objects in scenes while statistically controlling for the roles of object size and eccentricity. Using eye-movement data from aesthetic judgment and memorization tasks, we then tested whether fixations are more likely to land on high-meaning objects than low-meaning objects while controlling for object salience, size, and eccentricity. The results demonstrated that fixations are more likely to be directed to high meaning objects than low meaning objects regardless of these other factors. Further analyses revealed that fixation durations were positively associated with object meaning irrespective of the other object properties. Overall, these findings provide the first evidence that objects are, in part, selected by meaning for attentional selection during passive scene viewing.
Collapse
Affiliation(s)
- Candace E Peacock
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA.
- Department of Psychology, University of California, Davis, CA, USA.
| | - Elizabeth H Hall
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
- Department of Psychology, University of California, Davis, CA, USA
| | - John M Henderson
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
- Department of Psychology, University of California, Davis, CA, USA
| |
Collapse
|
9
|
Linka M, Sensoy Ö, Karimpur H, Schwarzer G, de Haas B. Free viewing biases for complex scenes in preschoolers and adults. Sci Rep 2023; 13:11803. [PMID: 37479760 PMCID: PMC10362043 DOI: 10.1038/s41598-023-38854-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/16/2023] [Indexed: 07/23/2023] Open
Abstract
Adult gaze behaviour towards naturalistic scenes is highly biased towards semantic object classes. Little is known about the ontological development of these biases, nor about group-level differences in gaze behaviour between adults and preschoolers. Here, we let preschoolers (n = 34, age 5 years) and adults (n = 42, age 18-59 years) freely view 40 complex scenes containing objects with different semantic attributes to compare their fixation behaviour. Results show that preschool children allocate a significantly smaller proportion of dwell time and first fixations on Text and instead fixate Faces, Touched objects, Hands and Bodies more. A predictive model of object fixations controlling for a range of potential confounds suggests that most of these differences can be explained by drastically reduced text salience in pre-schoolers and that this effect is independent of low-level salience. These findings are in line with a developmental attentional antagonism between text and body parts (touched objects and hands in particular), which resonates with recent findings regarding 'cortical recycling'. We discuss this and other potential mechanisms driving salience differences between children and adults.
Collapse
Affiliation(s)
- Marcel Linka
- Department of Experimental Psychology, Justus Liebig University Giessen, 35394, Giessen, Germany.
| | - Özlem Sensoy
- Department of Developmental Psychology, Justus Liebig University Giessen, 35394, Giessen, Germany
| | - Harun Karimpur
- Department of Experimental Psychology, Justus Liebig University Giessen, 35394, Giessen, Germany
| | - Gudrun Schwarzer
- Department of Developmental Psychology, Justus Liebig University Giessen, 35394, Giessen, Germany
| | - Benjamin de Haas
- Department of Experimental Psychology, Justus Liebig University Giessen, 35394, Giessen, Germany
| |
Collapse
|
10
|
Peacock CE, Singh P, Hayes TR, Rehrig G, Henderson JM. Searching for meaning: Local scene semantics guide attention during natural visual search in scenes. Q J Exp Psychol (Hove) 2023; 76:632-648. [PMID: 35510885 PMCID: PMC11132926 DOI: 10.1177/17470218221101334] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Models of visual search in scenes include image salience as a source of attentional guidance. However, because scene meaning is correlated with image salience, it could be that the salience predictor in these models is driven by meaning. To test this proposal, we generated meaning maps that represented the spatial distribution of semantic informativeness in scenes, and salience maps which represented the spatial distribution of conspicuous image features and tested their influence on fixation densities from two object search tasks in real-world scenes. The results showed that meaning accounted for significantly greater variance in fixation densities than image salience, both overall and in early attention across both studies. Here, meaning explained 58% and 63% of the theoretical ceiling of variance in attention across both studies, respectively. Furthermore, both studies demonstrated that fast initial saccades were not more likely to be directed to higher salience regions than slower initial saccades, and initial saccades of all latencies were directed to regions containing higher meaning than salience. Together, these results demonstrated that even though meaning was task-neutral, the visual system still selected meaningful over salient scene regions for attention during search.
Collapse
Affiliation(s)
- Candace E Peacock
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - Praveena Singh
- Center for Neuroscience, University of California, Davis, Davis, CA, USA
| | - Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Gwendolyn Rehrig
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| |
Collapse
|
11
|
Gaze estimation in videoconferencing settings. COMPUTERS IN HUMAN BEHAVIOR 2023. [DOI: 10.1016/j.chb.2022.107517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
12
|
Walter K, Bex P. Low-level factors increase gaze-guidance under cognitive load: A comparison of image-salience and semantic-salience models. PLoS One 2022; 17:e0277691. [PMID: 36441789 PMCID: PMC9704686 DOI: 10.1371/journal.pone.0277691] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 11/01/2022] [Indexed: 11/29/2022] Open
Abstract
Growing evidence links eye movements and cognitive functioning, however there is debate concerning what image content is fixated in natural scenes. Competing approaches have argued that low-level/feedforward and high-level/feedback factors contribute to gaze-guidance. We used one low-level model (Graph Based Visual Salience, GBVS) and a novel language-based high-level model (Global Vectors for Word Representation, GloVe) to predict gaze locations in a natural image search task, and we examined how fixated locations during this task vary under increasing levels of cognitive load. Participants (N = 30) freely viewed a series of 100 natural scenes for 10 seconds each. Between scenes, subjects identified a target object from the scene a specified number of trials (N) back among three distracter objects of the same type but from alternate scenes. The N-back was adaptive: N-back increased following two correct trials and decreased following one incorrect trial. Receiver operating characteristic (ROC) analysis of gaze locations showed that as cognitive load increased, there was a significant increase in prediction power for GBVS, but not for GloVe. Similarly, there was no significant difference in the area under the ROC between the minimum and maximum N-back achieved across subjects for GloVe (t(29) = -1.062, p = 0.297), while there was a cohesive upwards trend for GBVS (t(29) = -1.975, p = .058), although not significant. A permutation analysis showed that gaze locations were correlated with GBVS indicating that salient features were more likely to be fixated. However, gaze locations were anti-correlated with GloVe, indicating that objects with low semantic consistency with the scene were more likely to be fixated. These results suggest that fixations are drawn towards salient low-level image features and this bias increases with cognitive load. Additionally, there is a bias towards fixating improbable objects that does not vary under increasing levels of cognitive load.
Collapse
Affiliation(s)
- Kerri Walter
- Psychology Department, Northeastern University, Boston, MA, United States of America
| | - Peter Bex
- Psychology Department, Northeastern University, Boston, MA, United States of America
| |
Collapse
|
13
|
Thrun MC. Identification of Explainable Structures in Data with a Human-in-the-Loop. KUNSTLICHE INTELLIGENZ 2022. [DOI: 10.1007/s13218-022-00782-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
AbstractExplainable AIs (XAIs) often do not provide relevant or understandable explanations for a domain-specific human-in-the-loop (HIL). In addition, internally used metrics have biases that might not match existing structures in the data. The habilitation thesis presents an alternative solution approach by deriving explanations from high dimensional structures in the data rather than from predetermined classifications. Typically, the detection of such density- or distance-based structures in data has so far entailed the challenges of choosing appropriate algorithms and their parameters, which adds a considerable amount of complex decision-making options for the HIL. Central steps of the solution approach are a parameter-free methodology for the estimation and visualization of probability density functions (PDFs); followed by a hypothesis for selecting an appropriate distance metric independent of the data context in combination with projection-based clustering (PBC). PBC allows for subsequent interactive identification of separable structures in the data. Hence, the HIL does not need deep knowledge of the underlying algorithms to identify structures in data. The complete data-driven XAI approach involving the HIL is based on a decision tree guided by distance-based structures in data (DSD). This data-driven XAI shows initial success in the application to multivariate time series and non-sequential high-dimensional data. It generates meaningful and relevant explanations that are evaluated by Grice’s maxims.
Collapse
|
14
|
|
15
|
Chakraborty S, Samaras D, Zelinsky GJ. Weighting the factors affecting attention guidance during free viewing and visual search: The unexpected role of object recognition uncertainty. J Vis 2022; 22:13. [PMID: 35323870 PMCID: PMC8963662 DOI: 10.1167/jov.22.4.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 02/18/2022] [Indexed: 11/24/2022] Open
Abstract
The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition. We hypothesize that the greater the number of object categories competing for an object proposal, the greater the uncertainty of how that object should be recognized and, hence, the greater the need for attention to resolve this uncertainty. As expected, we found that target features best predicted target-present search, with their dominance obscuring the use of other features. Unexpectedly, we found that target features were only weakly used during target-absent search. We also found that object recognition uncertainty outperformed an unsupervised saliency model in predicting free-viewing fixations, although saliency was slightly more predictive of search. We conclude that uncertainty in object recognition, a measure that is image computable and highly interpretable, is better than bottom-up saliency in predicting attention during free viewing.
Collapse
Affiliation(s)
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gregory J Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, NY, USA
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
16
|
Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app12010309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction is widely used in computer vision and interdisciplinary subjects. In recent years, with the rapid development of deep learning, deep models have made amazing achievements in saliency prediction. Deep learning models can automatically learn features, thus solving many drawbacks of the classic models, such as handcrafted features and task settings, among others. Nevertheless, the deep models still have some limitations, for example in tasks involving multi-modality and semantic understanding. This study focuses on summarizing the relevant achievements in the field of saliency prediction, including the early neurological and psychological mechanisms and the guiding role of classic models, followed by the development process and data comparison of classic and deep saliency prediction models. This study also discusses the relationship between the model and human vision, as well as the factors that cause the semantic gaps, the influences of attention in cognitive research, the limitations of the saliency model, and the emerging applications, to provide new saliency predictions for follow-up work and the necessary help and advice.
Collapse
|
17
|
Thrun MC, Pape F, Ultsch A. Conventional displays of structures in data compared with interactive projection-based clustering (IPBC). INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2021. [DOI: 10.1007/s41060-021-00264-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractClustering is an important task in knowledge discovery with the goal to identify structures of similar data points in a dataset. Here, the focus lies on methods that use a human-in-the-loop, i.e., incorporate user decisions into the clustering process through 2D and 3D displays of the structures in the data. Some of these interactive approaches fall into the category of visual analytics and emphasize the power of such displays to identify the structures interactively in various types of datasets or to verify the results of clustering algorithms. This work presents a new method called interactive projection-based clustering (IPBC). IPBC is an open-source and parameter-free method using a human-in-the-loop for an interactive 2.5D display and identification of structures in data based on the user’s choice of a dimensionality reduction method. The IPBC approach is systematically compared with accessible visual analytics methods for the display and identification of cluster structures using twelve clustering benchmark datasets and one additional natural dataset. Qualitative comparison of 2D, 2.5D and 3D displays of structures and empirical evaluation of the identified cluster structures show that IPBC outperforms comparable methods. Additionally, IPBC assists in identifying structures previously unknown to domain experts in an application.
Collapse
|
18
|
Pomaranski KI, Hayes TR, Kwon MK, Henderson JM, Oakes LM. Developmental changes in natural scene viewing in infancy. Dev Psychol 2021; 57:1025-1041. [PMID: 34435820 PMCID: PMC8406411 DOI: 10.1037/dev0001020] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We extend decades of research on infants' visual processing by examining their eye gaze during viewing of natural scenes. We examined the eye movements of a racially diverse group of 4- to 12-month-old infants (N = 54; 27 boys; 24 infants were White and not Hispanic, 30 infants were African American, Asian American, mixed race and/or Hispanic) as they viewed images selected from the MIT Saliency Benchmark Project. In general, across this age range infants' fixation distributions became more consistent and more adult-like, suggesting that infants' fixations in natural scenes become increasingly more systematic. Evaluation of infants' fixation patterns with saliency maps generated by different models of physical salience revealed that although over this age range there was an increase in the correlations between infants' fixations and saliency, the amount of variance accounted for by salience actually decreased. At the youngest age, the amount of variance accounted for by salience was very similar to the consistency between infants' fixations, suggesting that the systematicity in these youngest infants' fixations was explained by their attention to physically salient regions. By 12 months, in contrast, the consistency between infants was greater than the variance accounted for by salience, suggesting that the systematicity in older infants' fixations reflected more than their attention to physically salient regions. Together these results show that infants' fixations when viewing natural scenes becomes more systematic and predictable, and that predictability is due to their attention to features other than physical salience. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
19
|
Walter K, Bex P. Cognitive load influences oculomotor behavior in natural scenes. Sci Rep 2021; 11:12405. [PMID: 34117336 PMCID: PMC8196072 DOI: 10.1038/s41598-021-91845-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/27/2021] [Indexed: 11/09/2022] Open
Abstract
Cognitive neuroscience researchers have identified relationships between cognitive load and eye movement behavior that are consistent with oculomotor biomarkers for neurological disorders. We develop an adaptive visual search paradigm that manipulates task difficulty and examine the effect of cognitive load on oculomotor behavior in healthy young adults. Participants (N = 30) free-viewed a sequence of 100 natural scenes for 10 s each, while their eye movements were recorded. After each image, participants completed a 4 alternative forced choice task in which they selected a target object from one of the previously viewed scenes, among 3 distracters of the same object type but from alternate scenes. Following two correct responses, the target object was selected from an image increasingly farther back (N-back) in the image stream; following an incorrect response, N decreased by 1. N-back thus quantifies and individualizes cognitive load. The results show that response latencies increased as N-back increased, and pupil diameter increased with N-back, before decreasing at very high N-back. These findings are consistent with previous studies and confirm that this paradigm was successful in actively engaging working memory, and successfully adapts task difficulty to individual subject's skill levels. We hypothesized that oculomotor behavior would covary with cognitive load. We found that as cognitive load increased, there was a significant decrease in the number of fixations and saccades. Furthermore, the total duration of saccades decreased with the number of events, while the total duration of fixations remained constant, suggesting that as cognitive load increased, subjects made fewer, longer fixations. These results suggest that cognitive load can be tracked with an adaptive visual search task, and that oculomotor strategies are affected as a result of greater cognitive demand in healthy adults.
Collapse
Affiliation(s)
- Kerri Walter
- Psychology Department, Northeastern University, Boston, 02115, USA.
| | - Peter Bex
- Psychology Department, Northeastern University, Boston, 02115, USA
| |
Collapse
|
20
|
Drewes J, Feder S, Einhäuser W. Gaze During Locomotion in Virtual Reality and the Real World. Front Neurosci 2021; 15:656913. [PMID: 34108857 PMCID: PMC8180583 DOI: 10.3389/fnins.2021.656913] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 04/27/2021] [Indexed: 11/20/2022] Open
Abstract
How vision guides gaze in realistic settings has been researched for decades. Human gaze behavior is typically measured in laboratory settings that are well controlled but feature-reduced and movement-constrained, in sharp contrast to real-life gaze control that combines eye, head, and body movements. Previous real-world research has shown environmental factors such as terrain difficulty to affect gaze; however, real-world settings are difficult to control or replicate. Virtual reality (VR) offers the experimental control of a laboratory, yet approximates freedom and visual complexity of the real world (RW). We measured gaze data in 8 healthy young adults during walking in the RW and simulated locomotion in VR. Participants walked along a pre-defined path inside an office building, which included different terrains such as long corridors and flights of stairs. In VR, participants followed the same path in a detailed virtual reconstruction of the building. We devised a novel hybrid control strategy for movement in VR: participants did not actually translate: forward movements were controlled by a hand-held device, rotational movements were executed physically and transferred to the VR. We found significant effects of terrain type (flat corridor, staircase up, and staircase down) on gaze direction, on the spatial spread of gaze direction, and on the angular distribution of gaze-direction changes. The factor world (RW and VR) affected the angular distribution of gaze-direction changes, saccade frequency, and head-centered vertical gaze direction. The latter effect vanished when referencing gaze to a world-fixed coordinate system, and was likely due to specifics of headset placement, which cannot confound any other analyzed measure. Importantly, we did not observe a significant interaction between the factors world and terrain for any of the tested measures. This indicates that differences between terrain types are not modulated by the world. The overall dwell time on navigational markers did not differ between worlds. The similar dependence of gaze behavior on terrain in the RW and in VR indicates that our VR captures real-world constraints remarkably well. High-fidelity VR combined with naturalistic movement control therefore has the potential to narrow the gap between the experimental control of a lab and ecologically valid settings.
Collapse
Affiliation(s)
- Jan Drewes
- Institute of Brain and Psychological Sciences, Sichuan Normal University, Chengdu, China
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Sascha Feder
- Cognitive Systems Lab, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| |
Collapse
|
21
|
Nuthmann A, Clayden AC, Fisher RB. The effect of target salience and size in visual search within naturalistic scenes under degraded vision. J Vis 2021; 21:2. [PMID: 33792616 PMCID: PMC8024777 DOI: 10.1167/jov.21.4.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We address two questions concerning eye guidance during visual search in naturalistic scenes. First, search has been described as a task in which visual salience is unimportant. Here, we revisit this question by using a letter-in-scene search task that minimizes any confounding effects that may arise from scene guidance. Second, we investigate how important the different regions of the visual field are for different subprocesses of search (target localization, verification). In Experiment 1, we manipulated both the salience (low vs. high) and the size (small vs. large) of the target letter (a "T"), and we implemented a foveal scotoma (radius: 1°) in half of the trials. In Experiment 2, observers searched for high- and low-salience targets either with full vision or with a central or peripheral scotoma (radius: 2.5°). In both experiments, we found main effects of salience with better performance for high-salience targets. In Experiment 1, search was faster for large than for small targets, and high-salience helped more for small targets. When searching with a foveal scotoma, performance was relatively unimpaired regardless of the target's salience and size. In Experiment 2, both visual-field manipulations led to search time costs, but the peripheral scotoma was much more detrimental than the central scotoma. Peripheral vision proved to be important for target localization, and central vision for target verification. Salience affected eye movement guidance to the target in both central and peripheral vision. Collectively, the results lend support for search models that incorporate salience for predicting eye-movement behavior.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, University of Kiel, Germany.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK., http://orcid.org/0000-0003-3338-3434
| | - Adam C Clayden
- School of Engineering, Arts, Science and Technology, University of Suffolk, UK.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK.,
| | | |
Collapse
|
22
|
The interplay between gaze and consistency in scene viewing: Evidence from visual search by young and older adults. Atten Percept Psychophys 2021; 83:1954-1970. [PMID: 33748905 PMCID: PMC8213592 DOI: 10.3758/s13414-021-02242-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2021] [Indexed: 11/08/2022]
Abstract
Searching for an object in a complex scene is influenced by high-level factors such as how much the item would be expected in that setting (semantic consistency). There is also evidence that a person gazing at an object directs our attention towards it. However, there has been little previous research that has helped to understand how we integrate top-down cues such as semantic consistency and gaze to direct attention when searching for an object. Also, there are separate lines of evidence to suggest that older adults may be more influenced by semantic factors and less by gaze cues compared to younger counterparts, but this has not been investigated before in an integrated task. In the current study we analysed eye-movements of 34 younger and 30 older adults as they searched for a target object in complex visual scenes. Younger adults were influenced by semantic consistency in their attention to objects, but were more influenced by gaze cues. In contrast, older adults were more guided by semantic consistency in directing their attention, and showed less influence from gaze cues. These age differences in use of high-level cues were apparent early in processing (time to first fixation and probability of immediate fixation) but not in later processing (total time looking at objects and time to make a response). Overall, this pattern of findings indicates that people are influenced by both social cues and prior expectations when processing a complex scene, and the relative importance of these factors depends on age.
Collapse
|
23
|
Borji A. Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:679-700. [PMID: 31425064 DOI: 10.1109/tpami.2019.2935715] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss the remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models.
Collapse
|
24
|
Võ MLH. The meaning and structure of scenes. Vision Res 2021; 181:10-20. [PMID: 33429218 DOI: 10.1016/j.visres.2020.11.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 10/31/2020] [Accepted: 11/03/2020] [Indexed: 01/09/2023]
Abstract
We live in a rich, three dimensional world with complex arrangements of meaningful objects. For decades, however, theories of visual attention and perception have been based on findings generated from lines and color patches. While these theories have been indispensable for our field, the time has come to move on from this rather impoverished view of the world and (at least try to) get closer to the real thing. After all, our visual environment consists of objects that we not only look at, but constantly interact with. Having incorporated the meaning and structure of scenes, i.e. its "grammar", then allows us to easily understand objects and scenes we have never encountered before. Studying this grammar provides us with the fascinating opportunity to gain new insights into the complex workings of attention, perception, and cognition. In this review, I will discuss how the meaning and the complex, yet predictive structure of real-world scenes influence attention allocation, search, and object identification.
Collapse
Affiliation(s)
- Melissa Le-Hoa Võ
- Department of Psychology, Johann Wolfgang-Goethe-Universität, Frankfurt, Germany. https://www.scenegrammarlab.com/
| |
Collapse
|
25
|
Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults. Sci Rep 2020; 10:22057. [PMID: 33328485 PMCID: PMC7745017 DOI: 10.1038/s41598-020-78203-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 11/18/2020] [Indexed: 11/21/2022] Open
Abstract
Whether fixation selection in real-world scenes is guided by image salience or by objects has been a matter of scientific debate. To contrast the two views, we compared effects of location-based and object-based visual salience in young and older (65 + years) adults. Generalized linear mixed models were used to assess the unique contribution of salience to fixation selection in scenes. When analysing fixation guidance without recurrence to objects, visual salience predicted whether image patches were fixated or not. This effect was reduced for the elderly, replicating an earlier finding. When using objects as the unit of analysis, we found that highly salient objects were more frequently selected for fixation than objects with low visual salience. Interestingly, this effect was larger for older adults. We also analysed where viewers fixate within objects, once they are selected. A preferred viewing location close to the centre of the object was found for both age groups. The results support the view that objects are important units of saccadic selection. Reconciling the salience view with the object view, we suggest that visual salience contributes to prioritization among objects. Moreover, the data point towards an increasing relevance of object-bound information with increasing age.
Collapse
|
26
|
Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations. Cognition 2020; 206:104465. [PMID: 33096374 DOI: 10.1016/j.cognition.2020.104465] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/04/2020] [Accepted: 09/08/2020] [Indexed: 11/24/2022]
Abstract
Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II - a deep neural network trained to predict fixations based on high-level features rather than meaning - outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.
Collapse
|
27
|
Uejima T, Niebur E, Etienne-Cummings R. Proto-Object Based Saliency Model With Texture Detection Channel. Front Comput Neurosci 2020; 14:541581. [PMID: 33071766 PMCID: PMC7541834 DOI: 10.3389/fncom.2020.541581] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 08/14/2020] [Indexed: 11/13/2022] Open
Abstract
The amount of visual information projected from the retina to the brain exceeds the information processing capacity of the latter. Attention, therefore, functions as a filter to highlight important information at multiple stages of the visual pathway that requires further and more detailed analysis. Among other functions, this determines where to fixate since only the fovea allows for high resolution imaging. Visual saliency modeling, i.e. understanding how the brain selects important information to analyze further and to determine where to fixate next, is an important research topic in computational neuroscience and computer vision. Most existing bottom-up saliency models use low-level features such as intensity and color, while some models employ high-level features, like faces. However, little consideration has been given to mid-level features, such as texture, for visual saliency models. In this paper, we extend a biologically plausible proto-object based saliency model by adding simple texture channels which employ nonlinear operations that mimic the processing performed by primate visual cortex. The extended model shows statistically significant improved performance in predicting human fixations compared to the previous model. Comparing the performance of our model with others on publicly available benchmarking datasets, we find that our biologically plausible model matches the performance of other models, even though those were designed entirely for maximal performance with little regard to biological realism.
Collapse
Affiliation(s)
- Takeshi Uejima
- The Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, United States
| | - Ernst Niebur
- The Solomon Snyder Department of Neuroscience and the Zanvyl Krieger Mind/Brain Institute, The Johns Hopkins University, Baltimore, MD, United States
| | - Ralph Etienne-Cummings
- The Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
28
|
Coco MI, Nuthmann A, Dimigen O. Fixation-related Brain Potentials during Semantic Integration of Object–Scene Information. J Cogn Neurosci 2020; 32:571-589. [DOI: 10.1162/jocn_a_01504] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Abstract
In vision science, a particularly controversial topic is whether and how quickly the semantic information about objects is available outside foveal vision. Here, we aimed at contributing to this debate by coregistering eye movements and EEG while participants viewed photographs of indoor scenes that contained a semantically consistent or inconsistent target object. Linear deconvolution modeling was used to analyze the ERPs evoked by scene onset as well as the fixation-related potentials (FRPs) elicited by the fixation on the target object (t) and by the preceding fixation (t − 1). Object–scene consistency did not influence the probability of immediate target fixation or the ERP evoked by scene onset, which suggests that object–scene semantics was not accessed immediately. However, during the subsequent scene exploration, inconsistent objects were prioritized over consistent objects in extrafoveal vision (i.e., looked at earlier) and were more effortful to process in foveal vision (i.e., looked at longer). In FRPs, we demonstrate a fixation-related N300/N400 effect, whereby inconsistent objects elicit a larger frontocentral negativity than consistent objects. In line with the behavioral findings, this effect was already seen in FRPs aligned to the pretarget fixation t − 1 and persisted throughout fixation t, indicating that the extraction of object semantics can already begin in extrafoveal vision. Taken together, the results emphasize the usefulness of combined EEG/eye movement recordings for understanding the mechanisms of object–scene integration during natural viewing.
Collapse
Affiliation(s)
- Moreno I. Coco
- The University of East London
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa
| | | | | |
Collapse
|
29
|
Functional Imaging of Visuospatial Attention in Complex and Naturalistic Conditions. Curr Top Behav Neurosci 2020. [PMID: 30547430 DOI: 10.1007/7854_2018_73] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
One of the ultimate goals of cognitive neuroscience is to understand how the brain works in the real world. Functional imaging with naturalistic stimuli provides us with the opportunity to study the brain in situations similar to the everyday life. This includes the processing of complex stimuli that can trigger many types of signals related both to the physical characteristics of the external input and to the internal knowledge that we have about natural objects and environments. In this chapter, I will first outline different types of stimuli that have been used in naturalistic imaging studies. These include static pictures, short video clips, full-length movies, and virtual reality, each comprising specific advantages and disadvantages. Next, I will turn to the main issue of visual-spatial orienting in naturalistic conditions and its neural substrates. I will discuss different classes of internal signals, related to objects, scene structure, and long-term memory. All of these, together with external signals about stimulus salience, have been found to modulate the activity and the connectivity of the frontoparietal attention networks. I will conclude by pointing out some promising future directions for functional imaging with naturalistic stimuli. Despite this field of research is still in its early days, I consider that it will play a major role in bridging the gap between standard laboratory paradigms and mechanisms of brain functioning in the real world.
Collapse
|
30
|
Schütt HH, Rothkegel LOM, Trukenbrod HA, Engbert R, Wichmann FA. Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time. J Vis 2019; 19:1. [PMID: 30821809 DOI: 10.1167/19.3.1] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Bottom-up and top-down as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analyzing their influence over time. For this purpose, we develop a saliency model that is based on the internal representation of a recent early spatial vision model to measure the low-level, bottom-up factor. To measure the influence of high-level, bottom-up features, we use a recent deep neural network-based saliency model. To account for top-down influences, we evaluate the models on two large data sets with different tasks: first, a memorization task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: the first saccade, an initial guided exploration characterized by a gradual broadening of the fixation density, and a steady state that is reached after roughly 10 fixations. Saccade-target selection during the initial exploration and in the steady state is related to similar areas of interest, which are better predicted when including high-level features. In the search data set, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties, and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level, bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later, this high-level, bottom-up control can be overruled by top-down influences.
Collapse
Affiliation(s)
- Heiko H Schütt
- Neural Information Processing Group, Universität Tübingen, Tübingen, Germany.,Experimental and Biological Psychology, University of Potsdam, Potsdam, Germany
| | - Lars O M Rothkegel
- Experimental and Biological Psychology, University of Potsdam, Potsdam, Germany
| | - Hans A Trukenbrod
- Experimental and Biological Psychology, University of Potsdam, Potsdam, Germany
| | - Ralf Engbert
- Experimental and Biological Psychology and Research Focus Cognitive Sciences, University of Potsdam, Potsdam, Germany
| | - Felix A Wichmann
- Neural Information Processing Group, Universität Tübingen, Tübingen, Germany
| |
Collapse
|
31
|
Renswoude DR, Visser I, Raijmakers MEJ, Tsang T, Johnson SP. Real‐world scene perception in infants: What factors guide attention allocation? INFANCY 2019; 24:693-717. [DOI: 10.1111/infa.12308] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 04/10/2019] [Accepted: 06/03/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Daan R. Renswoude
- Department of Psychology University of Amsterdam Amsterdam The Netherlands
- Research Priority Area YIELD Amsterdam The Netherlands
| | - Ingmar Visser
- Department of Psychology University of Amsterdam Amsterdam The Netherlands
- Research Priority Area YIELD Amsterdam The Netherlands
- Amsterdam Brian and Cognition Center Amsterdam The Netherlands
| | - Maartje E. J. Raijmakers
- Department of Psychology University of Amsterdam Amsterdam The Netherlands
- Research Priority Area YIELD Amsterdam The Netherlands
- Amsterdam Brian and Cognition Center Amsterdam The Netherlands
- Educational Studies & Learn! Free University Amsterdam The Netherlands
| | - Tawny Tsang
- Department of Psychology University of California Los Angeles California
| | - Scott P. Johnson
- Department of Psychology University of California Los Angeles California
| |
Collapse
|
32
|
Williams CC, Castelhano MS. The Changing Landscape: High-Level Influences on Eye Movement Guidance in Scenes. Vision (Basel) 2019; 3:E33. [PMID: 31735834 PMCID: PMC6802790 DOI: 10.3390/vision3030033] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 06/20/2019] [Accepted: 06/24/2019] [Indexed: 11/16/2022] Open
Abstract
The use of eye movements to explore scene processing has exploded over the last decade. Eye movements provide distinct advantages when examining scene processing because they are both fast and spatially measurable. By using eye movements, researchers have investigated many questions about scene processing. Our review will focus on research performed in the last decade examining: (1) attention and eye movements; (2) where you look; (3) influence of task; (4) memory and scene representations; and (5) dynamic scenes and eye movements. Although typically addressed as separate issues, we argue that these distinctions are now holding back research progress. Instead, it is time to examine the intersections of these seemingly separate influences and examine the intersectionality of how these influences interact to more completely understand what eye movements can tell us about scene processing.
Collapse
Affiliation(s)
- Carrick C. Williams
- Department of Psychology, California State University San Marcos, San Marcos, CA 92069, USA
| | | |
Collapse
|
33
|
de Haas B, Iakovidis AL, Schwarzkopf DS, Gegenfurtner KR. Individual differences in visual salience vary along semantic dimensions. Proc Natl Acad Sci U S A 2019; 116:11687-11692. [PMID: 31138705 PMCID: PMC6576124 DOI: 10.1073/pnas.1820553116] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
What determines where we look? Theories of attentional guidance hold that image features and task demands govern fixation behavior, while differences between observers are interpreted as a "noise-ceiling" that strictly limits predictability of fixations. However, recent twin studies suggest a genetic basis of gaze-trace similarity for a given stimulus. This leads to the question of how individuals differ in their gaze behavior and what may explain these differences. Here, we investigated the fixations of >100 human adults freely viewing a large set of complex scenes containing thousands of semantically annotated objects. We found systematic individual differences in fixation frequencies along six semantic stimulus dimensions. These differences were large (>twofold) and highly stable across images and time. Surprisingly, they also held for first fixations directed toward each image, commonly interpreted as "bottom-up" visual salience. Their perceptual relevance was documented by a correlation between individual face salience and face recognition skills. The set of reliable individual salience dimensions and their covariance pattern replicated across samples from three different countries, suggesting they reflect fundamental biological mechanisms of attention. Our findings show stable individual differences in salience along a set of fundamental semantic dimensions and that these differences have meaningful perceptual implications. Visual salience reflects features of the observer as well as the image.
Collapse
Affiliation(s)
- Benjamin de Haas
- Department of Psychology, Justus Liebig Universität, 35394 Giessen, Germany;
- Experimental Psychology, University College London, WC1H 0AP London, United Kingdom
| | - Alexios L Iakovidis
- Experimental Psychology, University College London, WC1H 0AP London, United Kingdom
| | - D Samuel Schwarzkopf
- Experimental Psychology, University College London, WC1H 0AP London, United Kingdom
- School of Optometry & Vision Science, University of Auckland, 1142 Auckland, New Zealand
| | | |
Collapse
|
34
|
Nuthmann A, de Groot F, Huettig F, Olivers CNL. Extrafoveal attentional capture by object semantics. PLoS One 2019; 14:e0217051. [PMID: 31120948 PMCID: PMC6532879 DOI: 10.1371/journal.pone.0217051] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 05/03/2019] [Indexed: 11/19/2022] Open
Abstract
There is ongoing debate on whether object meaning can be processed outside foveal vision, making semantics available for attentional guidance. Much of the debate has centred on whether objects that do not fit within an overall scene draw attention, in complex displays that are often difficult to control. Here, we revisited the question by reanalysing data from three experiments that used displays consisting of standalone objects from a carefully controlled stimulus set. Observers searched for a target object, as per auditory instruction. On the critical trials, the displays contained no target but objects that were semantically related to the target, visually related, or unrelated. Analyses using (generalized) linear mixed-effects models showed that, although visually related objects attracted most attention, semantically related objects were also fixated earlier in time than unrelated objects. Moreover, semantic matches affected the very first saccade in the display. The amplitudes of saccades that first entered semantically related objects were larger than 5° on average, confirming that object semantics is available outside foveal vision. Finally, there was no semantic capture of attention for the same objects when observers did not actively look for the target, confirming that it was not stimulus-driven. We discuss the implications for existing models of visual cognition.
Collapse
Affiliation(s)
- Antje Nuthmann
- Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Institute of Psychology, University of Kiel, Kiel, Germany
| | - Floor de Groot
- Department of Experimental and Applied Psychology & Institute for Brain and Behaviour, Vrije Universiteit, Amsterdam, The Netherlands
| | - Falk Huettig
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Centre for Language Studies, Radboud University, Nijmegen, The Netherlands
| | - Christian N. L. Olivers
- Department of Experimental and Applied Psychology & Institute for Brain and Behaviour, Vrije Universiteit, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
35
|
Nardo D, De Luca M, Rotondaro F, Spanò B, Bozzali M, Doricchi F, Paolucci S, Macaluso E. Left hemispatial neglect and overt orienting in naturalistic conditions: Role of high-level and stimulus-driven signals. Cortex 2019; 113:329-346. [PMID: 30735844 DOI: 10.1016/j.cortex.2018.12.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 11/08/2018] [Accepted: 12/27/2018] [Indexed: 11/29/2022]
Abstract
Deficits of visuospatial orienting in brain-damaged patients affected by hemispatial neglect have been extensively investigated. Nonetheless, spontaneous spatial orienting in naturalistic conditions is still poorly understood. Here, we investigated the role played by top-down and stimulus-driven signals in overt spatial orienting of neglect patients during free-viewing of short videos portraying everyday life situations. In Experiment 1, we assessed orienting when meaningful visual events competed on the left and right side of space, and tested whether sensory salience on the two sides biased orienting. In Experiment 2, we examined whether the spatial alignment of visual and auditory signals modulates orienting. The results of Experiment 1 showed that in neglect patients severe deficits in contralesional orienting were restricted to viewing conditions with bilateral visual events competing for attentional capture. In contrast, orienting towards the contralesional side was largely spared when the videos contained a single event on the left side. In neglect patients the processing of stimulus-driven salience was relatively spared and helped orienting towards the left side when multiple events were present. Experiment 2 showed that sounds spatially aligned with visual events on the left side improved orienting towards the otherwise neglected hemispace. Anatomical scans indicated that neglect patients suffered grey and white matter damages primarily in the ventral frontoparietal cortex. This suggests that the improvement of contralesional orienting associated with visual salience and audiovisual spatial alignment may be due to processing in the relatively intact dorsal frontoparietal areas. Our data show that in naturalistic environments, the presence of multiple meaningful events is a major determinant of spatial orienting deficits in neglect patients, whereas the salience of visual signals and the spatial alignment between auditory and visual signals can counteract spatial orienting deficits. These results open new perspectives to develop novel rehabilitation strategies based on the use of naturalistic stimuli.
Collapse
Affiliation(s)
- Davide Nardo
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy; MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| | - Maria De Luca
- Neuropsychology Unit, Santa Lucia Foundation, Rome, Italy
| | - Francesca Rotondaro
- Neuropsychology Unit, Santa Lucia Foundation, Rome, Italy; Department of Psychology, Sapienza University, Rome, Italy
| | - Barbara Spanò
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy
| | - Marco Bozzali
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy; Department of Neuroscience, Brighton and Sussex Medical School, University of Sussex, East Sussex, UK
| | - Fabrizio Doricchi
- Neuropsychology Unit, Santa Lucia Foundation, Rome, Italy; Department of Psychology, Sapienza University, Rome, Italy
| | | | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy; ImpAct Team, Lyon Neuroscience Research Center, Lyon, France
| |
Collapse
|
36
|
Abstract
This chapter reviews literature on development of visual-spatial attention. A brief overview of brain mechanisms of visual perception is provided, followed by discussion of neural maturation in the prenatal period, infancy, and childhood. This is followed by sections on gaze control, eye movement systems, and orienting. The chapter concludes with consideration of development of space, objects, and scenes. Visual-spatial attention reflects an intricate set of motor, perceptual, and cognitive systems that work jointly and all develop in tandem.
Collapse
|
37
|
van Renswoude DR, van den Berg L, Raijmakers ME, Visser I. Infants’ center bias in free viewing of real-world scenes. Vision Res 2019; 154:44-53. [DOI: 10.1016/j.visres.2018.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 09/29/2018] [Accepted: 10/10/2018] [Indexed: 11/26/2022]
|
38
|
Nuthmann A, Einhäuser W, Schütz I. How Well Can Saliency Models Predict Fixation Selection in Scenes Beyond Central Bias? A New Approach to Model Evaluation Using Generalized Linear Mixed Models. Front Hum Neurosci 2017; 11:491. [PMID: 29163092 PMCID: PMC5671469 DOI: 10.3389/fnhum.2017.00491] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 09/26/2017] [Indexed: 11/21/2022] Open
Abstract
Since the turn of the millennium, a large number of computational models of visual salience have been put forward. How best to evaluate a given model's ability to predict where human observers fixate in images of real-world scenes remains an open research question. Assessing the role of spatial biases is a challenging issue; this is particularly true when we consider the tendency for high-salience items to appear in the image center, combined with a tendency to look straight ahead (“central bias”). This problem is further exacerbated in the context of model comparisons, because some—but not all—models implicitly or explicitly incorporate a center preference to improve performance. To address this and other issues, we propose to combine a-priori parcellation of scenes with generalized linear mixed models (GLMM), building upon previous work. With this method, we can explicitly model the central bias of fixation by including a central-bias predictor in the GLMM. A second predictor captures how well the saliency model predicts human fixations, above and beyond the central bias. By-subject and by-item random effects account for individual differences and differences across scene items, respectively. Moreover, we can directly assess whether a given saliency model performs significantly better than others. In this article, we describe the data processing steps required by our analysis approach. In addition, we demonstrate the GLMM analyses by evaluating the performance of different saliency models on a new eye-tracking corpus. To facilitate the application of our method, we make the open-source Python toolbox “GridFix” available.
Collapse
Affiliation(s)
- Antje Nuthmann
- Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, United Kingdom.,Perception and Cognition Group, Institute of Psychology, University of Kiel, Kiel, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Immo Schütz
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| |
Collapse
|
39
|
Schomaker J, Rau EM, Einhäuser W, Wittmann BC. Motivational Objects in Natural Scenes (MONS): A Database of >800 Objects. Front Psychol 2017; 8:1669. [PMID: 29033870 PMCID: PMC5626981 DOI: 10.3389/fpsyg.2017.01669] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 09/11/2017] [Indexed: 11/29/2022] Open
Abstract
In daily life, we are surrounded by objects with pre-existing motivational associations. However, these are rarely controlled for in experiments with natural stimuli. Research on natural stimuli would therefore benefit from stimuli with well-defined motivational properties; in turn, such stimuli also open new paths in research on motivation. Here we introduce a database of Motivational Objects in Natural Scenes (MONS). The database consists of 107 scenes. Each scene contains 2 to 7 objects placed at approximately equal distance from the scene center. Each scene was photographed creating 3 versions, with one object (“critical object”) being replaced to vary the overall motivational value of the scene (appetitive, aversive, and neutral), while maintaining high visual similarity between the three versions. Ratings on motivation, valence, arousal and recognizability were obtained using internet-based questionnaires. Since the main objective was to provide stimuli of well-defined motivational value, three motivation scales were used: (1) Desire to own the object; (2) Approach/Avoid; (3) Desire to interact with the object. Three sets of ratings were obtained in independent sets of observers: for all 805 objects presented on a neutral background, for 321 critical objects presented in their scene context, and for the entire scenes. On the basis of the motivational ratings, objects were subdivided into aversive, neutral, and appetitive categories. The MONS database will provide a standardized basis for future studies on motivational value under realistic conditions.
Collapse
Affiliation(s)
- Judith Schomaker
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Elias M Rau
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Wolfgang Einhäuser
- Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany.,Department of Neurophysics, Philipps University of Marburg, Marburg, Germany
| | - Bianca C Wittmann
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
40
|
Onuma T, Penwannakul Y, Fuchimoto J, Sakai N. The effect of order of dwells on the first dwell gaze bias for eventually chosen items. PLoS One 2017; 12:e0181641. [PMID: 28723947 PMCID: PMC5517065 DOI: 10.1371/journal.pone.0181641] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 07/05/2017] [Indexed: 11/19/2022] Open
Abstract
The relationship between choice and eye movement has gained marked interest. The gaze bias effect, i.e., the tendency to look longer at items that are eventually chosen, has been shown to occur in the first dwell (initial cohesion of fixations for an item). In the two-alternative forced-choice (2AFC) paradigm, participants would look at one of the items first (defined as first look; FL), and they would then move and look at another item (second look; SL). This study investigated how the order in which the chosen items were looked at modulates the first dwell gaze bias effect. Participants were asked to assert their preferences and perceptual 2AFC decisions about human faces (Experiment 1) and daily consumer products (Experiment 2), while their eye movements were recorded. The results showed that the first dwell gaze bias was found only when the eventually chosen item was looked at after another one; the chosen item was looked at for longer as compared to the not-chosen item in the SL, but not in the FL. These results indicate that participants actively allocate more time to looking at a subsequently chosen item only after they perceive both items in the SL. Therefore, the selective encoding seems to occur in the early comparison stage of visual decision making, and not in the initial encoding stage. These findings provide insight into the relationship between choice and eye movement.
Collapse
Affiliation(s)
- Takuya Onuma
- Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Sendai, Miyagi, Japan
- Division for Interdisciplinary Advanced Research and Education, Tohoku University, Sendai Miyagi, Japan
| | - Yuwadee Penwannakul
- Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Sendai, Miyagi, Japan
| | - Jun Fuchimoto
- Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Sendai, Miyagi, Japan
| | - Nobuyuki Sakai
- Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Sendai, Miyagi, Japan
- * E-mail:
| |
Collapse
|
41
|
Ito J, Yamane Y, Suzuki M, Maldonado P, Fujita I, Tamura H, Grün S. Switch from ambient to focal processing mode explains the dynamics of free viewing eye movements. Sci Rep 2017; 7:1082. [PMID: 28439075 PMCID: PMC5430715 DOI: 10.1038/s41598-017-01076-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 03/22/2017] [Indexed: 11/21/2022] Open
Abstract
Previous studies have reported that humans employ ambient and focal modes of visual exploration while they freely view natural scenes. These two modes have been characterized based on eye movement parameters such as saccade amplitude and fixation duration, but not by any visual features of the viewed scenes. Here we propose a new characterization of eye movements during free viewing based on how eyes are moved from and to objects in a visual scene. We applied this characterization to data obtained from freely-viewing macaque monkeys. We show that the analysis based on this characterization gives a direct indication of a behavioral shift from ambient to focal processing mode along the course of free viewing exploration. We further propose a stochastic model of saccade sequence generation incorporating a switch between the two processing modes, which quantitatively reproduces the behavioral features observed in the data.
Collapse
Affiliation(s)
- Junji Ito
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA BRAIN Institute I, Jülich Research Centre, Jülich, Germany.
| | - Yukako Yamane
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Center for Information and Neural Networks, Osaka University and National Institute of Information and Communications Technology, Osaka, Japan
| | - Mika Suzuki
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| | - Pedro Maldonado
- BNI, CENEM and Programa de Fisiología y Biofísica, ICBM, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Ichiro Fujita
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Center for Information and Neural Networks, Osaka University and National Institute of Information and Communications Technology, Osaka, Japan
| | - Hiroshi Tamura
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Center for Information and Neural Networks, Osaka University and National Institute of Information and Communications Technology, Osaka, Japan
| | - Sonja Grün
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA BRAIN Institute I, Jülich Research Centre, Jülich, Germany
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Theoretical Systems Neurobiology, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
42
|
Schomaker J, Walper D, Wittmann BC, Einhäuser W. Attention in natural scenes: Affective-motivational factors guide gaze independently of visual salience. Vision Res 2017; 133:161-175. [PMID: 28279712 DOI: 10.1016/j.visres.2017.02.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 02/13/2017] [Accepted: 02/22/2017] [Indexed: 11/30/2022]
Abstract
In addition to low-level stimulus characteristics and current goals, our previous experience with stimuli can also guide attentional deployment. It remains unclear, however, if such effects act independently or whether they interact in guiding attention. In the current study, we presented natural scenes including every-day objects that differed in affective-motivational impact. In the first free-viewing experiment, we presented visually-matched triads of scenes in which one critical object was replaced that varied mainly in terms of motivational value, but also in terms of valence and arousal, as confirmed by ratings by a large set of observers. Treating motivation as a categorical factor, we found that it affected gaze. A linear-effect model showed that arousal, valence, and motivation predicted fixations above and beyond visual characteristics, like object size, eccentricity, or visual salience. In a second experiment, we experimentally investigated whether the effects of emotion and motivation could be modulated by visual salience. In a medium-salience condition, we presented the same unmodified scenes as in the first experiment. In a high-salience condition, we retained the saturation of the critical object in the scene, and decreased the saturation of the background, and in a low-salience condition, we desaturated the critical object while retaining the original saturation of the background. We found that highly salient objects guided gaze, but still found additional additive effects of arousal, valence and motivation, confirming that higher-level factors can also guide attention, as measured by fixations towards objects in natural scenes.
Collapse
Affiliation(s)
- Judith Schomaker
- Justus Liebig University Giessen, Department of Psychology and Sports Science, Germany.
| | - Daniel Walper
- Chemnitz University of Technology, Institute of Physics, Physics of Cognition, Germany
| | - Bianca C Wittmann
- Justus Liebig University Giessen, Department of Psychology and Sports Science, Germany
| | - Wolfgang Einhäuser
- Chemnitz University of Technology, Institute of Physics, Physics of Cognition, Germany
| |
Collapse
|
43
|
Abstract
How do we find what we are looking for? Fundamental limits on visual processing mean that even when the desired target is in our field of view, we often need to search, because it is impossible to recognize everything at once. Searching involves directing attention to objects that might be the target. This deployment of attention is not random. It is guided to the most promising items and locations by five factors discussed here: Bottom-up salience, top-down feature guidance, scene structure and meaning, the previous history of search over time scales from msec to years, and the relative value of the targets and distractors. Modern theories of search need to specify how all five factors combine to shape search behavior. An understanding of the rules of guidance can be used to improve the accuracy and efficiency of socially-important search tasks, from security screening to medical image perception.
Collapse
|
44
|
Abstract
Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information-a phenomenon referred to as the 'cocktail party problem'. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by 'bottom-up' sensory-driven factors, as well as 'top-down' task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| |
Collapse
|
45
|
A cognitive architecture account of the visual local advantage phenomenon in autism spectrum disorders. Vision Res 2016; 126:278-290. [DOI: 10.1016/j.visres.2015.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 02/17/2015] [Accepted: 04/14/2015] [Indexed: 11/24/2022]
|
46
|
Nardo D, Console P, Reverberi C, Macaluso E. Competition between Visual Events Modulates the Influence of Salience during Free-Viewing of Naturalistic Videos. Front Hum Neurosci 2016; 10:320. [PMID: 27445760 PMCID: PMC4923118 DOI: 10.3389/fnhum.2016.00320] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 06/13/2016] [Indexed: 11/13/2022] Open
Abstract
In daily life the brain is exposed to a large amount of external signals that compete for processing resources. The attentional system can select relevant information based on many possible combinations of goal-directed and stimulus-driven control signals. Here, we investigate the behavioral and physiological effects of competition between distinctive visual events during free-viewing of naturalistic videos. Nineteen healthy subjects underwent functional magnetic resonance imaging (fMRI) while viewing short video-clips of everyday life situations, without any explicit goal-directed task. Each video contained either a single semantically-relevant event on the left or right side (Lat-trials), or multiple distinctive events in both hemifields (Multi-trials). For each video, we computed a salience index to quantify the lateralization bias due to stimulus-driven signals, and a gaze index (based on eye-tracking data) to quantify the efficacy of the stimuli in capturing attention to either side. Behaviorally, our results showed that stimulus-driven salience influenced spatial orienting only in presence of multiple competing events (Multi-trials). fMRI results showed that the processing of competing events engaged the ventral attention network, including the right temporoparietal junction (R TPJ) and the right inferior frontal cortex. Salience was found to modulate activity in the visual cortex, but only in the presence of competing events; while the orienting efficacy of Multi-trials affected activity in both the visual cortex and posterior parietal cortex (PPC). We conclude that in presence of multiple competing events, the ventral attention system detects semantically-relevant events, while regions of the dorsal system make use of saliency signals to select relevant locations and guide spatial orienting.
Collapse
Affiliation(s)
- Davide Nardo
- Neuroimaging Laboratory, Santa Lucia FoundationRome, Italy; Institute of Cognitive Neuroscience, University College LondonLondon, UK
| | - Paola Console
- Neuroimaging Laboratory, Santa Lucia Foundation Rome, Italy
| | - Carlo Reverberi
- Department of Psychology, University of Milano-BicoccaMilan, Italy; NeuroMi-Milan Center for Neuroscience, University of Milano-BicoccaMilan, Italy
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia FoundationRome, Italy; Impact Team, Lyon Neuroscience Research CenterLyon, France
| |
Collapse
|
47
|
Borji A, Tanner J. Reconciling Saliency and Object Center-Bias Hypotheses in Explaining Free-Viewing Fixations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1214-1226. [PMID: 26452292 DOI: 10.1109/tnnls.2015.2480683] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Predicting where people look in natural scenes has attracted a lot of interest in computer vision and computational neuroscience over the past two decades. Two seemingly contrasting categories of cues have been proposed to influence where people look: 1) low-level image saliency and 2) high-level semantic information. Our first contribution is to take a detailed look at these cues to confirm the hypothesis proposed by Henderson and Nuthmann and Henderson that observers tend to look at the center of objects. We analyzed fixation data for scene free-viewing over 17 observers on 60 object-annotated images with various types of objects. Images contained different types of scenes, such as natural scenes, line drawings, and 3-D rendered scenes. Our second contribution is to propose a simple combined model of low-level saliency and object center bias that outperforms each individual component significantly over our data, as well as on the Object and Semantic Images and Eye-tracking data set by Xu et al. The results reconcile saliency with object center-bias hypotheses and highlight that both types of cues are important in guiding fixations. Our work opens new directions to understand strategies that humans use in observing scenes and objects, and demonstrates the construction of combined models of low-level saliency and high-level object-based information.
Collapse
|
48
|
Hu B, Kane-Jackson R, Niebur E. A proto-object based saliency model in three-dimensional space. Vision Res 2016; 119:42-9. [PMID: 26739278 DOI: 10.1016/j.visres.2015.12.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 12/16/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Most models of visual saliency operate on two-dimensional images, using elementary image features such as intensity, color, or orientation. The human visual system, however, needs to function in complex three-dimensional environments, where depth information is often available and may be used to guide the bottom-up attentional selection process. In this report we extend a model of proto-object based saliency to include depth information and evaluate its performance on three separate three-dimensional eye tracking datasets. Our results show that the additional depth information provides a small, but statistically significant, improvement in the model's ability to predict perceptual saliency (eye fixations) in natural scenes. The computational mechanisms of our model have direct neural correlates, and our results provide further evidence that proto-objects help to establish perceptual organization of the scene.
Collapse
Affiliation(s)
- Brian Hu
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, United States; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States.
| | - Ralinkae Kane-Jackson
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, United States.
| | - Ernst Niebur
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD 21218, United States; Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, United States.
| |
Collapse
|
49
|
Nuthmann A, Einhäuser W. A new approach to modeling the influence of image features on fixation selection in scenes. Ann N Y Acad Sci 2015; 1339:82-96. [PMID: 25752239 PMCID: PMC4402003 DOI: 10.1111/nyas.12705] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contribution of various image features to fixation selection: luminance and luminance contrast (low-level features); edge density (a mid-level feature); visual clutter and image segmentation to approximate local object density in the scene (higher-level features). An additional predictor captured the central bias of fixation. The GLMM results revealed that edge density, clutter, and the number of homogenous segments in a patch can independently predict whether image patches are fixated or not. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other predictors. Since the parcellation of the scene and the selection of features can be tailored to the specific research question, our approach allows for assessing the interplay of various factors relevant for fixation selection in scenes in a powerful and flexible manner.
Collapse
Affiliation(s)
- Antje Nuthmann
- Psychology Department, School of Philosophy, Psychology and Language Sciences, University of EdinburghUnited Kingdom
| | | |
Collapse
|
50
|
Dowiasch S, Marx S, Einhäuser W, Bremmer F. Effects of aging on eye movements in the real world. Front Hum Neurosci 2015; 9:46. [PMID: 25713524 PMCID: PMC4322726 DOI: 10.3389/fnhum.2015.00046] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 01/17/2015] [Indexed: 11/17/2022] Open
Abstract
The effects of aging on eye movements are well studied in the laboratory. Increased saccade latencies or decreased smooth-pursuit gain are well established findings. The question remains whether these findings are influenced by the rather untypical environment of a laboratory; that is, whether or not they transfer to the real world. We measured 34 healthy participants between the age of 25 and 85 during two everyday tasks in the real world: (I) walking down a hallway with free gaze, (II) visual tracking of an earth-fixed object while walking straight-ahead. Eye movements were recorded with a mobile light-weight eye tracker, the EyeSeeCam (ESC). We find that age significantly influences saccade parameters. With increasing age, saccade frequency, amplitude, peak velocity, and mean velocity are reduced and the velocity/amplitude distribution as well as the velocity profile become less skewed. In contrast to laboratory results on smooth pursuit, we did not find a significant effect of age on tracking eye-movements in the real world. Taken together, age-related eye-movement changes as measured in the laboratory only partly resemble those in the real world. It is well-conceivable that in the real world additional sensory cues, such as head-movement or vestibular signals, may partially compensate for age-related effects, which, according to this view, would be specific to early motion processing. In any case, our results highlight the importance of validity for natural situations when studying the impact of aging on real-life performance.
Collapse
Affiliation(s)
- Stefan Dowiasch
- Department of Neurophysics, Philipps-University Marburg Marburg, Germany
| | - Svenja Marx
- Department of Neurophysics, Philipps-University Marburg Marburg, Germany
| | - Wolfgang Einhäuser
- Department of Neurophysics, Philipps-University Marburg Marburg, Germany
| | - Frank Bremmer
- Department of Neurophysics, Philipps-University Marburg Marburg, Germany
| |
Collapse
|