1
|
Xu ZJ, Lleras A, Gong ZG, Buetti S. Top-down instructions influence the attentional weight on color and shape dimensions during bidimensional search. Sci Rep 2024; 14:31376. [PMID: 39732851 DOI: 10.1038/s41598-024-82866-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 12/09/2024] [Indexed: 12/30/2024] Open
Abstract
Efficient searches are guided by target-distractor distinctiveness: the greater the distinctiveness, the faster the search. Previous research showed that when the target and distractors differ along both color and shape dimensions (i.e., bidimensional search), distinctiveness along individual dimensions combine collinearly to guide the search, following a city-block metric. This result was found when participants expected the target and distractors to differ along both dimensions. In the present study, we used an instruction manipulation to investigate how bidimensional search varies in response to different top-down instructions. Using unidimensional search performance observed in Experiment 1, we predicted bidimensional search performance under three conditions: when participants were instructed to attend to color (Experiment 2), shape (Experiment 3), or both (Experiment 4). Results showed that instructions influenced how distinctiveness along color and shape combine to guide attention: when instructed to search for a target color, participants allocated more attentional weight to the color dimension (and less weight to the shape dimension) compared to when instructed to search for a target shape. Our study presents a novel technique to quantify how top-down instructions change attentional weighting to different features during bidimensional visual searches.
Collapse
Affiliation(s)
- Zoe Jing Xu
- Psychology Department, University of Illinois at Urbana Champaign, Champaign, United States.
| | - Alejandro Lleras
- Psychology Department, University of Illinois at Urbana Champaign, Champaign, United States
| | - Zixu Gavin Gong
- Psychology Department, University of Illinois at Urbana Champaign, Champaign, United States
| | - Simona Buetti
- Psychology Department, University of Illinois at Urbana Champaign, Champaign, United States
| |
Collapse
|
2
|
Conroy C, Nanjappa R, McPeek RM. Inhibitory tagging both speeds and strengthens saccade target selection in the superior colliculus during visual search. J Neurophysiol 2024; 131:548-555. [PMID: 38292000 PMCID: PMC11305629 DOI: 10.1152/jn.00355.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 01/22/2024] [Accepted: 01/26/2024] [Indexed: 02/01/2024] Open
Abstract
It has been suggested that, during difficult visual search tasks involving time pressure and multiple saccades, inhibitory tagging helps to facilitate efficient saccade target selection by reducing responses to objects in the scene once they have been searched and rejected. The superior colliculus (SC) is a midbrain structure involved in target selection, and recent findings suggest an influence of inhibitory tagging on SC activity. Precisely how, and by how much, inhibitory tagging influences target selection by SC neurons, however, is unclear. The purpose of this study, therefore, was to characterize and quantify the influence of inhibitory tagging on target selection in the SC. Rhesus monkeys performed a visual search task involving time pressure and multiple saccades. Early in the fixation period between saccades in the context of this task, a subset of SC neurons reliably discriminated the stimulus selected as the next saccade goal, consistent with a role in target selection. Discrimination occurred earlier and was more robust, however, when unselected stimuli in the search array had been previously fixated on the same trial. This indicates that inhibitory tagging both speeds and strengthens saccade target selection in the SC during multisaccade search. The results provide constraints on models of target selection based on SC activity.NEW & NOTEWORTHY An important aspect of efficient behavior during difficult, time-limited visual search tasks is the efficient selection of sequential saccade targets. Inhibitory tagging, i.e., a reduction of neural activity associated with previously fixated objects, may help to facilitate such efficient selection by modulating the selection process in the superior colliculus (SC). In this study, we characterized and quantified this modulation and found that, indeed, inhibitory tagging both speeds and strengthens target selection in the SC.
Collapse
Affiliation(s)
- Christopher Conroy
- Department of Biological and Vision Sciences, State University of New York College of Optometry, New York City, New York, United States
| | - Rakesh Nanjappa
- Department of Biological and Vision Sciences, State University of New York College of Optometry, New York City, New York, United States
- Department of Optometry, School of Medical & Allied Sciences, G.D. Goenka University, Gurugram, India
| | - Robert M McPeek
- Department of Biological and Vision Sciences, State University of New York College of Optometry, New York City, New York, United States
| |
Collapse
|
3
|
Conroy C, Nanjappa R, McPeek RM. Inhibitory tagging both speeds and strengthens saccade target selection in the superior colliculus during visual search. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.20.558470. [PMID: 37781596 PMCID: PMC10541111 DOI: 10.1101/2023.09.20.558470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
It has been suggested that, during difficult visual search tasks involving time pressure and multiple saccades, inhibitory tagging helps to facilitate efficient saccade target selection by reducing responses to objects in the scene once they have been searched and rejected. The superior colliculus (SC) is a midbrain structure involved in target selection, and recent findings suggest an influence of inhibitory tagging on SC activity. Precisely how, and by how much, inhibitory tagging influences target selection by SC neurons, however, is unclear. The purpose of this study, therefore, was to characterize and quantify the influence of inhibitory tagging on target selection in the SC. Rhesus monkeys performed a visual search task involving time pressure and multiple saccades. Early in the fixation period between saccades, a subset of SC neurons reliably discriminated the stimulus selected as the next saccade goal, consistent with a role in target selection. Discrimination occurred earlier and was more robust, however, when unselected stimuli in the search array had been previously fixated on the same trial. This indicates that inhibitory tagging both speeds and strengthens saccade target selection in the SC during multisaccade search. The results provide constraints on models of target selection based on SC activity.
Collapse
Affiliation(s)
- Christopher Conroy
- Department of Biological and Vision Sciences, SUNY College of Optometry, New York, NY, 10036, USA
| | - Rakesh Nanjappa
- Department of Biological and Vision Sciences, SUNY College of Optometry, New York, NY, 10036, USA
- Department of Optometry, School of Medical & Allied Sciences, G.D. Goenka University, Gurugram, 122103, India
| | - Robert M McPeek
- Department of Biological and Vision Sciences, SUNY College of Optometry, New York, NY, 10036, USA
| |
Collapse
|
4
|
Kümmerer M, Bethge M. Predicting Visual Fixations. Annu Rev Vis Sci 2023; 9:269-291. [PMID: 37419107 DOI: 10.1146/annurev-vision-120822-072528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
As we navigate and behave in the world, we are constantly deciding, a few times per second, where to look next. The outcomes of these decisions in response to visual input are comparatively easy to measure as trajectories of eye movements, offering insight into many unconscious and conscious visual and cognitive processes. In this article, we review recent advances in predicting where we look. We focus on evaluating and comparing models: How can we consistently measure how well models predict eye movements, and how can we judge the contribution of different mechanisms? Probabilistic models facilitate a unified approach to fixation prediction that allows us to use explainable information explained to compare different models across different settings, such as static and video saliency, as well as scanpath prediction. We review how the large variety of saliency maps and scanpath models can be translated into this unifying framework, how much different factors contribute, and how we can select the most informative examples for model comparison. We conclude that the universal scale of information gain offers a powerful tool for the inspection of candidate mechanisms and experimental design that helps us understand the continual decision-making process that determines where we look.
Collapse
Affiliation(s)
| | - Matthias Bethge
- Tübingen AI Center, University of Tübingen, Tübingen, Germany; ,
| |
Collapse
|
5
|
Sawant Y, Kundu JN, Radhakrishnan VB, Sridharan D. A Midbrain Inspired Recurrent Neural Network Model for Robust Change Detection. J Neurosci 2022; 42:8262-8283. [PMID: 36123120 PMCID: PMC9653281 DOI: 10.1523/jneurosci.0164-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 07/26/2022] [Accepted: 07/30/2022] [Indexed: 11/21/2022] Open
Abstract
We present a biologically inspired recurrent neural network (RNN) that efficiently detects changes in natural images. The model features sparse, topographic connectivity (st-RNN), closely modeled on the circuit architecture of a "midbrain attention network." We deployed the st-RNN in a challenging change blindness task, in which changes must be detected in a discontinuous sequence of images. Compared with a conventional RNN, the st-RNN learned 9x faster and achieved state-of-the-art performance with 15x fewer connections. An analysis of low-dimensional dynamics revealed putative circuit mechanisms, including a critical role for a global inhibitory (GI) motif, for successful change detection. The model reproduced key experimental phenomena, including midbrain neurons' sensitivity to dynamic stimuli, neural signatures of stimulus competition, as well as hallmark behavioral effects of midbrain microstimulation. Finally, the model accurately predicted human gaze fixations in a change blindness experiment, surpassing state-of-the-art saliency-based methods. The st-RNN provides a novel deep learning model for linking neural computations underlying change detection with psychophysical mechanisms.SIGNIFICANCE STATEMENT For adaptive survival, our brains must be able to accurately and rapidly detect changing aspects of our visual world. We present a novel deep learning model, a sparse, topographic recurrent neural network (st-RNN), that mimics the neuroanatomy of an evolutionarily conserved "midbrain attention network." The st-RNN achieved robust change detection in challenging change blindness tasks, outperforming conventional RNN architectures. The model also reproduced hallmark experimental phenomena, both neural and behavioral, reported in seminal midbrain studies. Lastly, the st-RNN outperformed state-of-the-art models at predicting human gaze fixations in a laboratory change blindness experiment. Our deep learning model may provide important clues about key mechanisms by which the brain efficiently detects changes.
Collapse
Affiliation(s)
- Yash Sawant
- Centre for Neuroscience, Indian Institute of Science, Bangalore 560012, India
| | - Jogendra Nath Kundu
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India
| | | | - Devarajan Sridharan
- Centre for Neuroscience, Indian Institute of Science, Bangalore 560012, India
- Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India
| |
Collapse
|
6
|
Zemliak V, MacInnes WJ. The Spatial Leaky Competing Accumulator Model. FRONTIERS IN COMPUTER SCIENCE 2022. [DOI: 10.3389/fcomp.2022.866029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Leaky Competing Accumulator model (LCA) of Usher and McClelland is able to simulate the time course of perceptual decision making between an arbitrary number of stimuli. Reaction times, such as saccadic latencies, produce a typical distribution that is skewed toward longer latencies and accumulator models have shown excellent fit to these distributions. We propose a new implementation called the Spatial Leaky Competing Accumulator (SLCA), which can be used to predict the timing of subsequent fixation durations during a visual task. SLCA uses a pre-existing saliency map as input and represents accumulation neurons as a two-dimensional grid to generate predictions in visual space. The SLCA builds on several biologically motivated parameters: leakage, recurrent self-excitation, randomness and non-linearity, and we also test two implementations of lateral inhibition. A global lateral inhibition, as implemented in the original model of Usher and McClelland, is applied to all competing neurons, while a local implementation allows only inhibition of immediate neighbors. We trained and compared versions of the SLCA with both global and local lateral inhibition with use of a genetic algorithm, and compared their performance in simulating human fixation latency distribution in a foraging task. Although both implementations were able to produce a positively skewed latency distribution, only the local SLCA was able to match the human data distribution from the foraging task. Our model is discussed for its potential in models of salience and priority, and its benefits as compared to other models like the Leaky integrate and fire network.
Collapse
|
7
|
Kümmerer M, Bethge M, Wallis TSA. DeepGaze III: Modeling free-viewing human scanpaths with deep learning. J Vis 2022; 22:7. [PMID: 35472130 PMCID: PMC9055565 DOI: 10.1167/jov.22.5.7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we present DeepGaze III, a new model that predicts the spatial location of consecutive fixations in a free-viewing scanpath over static images. DeepGaze III is a deep learning–based model that combines image information with information about the previous fixation history to predict where a participant might fixate next. As a high-capacity and flexible model, DeepGaze III captures many relevant patterns in the human scanpath data, setting a new state of the art in the MIT300 dataset and thereby providing insight into how much information in scanpaths across observers exists in the first place. We use this insight to assess the importance of mechanisms implemented in simpler, interpretable models for fixation selection. Due to its architecture, DeepGaze III allows us to disentangle several factors that play an important role in fixation selection, such as the interplay of scene content and scanpath history. The modular nature of DeepGaze III allows us to conduct ablation studies, which show that scene content has a stronger effect on fixation selection than previous scanpath history in our main dataset. In addition, we can use the model to identify scenes for which the relative importance of these sources of information differs most. These data-driven insights would be difficult to accomplish with simpler models that do not have the computational capacity to capture such patterns, demonstrating an example of how deep learning advances can be used to contribute to scientific understanding.
Collapse
Affiliation(s)
| | | | - Thomas S A Wallis
- Technical University of Darmstadt, Institute of Psychology and Centre for Cognitive Science, Darmstadt, Germany.,
| |
Collapse
|
8
|
Chakraborty S, Samaras D, Zelinsky GJ. Weighting the factors affecting attention guidance during free viewing and visual search: The unexpected role of object recognition uncertainty. J Vis 2022; 22:13. [PMID: 35323870 PMCID: PMC8963662 DOI: 10.1167/jov.22.4.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 02/18/2022] [Indexed: 11/24/2022] Open
Abstract
The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition. We hypothesize that the greater the number of object categories competing for an object proposal, the greater the uncertainty of how that object should be recognized and, hence, the greater the need for attention to resolve this uncertainty. As expected, we found that target features best predicted target-present search, with their dominance obscuring the use of other features. Unexpectedly, we found that target features were only weakly used during target-absent search. We also found that object recognition uncertainty outperformed an unsupervised saliency model in predicting free-viewing fixations, although saliency was slightly more predictive of search. We conclude that uncertainty in object recognition, a measure that is image computable and highly interpretable, is better than bottom-up saliency in predicting attention during free viewing.
Collapse
Affiliation(s)
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gregory J Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, NY, USA
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
9
|
Pedziwiatr MA, Kümmerer M, Wallis TSA, Bethge M, Teufel C. Semantic object-scene inconsistencies affect eye movements, but not in the way predicted by contextualized meaning maps. J Vis 2022; 22:9. [PMID: 35171232 PMCID: PMC8857618 DOI: 10.1167/jov.22.2.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Semantic information is important in eye movement control. An important semantic influence on gaze guidance relates to object-scene relationships: objects that are semantically inconsistent with the scene attract more fixations than consistent objects. One interpretation of this effect is that fixations are driven toward inconsistent objects because they are semantically more informative. We tested this explanation using contextualized meaning maps, a method that is based on crowd-sourced ratings to quantify the spatial distribution of context-sensitive “meaning” in images. In Experiment 1, we compared gaze data and contextualized meaning maps for images, in which objects-scene consistency was manipulated. Observers fixated more on inconsistent versus consistent objects. However, contextualized meaning maps did not assign higher meaning to image regions that contained semantic inconsistencies. In Experiment 2, a large number of raters evaluated image-regions, which were deliberately selected for their content and expected meaningfulness. The results suggest that the same scene locations were experienced as slightly less meaningful when they contained inconsistent compared to consistent objects. In summary, we demonstrated that — in the context of our rating task — semantically inconsistent objects are experienced as less meaningful than their consistent counterparts and that contextualized meaning maps do not capture prototypical influences of image meaning on gaze guidance.
Collapse
Affiliation(s)
- Marek A Pedziwiatr
- Cardiff University, Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff, UK.,Queen Mary University of London, Department of Biological and Experimental Psychology, London, UK.,
| | | | - Thomas S A Wallis
- Technical University of Darmstadt, Institute for Psychology and Centre for Cognitive Science, Darmstadt, Germany.,
| | | | - Christoph Teufel
- Cardiff University, Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff, UK.,
| |
Collapse
|
10
|
Avberšek LK, Zeman A, Op de Beeck H. Training for object recognition with increasing spatial frequency: A comparison of deep learning with human vision. J Vis 2021; 21:14. [PMID: 34533580 PMCID: PMC8458991 DOI: 10.1167/jov.21.10.14] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The ontogenetic development of human vision and the real-time neural processing of visual input exhibit a striking similarity—a sensitivity toward spatial frequencies that progresses in a coarse-to-fine manner. During early human development, sensitivity for higher spatial frequencies increases with age. In adulthood, when humans receive new visual input, low spatial frequencies are typically processed first before subsequent processing of higher spatial frequencies. We investigated to what extent this coarse-to-fine progression might impact visual representations in artificial vision and compared this to adult human representations. We simulated the coarse-to-fine progression of image processing in deep convolutional neural networks (CNNs) by gradually increasing spatial frequency information during training. We compared CNN performance after standard and coarse-to-fine training with a wide range of datasets from behavioral and neuroimaging experiments. In contrast to humans, CNNs that are trained using the standard protocol are very insensitive to low spatial frequency information, showing very poor performance in being able to classify such object images. By training CNNs using our coarse-to-fine method, we improved the classification accuracy of CNNs from 0% to 32% on low-pass-filtered images taken from the ImageNet dataset. The coarse-to-fine training also made the CNNs more sensitive to low spatial frequencies in hybrid images with conflicting information in different frequency bands. When comparing differently trained networks on images containing full spatial frequency information, we saw no representational differences. Overall, this integration of computational, neural, and behavioral findings shows the relevance of the exposure to and processing of inputs with variation in spatial frequency content for some aspects of high-level object representations.
Collapse
Affiliation(s)
- Lev Kiar Avberšek
- Department of Brain and Cognition, Leuven Brain Institute, Faculty of Psychology & Educational Sciences, KU Leuven, Leuven, Belgium.,Department of Psychology, Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia.,
| | - Astrid Zeman
- Department of Brain and Cognition, Leuven Brain Institute, Faculty of Psychology & Educational Sciences, KU Leuven, Leuven, Belgium.,
| | - Hans Op de Beeck
- Department of Brain and Cognition, Leuven Brain Institute, Faculty of Psychology & Educational Sciences, KU Leuven, Leuven, Belgium.,
| |
Collapse
|
11
|
Jagatap A, Purokayastha S, Jain H, Sridharan D. Neurally-constrained modeling of human gaze strategies in a change blindness task. PLoS Comput Biol 2021; 17:e1009322. [PMID: 34428201 PMCID: PMC8478260 DOI: 10.1371/journal.pcbi.1009322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 09/28/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022] Open
Abstract
Despite possessing the capacity for selective attention, we often fail to notice the obvious. We investigated participants’ (n = 39) failures to detect salient changes in a change blindness experiment. Surprisingly, change detection success varied by over two-fold across participants. These variations could not be readily explained by differences in scan paths or fixated visual features. Yet, two simple gaze metrics–mean duration of fixations and the variance of saccade amplitudes–systematically predicted change detection success. We explored the mechanistic underpinnings of these results with a neurally-constrained model based on the Bayesian framework of sequential probability ratio testing, with a posterior odds-ratio rule for shifting gaze. The model’s gaze strategies and success rates closely mimicked human data. Moreover, the model outperformed a state-of-the-art deep neural network (DeepGaze II) with predicting human gaze patterns in this change blindness task. Our mechanistic model reveals putative rational observer search strategies for change detection during change blindness, with critical real-world implications. Our brain has the remarkable capacity to pay attention, selectively, to important objects in the world around us. Yet, sometimes, we fail spectacularly to notice even the most salient events. We tested this phenomenon in the laboratory with a change-blindness experiment, by having participants freely scan and detect changes across discontinuous image pairs. Participants varied widely in their ability to detect these changes. Surprisingly, two low-level gaze metrics—fixation durations and saccade amplitudes—strongly predicted success in this task. We present a novel, computational model of eye movements, incorporating neural constraints on stimulus encoding, that links these gaze metrics with change detection success. Our model is relevant for a mechanistic understanding of human gaze strategies in dynamic visual environments.
Collapse
Affiliation(s)
- Akshay Jagatap
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India
| | | | - Hritik Jain
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India
| | - Devarajan Sridharan
- Centre for Neuroscience, Indian Institute of Science, Bangalore, India
- Computer Science and Automation, Indian Institute of Science, Bangalore, India
- * E-mail:
| |
Collapse
|
12
|
Nuthmann A, Clayden AC, Fisher RB. The effect of target salience and size in visual search within naturalistic scenes under degraded vision. J Vis 2021; 21:2. [PMID: 33792616 PMCID: PMC8024777 DOI: 10.1167/jov.21.4.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We address two questions concerning eye guidance during visual search in naturalistic scenes. First, search has been described as a task in which visual salience is unimportant. Here, we revisit this question by using a letter-in-scene search task that minimizes any confounding effects that may arise from scene guidance. Second, we investigate how important the different regions of the visual field are for different subprocesses of search (target localization, verification). In Experiment 1, we manipulated both the salience (low vs. high) and the size (small vs. large) of the target letter (a "T"), and we implemented a foveal scotoma (radius: 1°) in half of the trials. In Experiment 2, observers searched for high- and low-salience targets either with full vision or with a central or peripheral scotoma (radius: 2.5°). In both experiments, we found main effects of salience with better performance for high-salience targets. In Experiment 1, search was faster for large than for small targets, and high-salience helped more for small targets. When searching with a foveal scotoma, performance was relatively unimpaired regardless of the target's salience and size. In Experiment 2, both visual-field manipulations led to search time costs, but the peripheral scotoma was much more detrimental than the central scotoma. Peripheral vision proved to be important for target localization, and central vision for target verification. Salience affected eye movement guidance to the target in both central and peripheral vision. Collectively, the results lend support for search models that incorporate salience for predicting eye-movement behavior.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, University of Kiel, Germany.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK., http://orcid.org/0000-0003-3338-3434
| | - Adam C Clayden
- School of Engineering, Arts, Science and Technology, University of Suffolk, UK.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK.,
| | | |
Collapse
|
13
|
Abstract
Feature Integration Theory (FIT) set out the groundwork for much of the work in visual cognition since its publication. One of the most important legacies of this theory has been the emphasis on feature-specific processing. Nowadays, visual features are thought of as a sort of currency of visual attention (e.g., features can be attended, processing of attended features is enhanced), and attended features are thought to guide attention towards likely targets in a scene. Here we propose an alternative theory - the Target Contrast Signal Theory - based on the idea that when we search for a specific target, it is not the target-specific features that guide our attention towards the target; rather, what determines behavior is the result of an active comparison between the target template in mind and every element present in the scene. This comparison occurs in parallel and is aimed at rejecting from consideration items that peripheral vision can confidently reject as being non-targets. The speed at which each item is evaluated is determined by the overall contrast between that item and the target template. We present computational simulations to demonstrate the workings of the theory as well as eye-movement data that support core predictions of the theory. The theory is discussed in the context of FIT and other important theories of visual search.
Collapse
|
14
|
Changing perspectives on goal-directed attention control: The past, present, and future of modeling fixations during visual search. PSYCHOLOGY OF LEARNING AND MOTIVATION 2020. [DOI: 10.1016/bs.plm.2020.08.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
15
|
Krasovskaya S, MacInnes WJ. Salience Models: A Computational Cognitive Neuroscience Review. Vision (Basel) 2019; 3:E56. [PMID: 31735857 PMCID: PMC6969943 DOI: 10.3390/vision3040056] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 10/12/2019] [Accepted: 10/22/2019] [Indexed: 11/21/2022] Open
Abstract
The seminal model by Laurent Itti and Cristoph Koch demonstrated that we can compute the entire flow of visual processing from input to resulting fixations. Despite many replications and follow-ups, few have matched the impact of the original model-so what made this model so groundbreaking? We have selected five key contributions that distinguish the original salience model by Itti and Koch; namely, its contribution to our theoretical, neural, and computational understanding of visual processing, as well as the spatial and temporal predictions for fixation distributions. During the last 20 years, advances in the field have brought up various techniques and approaches to salience modelling, many of which tried to improve or add to the initial Itti and Koch model. One of the most recent trends has been to adopt the computational power of deep learning neural networks; however, this has also shifted their primary focus to spatial classification. We present a review of recent approaches to modelling salience, starting from direct variations of the Itti and Koch salience model to sophisticated deep-learning architectures, and discuss the models from the point of view of their contribution to computational cognitive neuroscience.
Collapse
Affiliation(s)
- Sofia Krasovskaya
- Vision Modelling Laboratory, Faculty of Social Science, National Research University Higher School of Economics, 101000 Moscow, Russia
- School of Psychology, National Research University Higher School of Economics, 101000 Moscow, Russia
| | - W. Joseph MacInnes
- Vision Modelling Laboratory, Faculty of Social Science, National Research University Higher School of Economics, 101000 Moscow, Russia
- School of Psychology, National Research University Higher School of Economics, 101000 Moscow, Russia
| |
Collapse
|
16
|
Võ MLH, Boettcher SEP, Draschkow D. Reading scenes: how scene grammar guides attention and aids perception in real-world environments. Curr Opin Psychol 2019; 29:205-210. [DOI: 10.1016/j.copsyc.2019.03.009] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 03/07/2019] [Accepted: 03/13/2019] [Indexed: 11/30/2022]
|
17
|
Lowe KA, Reppert TR, Schall JD. Selective Influence and Sequential Operations: A Research Strategy for Visual Search. VISUAL COGNITION 2019; 27:387-415. [PMID: 32982561 PMCID: PMC7518653 DOI: 10.1080/13506285.2019.1659896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 08/17/2019] [Indexed: 10/26/2022]
Abstract
We discuss the problem of elucidating mechanisms of visual search. We begin by considering the history, logic, and methods of relating behavioral or cognitive processes with neural processes. We then survey briefly the cognitive neurophysiology of visual search and essential aspects of the neural circuitry supporting this capacity. We introduce conceptually and empirically a powerful but underutilized experimental approach to dissect the cognitive processes supporting performance of a visual search task with factorial manipulations of singleton-distractor identifiability and stimulus-response cue discriminability. We show that systems factorial technology can distinguish processing architectures from the performance of macaque monkeys. This demonstration offers new opportunities to distinguish neural mechanisms through selective manipulation of visual encoding, search selection, rule encoding, and stimulus-response mapping.
Collapse
Affiliation(s)
- Kaleb A Lowe
- Department of Psychology, Vanderbilt University, Center for Integrative and Cognitive Neuroscience, Vanderbilt Vision Research Center
| | - Thomas R Reppert
- Department of Psychology, Vanderbilt University, Center for Integrative and Cognitive Neuroscience, Vanderbilt Vision Research Center
| | - Jeffrey D Schall
- Department of Psychology, Vanderbilt University, Center for Integrative and Cognitive Neuroscience, Vanderbilt Vision Research Center
| |
Collapse
|
18
|
Yu CP, Liu H, Samaras D, Zelinsky GJ. Modelling attention control using a convolutional neural network designed after the ventral visual pathway. VISUAL COGNITION 2019. [DOI: 10.1080/13506285.2019.1661927] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Chen-Ping Yu
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Huidong Liu
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Dimitrios Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gregory J. Zelinsky
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
- Department of Psychology, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
19
|
Alexander RG, Nahvi RJ, Zelinsky GJ. Specifying the precision of guiding features for visual search. J Exp Psychol Hum Percept Perform 2019; 45:1248-1264. [PMID: 31219282 PMCID: PMC6706321 DOI: 10.1037/xhp0000668] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Visual search is the task of finding things with uncertain locations. Despite decades of research, the features that guide visual search remain poorly specified, especially in realistic contexts. This study tested the role of two features-shape and orientation-both in the presence and absence of hue information. We conducted five experiments to describe preview-target mismatch effects, decreases in performance caused by differences between the image of the target as it appears in the preview and as it appears in the actual search display. These mismatch effects provide direct measures of feature importance, with larger performance decrements expected for more important features. Contrary to previous conclusions, our data suggest that shape and orientation only guide visual search when color is not available. By varying the probability of mismatch in each feature dimension, we also show that these patterns of feature guidance do not change with the probability that the previewed feature will be invalid. We conclude that the target representations used to guide visual search are much less precise than previously believed, with participants encoding and using color and little else. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
20
|
Albrengues C, Lavigne F, Aguilar C, Castet E, Vitu F. Linguistic processes do not beat visuo-motor constraints, but they modulate where the eyes move regardless of word boundaries: Evidence against top-down word-based eye-movement control during reading. PLoS One 2019; 14:e0219666. [PMID: 31329614 PMCID: PMC6645505 DOI: 10.1371/journal.pone.0219666] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 06/29/2019] [Indexed: 11/18/2022] Open
Abstract
Where readers move their eyes, while proceeding forward along lines of text, has long been assumed to be determined in a top-down word-based manner. According to this classical view, readers of alphabetic languages would invariably program their saccades towards the center of peripheral target words, as selected based on the (expected) needs of ongoing (word-identification) processing, and the variability in within-word landing positions would exclusively result from systematic and random errors. Here we put this predominant hypothesis to a strong test by estimating the respective influences of language-related variables (word frequency and word predictability) and lower-level visuo-motor factors (word length and saccadic launch-site distance to the beginning of words) on both word-skipping likelihood and within-word landing positions. Our eye-movement data were collected while forty participants read 316 pairs of sentences, that differed only by one word, the prime; this was either semantically related or unrelated to a following test word of variable frequency and length. We found that low-level visuo-motor variables largely predominated in determining which word would be fixated next, and where in a word the eye would land. In comparison, language-related variables only had tiny influences. Yet, linguistic variables affected both the likelihood of word skipping and within-word initial landing positions, all depending on the words’ length and how far on average the eye landed from the word boundaries, but pending the word could benefit from peripheral preview. These findings provide a strong case against the predominant word-based account of eye-movement guidance during reading, by showing that saccades are primarily driven by low-level visuo-motor processes, regardless of word boundaries, while being overall subject to subtle, one-off, language-based modulations. Our results also suggest that overall distributions of saccades’ landing positions, instead of truncated within-word landing-site distributions, should be used for a better understanding of eye-movement guidance during reading.
Collapse
Affiliation(s)
- Claire Albrengues
- Université Côte d’Azur, CNRS, BCL (Bases, Corpus, Langage), Nice, France
| | - Frédéric Lavigne
- Université Côte d’Azur, CNRS, BCL (Bases, Corpus, Langage), Nice, France
| | | | - Eric Castet
- Aix-Marseille Univ, CNRS, LPC (Laboratoire de Psychologie Cognitive), Fédération de Recherche 3C, Marseille, France
| | - Françoise Vitu
- Aix-Marseille Univ, CNRS, LPC (Laboratoire de Psychologie Cognitive), Fédération de Recherche 3C, Marseille, France
- * E-mail:
| |
Collapse
|
21
|
Abstract
After been exposed to the visual input, in the first year of life, the brain experiences subtle but massive changes apparently crucial for communicative/emotional and social human development. Its lack could be the explanation of the very high prevalence of autism in children with total congenital blindness. The present theory postulates that the superior colliculus is the key structure for such changes for several reasons: it dominates visual behavior during the first months of life; it is ready at birth for complex visual tasks; it has a significant influence on several hemispheric regions; it is the main brain hub that permanently integrates visual and non-visual, external and internal information (bottom-up and top-down respectively); and it owns the enigmatic ability to take non-conscious decisions about where to focus attention. It is also a sentinel that triggers the subcortical mechanisms which drive social motivation to follow faces from birth and to react automatically to emotional stimuli. Through indirect connections it also activates simultaneously several cortical structures necessary to develop social cognition and to accomplish the multiattentional task required for conscious social interaction in real life settings. Genetic or non-genetic prenatal or early postnatal factors could disrupt the SC functions resulting in autism. The timing of postnatal biological disruption matches the timing of clinical autism manifestations. Astonishing coincidences between etiologies, clinical manifestations, cognitive and pathogenic autism theories on one side and SC functions on the other are disclosed in this review. Although the visual system dependent of the SC is usually considered as accessory of the LGN canonical pathway, its imprinting gives the brain a qualitatively specific functions not supplied by any other brain structure.
Collapse
Affiliation(s)
- Rubin Jure
- Centro Privado de Neurología y Neuropsicología Infanto Juvenil WERNICKE, Córdoba, Argentina
| |
Collapse
|
22
|
Shiferaw B, Downey L, Crewther D. A review of gaze entropy as a measure of visual scanning efficiency. Neurosci Biobehav Rev 2019; 96:353-366. [DOI: 10.1016/j.neubiorev.2018.12.007] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 12/05/2018] [Accepted: 12/06/2018] [Indexed: 11/16/2022]
|
23
|
Berga D, Fdez-Vidal XR, Otazu X, Leborán V, Pardo XM. Psychophysical evaluation of individual low-level feature influences on visual attention. Vision Res 2018; 154:60-79. [PMID: 30408434 DOI: 10.1016/j.visres.2018.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Revised: 10/23/2018] [Accepted: 10/26/2018] [Indexed: 11/16/2022]
Abstract
In this study we provide the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of synthetically-generated image patterns. Design of visual stimuli was inspired by the ones used in previous psychophysical experiments, namely in free-viewing and visual searching tasks, to provide a total of 15 types of stimuli, divided according to the task and feature to be analyzed. Our interest is to analyze the influences of low-level feature contrast between a salient region and the rest of distractors, providing fixation localization characteristics and reaction time of landing inside the salient region. Eye-tracking data was collected from 34 participants during the viewing of a 230 images dataset. Results show that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. This experimentation proposes a new psychophysical basis for saliency model evaluation using synthetic images.
Collapse
Affiliation(s)
- David Berga
- Computer Vision Center, Universitat Autonoma de Barcelona, Spain; Computer Science Department, Universitat Autonoma de Barcelona, Spain.
| | - Xosé R Fdez-Vidal
- Centro de Investigacion en Tecnoloxias da Informacion, Universidade Santiago de Compostela, Spain
| | - Xavier Otazu
- Computer Vision Center, Universitat Autonoma de Barcelona, Spain; Computer Science Department, Universitat Autonoma de Barcelona, Spain
| | - Víctor Leborán
- Centro de Investigacion en Tecnoloxias da Informacion, Universidade Santiago de Compostela, Spain
| | - Xosé M Pardo
- Centro de Investigacion en Tecnoloxias da Informacion, Universidade Santiago de Compostela, Spain
| |
Collapse
|
24
|
Alexander RG, Zelinsky GJ. Occluded information is restored at preview but not during visual search. J Vis 2018; 18:4. [PMID: 30347091 PMCID: PMC6181188 DOI: 10.1167/18.11.4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 07/26/2018] [Indexed: 11/30/2022] Open
Abstract
Objects often appear with some amount of occlusion. We fill in missing information using local shape features even before attending to those objects-a process called amodal completion. Here we explore the possibility that knowledge about common realistic objects can be used to "restore" missing information even in cases where amodal completion is not expected. We systematically varied whether visual search targets were occluded or not, both at preview and in search displays. Button-press responses were longest when the preview was unoccluded and the target was occluded in the search display. This pattern is consistent with a target-verification process that uses the features visible at preview but does not restore missing information in the search display. However, visual search guidance was weakest whenever the target was occluded in the search display, regardless of whether it was occluded at preview. This pattern suggests that information missing during the preview was restored and used to guide search, thereby resulting in a feature mismatch and poor guidance. If this process were preattentive, as with amodal completion, we should have found roughly equivalent search guidance across all conditions because the target would always be unoccluded or restored, resulting in no mismatch. We conclude that realistic objects are restored behind occluders during search target preview, even in situations not prone to amodal completion, and this restoration does not occur preattentively during search.
Collapse
Affiliation(s)
| | - Gregory J Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, NY, USA
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
25
|
Saccadic inhibition interrupts ongoing oculomotor activity to enable the rapid deployment of alternate movement plans. Sci Rep 2018; 8:14163. [PMID: 30242249 PMCID: PMC6155112 DOI: 10.1038/s41598-018-32224-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 09/04/2018] [Indexed: 11/09/2022] Open
Abstract
Diverse psychophysical and neurophysiological results show that oculomotor networks are continuously active, such that plans for making the next eye movement are always ongoing. So, when new visual information arrives unexpectedly, how are those plans affected? At what point can the new information start guiding an eye movement, and how? Here, based on modeling and simulation results, we make two observations that are relevant to these questions. First, we note that many experiments, including those investigating the phenomenon known as "saccadic inhibition", are consistent with the idea that sudden-onset stimuli briefly interrupt the gradual rise in neural activity associated with the preparation of an impending saccade. And second, we show that this stimulus-driven interruption is functionally adaptive, but only if perception is fast. In that case, putting on hold an ongoing saccade plan toward location A allows the oculomotor system to initiate a concurrent, alternative plan toward location B (where a stimulus just appeared), deliberate (briefly) on the priority of each target, and determine which plan should continue. Based on physiological data, we estimate that the advantage of this strategy, relative to one in which any plan once initiated must be completed, is of several tens of milliseconds per saccade.
Collapse
|
26
|
Wollenberg L, Deubel H, Szinte M. Visual attention is not deployed at the endpoint of averaging saccades. PLoS Biol 2018; 16:e2006548. [PMID: 29939986 PMCID: PMC6034887 DOI: 10.1371/journal.pbio.2006548] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 07/06/2018] [Accepted: 06/06/2018] [Indexed: 12/02/2022] Open
Abstract
The premotor theory of attention postulates that spatial attention arises from the activation of saccade areas and that the deployment of attention is the consequence of motor programming. Yet attentional and oculomotor processes have been shown to be dissociable at the neuronal level in covert attention tasks. To investigate a potential dissociation at the behavioral level, we instructed human participants to move their eyes (saccade) towards 1 of 2 nearby, competing saccade targets. The spatial distribution of visual attention was determined using oriented visual stimuli presented either at the target locations, between them, or at several other equidistant locations. Results demonstrate that accurate saccades towards one of the targets were associated with presaccadic enhancement of visual sensitivity at the respective saccade endpoint compared to the nonsaccaded target location. In contrast, averaging saccades, landing between the 2 targets, were not associated with attentional facilitation at the saccade endpoint. Rather, attention before averaging saccades was equally deployed at the 2 target locations. Taken together, our results reveal that visual attention is not obligatorily coupled to the endpoint of a subsequent saccade. Rather, our results suggest that the oculomotor program depends on the state of attentional selection before saccade onset and that saccade averaging arises from unresolved attentional selection. The premotor theory of attention postulates that spatial visual attention is a consequence of the brain activity that controls eye movement. Indeed, attention and eye movement share overlapping brain networks, and attention is deployed at the target of an eye movement (saccade) even before the eyes start to move. But is attention always deployed at the endpoint of saccades? Here, we measured visual attention before accurate saccades and before saccades that landed in between 2 targets (averaging saccades). While accurate saccades were associated with a selective enhancement of visual sensitivity at their endpoint, no such enhancement was found at the endpoint of averaging saccades. Rather, visual sensitivity was evenly distributed across the 2 saccade targets, suggesting that saccade averaging arises from unresolved attentional selection. Overall, our results reveal that attention is not always coupled to the endpoint of saccades, arguing against a simplistic view of the premotor theory of attention at the behavioral level. Instead, we propose that saccadic responses depend on the state of attentional selection at saccade onset.
Collapse
Affiliation(s)
- Luca Wollenberg
- Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
- Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
- * E-mail:
| | - Heiner Deubel
- Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Martin Szinte
- Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
- Department of Cognitive Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
27
|
Kümmerer M, Wallis TSA, Bethge M. Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics. COMPUTER VISION – ECCV 2018 2018. [DOI: 10.1007/978-3-030-01270-0_47] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|