201
|
Ramkumar P, Fernandes H, Kording K, Segraves M. Modeling peripheral visual acuity enables discovery of gaze strategies at multiple time scales during natural scene search. J Vis 2015; 15:19. [PMID: 25814545 PMCID: PMC4374760 DOI: 10.1167/15.3.19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Accepted: 12/30/2014] [Indexed: 11/24/2022] Open
Abstract
Like humans, monkeys make saccades nearly three times a second. To understand the factors guiding this frequent decision, computational models of vision attempt to predict fixation locations using bottom-up visual features and top-down goals. How do the relative influences of these factors evolve over multiple time scales? Here we analyzed visual features at fixations using a retinal transform that provides realistic visual acuity by suitably degrading visual information in the periphery. In a task in which monkeys searched for a Gabor target in natural scenes, we characterized the relative importance of bottom-up and task-relevant influences by decoding fixated from nonfixated image patches based on visual features. At fast time scales, we found that search strategies can vary over the course of a single trial, with locations of higher saliency, target-similarity, edge–energy, and orientedness looked at later on in the trial. At slow time scales, we found that search strategies can be refined over several weeks of practice, and the influence of target orientation was significant only in the latter of two search tasks. Critically, these results were not observed without applying the retinal transform. Our results suggest that saccade-guidance strategies become apparent only when models take into account degraded visual representation in the periphery.
Collapse
Affiliation(s)
- Pavan Ramkumar
- Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL, USA
- Department of Neurobiology, Northwestern University, Evanston, IL, USA
| | - Hugo Fernandes
- Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL, USA
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Konrad Kording
- Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL, USA
| | - Mark Segraves
- Department of Neurobiology, Northwestern University, Evanston, IL, USA
| |
Collapse
|
202
|
What you see is what you expect: rapid scene understanding benefits from prior experience. Atten Percept Psychophys 2015; 77:1239-51. [DOI: 10.3758/s13414-015-0859-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
203
|
Detection of bird nests during mechanical weeding by incremental background modeling and visual saliency. SENSORS 2015; 15:5096-111. [PMID: 25738766 PMCID: PMC4435188 DOI: 10.3390/s150305096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Revised: 02/16/2015] [Accepted: 02/17/2015] [Indexed: 11/24/2022]
Abstract
Mechanical weeding is an important tool in organic farming. However, the use of mechanical weeding in conventional agriculture is increasing, due to public demands to lower the use of pesticides and an increased number of pesticide-resistant weeds. Ground nesting birds are highly susceptible to farming operations, like mechanical weeding, which may destroy the nests and reduce the survival of chicks and incubating females. This problem has limited focus within agricultural engineering. However, when the number of machines increases, destruction of nests will have an impact on various species. It is therefore necessary to explore and develop new technology in order to avoid these negative ethical consequences. This paper presents a vision-based approach to automated ground nest detection. The algorithm is based on the fusion of visual saliency, which mimics human attention, and incremental background modeling, which enables foreground detection with moving cameras. The algorithm achieves a good detection rate, as it detects 28 of 30 nests at an average distance of 3.8 m, with a true positive rate of 0.75.
Collapse
|
204
|
Nuthmann A, Einhäuser W. A new approach to modeling the influence of image features on fixation selection in scenes. Ann N Y Acad Sci 2015; 1339:82-96. [PMID: 25752239 PMCID: PMC4402003 DOI: 10.1111/nyas.12705] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contribution of various image features to fixation selection: luminance and luminance contrast (low-level features); edge density (a mid-level feature); visual clutter and image segmentation to approximate local object density in the scene (higher-level features). An additional predictor captured the central bias of fixation. The GLMM results revealed that edge density, clutter, and the number of homogenous segments in a patch can independently predict whether image patches are fixated or not. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other predictors. Since the parcellation of the scene and the selection of features can be tailored to the specific research question, our approach allows for assessing the interplay of various factors relevant for fixation selection in scenes in a powerful and flexible manner.
Collapse
Affiliation(s)
- Antje Nuthmann
- Psychology Department, School of Philosophy, Psychology and Language Sciences, University of EdinburghUnited Kingdom
| | | |
Collapse
|
205
|
|
206
|
Attentional Scene-Exploration and Object Discovery in Image and RGB-D Data. KUNSTLICHE INTELLIGENZ 2015. [DOI: 10.1007/s13218-014-0337-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
207
|
Simola J, Le Fevre K, Torniainen J, Baccino T. Affective processing in natural scene viewing: Valence and arousal interactions in eye-fixation-related potentials. Neuroimage 2015; 106:21-33. [DOI: 10.1016/j.neuroimage.2014.11.030] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 11/03/2014] [Accepted: 11/13/2014] [Indexed: 10/24/2022] Open
|
208
|
Overt attention in natural scenes: Objects dominate features. Vision Res 2015; 107:36-48. [DOI: 10.1016/j.visres.2014.11.006] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 11/04/2014] [Accepted: 11/10/2014] [Indexed: 11/18/2022]
|
209
|
|
210
|
Odic D, Halberda J. Eye movements reveal distinct encoding patterns for number and cumulative surface area in random dot arrays. J Vis 2015; 15:5. [PMID: 26575191 PMCID: PMC4654224 DOI: 10.1167/15.15.5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 10/02/2015] [Indexed: 11/24/2022] Open
Abstract
Humans can quickly and intuitively represent the number of objects in a scene using visual evidence through the Approximate Number System (ANS). But the computations that support the encoding of visual number-the transformation from the retinal input into ANS representations-remain controversial. Two types of number encoding theories have been proposed: those arguing that number is encoded through a dedicated, enumeration computation, and those arguing that visual number is inferred from nonnumber specific visual features, such as surface area, density, convex hull, etc. Here, we attempt to adjudicate between these two theories by testing participants on both a number and a cumulative area task while also tracking their eye-movements. We hypothesize that if approximate number and surface area depend on distinct encoding computations, saccadic signatures should be distinct for the two tasks, even if the visual stimuli are identical. Consistent with this hypothesis, we find that discriminating number versus cumulative area modulates both where participants look (i.e., participants spend more time looking at the more numerous set in the number task and the larger set in the cumulative area task), and how participants look (i.e., cumulative area encoding shows fewer, longer saccades, while number encoding shows many short saccades and many switches between targets). We further identify several saccadic signatures that are associated with task difficulty and correct versus incorrect trials for both dimensions. These results suggest distinct encoding algorithms for number and cumulative area extraction, and thereby distinct representations of these dimensions.
Collapse
|
211
|
Kubilius J, Wagemans J, Op de Beeck HP. A conceptual framework of computations in mid-level vision. Front Comput Neurosci 2014; 8:158. [PMID: 25566044 PMCID: PMC4264474 DOI: 10.3389/fncom.2014.00158] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 11/17/2014] [Indexed: 11/13/2022] Open
Abstract
If a picture is worth a thousand words, as an English idiom goes, what should those words-or, rather, descriptors-capture? What format of image representation would be sufficiently rich if we were to reconstruct the essence of images from their descriptors? In this paper, we set out to develop a conceptual framework that would be: (i) biologically plausible in order to provide a better mechanistic understanding of our visual system; (ii) sufficiently robust to apply in practice on realistic images; and (iii) able to tap into underlying structure of our visual world. We bring forward three key ideas. First, we argue that surface-based representations are constructed based on feature inference from the input in the intermediate processing layers of the visual system. Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features. The constructed surfaces may be partially overlapping to compensate for occlusions and are ordered in depth (figure-ground organization). Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands. Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.
Collapse
Affiliation(s)
- Jonas Kubilius
- Laboratory of Biological Psychology, Faculty of Psychology and Educational Sciences, KU LeuvenLeuven, Belgium
- Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU LeuvenLeuven, Belgium
| | - Johan Wagemans
- Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU LeuvenLeuven, Belgium
| | - Hans P. Op de Beeck
- Laboratory of Biological Psychology, Faculty of Psychology and Educational Sciences, KU LeuvenLeuven, Belgium
| |
Collapse
|
212
|
Gao F, Zhang Y, Wang J, Sun J, Yang E, Hussain A. Visual Attention Model Based Vehicle Target Detection in Synthetic Aperture Radar Images: A Novel Approach. Cognit Comput 2014. [DOI: 10.1007/s12559-014-9312-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
213
|
Fernandes HL, Stevenson IH, Phillips AN, Segraves MA, Kording KP. Saliency and saccade encoding in the frontal eye field during natural scene search. Cereb Cortex 2014; 24:3232-45. [PMID: 23863686 PMCID: PMC4240184 DOI: 10.1093/cercor/bht179] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The frontal eye field (FEF) plays a central role in saccade selection and execution. Using artificial stimuli, many studies have shown that the activity of neurons in the FEF is affected by both visually salient stimuli in a neuron's receptive field and upcoming saccades in a certain direction. However, the extent to which visual and motor information is represented in the FEF in the context of the cluttered natural scenes we encounter during everyday life has not been explored. Here, we model the activities of neurons in the FEF, recorded while monkeys were searching natural scenes, using both visual and saccade information. We compare the contribution of bottom-up visual saliency (based on low-level features such as brightness, orientation, and color) and saccade direction. We find that, while saliency is correlated with the activities of some neurons, this relationship is ultimately driven by activities related to movement. Although bottom-up visual saliency contributes to the choice of saccade targets, it does not appear that FEF neurons actively encode the kind of saliency posited by popular saliency map theories. Instead, our results emphasize the FEF's role in the stages of saccade planning directly related to movement generation.
Collapse
Affiliation(s)
- Hugo L. Fernandes
- Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL 60611, USA
- PDBC, Instituto Gulbenkian de Ciência, 2780 Oeiras, Portugal
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780 Oeiras, Portugal
| | - Ian H. Stevenson
- Redwood Center for Theoretical Neuroscience, University of California, Berkeley, CA 94720, USA
| | - Adam N. Phillips
- Tamagawa University, Brain Science Institute, Machida 194-8610, Japan
- Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | - Mark A. Segraves
- Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | - Konrad P. Kording
- Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL 60611, USA
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA
- Department of Physiology, Northwestern University, Chicago, IL 60611, USA
| |
Collapse
|
214
|
Caroux L, Le Bigot L, Vibert N. Impairment of shooting performance by background complexity and motion. Exp Psychol 2014; 62:98-109. [PMID: 25384639 DOI: 10.1027/1618-3169/a000277] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In many visual displays such as virtual environments, human tasks involve objects superimposed on both complex and moving backgrounds. However, most studies investigated the influence of background complexity or background motion in isolation. Two experiments were designed to investigate the joint influences of background complexity and lateral motion on a simple shooting task typical of video games. Participants had to perform the task on the moving and static versions of backgrounds of three levels of complexity, while their eye movements were recorded. The backgrounds displayed either an abstract (Experiment 1) or a naturalistic (Experiment 2) virtual environment. The results showed that performance was impaired by background motion in both experiments. The effects of motion and complexity were additive for the abstract background and multiplicative for the naturalistic background. Eye movement recordings showed that performance impairments reflected at least in part the impact of the background visual features on gaze control.
Collapse
Affiliation(s)
- Loïc Caroux
- Centre de Recherches sur la Cognition et l'Apprentissage (CeRCA), UMR 7295 - University of Poitiers, University of Tours, CNRS, France INRIA Bordeaux Sud-Ouest, Talence, France
| | - Ludovic Le Bigot
- Centre de Recherches sur la Cognition et l'Apprentissage (CeRCA), UMR 7295 - University of Poitiers, University of Tours, CNRS, France
| | - Nicolas Vibert
- Centre de Recherches sur la Cognition et l'Apprentissage (CeRCA), UMR 7295 - University of Poitiers, University of Tours, CNRS, France
| |
Collapse
|
215
|
Zhong M, Zhao X, Zou XC, Wang JZ, Wang W. Markov chain based computational visual attention model that learns from eye tracking data. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2014.06.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
216
|
|
217
|
Mohr J, Park JH, Obermayer K. A computer vision system for rapid search inspired by surface-based attention mechanisms from human perception. Neural Netw 2014; 60:182-93. [PMID: 25241349 DOI: 10.1016/j.neunet.2014.08.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 06/30/2014] [Accepted: 08/24/2014] [Indexed: 11/29/2022]
Abstract
Humans are highly efficient at visual search tasks by focusing selective attention on a small but relevant region of a visual scene. Recent results from biological vision suggest that surfaces of distinct physical objects form the basic units of this attentional process. The aim of this paper is to demonstrate how such surface-based attention mechanisms can speed up a computer vision system for visual search. The system uses fast perceptual grouping of depth cues to represent the visual world at the level of surfaces. This representation is stored in short-term memory and updated over time. A top-down guided attention mechanism sequentially selects one of the surfaces for detailed inspection by a recognition module. We show that the proposed attention framework requires little computational overhead (about 11 ms), but enables the system to operate in real-time and leads to a substantial increase in search efficiency.
Collapse
Affiliation(s)
- Johannes Mohr
- Department for Electrical Engineering and Computer Science, Technische Universität Berlin, Germany; MAR 5-6, Marchstr. 23, D-10587 Berlin, Germany.
| | - Jong-Han Park
- Department for Electrical Engineering and Computer Science, Technische Universität Berlin, Germany; MAR 5-6, Marchstr. 23, D-10587 Berlin, Germany
| | - Klaus Obermayer
- Department for Electrical Engineering and Computer Science, Technische Universität Berlin, Germany; MAR 5-6, Marchstr. 23, D-10587 Berlin, Germany
| |
Collapse
|
218
|
Han S, Vasconcelos N. Object recognition with hierarchical discriminant saliency networks. Front Comput Neurosci 2014; 8:109. [PMID: 25249971 PMCID: PMC4158795 DOI: 10.3389/fncom.2014.00109] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 08/22/2014] [Indexed: 12/22/2022] Open
Abstract
The benefits of integrating attention and object recognition are investigated. While attention is frequently modeled as a pre-processor for recognition, we investigate the hypothesis that attention is an intrinsic component of recognition and vice-versa. This hypothesis is tested with a recognition model, the hierarchical discriminant saliency network (HDSN), whose layers are top-down saliency detectors, tuned for a visual class according to the principles of discriminant saliency. As a model of neural computation, the HDSN has two possible implementations. In a biologically plausible implementation, all layers comply with the standard neurophysiological model of visual cortex, with sub-layers of simple and complex units that implement a combination of filtering, divisive normalization, pooling, and non-linearities. In a convolutional neural network implementation, all layers are convolutional and implement a combination of filtering, rectification, and pooling. The rectification is performed with a parametric extension of the now popular rectified linear units (ReLUs), whose parameters can be tuned for the detection of target object classes. This enables a number of functional enhancements over neural network models that lack a connection to saliency, including optimal feature denoising mechanisms for recognition, modulation of saliency responses by the discriminant power of the underlying features, and the ability to detect both feature presence and absence. In either implementation, each layer has a precise statistical interpretation, and all parameters are tuned by statistical learning. Each saliency detection layer learns more discriminant saliency templates than its predecessors and higher layers have larger pooling fields. This enables the HDSN to simultaneously achieve high selectivity to target object classes and invariance. The performance of the network in saliency and object recognition tasks is compared to those of models from the biological and computer vision literatures. This demonstrates benefits for all the functional enhancements of the HDSN, the class tuning inherent to discriminant saliency, and saliency layers based on templates of increasing target selectivity and invariance. Altogether, these experiments suggest that there are non-trivial benefits in integrating attention and recognition.
Collapse
Affiliation(s)
- Sunhyoung Han
- Analytics Department, ID Analytics San Diego, CA, USA
| | - Nuno Vasconcelos
- Statistical and Visual Computing Lab, Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
219
|
Marfil R, Palomino AJ, Bandera A. Combining segmentation and attention: a new foveal attention model. Front Comput Neurosci 2014; 8:96. [PMID: 25177289 PMCID: PMC4132578 DOI: 10.3389/fncom.2014.00096] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 07/24/2014] [Indexed: 11/13/2022] Open
Abstract
Artificial vision systems cannot process all the information that they receive from the world in real time because it is highly expensive and inefficient in terms of computational cost. Inspired by biological perception systems, artificial attention models pursuit to select only the relevant part of the scene. On human vision, it is also well established that these units of attention are not merely spatial but closely related to perceptual objects (proto-objects). This implies a strong bidirectional relationship between segmentation and attention processes. While the segmentation process is the responsible to extract the proto-objects from the scene, attention can guide segmentation, arising the concept of foveal attention. When the focus of attention is deployed from one visual unit to another, the rest of the scene is perceived but at a lower resolution that the focused object. The result is a multi-resolution visual perception in which the fovea, a dimple on the central retina, provides the highest resolution vision. In this paper, a bottom-up foveal attention model is presented. In this model the input image is a foveal image represented using a Cartesian Foveal Geometry (CFG), which encodes the field of view of the sensor as a fovea (placed in the focus of attention) surrounded by a set of concentric rings with decreasing resolution. Then multi-resolution perceptual segmentation is performed by building a foveal polygon using the Bounded Irregular Pyramid (BIP). Bottom-up attention is enclosed in the same structure, allowing to set the fovea over the most salient image proto-object. Saliency is computed as a linear combination of multiple low level features such as color and intensity contrast, symmetry, orientation and roundness. Obtained results from natural images show that the performance of the combination of hierarchical foveal segmentation and saliency estimation is good in terms of accuracy and speed.
Collapse
Affiliation(s)
- Rebeca Marfil
- ISIS Group, Department of Electronic Technology, University of Málaga Málaga, Spain
| | - Antonio J Palomino
- ISIS Group, Department of Electronic Technology, University of Málaga Málaga, Spain
| | - Antonio Bandera
- ISIS Group, Department of Electronic Technology, University of Málaga Málaga, Spain
| |
Collapse
|
220
|
Coco MI, Keller F. The interaction of visual and linguistic saliency during syntactic ambiguity resolution. Q J Exp Psychol (Hove) 2014; 68:46-74. [PMID: 25176109 DOI: 10.1080/17470218.2014.936475] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Psycholinguistic research using the visual world paradigm has shown that the processing of sentences is constrained by the visual context in which they occur. Recently, there has been growing interest in the interactions observed when both language and vision provide relevant information during sentence processing. In three visual world experiments on syntactic ambiguity resolution, we investigate how visual and linguistic information influence the interpretation of ambiguous sentences. We hypothesize that (1) visual and linguistic information both constrain which interpretation is pursued by the sentence processor, and (2) the two types of information act upon the interpretation of the sentence at different points during processing. In Experiment 1, we show that visual saliency is utilized to anticipate the upcoming arguments of a verb. In Experiment 2, we operationalize linguistic saliency using intonational breaks and demonstrate that these give prominence to linguistic referents. These results confirm prediction (1). In Experiment 3, we manipulate visual and linguistic saliency together and find that both types of information are used, but at different points in the sentence, to incrementally update its current interpretation. This finding is consistent with prediction (2). Overall, our results suggest an adaptive processing architecture in which different types of information are used when they become available, optimizing different aspects of situated language processing.
Collapse
Affiliation(s)
- Moreno I Coco
- a Institute for Language, Cognition and Computation, School of Informatics , University of Edinburgh , UK
| | | |
Collapse
|
221
|
Hu C, Wang Q, Fu G, Quinn PC, Lee K. Both children and adults scan faces of own and other races differently. Vision Res 2014; 102:1-10. [PMID: 24929225 PMCID: PMC4152410 DOI: 10.1016/j.visres.2014.05.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 05/27/2014] [Accepted: 05/29/2014] [Indexed: 11/16/2022]
Abstract
Extensive behavioral and neural evidence suggests that processing of own-race faces differs from that of other-race faces in both adults and infants. However, little research has examined whether and how children scan faces of own and other races differently for face recognition. In this eye-tracking study, Chinese children aged from 4 to 7 years and Chinese adults were asked to remember Chinese and Caucasian faces. None of the participants had any direct contact with foreign individuals. Multi-method analyses of eye-tracking data revealed that regardless of age group, proportional fixation duration on the eyes of Chinese faces was significantly lower than that on the eyes of Caucasian faces, whereas proportional fixation duration on the nose and mouth of Chinese faces was significantly higher than that on the nose and mouth of Caucasian faces. In addition, the amplitude of saccades on Chinese faces was significantly lower than that on Caucasian faces, potentially reflecting finer-grained processing for own-race faces. Moreover, adults' fixation duration/saccade numbers on the whole faces, proportional fixation percentage on the nose, proportional number of saccades between AOIs, and accuracy in recognizing faces were higher than those of children. These results together demonstrate that an abundance of visual experience with own-race faces and a lack of it with other-race faces may result in differential facial scanning in both children and adults. Furthermore, the increased experience of processing faces may result in a more holistic and advanced scanning strategy in Chinese adults.
Collapse
Affiliation(s)
- Chao Hu
- Department of Psychology, Zhejiang Normal University, Jinhua, China; Applied Psychology & Human Development Department, University of Toronto, Toronto, Canada
| | - Qiandong Wang
- Department of Psychology, Zhejiang Normal University, Jinhua, China
| | - Genyue Fu
- Department of Psychology, Zhejiang Normal University, Jinhua, China; Hangzhou Teachers College for Infant Children, Zhejiang Normal University, Hangzhou, Zhejiang, China.
| | - Paul C Quinn
- Department of Psychology, University of Delaware, Newark, DE, United States
| | - Kang Lee
- Department of Psychology, Zhejiang Normal University, Jinhua, China; Applied Psychology & Human Development Department, University of Toronto, Toronto, Canada.
| |
Collapse
|
222
|
Haji-Abolhassani A, Clark JJ. An inverse Yarbus process: predicting observers' task from eye movement patterns. Vision Res 2014; 103:127-42. [PMID: 25175112 DOI: 10.1016/j.visres.2014.08.014] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Revised: 07/30/2014] [Accepted: 08/21/2014] [Indexed: 10/24/2022]
Abstract
In this paper we develop a probabilistic method to infer the visual-task of a viewer given measured eye movement trajectories. This method is based on the theory of hidden Markov models (HMM) that employs a first order Markov process to predict the coordinates of fixations given the task. The prediction confidence level of each task-dependent model is used in a Bayesian inference formulation, whereby the task with the maximum a posteriori (MAP) probability is selected. We applied this technique to a challenging dataset consisting of eye movement trajectories obtained from subjects viewing monochrome images of real scenes tasked with answering questions regarding the scenes. The results show that the HMM approach, combined with a clustering technique, can be a reliable way to infer visual-task from eye movements data.
Collapse
Affiliation(s)
- Amin Haji-Abolhassani
- Centre for Intelligent Machines, Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec H3A 0E9, Canada.
| | - James J Clark
- Centre for Intelligent Machines, Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec H3A 0E9, Canada.
| |
Collapse
|
223
|
Abstract
Spatial priority maps are real-time representations of the behavioral salience of locations in the visual field, resulting from the combined influence of stimulus driven activity and top-down signals related to the current goals of the individual. They arbitrate which of a number of (potential) targets in the visual scene will win the competition for attentional resources. As a result, deployment of visual attention to a specific spatial location is determined by the current peak of activation (corresponding to the highest behavioral salience) across the map. Here we report a behavioral study performed on healthy human volunteers, where we demonstrate that spatial priority maps can be shaped via reward-based learning, reflecting long-lasting alterations (biases) in the behavioral salience of specific spatial locations. These biases exert an especially strong influence on performance under conditions where multiple potential targets compete for selection, conferring competitive advantage to targets presented in spatial locations associated with greater reward during learning relative to targets presented in locations associated with lesser reward. Such acquired biases of spatial attention are persistent, are nonstrategic in nature, and generalize across stimuli and task contexts. These results suggest that reward-based attentional learning can induce plastic changes in spatial priority maps, endowing these representations with the "intelligent" capacity to learn from experience.
Collapse
|
224
|
Learning Complementary Saliency Priors for Foreground Object Segmentation in Complex Scenes. Int J Comput Vis 2014. [DOI: 10.1007/s11263-014-0737-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
225
|
Hierarchical Geometry Verification via Maximum Entropy Saliency in Image Retrieval. ENTROPY 2014. [DOI: 10.3390/e16073848] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
226
|
Target features and target-distractor relation are both primed in visual search. Atten Percept Psychophys 2014; 76:682-94. [PMID: 24415176 DOI: 10.3758/s13414-013-0611-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Intertrial priming in visual search is the finding that repeating target and distractor features from one trial to the next speeds up search, relative to when these features change. Recently, Becker (2008) reported evidence that it is not so much the repetition of absolute feature values that causes priming, but repetition of the relation between target and distractors. For example, in search for a unique size, the size of the search elements may change from trial to trial, but this does not hurt performance as long as the target remains consistently larger (or smaller) than the distractors. Becker (2008) concluded that such findings are difficult to reconcile with existing theory. Here, we replicate the findings in the dimensions of size, color, and luminance and show that these effects are not due to the magnitude of feature changes or to search strategies, as may be induced by blocking versus mixing different types of intertrial changes experienced by observers. However, we show that repeating a feature from one trial to the next does convey a benefit above and beyond repeating the target-distractor relation. We argue that both effects can be readily accounted for within current models of visual search. Priming of relations results when one assumes the existence of cardinal feature channels, as do most models of visual search. Additional priming of specific values results when one assumes broadly distributed, overlapping feature channels.
Collapse
|
227
|
Calvo MG, Beltrán D, Fernández-Martín A. Processing of facial expressions in peripheral vision: Neurophysiological evidence. Biol Psychol 2014; 100:60-70. [DOI: 10.1016/j.biopsycho.2014.05.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 05/20/2014] [Accepted: 05/20/2014] [Indexed: 11/25/2022]
|
228
|
Attentional modulation and selection--an integrated approach. PLoS One 2014; 9:e99681. [PMID: 24963827 PMCID: PMC4070899 DOI: 10.1371/journal.pone.0099681] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Accepted: 05/19/2014] [Indexed: 11/19/2022] Open
Abstract
Various models of the neural mechanisms of attentional modulation in the visual cortex have been proposed. In general, these models assume that an ‘attention’ parameter is provided separately. Its value as well as the selection of neuron(s) to which it applies are assumed, but its source and the selection mechanism are unspecified. Here we show how the Selective Tuning model of visual attention can account for the modulation of the firing rate at the single neuron level, and for the temporal pattern of attentional modulations in the visual cortex, in a self-contained formulation that simultaneously determines the stimulus elements to be attended while modulating the relevant neural processes.
Collapse
|
229
|
Population coding of affect across stimuli, modalities and individuals. Nat Neurosci 2014; 17:1114-22. [PMID: 24952643 PMCID: PMC4317366 DOI: 10.1038/nn.3749] [Citation(s) in RCA: 160] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2014] [Accepted: 05/23/2014] [Indexed: 11/08/2022]
Abstract
It remains unclear how the brain represents external objective sensory events alongside our internal subjective impressions of them--affect. Representational mapping of population activity evoked by complex scenes and basic tastes in humans revealed a neural code supporting a continuous axis of pleasant-to-unpleasant valence. This valence code was distinct from low-level physical and high-level object properties. Although ventral temporal and anterior insular cortices supported valence codes specific to vision and taste, both the medial and lateral orbitofrontal cortices (OFC) maintained a valence code independent of sensory origin. Furthermore, only the OFC code could classify experienced affect across participants. The entire valence spectrum was represented as a collective pattern in regional neural activity as sensory-specific and abstract codes, whereby the subjective quality of affect can be objectively quantified across stimuli, modalities and people.
Collapse
|
230
|
Yu CP, Samaras D, Zelinsky GJ. Modeling visual clutter perception using proto-object segmentation. J Vis 2014; 14:4. [PMID: 24904121 PMCID: PMC4528410 DOI: 10.1167/14.7.4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2013] [Accepted: 02/23/2014] [Indexed: 11/24/2022] Open
Abstract
We introduce the proto-object model of visual clutter perception. This unsupervised model segments an image into superpixels, then merges neighboring superpixels that share a common color cluster to obtain proto-objects-defined here as spatially extended regions of coherent features. Clutter is estimated by simply counting the number of proto-objects. We tested this model using 90 images of realistic scenes that were ranked by observers from least to most cluttered. Comparing this behaviorally obtained ranking to a ranking based on the model clutter estimates, we found a significant correlation between the two (Spearman's ρ = 0.814, p < 0.001). We also found that the proto-object model was highly robust to changes in its parameters and was generalizable to unseen images. We compared the proto-object model to six other models of clutter perception and demonstrated that it outperformed each, in some cases dramatically. Importantly, we also showed that the proto-object model was a better predictor of clutter perception than an actual count of the number of objects in the scenes, suggesting that the set size of a scene may be better described by proto-objects than objects. We conclude that the success of the proto-object model is due in part to its use of an intermediate level of visual representation-one between features and objects-and that this is evidence for the potential importance of a proto-object representation in many common visual percepts and tasks.
Collapse
Affiliation(s)
- Chen-Ping Yu
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gregory J. Zelinsky
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
- Department of Psychology, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
231
|
|
232
|
Hu W, Hu R, Xie N, Ling H, Maybank S. Image classification using multiscale information fusion based on saliency driven nonlinear diffusion filtering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:1513-1526. [PMID: 24569440 DOI: 10.1109/tip.2014.2303639] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we propose saliency driven image multiscale nonlinear diffusion filtering. The resulting scale space in general preserves or even enhances semantically important structures such as edges, lines, or flow-like structures in the foreground, and inhibits and smoothes clutter in the background. The image is classified using multiscale information fusion based on the original image, the image at the final scale at which the diffusion process converges, and the image at a midscale. Our algorithm emphasizes the foreground features, which are important for image classification. The background image regions, whether considered as contexts of the foreground or noise to the foreground, can be globally handled by fusing information from different scales. Experimental tests of the effectiveness of the multiscale space for the image classification are conducted on the following publicly available datasets: 1) the PASCAL 2005 dataset; 2) the Oxford 102 flowers dataset; and 3) the Oxford 17 flowers dataset, with high classification rates.
Collapse
|
233
|
Raffone A, Srinivasan N, van Leeuwen C. The interplay of attention and consciousness in visual search, attentional blink and working memory consolidation. Philos Trans R Soc Lond B Biol Sci 2014; 369:20130215. [PMID: 24639586 DOI: 10.1098/rstb.2013.0215] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Despite the acknowledged relationship between consciousness and attention, theories of the two have mostly been developed separately. Moreover, these theories have independently attempted to explain phenomena in which both are likely to interact, such as the attentional blink (AB) and working memory (WM) consolidation. Here, we make an effort to bridge the gap between, on the one hand, a theory of consciousness based on the notion of global workspace (GW) and, on the other, a synthesis of theories of visual attention. We offer a theory of attention and consciousness (TAC) that provides a unified neurocognitive account of several phenomena associated with visual search, AB and WM consolidation. TAC assumes multiple processing stages between early visual representation and conscious access, and extends the dynamics of the global neuronal workspace model to a visual attentional workspace (VAW). The VAW is controlled by executive routers, higher-order representations of executive operations in the GW, without the need for explicit saliency or priority maps. TAC leads to newly proposed mechanisms for illusory conjunctions, AB, inattentional blindness and WM capacity, and suggests neural correlates of phenomenal consciousness. Finally, the theory reconciles the all-or-none and graded perspectives on conscious representation.
Collapse
Affiliation(s)
- Antonino Raffone
- Department of Psychology, 'Sapienza' University of Rome, , Via dei Marsi, 78, 00185 Rome, Italy
| | | | | |
Collapse
|
234
|
Abstract
Saliency models have been frequently used to predict eye movements made during image viewing without a specified task (free viewing). Use of a single image set to systematically compare free viewing to other tasks has never been performed. We investigated the effect of task differences on the ability of three models of saliency to predict the performance of humans viewing a novel database of 800 natural images. We introduced a novel task where 100 observers made explicit perceptual judgments about the most salient image region. Other groups of observers performed a free viewing task, saliency search task, or cued object search task. Behavior on the popular free viewing task was not best predicted by standard saliency models. Instead, the models most accurately predicted the explicit saliency selections and eye movements made while performing saliency judgments. Observers' fixations varied similarly across images for the saliency and free viewing tasks, suggesting that these two tasks are related. The variability of observers' eye movements was modulated by the task (lowest for the object search task and greatest for the free viewing and saliency search tasks) as well as the clutter content of the images. Eye movement variability in saliency search and free viewing might be also limited by inherent variation of what observers consider salient. Our results contribute to understanding the tasks and behavioral measures for which saliency models are best suited as predictors of human behavior, the relationship across various perceptual tasks, and the factors contributing to observer variability in fixational eye movements.
Collapse
Affiliation(s)
- Kathryn Koehler
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Fei Guo
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Sheng Zhang
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Miguel P. Eckstein
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA
| |
Collapse
|
235
|
|
236
|
Alsam A, Sharma P. Robust metric for the evaluation of visual saliency algorithms. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2014; 31:532-540. [PMID: 24690651 DOI: 10.1364/josaa.31.000532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we analyzed eye fixation data obtained from 15 observers and 1003 images. When studying the eigen-decomposition of the correlation matrix constructed based on the fixation data of one observer viewing all images, it was observed that 23% of the data can be accounted for by one eigenvector. This finding implies a repeated viewing pattern that is independent of image content. Examination of this pattern revealed that it was highly correlated with the center region of the image. The presence of a repeated viewing pattern raised the following question: can we use the statistical information contained in the first eigenvector to filter out the fixations that were part of the pattern from those that are image feature dependent? To answer this question we designed a robust AUC metric that uses statistical analysis to better judge the goodness of the different saliency algorithms.
Collapse
|
237
|
Kim K, Lin KH, Walther DB, Hasegawa-Johnson MA, Huang TS. Automatic detection of auditory salience with optimized linear filters derived from human annotation. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2013.11.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
238
|
Amso D, Haas S, Tenenbaum E, Markant J, Sheinkopf SJ. Bottom-up attention orienting in young children with autism. J Autism Dev Disord 2014; 44:664-73. [PMID: 23996226 PMCID: PMC4089391 DOI: 10.1007/s10803-013-1925-5] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We examined the impact of simultaneous bottom-up visual influences and meaningful social stimuli on attention orienting in young children with autism spectrum disorders (ASDs). Relative to typically-developing age and sex matched participants, children with ASDs were more influenced by bottom-up visual scene information regardless of whether social stimuli and bottom-up scene properties were congruent or competing. This initial reliance on bottom-up strategies correlated with severity of social impairment as well as receptive language impairments. These data provide support for the idea that there is enhanced reliance on bottom-up attention strategies in ASDs, and that this may have a negative impact on social and language development.
Collapse
Affiliation(s)
- Dima Amso
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, 190 Thayer St., Box 1821, Providence, RI, 02912, USA,
| | | | | | | | | |
Collapse
|
239
|
The hard-won benefits of familiarity in visual search: naturally familiar brand logos are found faster. Atten Percept Psychophys 2014; 76:914-30. [DOI: 10.3758/s13414-014-0623-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
240
|
Boccignone G, Ferraro M. Ecological sampling of gaze shifts. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:266-279. [PMID: 23757548 DOI: 10.1109/tcyb.2013.2253460] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Visual attention guides our gaze to relevant parts of the viewed scene, yet the moment-to-moment relocation of gaze can be different among observers even though the same locations are taken into account. Surprisingly, the variability of eye movements has been so far overlooked by the great majority of computational models of visual attention. In this paper we present the ecological sampling model, a stochastic model of eye guidance explaining such variability. The gaze shift mechanism is conceived as an active random sampling that the foraging eye carries out upon the visual landscape, under the constraints set by the observable features and the global complexity of the landscape. By drawing on results reported in the foraging literature, the actual gaze relocation is eventually driven by a stochastic differential equation whose noise source is sampled from a mixture of α-stable distributions. This way, the sampling strategy proposed here allows to mimic a fundamental property of the eye guidance mechanism: where we choose to look next at any given moment in time, it is not completely deterministic, but neither is it completely random To show that the model yields gaze shift motor behaviors that exhibit statistics similar to those displayed by human observers, we compare simulation outputs with those obtained from eye-tracked subjects while viewing complex dynamic scenes.
Collapse
|
241
|
Endres I, Hoiem D. Category-independent object proposals with diverse ranking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014; 36:222-234. [PMID: 24356345 DOI: 10.1109/tpami.2013.122] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We propose a category-independent method to produce a bag of regions and rank them, such that top-ranked regions are likely to be good segmentations of different objects. Our key objectives are completeness and diversity: Every object should have at least one good proposed region, and a diverse set should be top-ranked. Our approach is to generate a set of segmentations by performing graph cuts based on a seed region and a learned affinity function. Then, the regions are ranked using structured learning based on various cues. Our experiments on the Berkeley Segmentation Data Set and Pascal VOC 2011 demonstrate our ability to find most objects within a small bag of proposed regions.
Collapse
Affiliation(s)
- Ian Endres
- University of Illinois at Urbana-Champaign, Urbana
| | - Derek Hoiem
- University of Illinois at Urbana-Champaign, Urbana
| |
Collapse
|
242
|
Amso D, Haas S, Markant J. An eye tracking investigation of developmental change in bottom-up attention orienting to faces in cluttered natural scenes. PLoS One 2014; 9:e85701. [PMID: 24465653 PMCID: PMC3899069 DOI: 10.1371/journal.pone.0085701] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 12/05/2013] [Indexed: 01/13/2023] Open
Abstract
This study examined the contribution of visual salience to bottom-up attention orienting to faces in cluttered natural scenes across development. We eye tracked participants 4 months to 24 years of age as they freely viewed 16 natural scenes, all of which had faces in them. In half, the face was also the winner-take-all salient area in the display as determined by the MATLAB SaliencyToolbox. In the other half, a random location was the winner-take-all salient area in the display and the face was visually non-salient. We found that proportion of attended faces, in the first second of scene viewing, improved after the first year. Visually salient faces attracted bottom-up attention orienting more than non-salient faces reliably and robustly only after infancy. Preliminary data indicate that this shift to use of visual salience to guide bottom-up attention orienting after infancy may be a function of stabilization of visual skills. Moreover, sociodemographic factors including number of siblings in the home and family income were agents of developmental change in orienting to faces in cluttered natural scenes in infancy.
Collapse
Affiliation(s)
- Dima Amso
- Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
| | - Sara Haas
- Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
| | - Julie Markant
- Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
243
|
|
244
|
Xiao WS, Quinn PC, Pascalis O, Lee K. Own- and other-race face scanning in infants: implications for perceptual narrowing. Dev Psychobiol 2014; 56:262-73. [PMID: 24415549 DOI: 10.1002/dev.21196] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 12/12/2013] [Indexed: 11/10/2022]
Abstract
The present study investigated how 6- and 9-month-old Caucasian infants scan Caucasian and Chinese dynamic faces using eye-tracking methodology. Analyses of looking times revealed that with increased age, infants decreased their looking time to other-race noses, while maintaining their looking time for own-race noses. From 6 to 9 months, infants increased their looking time for the eyes of both races of faces. Analyses of scan paths showed that infants were no more likely to shift their fixation between the eyes of own-race faces than other-race faces. Similarity between participants' scan paths suggested that facial information was collected more efficiently for own- versus other-race faces at 9 months of age. Combined with previous eye-tracking studies of infants' face scanning (Liu et al. [2011] Journal of Experimental Child Psychology, 108, 180-189; Wheeler et al. [2011] PLoS ONE, 6, e18621. doi: 10.1371/journal.pone.0018621; Xiao et al. [2013] International Journal of Behavioral Development, 37, 100-105), the findings are interpreted in the context of perceptual narrowing and suggest differential contributions of visual experience, facial physiognomy, and culture in accounting for similarity and difference in infants scanning of own- and other-race faces.
Collapse
Affiliation(s)
- Wen S Xiao
- Institute of Child Study, University of Toronto, 45 Walmer Road, Toronto, Ontario, Canada, M5R 2X2
| | | | | | | |
Collapse
|
245
|
Merkel C, Stoppel CM, Hillyard SA, Heinze HJ, Hopf JM, Schoenfeld MA. Spatio-temporal Patterns of Brain Activity Distinguish Strategies of Multiple-object Tracking. J Cogn Neurosci 2014; 26:28-40. [DOI: 10.1162/jocn_a_00455] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Human observers can readily track up to four independently moving items simultaneously, even in the presence of moving distractors. Here we combined EEG and magnetoencephalography recordings to investigate the neural processes underlying this remarkable capability. Participants were instructed to track four of eight independently moving items for 3 sec. When the movement ceased a probe stimulus consisting of four items with a higher luminance was presented. The location of the probe items could correspond fully, partly, or not at all with the tracked items. Participants reported whether the probe items fully matched the tracked items or not. About half of the participants showed slower RTs and higher error rates with increasing correspondence between tracked items and the probe. The other half, however, showed faster RTs and lower error rates when the probe fully matched the tracked items. This latter behavioral pattern was associated with enhanced probe-evoked neural activity that was localized to the lateral occipital cortex in the time range 170–210 msec. This enhanced response in the object-selective lateral occipital cortex suggested that these participants performed the tracking task by visualizing the overall shape configuration defined by the vertices of the tracked items, thereby producing a behavioral advantage on full-match trials. In a later time range (270–310 msec) probe-evoked neural activity increased monotonically as a function of decreasing target–probe correspondence in all participants. This later modulation, localized to superior parietal cortex, was proposed to reflect the degree of mismatch between the probe and the automatically formed visual STM representation of the tracked items.
Collapse
Affiliation(s)
| | | | - Steven A. Hillyard
- 2Leibniz Institute for Neurobiology, Magdeburg
- 3University California, San Diego
| | - Hans-Jochen Heinze
- 1Otto-von-Guericke University, Magdeburg
- 2Leibniz Institute for Neurobiology, Magdeburg
| | - Jens-Max Hopf
- 1Otto-von-Guericke University, Magdeburg
- 2Leibniz Institute for Neurobiology, Magdeburg
| | - Mircea Ariel Schoenfeld
- 1Otto-von-Guericke University, Magdeburg
- 2Leibniz Institute for Neurobiology, Magdeburg
- 4Kliniken Schmieder, Allensbach
| |
Collapse
|
246
|
Russell AF, Mihalaş S, von der Heydt R, Niebur E, Etienne-Cummings R. A model of proto-object based saliency. Vision Res 2014; 94:1-15. [PMID: 24184601 PMCID: PMC3902215 DOI: 10.1016/j.visres.2013.10.005] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Revised: 08/06/2013] [Accepted: 10/04/2013] [Indexed: 10/26/2022]
Abstract
Organisms use the process of selective attention to optimally allocate their computational resources to the instantaneously most relevant subsets of a visual scene, ensuring that they can parse the scene in real time. Many models of bottom-up attentional selection assume that elementary image features, like intensity, color and orientation, attract attention. Gestalt psychologists, however, argue that humans perceive whole objects before they analyze individual features. This is supported by recent psychophysical studies that show that objects predict eye-fixations better than features. In this report we present a neurally inspired algorithm of object based, bottom-up attention. The model rivals the performance of state of the art non-biologically plausible feature based algorithms (and outperforms biologically plausible feature based algorithms) in its ability to predict perceptual saliency (eye fixations and subjective interest points) in natural scenes. The model achieves this by computing saliency as a function of proto-objects that establish the perceptual organization of the scene. All computational mechanisms of the algorithm have direct neural correlates, and our results provide evidence for the interface theory of attention.
Collapse
Affiliation(s)
- Alexander F Russell
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Stefan Mihalaş
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, United States; Zanvyl-Krieger Mind Brain Institute, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Rudiger von der Heydt
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, United States; Zanvyl-Krieger Mind Brain Institute, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Ernst Niebur
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, United States; Zanvyl-Krieger Mind Brain Institute, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Ralph Etienne-Cummings
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, United States.
| |
Collapse
|
247
|
Clarke ADF, Coco MI, Keller F. The impact of attentional, linguistic, and visual features during object naming. Front Psychol 2013; 4:927. [PMID: 24379792 PMCID: PMC3861867 DOI: 10.3389/fpsyg.2013.00927] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 11/23/2013] [Indexed: 11/14/2022] Open
Abstract
Object detection and identification are fundamental to human vision, and there is mounting evidence that objects guide the allocation of visual attention. However, the role of objects in tasks involving multiple modalities is less clear. To address this question, we investigate object naming, a task in which participants have to verbally identify objects they see in photorealistic scenes. We report an eye-tracking study that investigates which features (attentional, visual, and linguistic) influence object naming. We find that the amount of visual attention directed toward an object, its position and saliency, along with linguistic factors such as word frequency, animacy, and semantic proximity, significantly influence whether the object will be named or not. We then ask how features from different modalities are combined during naming, and find significant interactions between saliency and position, saliency and linguistic features, and attention and position. We conclude that when the cognitive system performs tasks such as object naming, it uses input from one modality to constraint or enhance the processing of other modalities, rather than processing each input modality independently.
Collapse
Affiliation(s)
- Alasdair D. F. Clarke
- Institute for Language, Cognition and Computation, School of Informatics, University of EdinburghEdinburgh, UK
| | - Moreno I. Coco
- Institute for Language, Cognition and Computation, School of Informatics, University of EdinburghEdinburgh, UK
- Faculdade de Psicologia, Universidade de LisboaLisboa, Portugal
| | - Frank Keller
- Institute for Language, Cognition and Computation, School of Informatics, University of EdinburghEdinburgh, UK
| |
Collapse
|
248
|
Peschel AO, Orquin JL. A review of the findings and theories on surface size effects on visual attention. Front Psychol 2013; 4:902. [PMID: 24367343 PMCID: PMC3856423 DOI: 10.3389/fpsyg.2013.00902] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 11/14/2013] [Indexed: 11/13/2022] Open
Abstract
That surface size has an impact on attention has been well-known in advertising research for almost a century; however, theoretical accounts of this effect have been sparse. To address this issue, we review studies on surface size effects on eye movements in this paper. While most studies find that large objects are more likely to be fixated, receive more fixations, and are fixated faster than small objects, a comprehensive explanation of this effect is still lacking. To bridge the theoretical gap, we relate the findings from this review to three theories of surface size effects suggested in the literature: a linear model based on the assumption of random fixations (Lohse, 1997), a theory of surface size as visual saliency (Pieters etal., 2007), and a theory based on competition for attention (CA; Janiszewski, 1998). We furthermore suggest a fourth model - demand for attention - which we derive from the theory of CA by revising the underlying model assumptions. In order to test the models against each other, we reanalyze data from an eye tracking study investigating surface size and saliency effects on attention. The reanalysis revealed little support for the first three theories while the demand for attention model showed a much better alignment with the data. We conclude that surface size effects may best be explained as an increase in object signal strength which depends on object size, number of objects in the visual scene, and object distance to the center of the scene. Our findings suggest that advertisers should take into account how objects in the visual scene interact in order to optimize attention to, for instance, brands and logos.
Collapse
Affiliation(s)
- Anne O Peschel
- MAPP Centre for Research on Customer Relations in the Food Sector, Department of Business Administration, Aarhus University Aarhus, Denmark
| | - Jacob L Orquin
- MAPP Centre for Research on Customer Relations in the Food Sector, Department of Business Administration, Aarhus University Aarhus, Denmark
| |
Collapse
|
249
|
|
250
|
Das A, Diu M, Mathew N, Scharfenberger C, Servos J, Wong A, Zelek JS, Clausi DA, Waslander SL. Mapping, Planning, and Sample Detection Strategies for Autonomous Exploration. J FIELD ROBOT 2013. [DOI: 10.1002/rob.21490] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Arun Das
- Mechanical and Mechatronics Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | - Michael Diu
- Electrical and Computer Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | - Neil Mathew
- Mechanical and Mechatronics Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | | | - James Servos
- Mechanical and Mechatronics Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | - Andy Wong
- Mechanical and Mechatronics Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | - John S. Zelek
- Systems Design Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | - David A. Clausi
- Systems Design Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| | - Steven L. Waslander
- Mechanical and Mechatronics Engineering; University of Waterloo; Waterloo ON Canada N2L 3G1
| |
Collapse
|