1
|
Ullman S, Assif L, Strugatski A, Vatashsky BZ, Levi H, Netanyahu A, Yaari A. Human-like scene interpretation by a guided counterstream processing. Proc Natl Acad Sci U S A 2023; 120:e2211179120. [PMID: 37769256 PMCID: PMC10556630 DOI: 10.1073/pnas.2211179120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 08/24/2023] [Indexed: 09/30/2023] Open
Abstract
In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, Trends. Cogn. Sci. 20, 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, 35th International Conference on Machine Learning, ICML 2018 (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.
Collapse
Affiliation(s)
- Shimon Ullman
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Liav Assif
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Alona Strugatski
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Ben-Zion Vatashsky
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Hila Levi
- Department of Computer Science, the Weizmann Institute of Science, Rehovot76100, Israel
| | - Aviv Netanyahu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Adam Yaari
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
2
|
Segraves MA. Using Natural Scenes to Enhance our Understanding of the Cerebral Cortex's Role in Visual Search. Annu Rev Vis Sci 2023; 9:435-454. [PMID: 37164028 DOI: 10.1146/annurev-vision-100720-124033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Using natural scenes is an approach to studying the visual and eye movement systems approximating how these systems function in everyday life. This review examines the results from behavioral and neurophysiological studies using natural scene viewing in humans and monkeys. The use of natural scenes for the study of cerebral cortical activity is relatively new and presents challenges for data analysis. Methods and results from the use of natural scenes for the study of the visual and eye movement cortex are presented, with emphasis on new insights that this method provides enhancing what is known about these cortical regions from the use of conventional methods.
Collapse
Affiliation(s)
- Mark A Segraves
- Department of Neurobiology, Northwestern University, Evanston, Illinois, USA;
| |
Collapse
|
3
|
Moskowitz JB, Fooken J, Castelhano MS, Gallivan JP, Flanagan JR. Visual search for reach targets in actionable space is influenced by movement costs imposed by obstacles. J Vis 2023; 23:4. [PMID: 37289172 PMCID: PMC10257340 DOI: 10.1167/jov.23.6.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 05/07/2023] [Indexed: 06/09/2023] Open
Abstract
Real world search tasks often involve action on a target object once it has been located. However, few studies have examined whether movement-related costs associated with acting on located objects influence visual search. Here, using a task in which participants reached to a target object after locating it, we examined whether people take into account obstacles that increase movement-related costs for some regions of the reachable search space but not others. In each trial, a set of 36 objects (4 targets and 32 distractors) were displayed on a vertical screen and participants moved a cursor to a target after locating it. Participants had to fixate on an object to determine whether it was a target or distractor. A rectangular obstacle, of varying length, location, and orientation, was briefly displayed at the start of the trial. Participants controlled the cursor by moving the handle of a robotic manipulandum in a horizontal plane. The handle applied forces to simulate contact between the cursor and the unseen obstacle. We found that search, measured using eye movements, was biased to regions of the search space that could be reached without moving around the obstacle. This result suggests that when deciding where to search, people can incorporate the physical structure of the environment so as to reduce the movement-related cost of subsequently acting on the located target.
Collapse
Affiliation(s)
- Joshua B Moskowitz
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
- Department of Psychology, Queen's University, Kingston, Ontario, Canada
| | - Jolande Fooken
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Monica S Castelhano
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
- Department of Psychology, Queen's University, Kingston, Ontario, Canada
| | - Jason P Gallivan
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
- Department of Psychology, Queen's University, Kingston, Ontario, Canada
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada
| | - J Randall Flanagan
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
- Department of Psychology, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
4
|
Yang Y, Mo L, Lio G, Huang Y, Perret T, Sirigu A, Duhamel JR. Assessing the allocation of attention during visual search using digit-tracking, a calibration-free alternative to eye tracking. Sci Rep 2023; 13:2376. [PMID: 36759694 PMCID: PMC9911646 DOI: 10.1038/s41598-023-29133-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 01/31/2023] [Indexed: 02/11/2023] Open
Abstract
Digit-tracking, a simple, calibration-free technique, has proven to be a good alternative to eye tracking in vision science. Participants view stimuli superimposed by Gaussian blur on a touchscreen interface and slide a finger across the display to locally sharpen an area the size of the foveal region just at the finger's position. Finger movements are recorded as an indicator of eye movements and attentional focus. Because of its simplicity and portability, this system has many potential applications in basic and applied research. Here we used digit-tracking to investigate visual search and replicated several known effects observed using different types of search arrays. Exploration patterns measured with digit-tracking during visual search of natural scenes were comparable to those previously reported for eye-tracking and constrained by similar saliency. Therefore, our results provide further evidence for the validity and relevance of digit-tracking for basic and applied research on vision and attention.
Collapse
Affiliation(s)
- Yidong Yang
- Key Laboratory of Brain, Cognition and Education, Ministry of Education, South China Normal University, Guangzhou, 510631, China.,Institute of Cognitive Sciences Marc Jeannerod CNRS, UMR 5229, 69675, Bron, France
| | - Lei Mo
- Key Laboratory of Brain, Cognition and Education, Ministry of Education, South China Normal University, Guangzhou, 510631, China
| | - Guillaume Lio
- IMind Center of Excellence for Autism, Le Vinatier Hospital, Bron, France
| | - Yulong Huang
- Key Laboratory of Brain, Cognition and Education, Ministry of Education, South China Normal University, Guangzhou, 510631, China.,Institute of Cognitive Sciences Marc Jeannerod CNRS, UMR 5229, 69675, Bron, France
| | - Thomas Perret
- Institute of Cognitive Sciences Marc Jeannerod CNRS, UMR 5229, 69675, Bron, France
| | - Angela Sirigu
- Institute of Cognitive Sciences Marc Jeannerod CNRS, UMR 5229, 69675, Bron, France.,IMind Center of Excellence for Autism, Le Vinatier Hospital, Bron, France
| | - Jean-René Duhamel
- Institute of Cognitive Sciences Marc Jeannerod CNRS, UMR 5229, 69675, Bron, France.
| |
Collapse
|
5
|
Chen S, Jiang M, Yang J, Zhao Q. Attention in Reasoning: Dataset, Analysis, and Modeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7310-7326. [PMID: 34550881 DOI: 10.1109/tpami.2021.3114582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of attention progressively along the reasoning process and to differentiate the correct and incorrect attention patterns. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR.
Collapse
|
6
|
Ktistakis E, Skaramagkas V, Manousos D, Tachos NS, Tripoliti E, Fotiadis DI, Tsiknakis M. COLET: A dataset for COgnitive workLoad estimation based on eye-tracking. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 224:106989. [PMID: 35870415 DOI: 10.1016/j.cmpb.2022.106989] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/02/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE The cognitive workload is an important component in performance psychology, ergonomics, and human factors. Publicly available datasets are scarce, making it difficult to establish new approaches and comparative studies. In this work, COLET-COgnitive workLoad estimation based on Eye-Tracking dataset is presented. METHODS Forty-seven (47) individuals' eye movements were monitored as they solved puzzles involving visual search activities of varying complexity and duration. The participants' cognitive workload level was evaluated with the subjective test of NASA-TLX and this score is used as an annotation of the activity. Extensive data analysis was performed in order to derive eye and gaze features from low-level eye recorded metrics, and a range of machine learning models were evaluated and tested regarding the estimation of the cognitive workload level. RESULTS The activities induced four different levels of cognitive workload. Multi tasking and time pressure have induced a higher level of cognitive workload than the one induced by single tasking and absence of time pressure. Multi tasking had a significant effect on 17 eye features while time pressure had a significant effect on 7 eye features. Both binary and multi-class identification attempts were performed by testing a variety of well-known classifiers, resulting in encouraging results towards cognitive workload levels estimation, with up to 88% correct predictions between low and high cognitive workload. CONCLUSIONS Machine learning analysis demonstrated potential in discriminating cognitive workload levels using only eye-tracking characteristics. The proposed dataset includes a much higher sample size and a wider spectrum of eye and gaze metrics than other similar datasets, allowing for the examination of their relations with various cognitive states.
Collapse
Affiliation(s)
- Emmanouil Ktistakis
- Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), GR-700 13 Heraklion, Greece; Laboratory of Optics and Vision, School of Medicine, University of Crete, GR-710 03 Heraklion, Greece.
| | - Vasileios Skaramagkas
- Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), GR-700 13 Heraklion, Greece; Dept. of Electrical and Computer Engineering, Hellenic Mediterranean University, GR-710 04 Heraklion, Crete, Greece
| | - Dimitris Manousos
- Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), GR-700 13 Heraklion, Greece
| | - Nikolaos S Tachos
- Biomedical Research Institute, FORTH, GR-451 10, Ioannina, Greece and the Dept. of Materials Science and Engineering, Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, GR-451 10, Ioannina, Greece
| | - Evanthia Tripoliti
- Dept. of Materials Science and Engineering, Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, GR-451 10, Ioannina, Greece
| | - Dimitrios I Fotiadis
- Biomedical Research Institute, FORTH, GR-451 10, Ioannina, Greece and the Dept. of Materials Science and Engineering, Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, GR-451 10, Ioannina, Greece
| | - Manolis Tsiknakis
- Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), GR-700 13 Heraklion, Greece; Dept. of Electrical and Computer Engineering, Hellenic Mediterranean University, GR-710 04 Heraklion, Crete, Greece
| |
Collapse
|
7
|
Chakraborty S, Samaras D, Zelinsky GJ. Weighting the factors affecting attention guidance during free viewing and visual search: The unexpected role of object recognition uncertainty. J Vis 2022; 22:13. [PMID: 35323870 PMCID: PMC8963662 DOI: 10.1167/jov.22.4.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 02/18/2022] [Indexed: 11/24/2022] Open
Abstract
The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition. We hypothesize that the greater the number of object categories competing for an object proposal, the greater the uncertainty of how that object should be recognized and, hence, the greater the need for attention to resolve this uncertainty. As expected, we found that target features best predicted target-present search, with their dominance obscuring the use of other features. Unexpectedly, we found that target features were only weakly used during target-absent search. We also found that object recognition uncertainty outperformed an unsupervised saliency model in predicting free-viewing fixations, although saliency was slightly more predictive of search. We conclude that uncertainty in object recognition, a measure that is image computable and highly interpretable, is better than bottom-up saliency in predicting attention during free viewing.
Collapse
Affiliation(s)
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gregory J Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, NY, USA
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
8
|
Lu T, Tang M, Guo Y, Zhou C, Zhao Q, You X. Effect of video game experience on the simulated flight task: the role of attention and spatial orientation. AUSTRALIAN JOURNAL OF PSYCHOLOGY 2022. [DOI: 10.1080/00049530.2021.2007736] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Tianjiao Lu
- Student Mental Health Education Center, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Menghan Tang
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, The Institute of Psychology, Shaanxi Normal University, Xi’an, Shaanxi, China
| | - Yu Guo
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, The Institute of Psychology, Shaanxi Normal University, Xi’an, Shaanxi, China
| | - Chenchen Zhou
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, The Institute of Psychology, Shaanxi Normal University, Xi’an, Shaanxi, China
| | - Qingxian Zhao
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, The Institute of Psychology, Shaanxi Normal University, Xi’an, Shaanxi, China
| | - Xuqun You
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, The Institute of Psychology, Shaanxi Normal University, Xi’an, Shaanxi, China
| |
Collapse
|
9
|
Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app12010309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction is widely used in computer vision and interdisciplinary subjects. In recent years, with the rapid development of deep learning, deep models have made amazing achievements in saliency prediction. Deep learning models can automatically learn features, thus solving many drawbacks of the classic models, such as handcrafted features and task settings, among others. Nevertheless, the deep models still have some limitations, for example in tasks involving multi-modality and semantic understanding. This study focuses on summarizing the relevant achievements in the field of saliency prediction, including the early neurological and psychological mechanisms and the guiding role of classic models, followed by the development process and data comparison of classic and deep saliency prediction models. This study also discusses the relationship between the model and human vision, as well as the factors that cause the semantic gaps, the influences of attention in cognitive research, the limitations of the saliency model, and the emerging applications, to provide new saliency predictions for follow-up work and the necessary help and advice.
Collapse
|
10
|
Frey M, Nau M, Doeller CF. Magnetic resonance-based eye tracking using deep neural networks. Nat Neurosci 2021; 24:1772-1779. [PMID: 34750593 PMCID: PMC10097595 DOI: 10.1038/s41593-021-00947-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 09/17/2021] [Indexed: 12/21/2022]
Abstract
Viewing behavior provides a window into many central aspects of human cognition and health, and it is an important variable of interest or confound in many functional magnetic resonance imaging (fMRI) studies. To make eye tracking freely and widely available for MRI research, we developed DeepMReye, a convolutional neural network (CNN) that decodes gaze position from the magnetic resonance signal of the eyeballs. It performs cameraless eye tracking at subimaging temporal resolution in held-out participants with little training data and across a broad range of scanning protocols. Critically, it works even in existing datasets and when the eyes are closed. Decoded eye movements explain network-wide brain activity also in regions not associated with oculomotor function. This work emphasizes the importance of eye tracking for the interpretation of fMRI results and provides an open source software solution that is widely applicable in research and clinical settings.
Collapse
Affiliation(s)
- Markus Frey
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease, Norwegian University of Science and Technology, Trondheim, Norway. .,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | - Matthias Nau
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease, Norwegian University of Science and Technology, Trondheim, Norway. .,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | - Christian F Doeller
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease, Norwegian University of Science and Technology, Trondheim, Norway.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Institute of Psychology, Leipzig University, Leipzig, Germany
| |
Collapse
|
11
|
Mattingly S, Hardesty E, Chovanec K, Cobos ME, Garcia J, Grizzle M, Huerta A, Ohtake J, Romero-Alvarez D, Gonzalez VH. Differences Between Attached and Detached Cadaveric Prosections on Students' Identification Ability During Practical Examinations. ANATOMICAL SCIENCES EDUCATION 2021; 14:808-815. [PMID: 33037784 DOI: 10.1002/ase.2023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 10/02/2020] [Accepted: 10/04/2020] [Indexed: 06/11/2023]
Abstract
Cadaveric prosections are effective learning tools in anatomy education. They range from a fully dissected, sometimes plastinated, complete cadaver (in situ prosections), to a single, carefully dissected structure detached from a cadaver (ex situ prosections). While most research has focused on the advantages and disadvantages of dissection versus prosection, limited information is available on the instructional efficacy of different prosection types. This contribution explored potential differences between in situ and ex situ prosections regarding the ability of undergraduate students to identify anatomical structures. To determine if students were able to recognize the same anatomical structure on both in situ and ex situ prosections, or on either one individually, six structures were tagged on both prosection types as part of three course summative examinations. The majority of students (61%-68%) fell into one of the two categories: those that recognized or failed to recognize the same structure on both in situ and ex situ prosections. The percentage of students who recognized a selected structure on only one type of prosection was small (1.6%-31.6%), but skewed in favor of ex situ prosections (P ≤ 0.01). These results suggest that overall students' identification ability was due to knowledge differences, not the spatial or contextual challenges posed by each type of prosection. They also suggest that the relative difficulty of either prosection type depends on the nature of the anatomical structure. Thus, one type of prosection might be more appropriate for teaching some structures, and therefore the use of both types is recommended.
Collapse
Affiliation(s)
- Spencer Mattingly
- Department of Ecology and Evolutionary Biology, College of Liberal Arts and Sciences, University of Kansas, Lawrence, Kansas
| | - Elizabeth Hardesty
- Department of Clinical, Health and Applied Sciences, College of Human Sciences and Humanities, University of Houston-Clear Lake, Houston, Texas
| | - Kevin Chovanec
- Department of Ecology and Evolutionary Biology, College of Liberal Arts and Sciences, University of Kansas, Lawrence, Kansas
| | - Marlon E Cobos
- Department of Ecology and Evolutionary Biology, College of Liberal Arts and Sciences, University of Kansas, Lawrence, Kansas
| | | | - Meghan Grizzle
- Department of Geospatial Information System Technology, University of Wyoming, Laramie, Wyoming
| | - Amanda Huerta
- School of Nursing, University of Kansas Medical Center, Kansas City, Kansas
| | - Jesse Ohtake
- Department of Physical Therapy and Rehabilitation Science, School of Health Professions, University of Kansas, Kansas City, Kansas
| | - Daniel Romero-Alvarez
- Department of Ecology and Evolutionary Biology, College of Liberal Arts and Sciences, University of Kansas, Lawrence, Kansas
| | - Victor H Gonzalez
- Department of Ecology and Evolutionary Biology, College of Liberal Arts and Sciences, University of Kansas, Lawrence, Kansas
- Undergraduate Biology Program, College of Liberal Arts and Sciences, University of Kansas, Lawrence, Kansas
| |
Collapse
|
12
|
Peacock CE, Cronin DA, Hayes TR, Henderson JM. Meaning and expected surfaces combine to guide attention during visual search in scenes. J Vis 2021; 21:1. [PMID: 34609475 PMCID: PMC8496418 DOI: 10.1167/jov.21.11.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/02/2021] [Indexed: 11/24/2022] Open
Abstract
How do spatial constraints and meaningful scene regions interact to control overt attention during visual search for objects in real-world scenes? To answer this question, we combined novel surface maps of the likely locations of target objects with maps of the spatial distribution of scene semantic content. The surface maps captured likely target surfaces as continuous probabilities. Meaning was represented by meaning maps highlighting the distribution of semantic content in local scene regions. Attention was indexed by eye movements during the search for target objects that varied in the likelihood they would appear on specific surfaces. The interaction between surface maps and meaning maps was analyzed to test whether fixations were directed to meaningful scene regions on target-related surfaces. Overall, meaningful scene regions were more likely to be fixated if they appeared on target-related surfaces than if they appeared on target-unrelated surfaces. These findings suggest that the visual system prioritizes meaningful scene regions on target-related surfaces during visual search in scenes.
Collapse
Affiliation(s)
- Candace E Peacock
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - Deborah A Cronin
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
- Department of Psychology, University of California, Davis, Davis, CA, USA
| |
Collapse
|
13
|
Nan Z, Jiang J, Gao X, Zhou S, Zuo W, Wei P, Zheng N. Predicting Task-Driven Attention via Integrating Bottom-Up Stimulus and Top-Down Guidance. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8293-8305. [PMID: 34559654 DOI: 10.1109/tip.2021.3113799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Task-free attention has gained intensive interest in the computer vision community while relatively few works focus on task-driven attention (TDAttention). Thus this paper handles the problem of TDAttention prediction in daily scenarios where a human is doing a task. Motivated by the cognition mechanism that human attention allocation is jointly controlled by the top-down guidance and bottom-up stimulus, this paper proposes a cognitively-explanatory deep neural network model to predict TDAttention. Given an image sequence, bottom-up features, such as human pose and motion, are firstly extracted. At the same time, the coarse-grained task information and fine-grained task information are embedded as a top-down feature. The bottom-up features are then fused with the top-down feature to guide the model to predict TDAttention. Two public datasets are re-annotated to make them qualified for TDAttention prediction, and our model is widely compared with other models on the two datasets. In addition, some ablation studies are conducted to evaluate the individual modules in our model. Experiment results demonstrate the effectiveness of our model.
Collapse
|
14
|
Saliency-Based Gaze Visualization for Eye Movement Analysis. SENSORS 2021; 21:s21155178. [PMID: 34372413 PMCID: PMC8348507 DOI: 10.3390/s21155178] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 07/12/2021] [Accepted: 07/27/2021] [Indexed: 12/29/2022]
Abstract
Gaze movement and visual stimuli have been utilized to analyze human visual attention intuitively. Gaze behavior studies mainly show statistical analyses of eye movements and human visual attention. During these analyses, eye movement data and the saliency map are presented to the analysts as separate views or merged views. However, the analysts become frustrated when they need to memorize all of the separate views or when the eye movements obscure the saliency map in the merged views. Therefore, it is not easy to analyze how visual stimuli affect gaze movements since existing techniques focus excessively on the eye movement data. In this paper, we propose a novel visualization technique for analyzing gaze behavior using saliency features as visual clues to express the visual attention of an observer. The visual clues that represent visual attention are analyzed to reveal which saliency features are prominent for the visual stimulus analysis. We visualize the gaze data with the saliency features to interpret the visual attention. We analyze the gaze behavior with the proposed visualization to evaluate that our approach to embedding saliency features within the visualization supports us to understand the visual attention of an observer.
Collapse
|
15
|
Hu Z, Bulling A, Li S, Wang G. FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2681-2690. [PMID: 33750707 DOI: 10.1109/tvcg.2021.3067779] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human visual attention in immersive virtual reality (VR) is key for many important applications, such as content design, gaze-contingent rendering, or gaze-based interaction. However, prior works typically focused on free-viewing conditions that have limited relevance for practical applications. We first collect eye tracking data of 27 participants performing a visual search task in four immersive VR environments. Based on this dataset, we provide a comprehensive analysis of the collected data and reveal correlations between users' eye fixations and other factors, i.e. users' historical gaze positions, task-related objects, saliency information of the VR content, and users' head rotation velocities. Based on this analysis, we propose FixationNet - a novel learning-based model to forecast users' eye fixations in the near future in VR. We evaluate the performance of our model for free-viewing and task-oriented settings and show that it outperforms the state of the art by a large margin of 19.8% (from a mean error of 2.93° to 2.35°) in free-viewing and of 15.1% (from 2.05° to 1.74°) in task-oriented situations. As such, our work provides new insights into task-oriented attention in virtual environments and guides future work on this important topic in VR research.
Collapse
|
16
|
Zelinsky GJ, Chen Y, Ahn S, Adeli H, Yang Z, Huang L, Samaras D, Hoai M. Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning. NEURONS, BEHAVIOR, DATA ANALYSIS, AND THEORY 2021; 2021:10.51628/001c.22322. [PMID: 34164631 PMCID: PMC8218820 DOI: 10.51628/001c.22322] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Understanding how goals control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used this behaviorally-annotated dataset and the machine learning method of inverse-reinforcement learning (IRL) to learn target-specific reward functions and policies for these two target goals. Finally, we used these learned policies to predict the fixations of 60 new behavioral searchers (clock = 30, microwave = 30) in a disjoint test dataset of kitchen scenes depicting both a microwave and a clock (thus controlling for differences in low-level image contrast). We found that the IRL model predicted behavioral search efficiency and fixation-density maps using multiple metrics. Moreover, reward maps from the IRL model revealed target-specific patterns that suggest, not just attention guidance by target features, but also guidance by scene context (e.g., fixations along walls in the search of clocks). Using machine learning and the psychologically meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.
Collapse
Affiliation(s)
- Gregory J. Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, NY, 11794, USA
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Yupei Chen
- Department of Psychology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Seoyoung Ahn
- Department of Psychology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Hossein Adeli
- Department of Psychology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Zhibo Yang
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Lihan Huang
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Dimitrios Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Minh Hoai
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
17
|
Gibson BC, Heinrich M, Mullins TS, Yu AB, Hansberger JT, Clark VP. Baseline Differences in Anxiety Affect Attention and tDCS-Mediated Learning. Front Hum Neurosci 2021; 15:541369. [PMID: 33746721 PMCID: PMC7965943 DOI: 10.3389/fnhum.2021.541369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 02/03/2021] [Indexed: 11/18/2022] Open
Abstract
Variable responses to transcranial direct current stimulation (tDCS) protocols across individuals are widely reported, but the reasons behind this variation are unclear. This includes tDCS protocols meant to improve attention. Attentional control is impacted by top-down and bottom-up processes, and this relationship is affected by state characteristics such as anxiety. According to Attentional Control Theory, anxiety biases attention towards bottom-up and stimulus-driven processing. The goal of this study was to explore the extent to which differences in state anxiety and related measures affect visual attention and category learning, both with and without the influence of tDCS. Using discovery learning, participants were trained to classify pictures of European streets into two categories while receiving 30 min of 2.0 mA anodal, cathodal, or sham tDCS over the rVLPFC. The pictures were classifiable according to two separate rules, one stimulus and one hypothesis-driven. The Remote Associates Test (RAT), Profile of Mood States, and Attention Networks Task (ANT) were used to understand the effects of individual differences at baseline on subsequent tDCS-mediated learning. Multinomial logistic regression was fit to predict rule learning based on the baseline measures, with subjects classified according to whether they used the stimulus-driven or hypothesis-driven rule to classify the pictures. The overall model showed a classification accuracy of 74.1%. The type of tDCS stimulation applied, attentional orienting score, and self-reported mood were significant predictors of different categories of rule learning. These results indicate that anxiety can influence the quality of subjects' attention at the onset of the task and that these attentional differences can influence tDCS-mediated category learning during the rapid assessment of visual scenes. These findings have implications for understanding the complex interactions that give rise to the variability in response to tDCS.
Collapse
Affiliation(s)
- Benjamin C. Gibson
- Department of Psychology, Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM, United States
- The Mind Research Network of the Lovelace Biomedical Research Institute, University of New Mexico, Albuquerque, NM, United States
| | - Melissa Heinrich
- Department of Psychology, Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM, United States
| | - Teagan S. Mullins
- Department of Psychology, Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM, United States
| | - Alfred B. Yu
- DEVCOM Army Research Laboratory, Human Research, and Engineering Directorate, Aberdeen Proving Ground, MD, United States
| | - Jeffrey T. Hansberger
- DEVCOM Army Research Laboratory, Human Research, and Engineering Directorate, Aberdeen Proving Ground, MD, United States
| | - Vincent P. Clark
- Department of Psychology, Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM, United States
- The Mind Research Network of the Lovelace Biomedical Research Institute, University of New Mexico, Albuquerque, NM, United States
| |
Collapse
|
18
|
Tatler BW. Searching in CCTV: effects of organisation in the multiplex. Cogn Res Princ Implic 2021; 6:11. [PMID: 33599890 PMCID: PMC7892658 DOI: 10.1186/s41235-021-00277-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/03/2021] [Indexed: 11/10/2022] Open
Abstract
CCTV plays a prominent role in public security, health and safety. Monitoring large arrays of CCTV camera feeds is a visually and cognitively demanding task. Arranging the scenes by geographical proximity in the surveilled environment has been recommended to reduce this demand, but empirical tests of this method have failed to find any benefit. The present study tests an alternative method for arranging scenes, based on psychological principles from literature on visual search and scene perception: grouping scenes by semantic similarity. Searching for a particular scene in the array-a common task in reactive and proactive surveillance-was faster when scenes were arranged by semantic category. This effect was found only when scenes were separated by gaps for participants who were not made aware that scenes in the multiplex were grouped by semantics (Experiment 1), but irrespective of whether scenes were separated by gaps or not for participants who were made aware of this grouping (Experiment 2). When target frequency varied between scene categories-mirroring unequal distributions of crime over space-the benefit of organising scenes by semantic category was enhanced for scenes in the most frequently searched-for category, without any statistical evidence for a cost when searching for rarely searched-for categories (Experiment 3). The findings extend current understanding of the role of within-scene semantics in visual search, to encompass between-scene semantic relationships. Furthermore, the findings suggest that arranging scenes in the CCTV control room by semantic category is likely to assist operators in finding specific scenes during surveillance.
Collapse
Affiliation(s)
- Benjamin W Tatler
- School of Psychology, University of Aberdeen, Aberdeen, AB24 3FX, Scotland, UK.
| |
Collapse
|
19
|
Bennett CR, Bex PJ, Merabet LB. Assessing visual search performance using a novel dynamic naturalistic scene. J Vis 2021; 21:5. [PMID: 33427871 PMCID: PMC7804579 DOI: 10.1167/jov.21.1.5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 12/01/2020] [Indexed: 11/24/2022] Open
Abstract
Daily activities require the constant searching and tracking of visual targets in dynamic and complex scenes. Classic work assessing visual search performance has been dominated by the use of simple geometric shapes, patterns, and static backgrounds. Recently, there has been a shift toward investigating visual search in more naturalistic dynamic scenes using virtual reality (VR)-based paradigms. In this direction, we have developed a first-person perspective VR environment combined with eye tracking for the capture of a variety of objective measures. Participants were instructed to search for a preselected human target walking in a crowded hallway setting. Performance was quantified based on saccade and smooth pursuit ocular motor behavior. To assess the effect of task difficulty, we manipulated factors of the visual scene, including crowd density (i.e., number of surrounding distractors) and the presence of environmental clutter. In general, results showed a pattern of worsening performance with increasing crowd density. In contrast, the presence of visual clutter had no effect. These results demonstrate how visual search performance can be investigated using VR-based naturalistic dynamic scenes and with high behavioral relevance. This engaging platform may also have utility in assessing visual search in a variety of clinical populations of interest.
Collapse
Affiliation(s)
- Christopher R Bennett
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| | - Peter J Bex
- Translational Vision Lab, Department of Psychology, Northeastern University, Boston, MA, USA
| | - Lotfi B Merabet
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
20
|
Eghdam R, Ebrahimpour R, Zabbah I, Zabbah S. Inherent Importance of Early Visual Features in Attraction of Human Attention. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020; 2020:3496432. [PMID: 33488689 PMCID: PMC7803287 DOI: 10.1155/2020/3496432] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 10/21/2020] [Accepted: 11/19/2020] [Indexed: 11/17/2022]
Abstract
Local contrasts attract human attention to different areas of an image. Studies have shown that orientation, color, and intensity are some basic visual features which their contrasts attract our attention. Since these features are in different modalities, their contribution in the attraction of human attention is not easily comparable. In this study, we investigated the importance of these three features in the attraction of human attention in synthetic and natural images. Choosing 100% percent detectable contrast in each modality, we studied the competition between different features. Psychophysics results showed that, although single features can be detected easily in all trials, when features were presented simultaneously in a stimulus, orientation always attracts subject's attention. In addition, computational results showed that orientation feature map is more informative about the pattern of human saccades in natural images. Finally, using optimization algorithms we quantified the impact of each feature map in construction of the final saliency map.
Collapse
Affiliation(s)
- Reza Eghdam
- Faculty of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran
- School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Niavaran, Tehran, Iran
| | - Reza Ebrahimpour
- Faculty of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran
- School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Niavaran, Tehran, Iran
| | - Iman Zabbah
- Department of Computer, Torbat-e-Heydariyeh branch, Islamic Azad University, Torbat-e-Heydariyeh, Iran
| | - Sajjad Zabbah
- School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Niavaran, Tehran, Iran
| |
Collapse
|
21
|
Analyzing Walkability Through Biometrics: Insights Into Sustainable Transportation Through the Use of Eye-Tracking Emulation Software. J Phys Act Health 2020; 17:1153-1161. [PMID: 33035992 DOI: 10.1123/jpah.2020-0127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 07/29/2020] [Accepted: 08/29/2020] [Indexed: 11/18/2022]
Abstract
BACKGROUND Understanding more about the unseen side of our responses to visual stimuli offers a powerful new tool for transportation planning. Traditional transportation planning tends to focus on the mobility of vehicles rather than on opportunities to encourage sustainable transport modes, like walking. METHODS Using eye-tracking emulation software, this study measured the unconscious visual responses people have to designs and layouts in new built environments, focusing on what makes streets most walkable. RESULTS The study found key differences between the way the brain takes in conventional automobile-oriented residential developments versus new urbanist layouts, with the former lacking key fixation points. CONCLUSION The study's discoveries significantly explain why new urbanist layouts promote walking effortlessly and conventional automobile-oriented residential developments cannot.
Collapse
|
22
|
Abstract
In visual search tasks, observers look for targets among distractors. In the lab, this often takes the form of multiple searches for a simple shape that may or may not be present among other items scattered at random on a computer screen (e.g., Find a red T among other letters that are either black or red.). In the real world, observers may search for multiple classes of target in complex scenes that occur only once (e.g., As I emerge from the subway, can I find lunch, my friend, and a street sign in the scene before me?). This article reviews work on how search is guided intelligently. I ask how serial and parallel processes collaborate in visual search, describe the distinction between search templates in working memory and target templates in long-term memory, and consider how searches are terminated.
Collapse
Affiliation(s)
- Jeremy M. Wolfe
- Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts 02115, USA
- Department of Radiology, Harvard Medical School, Boston, Massachusetts 02115, USA
- Visual Attention Lab, Brigham & Women's Hospital, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
23
|
Dadwhal YS, Kumar S, Sardana HK. Supervised framework for top-down color interest point detection. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-3189-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
24
|
Biologically Inspired Visual System Architecture for Object Recognition in Autonomous Systems. ALGORITHMS 2020. [DOI: 10.3390/a13070167] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Computer vision is currently one of the most exciting and rapidly evolving fields of science, which affects numerous industries. Research and development breakthroughs, mainly in the field of convolutional neural networks (CNNs), opened the way to unprecedented sensitivity and precision in object detection and recognition tasks. Nevertheless, the findings in recent years on the sensitivity of neural networks to additive noise, light conditions, and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. CNNs are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state-of-the-art bottom-up object recognition models, e.g., deep CNNs. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state-of-the-art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.
Collapse
|
25
|
Yang Z, Huang L, Chen Y, Wei Z, Ahn S, Zelinsky G, Samaras D, Hoai M. Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2020; 2020:190-199. [PMID: 34163124 PMCID: PMC8218821 DOI: 10.1109/cvpr42600.2020.00027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. We modeled the viewer's internal belief states as dynamic contextual belief maps of object locations. These maps were learned and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.
Collapse
|
26
|
Mahdi A, Qin J, Crosby G. DeepFeat: A Bottom-Up and Top-Down Saliency Model Based on Deep Features of Convolutional Neural Networks. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2894561] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
|
28
|
Cronin DA, Hall EH, Goold JE, Hayes TR, Henderson JM. Eye Movements in Real-World Scene Photographs: General Characteristics and Effects of Viewing Task. Front Psychol 2020; 10:2915. [PMID: 32010016 PMCID: PMC6971407 DOI: 10.3389/fpsyg.2019.02915] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 12/10/2019] [Indexed: 11/13/2022] Open
Abstract
The present study examines eye movement behavior in real-world scenes with a large (N = 100) sample. We report baseline measures of eye movement behavior in our sample, including mean fixation duration, saccade amplitude, and initial saccade latency. We also characterize how eye movement behaviors change over the course of a 12 s trial. These baseline measures will be of use to future work studying eye movement behavior in scenes in a variety of literatures. We also examine effects of viewing task on when and where the eyes move in real-world scenes: participants engaged in a memorization and an aesthetic judgment task while viewing 100 scenes. While we find no difference at the mean-level between the two tasks, temporal- and distribution-level analyses reveal significant task-driven differences in eye movement behavior.
Collapse
Affiliation(s)
- Deborah A. Cronin
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
| | - Elizabeth H. Hall
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
- Department of Psychology, University of California, Davis, Davis, CA, United States
| | - Jessica E. Goold
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
| | - Taylor R. Hayes
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
| | - John M. Henderson
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
- Department of Psychology, University of California, Davis, Davis, CA, United States
| |
Collapse
|
29
|
Changing perspectives on goal-directed attention control: The past, present, and future of modeling fixations during visual search. PSYCHOLOGY OF LEARNING AND MOTIVATION 2020. [DOI: 10.1016/bs.plm.2020.08.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
30
|
|
31
|
Wolfe JM, Utochkin IS. What is a preattentive feature? Curr Opin Psychol 2019; 29:19-26. [PMID: 30472539 PMCID: PMC6513732 DOI: 10.1016/j.copsyc.2018.11.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 11/01/2018] [Accepted: 11/08/2018] [Indexed: 11/30/2022]
Abstract
The concept of a preattentive feature has been central to vision and attention research for about half a century. A preattentive feature is a feature that guides attention in visual search and that cannot be decomposed into simpler features. While that definition seems straightforward, there is no simple diagnostic test that infallibly identifies a preattentive feature. This paper briefly reviews the criteria that have been proposed and illustrates some of the difficulties of definition.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Corresponding author Visual Attention Lab, Department
of Surgery, Brigham & Women's Hospital, Departments of Ophthalmology
and Radiology, Harvard Medical School, 64 Sidney St. Suite. 170, Cambridge, MA
02139-4170,
| | - Igor S Utochkin
- National Research University Higher School of
Economics, Moscow, Russian Federation Address: 101000, Armyansky per. 4, Moscow,
Russian Federation,
| |
Collapse
|
32
|
Zhang D, Zakir A. Top–Down Saliency Detection Based on Deep-Learned Features. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2019. [DOI: 10.1142/s1469026819500093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
How to localize objects in images accurately and efficiently is a challenging problem in computer vision. In this paper, a novel top–down fine-grained salient object detection method based on deep-learned features is proposed, which can detect the same object in input image as the query image. The query image and its three subsample images are used as top–down cues to guide saliency detection. We ameliorate convolutional neural network (CNN) using the fast VGG network (VGG-f) pre-trained on ImageNet and re-trained on the Pascal VOC 2012 dataset. Experiment on the FiFA dataset demonstrates that proposed method can localize the saliency region and find the specific object (e.g., human face) as the query. Experiments on the David1 and Face1 sequences conclusively prove that the proposed algorithm is able to effectively deal with many challenging factors including illumination change, shape deformation, scale change and partial occlusion.
Collapse
Affiliation(s)
- Duzhen Zhang
- School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, Jiangsu, P. R. China
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, P. R. China
| | - Ali Zakir
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, P. R. China
| |
Collapse
|
33
|
Abstract
When searching real-world scenes, human attention is guided by knowledge of the plausible size of target object (if an object is six feet tall, it isn't your cat). Computer algorithms typically do not do this, but perhaps they should.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Professor of Ophthalmology & Radiology, Harvard Medical School, and Visual Attention Laboratory, Department of Surgery, Brigham & Women's Hospital, 64 Sidney Street Suite 170, Cambridge, MA 02139-4170, USA.
| |
Collapse
|
34
|
Feature-based guidance of attention by visual working memory is applied independently of remembered object location. Atten Percept Psychophys 2019; 82:98-108. [PMID: 31140137 DOI: 10.3758/s13414-019-01759-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Visual working memory (VWM) has been implicated both in the online representation of object tokens (in the object-file framework) and in the top-down guidance of attention during visual search, implementing a feature template. It is well established that object representations in VWM are structured by location, with access to the content of VWM modulated by position consistency. In the present study, we examined whether this property generalizes to the guidance of attention. Specifically, in two experiments, we probed whether the guidance of spatial attention from features in VWM is modulated by the position of the object from which these features were encoded. Participants remembered an object with an incidental color. Items in a subsequent search array could match either the color of the remembered object, the location, or both. Robust benefits of color match (when the matching item was the target) and costs (when the matching items was a distractor) were observed. Critically, the magnitude of neither effect was influenced by spatial correspondence. The results demonstrate that features in VWM influence attentional priority maps in a manner that does not necessarily inherit the spatial structure of the object representations in which those features are maintained.
Collapse
|
35
|
Ohmatsu S, Takamura Y, Fujii S, Tanaka K, Morioka S, Kawashima N. Visual search pattern during free viewing of horizontally flipped images in patients with unilateral spatial neglect. Cortex 2019; 113:83-95. [PMID: 30620921 DOI: 10.1016/j.cortex.2018.11.029] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 07/12/2018] [Accepted: 11/30/2018] [Indexed: 10/27/2022]
Abstract
Eye tracking is an effective tool for identifying behavioural aspects of unilateral spatial neglect (USN), which is a common neurological syndrome that develops after a right hemisphere lesion. Here, we attempted to elucidate how the neglect symptom affects the symmetry of the gaze pattern, by performing an analysis of gaze distribution during the free viewing of a pair of horizontally flipped images. Based on their Behavioural Inattention Test (BIT) scores, 41 patients with right-hemisphere damage were classified into those with USN (n = 27) and those without USN (right hemisphere damaged - RHD; n = 14). Eye movement was recorded while the patients viewed six pairs of horizontally flipped images on a computer display. A pair of flipped images has both similar and consistent elements, as well as a reversed spatial location of objects (right-left). We calculated the gaze distribution, extent of gaze shift, total gaze distance, and gaze velocity in each direction. Our results demonstrated a significantly larger rightward gaze shift in the USN group, which showed a significant correlation with the BIT score. More importantly, the extent of gaze shift and total gaze distance were similarly modulated by the contents of the displayed images in both the USN and RHD groups. Our findings suggest that analyses of gaze distribution during the free viewing of a pair of horizontally flipped images have the potential to precisely reveal neglect behaviour, and our results provide important implications for rehabilitation.
Collapse
Affiliation(s)
- Satoko Ohmatsu
- Graduate School of Health Sciences, Kio University, Nara, Japan; Department of Rehabilitation for the Movement Functions, Research Institute, National Rehabilitation Center for Persons with Disabilities, Saitama, Japan
| | - Yusaku Takamura
- Graduate School of Health Sciences, Kio University, Nara, Japan; Murata Hospital, Osaka, Japan
| | - Shintaro Fujii
- Graduate School of Health Sciences, Kio University, Nara, Japan; Nishiyamato Rehabilitation Hospital, Nara, Japan
| | - Kohei Tanaka
- Shizuoka Rehabilitation Hospital, Shizuoka, Japan
| | - Shu Morioka
- Graduate School of Health Sciences, Kio University, Nara, Japan; Neurorehabilitation Research Center, Kio University, Nara, Japan
| | - Noritaka Kawashima
- Department of Rehabilitation for the Movement Functions, Research Institute, National Rehabilitation Center for Persons with Disabilities, Saitama, Japan; Neurorehabilitation Research Center, Kio University, Nara, Japan.
| |
Collapse
|
36
|
Litchfield D, Donovan T. Expecting the initial glimpse: prior target knowledge activation or repeated search does not eliminate scene preview search benefits. JOURNAL OF COGNITIVE PSYCHOLOGY 2019. [DOI: 10.1080/20445911.2018.1555163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
| | - Tim Donovan
- Medical & Sport Sciences, University of Cumbria, Carlisle, UK
| |
Collapse
|
37
|
Learning to Perform Visual Tasks from Human Demonstrations. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1007/978-3-030-31321-0_30] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
38
|
Nakano T, Miyazaki Y. Blink synchronization is an indicator of interest while viewing videos. Int J Psychophysiol 2018; 135:1-11. [PMID: 30428333 DOI: 10.1016/j.ijpsycho.2018.10.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 10/22/2018] [Accepted: 10/25/2018] [Indexed: 10/27/2022]
Abstract
The temporal pattern of spontaneous blinks changes greatly depending on an individual's internal cognitive state. For instance, when several individuals watch the same video, blinks can be synchronized at attentional breakpoints. The present study examined the degree of this blink synchronization, as reflecting an interest level, while viewing various video clips. In the first experiment, participants interested in soccer, shogi (Japanese chess), or a specific musical group watched a video clip related to each category and rated their interest level after viewing. Results revealed that blink synchronization increased with a rise in interest level in the video clips of soccer and shogi. Moreover, while blink synchronization increased when viewing preferred video clips for the soccer and music group fans, synchronization decreased when viewing videos from the other categories, except for the shogi fans. In contrast, the blink rates did not correlate with the interest level on the video content but changed with the number of shot transitions of it. In the second experiment, participants viewed a video in which a professional salesperson gave descriptions of several products for a few minutes each. When participants reported an interest in the product, blinks were synchronized to the salesperson's blinks. However, when feeling uninterested, blink synchronization did not occur. These results suggest that blink synchronization could be used as an involuntary index to assess a person's interest.
Collapse
Affiliation(s)
- Tamami Nakano
- Graduate School of Frontiers Bioscience, Osaka University, Osaka, Japan; Graduate School of Medicine, Osaka University, Osaka, Japan; JST PRESTO, Japan.
| | - Yuta Miyazaki
- Graduate School of Information, Osaka University, Osaka, Japan
| |
Collapse
|
39
|
Mahdi A, Su M, Schlesinger M, Qin J. A Comparison Study of Saliency Models for Fixation Prediction on Infants and Adults. IEEE Trans Cogn Dev Syst 2018. [DOI: 10.1109/tcds.2017.2696439] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
40
|
Rahman IMH, Hollitt C, Zhang M. Feature Map Quality Score Estimation Through Regression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1793-1808. [PMID: 29346095 DOI: 10.1109/tip.2017.2785623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Understanding the visual quality of a feature map plays a significant role in many active vision applications. Previous works mostly rely on object-level features, such as compactness, to estimate the quality score of a feature map. However, the compactness is leveraged on feature maps produced by salient object detection techniques where the maps tend to be compact. As a result, the compactness feature fails when the feature maps are blurry (e.g., fixation maps). In this paper, we regard the process of estimating the quality score of feature maps, specifically fixation maps, as a regression problem. After extracting several local, global, geometric, and positional characteristic features from a feature map, a model is learned using a random forest regressor to estimate the quality score of any unseen feature map. Our model is specifically tailored to estimate the quality of three types of maps: bottom-up, target, and contextual feature maps. These maps are produced for a large benchmark fixation data set of more than 900 challenging outdoor images. We demonstrate that our approach provides an accurate estimate of the quality of the abovementioned feature maps compared to the groundtruth data. In addition, we show that our proposed approach is useful in feature map integration for predicting human fixation. Instead of naively integrating all three feature maps when predicting human fixation, our proposed approach dynamically selects the best feature map with the highest estimated quality score on an individual image basis, thereby improving the fixation prediction accuracy.
Collapse
|
41
|
The right look for the job: decoding cognitive processes involved in the task from spatial eye-movement patterns. PSYCHOLOGICAL RESEARCH 2018; 84:245-258. [PMID: 29464316 DOI: 10.1007/s00426-018-0996-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 02/19/2018] [Indexed: 10/18/2022]
Abstract
The aim of the study was not only to demonstrate whether eye-movement-based task decoding was possible but also to investigate whether eye-movement patterns can be used to identify cognitive processes behind the tasks. We compared eye-movement patterns elicited under different task conditions, with tasks differing systematically with regard to the types of cognitive processes involved in solving them. We used four tasks, differing along two dimensions: spatial (global vs. local) processing (Navon, Cognit Psychol, 9(3):353-383 1977) and semantic (deep vs. shallow) processing (Craik and Lockhart, J Verbal Learn Verbal Behav, 11(6):671-684 1972). We used eye-movement patterns obtained from two time periods: fixation cross preceding the target stimulus and the target stimulus. We found significant effects of both spatial and semantic processing, but in case of the latter, the effect might be an artefact of insufficient task control. We found above chance task classification accuracy for both time periods: 51.4% for the period of stimulus presentation and 34.8% for the period of fixation cross presentation. Therefore, we show that task can be to some extent decoded from the preparatory eye-movements before the stimulus is displayed. This suggests that anticipatory eye-movements reflect the visual scanning strategy employed for the task at hand. Finally, this study also demonstrates that decoding is possible even from very scant eye-movement data similar to Coco and Keller, J Vis 14(3):11-11 (2014). This means that task decoding is not limited to tasks that naturally take longer to perform and yield multi-second eye-movement recordings.
Collapse
|
42
|
Hutson JP, Smith TJ, Magliano JP, Loschky LC. What is the role of the film viewer? The effects of narrative comprehension and viewing task on gaze control in film. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2017; 2:46. [PMID: 29214207 PMCID: PMC5698392 DOI: 10.1186/s41235-017-0080-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 10/04/2017] [Indexed: 11/23/2022]
Abstract
Film is ubiquitous, but the processes that guide viewers’ attention while viewing film narratives are poorly understood. In fact, many film theorists and practitioners disagree on whether the film stimulus (bottom-up) or the viewer (top-down) is more important in determining how we watch movies. Reading research has shown a strong connection between eye movements and comprehension, and scene perception studies have shown strong effects of viewing tasks on eye movements, but such idiosyncratic top-down control of gaze in film would be anathema to the universal control mainstream filmmakers typically aim for. Thus, in two experiments we tested whether the eye movements and comprehension relationship similarly held in a classic film example, the famous opening scene of Orson Welles’ Touch of Evil (Welles & Zugsmith, Touch of Evil, 1958). Comprehension differences were compared with more volitionally controlled task-based effects on eye movements. To investigate the effects of comprehension on eye movements during film viewing, we manipulated viewers’ comprehension by starting participants at different points in a film, and then tracked their eyes. Overall, the manipulation created large differences in comprehension, but only produced modest differences in eye movements. To amplify top-down effects on eye movements, a task manipulation was designed to prioritize peripheral scene features: a map task. This task manipulation created large differences in eye movements when compared to participants freely viewing the clip for comprehension. Thus, to allow for strong, volitional top-down control of eye movements in film, task manipulations need to make features that are important to narrative comprehension irrelevant to the viewing task. The evidence provided by this experimental case study suggests that filmmakers’ belief in their ability to create systematic gaze behavior across viewers is confirmed, but that this does not indicate universally similar comprehension of the film narrative.
Collapse
Affiliation(s)
- John P Hutson
- Department of Psychological Sciences, Kansas State University, 492 Bluemont Hall, 1100 Mid-campus Dr, Manhattan, KS 66506 USA
| | - Tim J Smith
- Department of Psychological Sciences, Birkbeck, University of London, Malet St, London, WC1E 7HX UK
| | - Joseph P Magliano
- Department of Psychology, Northern Illinois University, 361 Psychology-Computer Science Building, DeKalb, IL 60115 USA
| | - Lester C Loschky
- Department of Psychological Sciences, Kansas State University, 492 Bluemont Hall, 1100 Mid-campus Dr, Manhattan, KS 66506 USA
| |
Collapse
|
43
|
When is it time to move to the next map? Optimal foraging in guided visual search. Atten Percept Psychophys 2017; 78:2135-51. [PMID: 27192994 DOI: 10.3758/s13414-016-1128-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Suppose that you are looking for visual targets in a set of images, each containing an unknown number of targets. How do you perform that search, and how do you decide when to move from the current image to the next? Optimal foraging theory predicts that foragers should leave the current image when the expected value from staying falls below the expected value from leaving. Here, we describe how to apply these models to more complex tasks, like search for objects in natural scenes where people have prior beliefs about the number and locations of targets in each image, and search is guided by target features and scene context. We model these factors in a guided search task and predict the optimal time to quit search. The data come from a satellite image search task. Participants searched for small gas stations in large satellite images. We model quitting times with a Bayesian model that incorporates prior beliefs about the number of targets in each map, average search efficiency (guidance), and actual search history in the image. Clicks deploying local magnification were used as surrogates for deployments of attention and, thus, for time. Leaving times (measured in mouse clicks) were well-predicted by the model. People terminated search when their expected rate of target collection fell to the average rate for the task. Apparently, people follow a rate-optimizing strategy in this task and use both their prior knowledge and search history in the image to decide when to quit searching.
Collapse
|
44
|
Abstract
Humans are remarkably capable of finding desired objects in the world, despite the scale and complexity of naturalistic environments. Broadly, this ability is supported by an interplay between exploratory search and guidance from episodic memory for previously observed target locations. Here we examined how the environment itself may influence this interplay. In particular, we examined how partitions in the environment-like buildings, rooms, and furniture-can impact memory during repeated search. We report that the presence of partitions in a display, independent of item configuration, reliably improves episodic memory for item locations. Repeated search through partitioned displays was faster overall and was characterized by more rapid ballistic orienting in later repetitions. Explicit recall was also both faster and more accurate when displays were partitioned. Finally, we found that search paths were more regular and systematic when displays were partitioned. Given the ubiquity of partitions in real-world environments, these results provide important insights into the mechanisms of naturalistic search and its relation to memory.
Collapse
|
45
|
|
46
|
Bahle B, Matsukura M, Hollingworth A. Contrasting gist-based and template-based guidance during real-world visual search. J Exp Psychol Hum Percept Perform 2017; 44:367-386. [PMID: 28795834 DOI: 10.1037/xhp0000468] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Visual search through real-world scenes is guided both by a representation of target features and by knowledge of the sematic properties of the scene (derived from scene gist recognition). In 3 experiments, we compared the relative roles of these 2 sources of guidance. Participants searched for a target object in the presence of a critical distractor object. The color of the critical distractor either matched or mismatched (a) the color of an item maintained in visual working memory for a secondary task (Experiment 1), or (b) the color of the target, cued by a picture before search commenced (Experiments 2 and 3). Capture of gaze by a matching distractor served as an index of template guidance. There were 4 main findings: (a) The distractor match effect was observed from the first saccade on the scene, (b) it was independent of the availability of scene-level gist-based guidance, (c) it was independent of whether the distractor appeared in a plausible location for the target, and (d) it was preserved even when gist-based guidance was available before scene onset. Moreover, gist-based, semantic guidance of gaze to target-plausible regions of the scene was delayed relative to template-based guidance. These results suggest that feature-based template guidance is not limited to plausible scene regions after an initial, scene-level analysis. (PsycINFO Database Record
Collapse
Affiliation(s)
- Brett Bahle
- Department of Psychological and Brain Sciences, The University of Iowa
| | - Michi Matsukura
- Department of Psychological and Brain Sciences, The University of Iowa
| | | |
Collapse
|
47
|
Amor TA, Luković M, Herrmann HJ, Andrade JS. Influence of scene structure and content on visual search strategies. J R Soc Interface 2017; 14:rsif.2017.0406. [PMID: 28747401 DOI: 10.1098/rsif.2017.0406] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 06/30/2017] [Indexed: 11/12/2022] Open
Abstract
When searching for a target within an image, our brain can adopt different strategies, but which one does it choose? This question can be answered by tracking the motion of the eye while it executes the task. Following many individuals performing various search tasks, we distinguish between two competing strategies. Motivated by these findings, we introduce a model that captures the interplay of the search strategies and allows us to create artificial eye-tracking trajectories, which could be compared with the experimental ones. Identifying the model parameters allows us to quantify the strategy employed in terms of ensemble averages, characterizing each experimental cohort. In this way, we can discern with high sensitivity the relation between the visual landscape and the average strategy, disclosing how small variations in the image induce changes in the strategy.
Collapse
Affiliation(s)
- Tatiana A Amor
- Computational Physics IfB, ETH Zurich, Stefano-Franscini-Platz 3, 8093, Zurich, Switzerland.,Departamento de Física, Universidade Federal do Ceará, 60451-970, Fortaleza, Ceará, Brazil
| | - Mirko Luković
- Computational Physics IfB, ETH Zurich, Stefano-Franscini-Platz 3, 8093, Zurich, Switzerland
| | - Hans J Herrmann
- Computational Physics IfB, ETH Zurich, Stefano-Franscini-Platz 3, 8093, Zurich, Switzerland.,Departamento de Física, Universidade Federal do Ceará, 60451-970, Fortaleza, Ceará, Brazil
| | - José S Andrade
- Departamento de Física, Universidade Federal do Ceará, 60451-970, Fortaleza, Ceará, Brazil
| |
Collapse
|
48
|
How do targets, nontargets, and scene context influence real-world object detection? Atten Percept Psychophys 2017; 79:2021-2036. [PMID: 28660468 DOI: 10.3758/s13414-017-1359-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
49
|
Abstract
Speakers' perception of a visual scene influences the language they use to describe it-which objects they choose to mention and how they characterize the relationships between them. We show that visual complexity can either delay or facilitate description generation, depending on how much disambiguating information is required and how useful the scene's complexity can be in providing, for example, helpful landmarks. To do so, we measure speech onset times, eye gaze, and utterance content in a reference production experiment in which the target object is either unique or non-unique in a visual scene of varying size and complexity. Speakers delay speech onset if the target object is non-unique and requires disambiguation, and we argue that this reflects the cost of deciding on a high-level strategy for describing it. The eye-tracking data demonstrate that these delays increase when speakers are able to conduct an extensive early visual search, implying that when speakers scan too little of the scene early on, they may decide to begin speaking before becoming aware that their description is underspecified. Speakers' content choices reflect the visual makeup of the scene-the number of distractors present and the availability of useful landmarks. Our results highlight the complex role of visual perception in reference production, showing that speakers can make good use of complexity in ways that reflect their visual processing of the scene.
Collapse
Affiliation(s)
- Micha Elsner
- Department of Linguistics, The Ohio State University
| | | | - Hannah Rohde
- Department of Linguistics and English Language, University of Edinburgh
| |
Collapse
|
50
|
R. Tavakoli H, Borji A, Laaksonen J, Rahtu E. Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.018] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|