1
|
Zhang M, Armendariz M, Xiao W, Rose O, Bendtz K, Livingstone M, Ponce C, Kreiman G. Look twice: A generalist computational model predicts return fixations across tasks and species. PLoS Comput Biol 2022; 18:e1010654. [PMID: 36413523 PMCID: PMC9681066 DOI: 10.1371/journal.pcbi.1010654] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 10/13/2022] [Indexed: 11/23/2022] Open
Abstract
Primates constantly explore their surroundings via saccadic eye movements that bring different parts of an image into high resolution. In addition to exploring new regions in the visual field, primates also make frequent return fixations, revisiting previously foveated locations. We systematically studied a total of 44,328 return fixations out of 217,440 fixations. Return fixations were ubiquitous across different behavioral tasks, in monkeys and humans, both when subjects viewed static images and when subjects performed natural behaviors. Return fixations locations were consistent across subjects, tended to occur within short temporal offsets, and typically followed a 180-degree turn in saccadic direction. To understand the origin of return fixations, we propose a proof-of-principle, biologically-inspired and image-computable neural network model. The model combines five key modules: an image feature extractor, bottom-up saliency cues, task-relevant visual features, finite inhibition-of-return, and saccade size constraints. Even though there are no free parameters that are fine-tuned for each specific task, species, or condition, the model produces fixation sequences resembling the universal properties of return fixations. These results provide initial steps towards a mechanistic understanding of the trade-off between rapid foveal recognition and the need to scrutinize previous fixation locations.
Collapse
Affiliation(s)
- Mengmi Zhang
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
- CFAR and I2R, Agency for Science, Technology and Research, Singapore
| | - Marcelo Armendariz
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium
| | - Will Xiao
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Olivia Rose
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Katarina Bendtz
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
| | - Margaret Livingstone
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Carlos Ponce
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Gabriel Kreiman
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
2
|
Zhang M, Feng J, Ma KT, Lim JH, Zhao Q, Kreiman G. Finding any Waldo with zero-shot invariant and efficient visual search. Nat Commun 2018; 9:3730. [PMID: 30213937 PMCID: PMC6137219 DOI: 10.1038/s41467-018-06217-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 08/10/2018] [Indexed: 11/11/2022] Open
Abstract
Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work on visual search has focused on searching for perfect matches of a target after extensive category-specific training. Here, we show for the first time that humans can efficiently and invariantly search for natural objects in complex scenes. To gain insight into the mechanisms that guide visual search, we propose a biologically inspired computational model that can locate targets without exhaustive sampling and which can generalize to novel objects. The model provides an approximation to the mechanisms integrating bottom-up and top-down signals during search in natural scenes.
Collapse
Affiliation(s)
- Mengmi Zhang
- Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, 138632, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 138632, Singapore
- Visual Intelligence Unit, Image/Video Analytics Dept, A*STAR, Singapore, 138632, Singapore
| | - Jiashi Feng
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 138632, Singapore
| | - Keng Teck Ma
- Artificial Intelligence Program, Agency for Science, Technology and Research, Singapore, 138632, Singapore
| | - Joo Hwee Lim
- Visual Intelligence Unit, Image/Video Analytics Dept, A*STAR, Singapore, 138632, Singapore
| | - Qi Zhao
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, 55455, USA
| | - Gabriel Kreiman
- Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
3
|
Searching the same display twice: Properties of short-term memory in repeated search. Atten Percept Psychophys 2013; 76:335-52. [DOI: 10.3758/s13414-013-0589-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
4
|
Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A Computational Cognitive Model of Information Search in Textual Materials. Cognit Comput 2012. [DOI: 10.1007/s12559-012-9200-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
5
|
Bisley JW, Mirpour K, Arcizet F, Ong WS. The role of the lateral intraparietal area in orienting attention and its implications for visual search. Eur J Neurosci 2011; 33:1982-90. [PMID: 21645094 DOI: 10.1111/j.1460-9568.2011.07700.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Orienting visual attention is of fundamental importance when viewing a visual scene. One of the areas thought to play a role in the guidance of this process is the posterior parietal cortex. In this review, we will describe the way the lateral intraparietal area (LIP) of the posterior parietal cortex acts as a priority map to help guide the allocation of covert attention and eye movements (overt attention). We will explain the concept of a priority map and then show that LIP activity is biased by both bottom-up stimulus-driven factors and top-down cognitive influences, and that this activity can be used to predict the locus of covert attention and initial saccadic latencies in simple visual search tasks. We will then describe evidence for how this system acts during covert visual search and how its activity could be used to optimize overt visual search performance.
Collapse
Affiliation(s)
- James W Bisley
- Department of Neurobiology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| | | | | | | |
Collapse
|
6
|
Wolfe JM, Võ MLH, Evans KK, Greene MR. Visual search in scenes involves selective and nonselective pathways. Trends Cogn Sci 2011; 15:77-84. [PMID: 21227734 DOI: 10.1016/j.tics.2010.12.001] [Citation(s) in RCA: 285] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Revised: 12/01/2010] [Accepted: 12/02/2010] [Indexed: 10/18/2022]
Abstract
How does one find objects in scenes? For decades, visual search models have been built on experiments in which observers search for targets, presented among distractor items, isolated and randomly arranged on blank backgrounds. Are these models relevant to search in continuous scenes? This article argues that the mechanisms that govern artificial, laboratory search tasks do play a role in visual search in scenes. However, scene-based information is used to guide search in ways that had no place in earlier models. Search in scenes might be best explained by a dual-path model: a 'selective' path in which candidate objects must be individually selected for recognition and a 'nonselective' path in which information can be extracted from global and/or statistical information.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Brigham & Women's Hospital, Harvard Medical School, 64 Sidney St. Suite 170, Cambridge, MA 02139, USA.
| | | | | | | |
Collapse
|
7
|
Wolfe JM, Palmer EM, Horowitz TS. Reaction time distributions constrain models of visual search. Vision Res 2010; 50:1304-11. [PMID: 19895828 PMCID: PMC2891283 DOI: 10.1016/j.visres.2009.11.002] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2009] [Revised: 08/18/2009] [Accepted: 11/02/2009] [Indexed: 11/25/2022]
Abstract
Many experiments have investigated visual search for simple stimuli like colored bars or alphanumeric characters. When eye movements are not a limiting factor, these tasks tend to produce roughly linear functions relating reaction time (RT) to the number of items in the display (set size). The slopes of the RTxset size functions for different searches fall on a continuum from highly efficient (slopes near zero) to inefficient (slopes>25-30ms/item). Many theories of search can produce the correct pattern of mean RTs. Producing the correct RT distributions is more difficult. In order to guide future modeling, we have collected a very large data set (about 112,000 trials) on three tasks: an efficient color feature search, an inefficient search for a 2 among 5s, and an intermediate colorxorientation conjunction search. The RT distributions have interesting properties. For example, target absent distributions overlap target present more than would be expected if the decision to end search were based on a simple elapsed time threshold. Other qualitative properties of the RT distributions falsify some classes of model. For example, normalized RT distributions do not change shape as set size changes as a standard self-terminating model predicts that they should.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Visual Attention Laboratory, Brigham and Women's Hospital and Harvard Medical School, 64 Sidney Street, Suite 170, Cambridge, MA 02139-4170, United States.
| | | | | |
Collapse
|
8
|
Mirpour K, Arcizet F, Ong WS, Bisley JW. Been there, seen that: a neural mechanism for performing efficient visual search. J Neurophysiol 2009; 102:3481-91. [PMID: 19812286 DOI: 10.1152/jn.00688.2009] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In everyday life, we efficiently find objects in the world by moving our gaze from one location to another. The efficiency of this process is brought about by ignoring items that are dissimilar to the target and remembering which target-like items have already been examined. We trained two animals on a visual foraging task in which they had to find a reward-loaded target among five task-irrelevant distractors and five potential targets. We found that both animals performed the task efficiently, ignoring the distractors and rarely examining a particular target twice. We recorded the single unit activity of 54 neurons in the lateral intraparietal area (LIP) while the animals performed the task. The responses of the neurons differentiated between targets and distractors throughout the trial. Further, the responses marked off targets that had been fixated by a reduction in activity. This reduction acted like inhibition of return in saliency map models; items that had been fixated would no longer be represented by high enough activity to draw an eye movement. This reduction could also be seen as a correlate of reward expectancy; after a target had been identified as not containing the reward the activity was reduced. Within a trial, responses to the remaining targets did not increase as they became more likely to yield a result, suggesting that only activity related to an event is updated on a moment-by-moment bases. Together, our data show that all the neural activity required to guide efficient search is present in LIP. Because LIP activity is known to correlate with saccade goal selection, we propose that LIP plays a significant role in the guidance of efficient visual search.
Collapse
Affiliation(s)
- Koorosh Mirpour
- Department of Neurobiology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095-1763, USA.
| | | | | | | |
Collapse
|
9
|
|