51
|
van Renswoude DR, van den Berg L, Raijmakers ME, Visser I. Infants’ center bias in free viewing of real-world scenes. Vision Res 2019; 154:44-53. [DOI: 10.1016/j.visres.2018.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 09/29/2018] [Accepted: 10/10/2018] [Indexed: 11/26/2022]
|
52
|
Li A, Chen Z. Representative Scanpath Identification for Group Viewing Pattern Analysis. J Eye Mov Res 2018; 11:10.16910/jemr.11.6.5. [PMID: 33828715 PMCID: PMC7909138 DOI: 10.16910/jemr.11.6.5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Scanpaths are composed of fixations and saccades. Viewing trends reflected by scanpaths play an important role in scientific studies like saccadic model evaluation and real-life applications like artistic design. Several scanpath synthesis methods have been proposed to obtain a scanpath that is representative of the group viewing trend. But most of them either target a specific category of viewing materials like webpages or leave out some useful information like gaze duration. Our previous work defined the representative scanpath as the barycenter of a group of scanpaths, which actually shows the averaged shape of multiple scanpaths. In this paper, we extend our previous framework to take gaze duration into account, obtaining representative scanpaths that describe not only attention distribution and shift but also attention span. The extended framework consists of three steps: Eye-gaze data preprocessing, scanpath aggregation and gaze duration analysis. Experiments demonstrate that the framework can well serve the purpose of mining viewing patterns and "barycenter" based representative scanpaths can better characterize the pattern.
Collapse
Affiliation(s)
- Aoqi Li
- Wuhan University, Wuhan, China
| | | |
Collapse
|
53
|
Berga D, Fdez-Vidal XR, Otazu X, Leborán V, Pardo XM. Psychophysical evaluation of individual low-level feature influences on visual attention. Vision Res 2018; 154:60-79. [PMID: 30408434 DOI: 10.1016/j.visres.2018.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Revised: 10/23/2018] [Accepted: 10/26/2018] [Indexed: 11/16/2022]
Abstract
In this study we provide the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of synthetically-generated image patterns. Design of visual stimuli was inspired by the ones used in previous psychophysical experiments, namely in free-viewing and visual searching tasks, to provide a total of 15 types of stimuli, divided according to the task and feature to be analyzed. Our interest is to analyze the influences of low-level feature contrast between a salient region and the rest of distractors, providing fixation localization characteristics and reaction time of landing inside the salient region. Eye-tracking data was collected from 34 participants during the viewing of a 230 images dataset. Results show that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. This experimentation proposes a new psychophysical basis for saliency model evaluation using synthetic images.
Collapse
Affiliation(s)
- David Berga
- Computer Vision Center, Universitat Autonoma de Barcelona, Spain; Computer Science Department, Universitat Autonoma de Barcelona, Spain.
| | - Xosé R Fdez-Vidal
- Centro de Investigacion en Tecnoloxias da Informacion, Universidade Santiago de Compostela, Spain
| | - Xavier Otazu
- Computer Vision Center, Universitat Autonoma de Barcelona, Spain; Computer Science Department, Universitat Autonoma de Barcelona, Spain
| | - Víctor Leborán
- Centro de Investigacion en Tecnoloxias da Informacion, Universidade Santiago de Compostela, Spain
| | - Xosé M Pardo
- Centro de Investigacion en Tecnoloxias da Informacion, Universidade Santiago de Compostela, Spain
| |
Collapse
|
54
|
Cohen-Lhyver B, Argentieri S, Gas B. The Head Turning Modulation System: An Active Multimodal Paradigm for Intrinsically Motivated Exploration of Unknown Environments. Front Neurorobot 2018; 12:60. [PMID: 30297995 PMCID: PMC6160585 DOI: 10.3389/fnbot.2018.00060] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 08/30/2018] [Indexed: 11/13/2022] Open
Abstract
Over the last 20 years, a significant part of the research in exploratory robotics partially switches from looking for the most efficient way of exploring an unknown environment to finding what could motivate a robot to autonomously explore it. Moreover, a growing literature focuses not only on the topological description of a space (dimensions, obstacles, usable paths, etc.) but rather on more semantic components, such as multimodal objects present in it. In the search of designing robots that behave autonomously by embedding life-long learning abilities, the inclusion of mechanisms of attention is of importance. Indeed, be it endogenous or exogenous, attention constitutes a form of intrinsic motivation for it can trigger motor command toward specific stimuli, thus leading to an exploration of the space. The Head Turning Modulation model presented in this paper is composed of two modules providing a robot with two different forms of intrinsic motivations leading to triggering head movements toward audiovisual sources appearing in unknown environments. First, the Dynamic Weighting module implements a motivation by the concept of Congruence, a concept defined as an adaptive form of semantic saliency specific for each explored environment. Then, the Multimodal Fusion and Inference module implements a motivation by the reduction of Uncertainty through a self-supervised online learning algorithm that can autonomously determine local consistencies. One of the novelty of the proposed model is to solely rely on semantic inputs (namely audio and visual labels the sources belong to), in opposition to the traditional analysis of the low-level characteristics of the perceived data. Another contribution is found in the way the exploration is exploited to actively learn the relationship between the visual and auditory modalities. Importantly, the robot-endowed with binocular vision, binaural audition and a rotating head-does not have access to prior information about the different environments it will explore. Consequently, it will have to learn in real-time what audiovisual objects are of "importance" in order to rotate its head toward them. Results presented in this paper have been obtained in simulated environments as well as with a real robot in realistic experimental conditions.
Collapse
Affiliation(s)
- Benjamin Cohen-Lhyver
- CNRS, Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, Paris, France
| | - Sylvain Argentieri
- CNRS, Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, Paris, France
| | - Bruno Gas
- CNRS, Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, Paris, France
| |
Collapse
|
55
|
Oculomotor behavior during non-visual tasks: The role of visual saliency. PLoS One 2018; 13:e0198242. [PMID: 29933381 PMCID: PMC6014668 DOI: 10.1371/journal.pone.0198242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 05/16/2018] [Indexed: 11/19/2022] Open
Abstract
Background During visual exploration or free-view, gaze positioning is largely determined by the tendency to maximize visual saliency: more salient locations are more likely to be fixated. However, when visual input is completely irrelevant for performance, such as with non-visual tasks, this saliency maximization strategy may be less advantageous and potentially even disruptive for task-performance. Here, we examined whether visual saliency remains a strong driving force in determining gaze positions even in non-visual tasks. We tested three alternative hypotheses: a) That saliency is disadvantageous for non-visual tasks and therefore gaze would tend to shift away from it and towards non-salient locations; b) That saliency is irrelevant during non-visual tasks and therefore gaze would not be directed towards it but also not away-from it; c) That saliency maximization is a strong behavioral drive that would prevail even during non-visual tasks. Methods Gaze position was monitored as participants performed visual or non-visual tasks while they were presented with complex or simple images. The effect of attentional demands was examined by comparing an easy non-visual task with a more difficult one. Results Exploratory behavior was evident, regardless of task difficulty, even when the task was non-visual and the visual input was entirely irrelevant. The observed exploratory behaviors included a strong tendency to fixate salient locations, central fixation bias and a gradual reduction in saliency for later fixations. These exploratory behaviors were spatially similar to those of an explicit visual exploration task but they were, nevertheless, attenuated. Temporal differences were also found: in the non-visual task there were longer fixations and later first fixations than in the visual task, reflecting slower visual sampling in this task. Conclusion We conclude that in the presence of a rich visual environment, visual exploration is evident even when there is no explicit instruction to explore. Compared to visually motivated tasks, exploration in non-visual tasks follows similar selection mechanisms, but occurs at a lower rate. This is consistent with the view that the non-visual task is the equivalent of a dual-task: it combines the instructed task with an uninstructed, perhaps even mandatory, exploratory behavior.
Collapse
|
56
|
Sitzmann V, Serrano A, Pavel A, Agrawala M, Gutierrez D, Masia B, Wetzstein G. Saliency in VR: How Do People Explore Virtual Environments? IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1633-1642. [PMID: 29553930 DOI: 10.1109/tvcg.2018.2793599] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Understanding how people explore immersive virtual environments is crucial for many applications, such as designing virtual reality (VR) content, developing new compression algorithms, or learning computational models of saliency or visual attention. Whereas a body of recent work has focused on modeling saliency in desktop viewing conditions, VR is very different from these conditions in that viewing behavior is governed by stereoscopic vision and by the complex interaction of head orientation, gaze, and other kinematic constraints. To further our understanding of viewing behavior and saliency in VR, we capture and analyze gaze and head orientation data of 169 users exploring stereoscopic, static omni-directional panoramas, for a total of 1980 head and gaze trajectories for three different viewing conditions. We provide a thorough analysis of our data, which leads to several important insights, such as the existence of a particular fixation bias, which we then use to adapt existing saliency predictors to immersive VR conditions. In addition, we explore other applications of our data and analysis, including automatic alignment of VR video cuts, panorama thumbnails, panorama video synopsis, and saliency-basedcompression.
Collapse
|
57
|
Abstract
How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.
Collapse
Affiliation(s)
| | - Janet H Hsiao
- Department of Psychology, The University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Antoni B Chan
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| |
Collapse
|
58
|
Le Meur O, Coutrot A, Liu Z, Rama P, Le Roch A, Helo A. Visual Attention Saccadic Models Learn to Emulate Gaze Patterns From Childhood to Adulthood. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:4777-4789. [PMID: 28682255 DOI: 10.1109/tip.2017.2722238] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
How people look at visual information reveals fundamental information about themselves, their interests and their state of mind. While previous visual attention models output static 2D saliency maps, saccadic models aim to predict not only where observers look at but also how they move their eyes to explore the scene. In this paper, we demonstrate that saccadic models are a flexible framework that can be tailored to emulate observer's viewing tendencies. More specifically, we use fixation data from 101 observers split into five age groups (adults, 8-10 y.o., 6-8 y.o., 4-6 y.o., and 2 y.o.) to train our saccadic model for different stages of the development of human visual system. We show that the joint distribution of saccade amplitude and orientation is a visual signature specific to each age group, and can be used to generate age-dependent scan paths. Our age-dependent saccadic model does not only output human-like, age-specific visual scan paths, but also significantly outperforms other state-of-the-art saliency models. We demonstrate that the computational modeling of visual attention, through the use of saccadic model, can be efficiently adapted to emulate the gaze behavior of a specific group of observers.
Collapse
|
59
|
Ito J, Yamane Y, Suzuki M, Maldonado P, Fujita I, Tamura H, Grün S. Switch from ambient to focal processing mode explains the dynamics of free viewing eye movements. Sci Rep 2017; 7:1082. [PMID: 28439075 PMCID: PMC5430715 DOI: 10.1038/s41598-017-01076-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 03/22/2017] [Indexed: 11/21/2022] Open
Abstract
Previous studies have reported that humans employ ambient and focal modes of visual exploration while they freely view natural scenes. These two modes have been characterized based on eye movement parameters such as saccade amplitude and fixation duration, but not by any visual features of the viewed scenes. Here we propose a new characterization of eye movements during free viewing based on how eyes are moved from and to objects in a visual scene. We applied this characterization to data obtained from freely-viewing macaque monkeys. We show that the analysis based on this characterization gives a direct indication of a behavioral shift from ambient to focal processing mode along the course of free viewing exploration. We further propose a stochastic model of saccade sequence generation incorporating a switch between the two processing modes, which quantitatively reproduces the behavioral features observed in the data.
Collapse
Affiliation(s)
- Junji Ito
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA BRAIN Institute I, Jülich Research Centre, Jülich, Germany.
| | - Yukako Yamane
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Center for Information and Neural Networks, Osaka University and National Institute of Information and Communications Technology, Osaka, Japan
| | - Mika Suzuki
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| | - Pedro Maldonado
- BNI, CENEM and Programa de Fisiología y Biofísica, ICBM, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Ichiro Fujita
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Center for Information and Neural Networks, Osaka University and National Institute of Information and Communications Technology, Osaka, Japan
| | - Hiroshi Tamura
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Center for Information and Neural Networks, Osaka University and National Institute of Information and Communications Technology, Osaka, Japan
| | - Sonja Grün
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA BRAIN Institute I, Jülich Research Centre, Jülich, Germany
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
- Theoretical Systems Neurobiology, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
60
|
Naqvi RA, Arsalan M, Park KR. Fuzzy System-Based Target Selection for a NIR Camera-Based Gaze Tracker. SENSORS 2017; 17:s17040862. [PMID: 28420114 PMCID: PMC5424739 DOI: 10.3390/s17040862] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 03/29/2017] [Accepted: 04/11/2017] [Indexed: 11/16/2022]
Abstract
Gaze-based interaction (GBI) techniques have been a popular subject of research in the last few decades. Among other applications, GBI can be used by persons with disabilities to perform everyday tasks, as a game interface, and can play a pivotal role in the human computer interface (HCI) field. While gaze tracking systems have shown high accuracy in GBI, detecting a user’s gaze for target selection is a challenging problem that needs to be considered while using a gaze detection system. Past research has used the blinking of the eyes for this purpose as well as dwell time-based methods, but these techniques are either inconvenient for the user or requires a long time for target selection. Therefore, in this paper, we propose a method for fuzzy system-based target selection for near-infrared (NIR) camera-based gaze trackers. The results of experiments performed in addition to tests of the usability and on-screen keyboard use of the proposed method show that it is better than previous methods.
Collapse
Affiliation(s)
- Rizwan Ali Naqvi
- Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 100-715, Korea.
| | - Muhammad Arsalan
- Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 100-715, Korea.
| | - Kang Ryoung Park
- Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 100-715, Korea.
| |
Collapse
|
61
|
Mills M, Alwatban M, Hage B, Barney E, Truemper EJ, Bashford GR, Dodd MD. Cerebral hemodynamics during scene viewing: Hemispheric lateralization predicts temporal gaze behavior associated with distinct modes of visual processing. J Exp Psychol Hum Percept Perform 2017; 43:1291-1302. [PMID: 28287758 DOI: 10.1037/xhp0000357] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Systematic patterns of eye movements during scene perception suggest a functional distinction between 2 viewing modes: an ambient mode (characterized by short fixations and large saccades) thought to reflect dorsal activity involved with spatial analysis, and a focal mode (characterized by long fixations and small saccades) thought to reflect ventral activity involved with object analysis. Little neuroscientific evidence exists supporting this claim. Here, functional transcranial Doppler ultrasound (fTCD) was used to investigate whether these modes show hemispheric specialization. Participants viewed scenes for 20 s under instructions to search or memorize. Overall, early viewing was right lateralized, whereas later viewing was left lateralized. This right-to-left shift interacted with viewing task (more pronounced in the memory task). Importantly, changes in lateralization correlated with changes in eye movements. This is the first demonstration of right hemisphere bias for eye movements servicing spatial analysis and left hemisphere bias for eye movements servicing object analysis. (PsycINFO Database Record
Collapse
Affiliation(s)
| | - Mohammed Alwatban
- Department of Biological Systems Engineering, University of Nebraska-Lincoln
| | - Benjamin Hage
- Department of Biological Systems Engineering, University of Nebraska-Lincoln
| | - Erin Barney
- Department of Biological Systems Engineering, University of Nebraska- Lincoln
| | - Edward J Truemper
- Department of Biological Systems Engineering, University of Nebraska-Lincoln
| | - Gregory R Bashford
- Department of Biological Systems Engineering, University of Nebraska-Lincoln
| | | |
Collapse
|
62
|
Influence of initial fixation position in scene viewing. Vision Res 2016; 129:33-49. [PMID: 27771330 DOI: 10.1016/j.visres.2016.09.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 07/30/2016] [Accepted: 09/16/2016] [Indexed: 11/21/2022]
Abstract
During scene perception our eyes generate complex sequences of fixations. Predictors of fixation locations are bottom-up factors such as luminance contrast, top-down factors like viewing instruction, and systematic biases, e.g., the tendency to place fixations near the center of an image. However, comparatively little is known about the dynamics of scanpaths after experimental manipulation of specific fixation locations. Here we investigate the influence of initial fixation position on subsequent eye-movement behavior on an image. We presented 64 colored photographs to participants who started their scanpaths from one of two experimentally controlled positions in the right or left part of an image. Additionally, we used computational models to predict the images' fixation locations and classified them as balanced images or images with high conspicuity on either the left or right side of a picture. The manipulation of the starting position influenced viewing behavior for several seconds and produced a tendency to overshoot to the image side opposite to the starting position. Possible mechanisms for the generation of this overshoot were investigated using numerical simulations of statistical and dynamical models. Our model comparisons show that inhibitory tagging is a viable mechanism for dynamical planning of scanpaths.
Collapse
|
63
|
Wang Y, Wang B, Wu X, Zhang L. Scanpath estimation based on foveated image saliency. Cogn Process 2016; 18:87-95. [PMID: 27743143 DOI: 10.1007/s10339-016-0781-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 10/06/2016] [Indexed: 10/20/2022]
Abstract
The estimation of gaze shift has been an important research area in saliency modeling. Gaze movement is a dynamic progress, yet existing estimation methods are limited to estimating scanpaths within only one saliency map, providing results with unsatisfactory accuracy. A bio-inspired method for gaze shift prediction is thus proposed. We take the effect of foveation into account in the proposed model, which plays an important role in the search for dynamic salient regions. The saccadic bias of gaze shifts and the mechanism of inhibition of return in short-term memory are also considered. Based on the probability map derived from these factors, candidates for the next fixation can be randomly generated, and the final scanpath can be acquired point by point. By the evaluation of objective measures, experimental results show that this method possesses better performance in several datasets than many existing models do.
Collapse
Affiliation(s)
- Yixiu Wang
- Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai, 200433, China.,Research Center of Smart Networks and Systems, School of Information Science and Technology, Fudan University, Shanghai, 200433, China
| | - Bin Wang
- Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai, 200433, China. .,Research Center of Smart Networks and Systems, School of Information Science and Technology, Fudan University, Shanghai, 200433, China.
| | - Xiaofeng Wu
- Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai, 200433, China.,Research Center of Smart Networks and Systems, School of Information Science and Technology, Fudan University, Shanghai, 200433, China
| | - Liming Zhang
- Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai, 200433, China.,Research Center of Smart Networks and Systems, School of Information Science and Technology, Fudan University, Shanghai, 200433, China
| |
Collapse
|
64
|
Abstract
A key component of interacting with the world is how to direct ones' sensors so as to extract task-relevant information - a process referred to as active sensing. In this review, we present a framework for active sensing that forms a closed loop between an ideal observer, that extracts task-relevant information from a sequence of observations, and an ideal planner which specifies the actions that lead to the most informative observations. We discuss active sensing as an approximation to exploration in the wider framework of reinforcement learning, and conversely, discuss several sensory, perceptual, and motor processes as approximations to active sensing. Based on this framework, we introduce a taxonomy of sensing strategies, identify hallmarks of active sensing, and discuss recent advances in formalizing and quantifying active sensing.
Collapse
Affiliation(s)
- Scott Cheng-Hsin Yang
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
| | - Daniel M Wolpert
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
| | - Máté Lengyel
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK.,Department of Cognitive Science, Central European University, Budapest H-1051, Hungary
| |
Collapse
|
65
|
Wang J, Borji A, Jay Kuo CC, Itti L. Learning a Combined Model of Visual Saliency for Fixation Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1566-1579. [PMID: 26829792 DOI: 10.1109/tip.2016.2522380] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A large number of saliency models, each based on a different hypothesis, have been proposed over the past 20 years. In practice, while subscribing to one hypothesis or computational principle makes a model that performs well on some types of images, it hinders the general performance of a model on arbitrary images and large-scale data sets. One natural approach to improve overall saliency detection accuracy would then be fusing different types of models. In this paper, inspired by the success of late-fusion strategies in semantic analysis and multi-modal biometrics, we propose to fuse the state-of-the-art saliency models at the score level in a para-boosting learning fashion. First, saliency maps generated by several models are used as confidence scores. Then, these scores are fed into our para-boosting learner (i.e., support vector machine, adaptive boosting, or probability density estimator) to generate the final saliency map. In order to explore the strength of para-boosting learners, traditional transformation-based fusion strategies, such as Sum, Min, and Max, are also explored and compared in this paper. To further reduce the computation cost of fusing too many models, only a few of them are considered in the next step. Experimental results show that score-level fusion outperforms each individual model and can further reduce the performance gap between the current models and the human inter-observer model.
Collapse
|
66
|
|