351
|
Fu K, Gong C, Gu IYH, Yang J. Normalized cut-based saliency detection by adaptive multi-level region merging. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:5671-5683. [PMID: 26441448 DOI: 10.1109/tip.2015.2485782] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Existing salient object detection models favor over-segmented regions upon which saliency is computed. Such local regions are less effective on representing object holistically and degrade emphasis of entire salient objects. As a result, the existing methods often fail to highlight an entire object in complex background. Toward better grouping of objects and background, in this paper, we consider graph cut, more specifically, the normalized graph cut (Ncut) for saliency detection. Since the Ncut partitions a graph in a normalized energy minimization fashion, resulting eigenvectors of the Ncut contain good cluster information that may group visual contents. Motivated by this, we directly induce saliency maps via eigenvectors of the Ncut, contributing to accurate saliency estimation of visual clusters. We implement the Ncut on a graph derived from a moderate number of superpixels. This graph captures both intrinsic color and edge information of image data. Starting from the superpixels, an adaptive multi-level region merging scheme is employed to seek such cluster information from Ncut eigenvectors. With developed saliency measures for each merged region, encouraging performance is obtained after across-level integration. Experiments by comparing with 13 existing methods on four benchmark datasets, including MSRA-1000, SOD, SED, and CSSD show the proposed method, Ncut saliency, results in uniform object enhancement and achieves comparable/better performance to the state-of-the-art methods.
Collapse
|
352
|
|
353
|
Napoletano P, Boccignone G, Tisato F. Attentive Monitoring of Multiple Video Streams Driven by a Bayesian Foraging Strategy. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:3266-3281. [PMID: 25966475 DOI: 10.1109/tip.2015.2431438] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we shall consider the problem of deploying attention to the subsets of the video streams for collating the most relevant data and information of interest related to a given task. We formalize this monitoring problem as a foraging problem. We propose a probabilistic framework to model observer's attentive behavior as the behavior of a forager. The forager, moment to moment, focuses its attention on the most informative stream/camera, detects interesting objects or activities, or switches to a more profitable stream. The approach proposed here is suitable to be exploited for multistream video summarization. Meanwhile, it can serve as a preliminary step for more sophisticated video surveillance, e.g., activity and behavior analysis. Experimental results achieved on the UCR Videoweb Activities Data Set, a publicly available data set, are presented to illustrate the utility of the proposed technique.
Collapse
|
354
|
Zhou L, Yang Z, Yuan Q, Zhou Z, Hu D. Salient Region Detection via Integrating Diffusion-Based Compactness and Local Contrast. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:3308-3320. [PMID: 26080382 DOI: 10.1109/tip.2015.2438546] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Salient region detection is a challenging problem and an important topic in computer vision. It has a wide range of applications, such as object recognition and segmentation. Many approaches have been proposed to detect salient regions using different visual cues, such as compactness, uniqueness, and objectness. However, each visual cue-based method has its own limitations. After analyzing the advantages and limitations of different visual cues, we found that compactness and local contrast are complementary to each other. In addition, local contrast can very effectively recover incorrectly suppressed salient regions using compactness cues. Motivated by this, we propose a bottom-up salient region detection method that integrates compactness and local contrast cues. Furthermore, to produce a pixel-accurate saliency map that more uniformly covers the salient objects, we propagate the saliency information using a diffusion process. Our experimental results on four benchmark data sets demonstrate the effectiveness of the proposed method. Our method produces more accurate saliency maps with better precision-recall curve and higher F-Measure than other 19 state-of-the-arts approaches on ASD, CSSD, and ECSSD data sets.
Collapse
|
355
|
Hua Y, Yang M, Zhao Z, Zhou R, Cai A. On semantic-instructed attention: From video eye-tracking dataset to memory-guided probabilistic saliency model. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.05.033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
356
|
Opitz R, Limp WF. Recent Developments in High-Density Survey and Measurement (HDSM) for Archaeology: Implications for Practice and Theory. ANNUAL REVIEW OF ANTHROPOLOGY 2015. [DOI: 10.1146/annurev-anthro-102214-013845] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
HDSM, high-density survey and measurement, is the collective term for a range of new technologies that give us the ability to measure, record, and analyze the spatial, locational, and morphological properties of objects, sites, structures, and landscapes with higher density and more precision than ever before. This article considers HDSM technologies, including airborne lidar, real-time kinematic global navigation satellite system (GNSS) survey, robotic total stations, terrestrial laser scanning, structured light scanning and close-range photogrammetry [CRP, also known as structure from motion (SfM)], and unmanned aerial vehicle (UAV)-based SfM/CRP and scanning, and we discuss the impact of these technologies on contemporary archaeological practice. This article reflects on how the democratization and proliferation of HDSM opens various applications and greatly broadens the set of problems being addressed explicitly and directly through shape and place.
Collapse
Affiliation(s)
- Rachel Opitz
- Center for Advanced Spatial Technologies (CAST), University of Arkansas, Fayetteville, Arkansas 72701;,
| | - W. Fred Limp
- Center for Advanced Spatial Technologies (CAST), University of Arkansas, Fayetteville, Arkansas 72701;,
| |
Collapse
|
357
|
Calvo MG, Gutiérrez-García A, del Líbano M. Sensitivity to emotional scene content outside the focus of attention. Acta Psychol (Amst) 2015; 161:36-44. [PMID: 26301803 DOI: 10.1016/j.actpsy.2015.08.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 06/08/2015] [Accepted: 08/08/2015] [Indexed: 10/23/2022] Open
Abstract
We investigated whether the emotional content of visual scenes depicting people is processed in peripheral vision. Emotional or neutral scene photographs were paired with a matched scrambled image for 150ms in peripheral vision (≥5°). The pictures were immediately followed by a digit or letter in a discrimination task. Interference (i.e., slowed reaction times) with performance in this task indexed the processing resources drawn by the pictures. Twelve types of specific emotional scene contents (e.g., erotica or mutilation) were compared. Results showed, first, that emotional scenes caused greater interference than neutral scenes, in the absence of fixations. This suggests that emotional scenes are processed and draw covert attention outside the focus of overt attention. Second, interference was similar for female and male participants with pleasant scenes (except for erotica), but females were more affected by all types of unpleasant scenes than males. This reveals that sensitivity to peripheral vision is modulated by sex and affective valence. Third, low-level image properties, visual saliency, and size of bodies and faces, were generally equivalent for emotional and neutral scenes. This rules out the alternative hypothesis of a contribution of non-emotional, purely perceptual factors.
Collapse
|
358
|
|
359
|
Wang W, Shen J, Li X, Porikli F. Robust video object cosegmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:3137-3148. [PMID: 26080051 DOI: 10.1109/tip.2015.2438550] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
With ever-increasing volumes of video data, automatic extraction of salient object regions became even more significant for visual analytic solutions. This surge has also opened up opportunities for taking advantage of collective cues encapsulated in multiple videos in a cooperative manner. However, it also brings up major challenges, such as handling of drastic appearance, motion pattern, and pose variations, of foreground objects as well as indiscriminate backgrounds. Here, we present a cosegmentation framework to discover and segment out common object regions across multiple frames and multiple videos in a joint fashion. We incorporate three types of cues, i.e., intraframe saliency, interframe consistency, and across-video similarity into an energy optimization framework that does not make restrictive assumptions on foreground appearance and motion model, and does not require objects to be visible in all frames. We also introduce a spatio-temporal scale-invariant feature transform (SIFT) flow descriptor to integrate across-video correspondence from the conventional SIFT-flow into interframe motion flow from optical flow. This novel spatio-temporal SIFT flow generates reliable estimations of common foregrounds over the entire video data set. Experimental results show that our method outperforms the state-of-the-art on a new extensive data set (ViCoSeg).
Collapse
|
360
|
Abstract
In this paper, visual attention spreading is formulated as a nonlocal diffusion equation. Different from other diffusion-based methods, a nonlocal diffusion tensor is introduced to consider both the diffusion strength and the diffusion direction. With the help of diffusion tensor, along with the principle direction, the diffusion has been suppressed to preserve the dissimilarity between the foreground and background, while in other directions, the diffusion has been boosted to combine the similar regions and highlight the salient object as a whole. Through a two-stages diffusion, the final saliency maps are obtained. Extensive quantitative or visual comparisons are performed on three widely used benchmark datasets, i.e. MSRA-ASD, MSRA-B and PASCAL-1500 datasets. Experimental results demonstrate the superior performance of our method.
Collapse
Affiliation(s)
- Xiujun Zhang
- College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen 518060, Guangdong, P. R. China
| | - Chen Xu
- Institute of Intelligent Computing Science, Shenzhen University, Nanhai Ave 3688, Shenzhen 518060, Guangdong, P. R. China
| | - Xiaoli Sun
- College of Mathematics and Computational Science, Shenzhen University, Nanhai Ave 3688, Shenzhen 518060, Guangdong, P. R. China
| | - George Baciu
- GAMA Lab, Department of Computing, The Hong Kong Polytechnic University, Hong Kong
| |
Collapse
|
361
|
Orquin JL, Lagerkvist CJ. Effects of salience are both short- and long-lived. Acta Psychol (Amst) 2015; 160:69-76. [PMID: 26188691 DOI: 10.1016/j.actpsy.2015.07.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 06/10/2015] [Accepted: 07/06/2015] [Indexed: 10/23/2022] Open
Abstract
A salient object can attract attention irrespective of its relevance to current goals. However, this bottom up effect tends to be short-lived (e.g. <150 ms) and it is generally assumed that top down processes such as goals or task instructions operating in later time windows override the effect of salience operating in early time windows. While the majority of studies on visual search and scene viewing comply with the assumptions of top down and bottom up processes operating in different time windows and that the former overrides the latter, we point to some possible anomalies in decision research. To explore these anomalies and thereby test the two key assumptions, we manipulate the salience and valence of one information cue in a decision task. Our analyses reveal that in decision tasks top down and bottom up processes do not operate in different time windows as predicted, nor does the former process necessarily override the latter. Instead, we find that the maximum effect of salience on the likelihood of making a saccade to the target cue is delayed until about 20 saccades after stimulus onset and that the effects of salience and valence are additive rather than multiplicative. Further, we find that in the positive and neutral valence conditions, salience continues to exert pressure on saccadic latency, i.e. the interval between saccades to the target with high salience targets being fixated faster than low salience targets. Our findings challenge the assumption that top down and bottom up processes operate in different time windows and the assumption that top down processes necessarily override bottom up processes.
Collapse
|
362
|
Yoo BS, Kim JH. Fuzzy Integral-Based Gaze Control of a Robotic Head for Human Robot Interaction. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1769-1783. [PMID: 25312975 DOI: 10.1109/tcyb.2014.2360205] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
During the last few decades, as a part of effort to enhance natural human robot interaction (HRI), considerable research has been carried out to develop human-like gaze control. However, most studies did not consider hardware implementation, real-time processing, and the real environment, factors that should be taken into account to achieve natural HRI. This paper proposes a fuzzy integral-based gaze control algorithm, operating in real-time and the real environment, for a robotic head. We formulate the gaze control as a multicriteria decision making problem and devise seven human gaze-inspired criteria. Partial evaluations of all candidate gaze directions are carried out with respect to the seven criteria defined from perceived visual, auditory, and internal inputs, and fuzzy measures are assigned to a power set of the criteria to reflect the user defined preference. A fuzzy integral of the partial evaluations with respect to the fuzzy measures is employed to make global evaluations of all candidate gaze directions. The global evaluation values are adjusted by applying inhibition of return and are compared with the global evaluation values of the previous gaze directions to decide the final gaze direction. The effectiveness of the proposed algorithm is demonstrated with a robotic head, developed in the Robot Intelligence Technology Laboratory at Korea Advanced Institute of Science and Technology, through three interaction scenarios and three comparison scenarios with another algorithm.
Collapse
|
363
|
Liu L, Gao SS, Bailey ST, Huang D, Li D, Jia Y. Automated choroidal neovascularization detection algorithm for optical coherence tomography angiography. BIOMEDICAL OPTICS EXPRESS 2015; 6:3564-76. [PMID: 26417524 PMCID: PMC4574680 DOI: 10.1364/boe.6.003564] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Revised: 08/20/2015] [Accepted: 08/20/2015] [Indexed: 05/03/2023]
Abstract
Optical coherence tomography angiography has recently been used to visualize choroidal neovascularization (CNV) in participants with age-related macular degeneration. Identification and quantification of CNV area is important clinically for disease assessment. An automated algorithm for CNV area detection is presented in this article. It relies on denoising and a saliency detection model to overcome issues such as projection artifacts and the heterogeneity of CNV. Qualitative and quantitative evaluations were performed on scans of 7 participants. Results from the algorithm agreed well with manual delineation of CNV area.
Collapse
Affiliation(s)
- Li Liu
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
- College of Physics and Electronics, Shandong Normal University, Jinan, China
| | - Simon S. Gao
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Steven T. Bailey
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - David Huang
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Dengwang Li
- College of Physics and Electronics, Shandong Normal University, Jinan, China
| | - Yali Jia
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
364
|
Carlin MA, Elhilali M. Modeling attention-driven plasticity in auditory cortical receptive fields. Front Comput Neurosci 2015; 9:106. [PMID: 26347643 PMCID: PMC4541291 DOI: 10.3389/fncom.2015.00106] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 07/30/2015] [Indexed: 11/24/2022] Open
Abstract
To navigate complex acoustic environments, listeners adapt neural processes to focus on behaviorally relevant sounds in the acoustic foreground while minimizing the impact of distractors in the background, an ability referred to as top-down selective attention. Particularly striking examples of attention-driven plasticity have been reported in primary auditory cortex via dynamic reshaping of spectro-temporal receptive fields (STRFs). By enhancing the neural response to features of the foreground while suppressing those to the background, STRFs can act as adaptive contrast matched filters that directly contribute to an improved cognitive segregation between behaviorally relevant and irrelevant sounds. In this study, we propose a novel discriminative framework for modeling attention-driven plasticity of STRFs in primary auditory cortex. The model describes a general strategy for cortical plasticity via an optimization that maximizes discriminability between the foreground and distractors while maintaining a degree of stability in the cortical representation. The first instantiation of the model describes a form of feature-based attention and yields STRF adaptation patterns consistent with a contrast matched filter previously reported in neurophysiological studies. An extension of the model captures a form of object-based attention, where top-down signals act on an abstracted representation of the sensory input characterized in the modulation domain. The object-based model makes explicit predictions in line with limited neurophysiological data currently available but can be readily evaluated experimentally. Finally, we draw parallels between the model and anatomical circuits reported to be engaged during active attention. The proposed model strongly suggests an interpretation of attention-driven plasticity as a discriminative adaptation operating at the level of sensory cortex, in line with similar strategies previously described across different sensory modalities.
Collapse
Affiliation(s)
- Michael A Carlin
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University Baltimore, MD, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University Baltimore, MD, USA
| |
Collapse
|
365
|
Souly N, Shah M. Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes. Int J Comput Vis 2015. [DOI: 10.1007/s11263-015-0853-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
366
|
Beltrán D, Calvo MG. Brain signatures of perceiving a smile: Time course and source localization. Hum Brain Mapp 2015; 36:4287-303. [PMID: 26252428 DOI: 10.1002/hbm.22917] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 07/04/2015] [Accepted: 07/14/2015] [Indexed: 01/18/2023] Open
Abstract
Facial happiness is consistently recognized faster than other expressions of emotion. In this study, to determine when and where in the brain such a recognition advantage develops, EEG activity during an expression categorization task was subjected to temporospatial PCA analysis and LAURA source localizations. Happy, angry, and neutral faces were presented either in whole or bottom-half format (with the mouth region visible). The comparison of part- versus whole-face conditions served to examine the role of the smile. Two neural signatures underlying the happy face advantage emerged. One peaked around 140 ms (left N140) and was source-located at the left IT cortex (MTG), with greater activity for happy versus non-happy faces in both whole and bottom-half face format. This suggests an enhanced perceptual encoding mechanism for salient smiles. The other peaked around 370 ms (P3b and N3) and was located at the right IT (FG) and dorsal cingulate (CC) cortices, with greater activity specifically for bottom-half happy versus non-happy faces. This suggests an enhanced recruitment of face-specific information to categorize (or reconstruct) facial happiness from diagnostic smiling mouths. Additional differential brain responses revealed a specific "anger effect," with greater activity for angry versus non-angry expressions (right N170 and P230; right pSTS and IPL); and a coarse "emotion effect," with greater activity for happy and angry versus neutral expressions (anterior P2 and posterior N170; vmPFC and right IFG).
Collapse
Affiliation(s)
- David Beltrán
- Department of Cognitive Psychology, University of La Laguna, Tenerife, Spain
| | - Manuel G Calvo
- Department of Cognitive Psychology, University of La Laguna, Tenerife, Spain
| |
Collapse
|
367
|
Jian M, Lam KM, Dong J, Shen L. Visual-Patch-Attention-Aware Saliency Detection. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1575-1586. [PMID: 25291809 DOI: 10.1109/tcyb.2014.2356200] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The human visual system (HVS) can reliably perceive salient objects in an image, but, it remains a challenge to computationally model the process of detecting salient objects without prior knowledge of the image contents. This paper proposes a visual-attention-aware model to mimic the HVS for salient-object detection. The informative and directional patches can be seen as visual stimuli, and used as neuronal cues for humans to interpret and detect salient objects. In order to simulate this process, two typical patches are extracted individually and in parallel from the intensity channel and the discriminant color channel, respectively, as the primitives. In our algorithm, an improved wavelet-based salient-patch detector is used to extract the visually informative patches. In addition, as humans are sensitive to orientation features, and as directional patches are reliable cues, we also propose a method for extracting directional patches. These two different types of patches are then combined to form the most important patches, which are called preferential patches and are considered as the visual stimuli applied to the HVS for salient-object detection. Compared with the state-of-the-art methods for salient-object detection, experimental results using publicly available datasets show that our produced algorithm is reliable and effective.
Collapse
|
368
|
Calvo MG, Nummenmaa L. Perceptual and affective mechanisms in facial expression recognition: An integrative review. Cogn Emot 2015. [PMID: 26212348 DOI: 10.1080/02699931.2015.1049124] [Citation(s) in RCA: 124] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Facial expressions of emotion involve a physical component of morphological changes in a face and an affective component conveying information about the expresser's internal feelings. It remains unresolved how much recognition and discrimination of expressions rely on the perception of morphological patterns or the processing of affective content. This review of research on the role of visual and emotional factors in expression recognition reached three major conclusions. First, behavioral, neurophysiological, and computational measures indicate that basic expressions are reliably recognized and discriminated from one another, albeit the effect may be inflated by the use of prototypical expression stimuli and forced-choice responses. Second, affective content along the dimensions of valence and arousal is extracted early from facial expressions, although this coarse affective representation contributes minimally to categorical recognition of specific expressions. Third, the physical configuration and visual saliency of facial features contribute significantly to expression recognition, with "emotionless" computational models being able to reproduce some of the basic phenomena demonstrated in human observers. We conclude that facial expression recognition, as it has been investigated in conventional laboratory tasks, depends to a greater extent on perceptual than affective information and mechanisms.
Collapse
Affiliation(s)
- Manuel G Calvo
- a Department of Cognitive Psychology , University of La Laguna , Tenerife , Spain
| | - Lauri Nummenmaa
- b School of Science , Aalto University , Espoo , Finland.,c Department of Psychology and Turku PET Centre , University of Turku , Turku , Finland
| |
Collapse
|
369
|
|
370
|
Xu J, Yue S. Building up a Bio-Inspired Visual Attention Model by Integrating Top-Down Shape Bias and Improved Mean Shift Adaptive Segmentation. INT J PATTERN RECOGN 2015. [DOI: 10.1142/s0218001415550058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The driver-assistance system (DAS) becomes quite necessary in-vehicle equipment nowadays due to the large number of road traffic accidents worldwide. An efficient DAS detecting hazardous situations robustly is key to reduce road accidents. The core of a DAS is to identify salient regions or regions of interest relevant to visual attended objects in real visual scenes for further process. In order to achieve this goal, we present a method to locate regions of interest automatically based on a novel adaptive mean shift segmentation algorithm to obtain saliency objects. In the proposed mean shift algorithm, we use adaptive Bayesian bandwidth to find the convergence of all data points by iterations and the k-nearest neighborhood queries. Experiments showed that the proposed algorithm is efficient, and yields better visual salient regions comparing with ground-truth benchmark. The proposed algorithm continuously outperformed other known visual saliency methods, generated higher precision and better recall rates, when challenged with natural scenes collected locally and one of the largest publicly available data sets. The proposed algorithm can also be extended naturally to detect moving vehicles in dynamic scenes once integrated with top-down shape biased cues, as demonstrated in our experiments.
Collapse
Affiliation(s)
- Jiawei Xu
- School of Computer Science, University of Lincoln, Lincoln LN6 7TS, UK
| | - Shigang Yue
- School of Computer Science, University of Lincoln, Lincoln LN6 7TS, UK
| |
Collapse
|
371
|
Affordance Estimation Enhances Artificial Visual Attention: Evidence from a Change-Blindness Study. Cognit Comput 2015. [DOI: 10.1007/s12559-015-9329-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
372
|
Frutos-Pascual M, Garcia-Zapirain B. Assessing visual attention using eye tracking sensors in intelligent cognitive therapies based on serious games. SENSORS (BASEL, SWITZERLAND) 2015; 15:11092-117. [PMID: 25985158 PMCID: PMC4481919 DOI: 10.3390/s150511092] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Revised: 04/22/2015] [Accepted: 04/27/2015] [Indexed: 12/04/2022]
Abstract
This study examines the use of eye tracking sensors as a means to identify children's behavior in attention-enhancement therapies. For this purpose, a set of data collected from 32 children with different attention skills is analyzed during their interaction with a set of puzzle games. The authors of this study hypothesize that participants with better performance may have quantifiably different eye-movement patterns from users with poorer results. The use of eye trackers outside the research community may help to extend their potential with available intelligent therapies, bringing state-of-the-art technologies to users. The use of gaze data constitutes a new information source in intelligent therapies that may help to build new approaches that are fully-customized to final users' needs. This may be achieved by implementing machine learning algorithms for classification. The initial study of the dataset has proven a 0.88 (±0.11) classification accuracy with a random forest classifier, using cross-validation and hierarchical tree-based feature selection. Further approaches need to be examined in order to establish more detailed attention behaviors and patterns among children with and without attention problems.
Collapse
Affiliation(s)
- Maite Frutos-Pascual
- DeustoTech Life [eVIDA] Faculty of Engineering University of Deusto, Avda de las Universidades 24, Bilbao 48015, Spain.
| | - Begonya Garcia-Zapirain
- DeustoTech Life [eVIDA] Faculty of Engineering University of Deusto, Avda de las Universidades 24, Bilbao 48015, Spain.
| |
Collapse
|
373
|
Duan H, Wang X. Visual attention model based on statistical properties of neuron responses. Sci Rep 2015; 5:8873. [PMID: 25747859 PMCID: PMC4352866 DOI: 10.1038/srep08873] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 02/06/2015] [Indexed: 11/08/2022] Open
Abstract
Visual attention is a mechanism of the visual system that can select relevant objects from a specific scene. Interactions among neurons in multiple cortical areas are considered to be involved in attentional allocation. However, the characteristics of the encoded features and neuron responses in those attention related cortices are indefinite. Therefore, further investigations carried out in this study aim at demonstrating that unusual regions arousing more attention generally cause particular neuron responses. We suppose that visual saliency is obtained on the basis of neuron responses to contexts in natural scenes. A bottom-up visual attention model is proposed based on the self-information of neuron responses to test and verify the hypothesis. Four different color spaces are adopted and a novel entropy-based combination scheme is designed to make full use of color information. Valuable regions are highlighted while redundant backgrounds are suppressed in the saliency maps obtained by the proposed model. Comparative results reveal that the proposed model outperforms several state-of-the-art models. This study provides insights into the neuron responses based saliency detection and may underlie the neural mechanism of early visual cortices for bottom-up visual attention.
Collapse
Affiliation(s)
- Haibin Duan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, P. R. China
- Science and Technology on Aircraft Control Laboratory, School of Automation Science and Electronic Engineering, Beihang University, Beijing 100191, P. R. China
| | - Xiaohua Wang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, P. R. China
- Science and Technology on Aircraft Control Laboratory, School of Automation Science and Electronic Engineering, Beihang University, Beijing 100191, P. R. China
| |
Collapse
|
374
|
Khorsand P, Moore T, Soltani A. Combined contributions of feedforward and feedback inputs to bottom-up attention. Front Psychol 2015; 6:155. [PMID: 25784883 PMCID: PMC4345765 DOI: 10.3389/fpsyg.2015.00155] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2014] [Accepted: 01/30/2015] [Indexed: 11/30/2022] Open
Abstract
In order to deal with a large amount of information carried by visual inputs entering the brain at any given point in time, the brain swiftly uses the same inputs to enhance processing in one part of visual field at the expense of the others. These processes, collectively called bottom-up attentional selection, are assumed to solely rely on feedforward processing of the external inputs, as it is implied by the nomenclature. Nevertheless, evidence from recent experimental and modeling studies points to the role of feedback in bottom-up attention. Here, we review behavioral and neural evidence that feedback inputs are important for the formation of signals that could guide attentional selection based on exogenous inputs. Moreover, we review results from a modeling study elucidating mechanisms underlying the emergence of these signals in successive layers of neural populations and how they depend on feedback from higher visual areas. We use these results to interpret and discuss more recent findings that can further unravel feedforward and feedback neural mechanisms underlying bottom-up attention. We argue that while it is descriptively useful to separate feedforward and feedback processes underlying bottom-up attention, these processes cannot be mechanistically separated into two successive stages as they occur at almost the same time and affect neural activity within the same brain areas using similar neural mechanisms. Therefore, understanding the interaction and integration of feedforward and feedback inputs is crucial for better understanding of bottom-up attention.
Collapse
Affiliation(s)
| | - Tirin Moore
- Department of Neurobiology, Stanford University School of Medicine , Stanford, CA, USA ; Howard Hughes Medical Institute , Stanford, CA, USA
| | - Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College , Hanover, NH, USA
| |
Collapse
|
375
|
Nuthmann A, Einhäuser W. A new approach to modeling the influence of image features on fixation selection in scenes. Ann N Y Acad Sci 2015; 1339:82-96. [PMID: 25752239 PMCID: PMC4402003 DOI: 10.1111/nyas.12705] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contribution of various image features to fixation selection: luminance and luminance contrast (low-level features); edge density (a mid-level feature); visual clutter and image segmentation to approximate local object density in the scene (higher-level features). An additional predictor captured the central bias of fixation. The GLMM results revealed that edge density, clutter, and the number of homogenous segments in a patch can independently predict whether image patches are fixated or not. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other predictors. Since the parcellation of the scene and the selection of features can be tailored to the specific research question, our approach allows for assessing the interplay of various factors relevant for fixation selection in scenes in a powerful and flexible manner.
Collapse
Affiliation(s)
- Antje Nuthmann
- Psychology Department, School of Philosophy, Psychology and Language Sciences, University of EdinburghUnited Kingdom
| | | |
Collapse
|
376
|
Li W, Wang P, Jiang R, Qiao H. Robust object tracking guided by top-down spectral analysis visual attention. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
377
|
Cheng MM, Mitra NJ, Huang X, Torr PHS, Hu SM. Global Contrast Based Salient Region Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:569-582. [PMID: 26353262 DOI: 10.1109/tpami.2014.2345401] [Citation(s) in RCA: 486] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object detection algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut, namely SaliencyCut, for high quality unsupervised salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms 15 existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.
Collapse
|
378
|
Hammer R, Sloutsky V, Grill-Spector K. Feature saliency and feedback information interactively impact visual category learning. Front Psychol 2015; 6:74. [PMID: 25745404 PMCID: PMC4333777 DOI: 10.3389/fpsyg.2015.00074] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 01/13/2015] [Indexed: 11/21/2022] Open
Abstract
Visual category learning (VCL) involves detecting which features are most relevant for categorization. VCL relies on attentional learning, which enables effectively redirecting attention to object’s features most relevant for categorization, while ‘filtering out’ irrelevant features. When features relevant for categorization are not salient, VCL relies also on perceptual learning, which enables becoming more sensitive to subtle yet important differences between objects. Little is known about how attentional learning and perceptual learning interact when VCL relies on both processes at the same time. Here we tested this interaction. Participants performed VCL tasks in which they learned to categorize novel stimuli by detecting the feature dimension relevant for categorization. Tasks varied both in feature saliency (low-saliency tasks that required perceptual learning vs. high-saliency tasks), and in feedback information (tasks with mid-information, moderately ambiguous feedback that increased attentional load, vs. tasks with high-information non-ambiguous feedback). We found that mid-information and high-information feedback were similarly effective for VCL in high-saliency tasks. This suggests that an increased attentional load, associated with the processing of moderately ambiguous feedback, has little effect on VCL when features are salient. In low-saliency tasks, VCL relied on slower perceptual learning; but when the feedback was highly informative participants were able to ultimately attain the same performance as during the high-saliency VCL tasks. However, VCL was significantly compromised in the low-saliency mid-information feedback task. We suggest that such low-saliency mid-information learning scenarios are characterized by a ‘cognitive loop paradox’ where two interdependent learning processes have to take place simultaneously.
Collapse
Affiliation(s)
- Rubi Hammer
- Department of Psychology, Stanford University Stanford, CA, USA ; Department of Communication Sciences and Disorders, Northwestern University Evanston, IL, USA ; Interdepartmental Neuroscience Program, Northwestern University Evanston, IL, USA
| | - Vladimir Sloutsky
- Department of Psychology and Center for Cognitive Science, The Ohio State University Columbus, OH, USA
| | - Kalanit Grill-Spector
- Department of Psychology, Stanford University Stanford, CA, USA ; Stanford Neuroscience Institute, Stanford University Stanford, CA, USA
| |
Collapse
|
379
|
Koide N, Kubo T, Nishida S, Shibata T, Ikeda K. Art expertise reduces influence of visual salience on fixation in viewing abstract-paintings. PLoS One 2015; 10:e0117696. [PMID: 25658327 PMCID: PMC4319974 DOI: 10.1371/journal.pone.0117696] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 12/30/2014] [Indexed: 11/25/2022] Open
Abstract
When viewing a painting, artists perceive more information from the painting on the basis of their experience and knowledge than art novices do. This difference can be reflected in eye scan paths during viewing of paintings. Distributions of scan paths of artists are different from those of novices even when the paintings contain no figurative object (i.e. abstract paintings). There are two possible explanations for this difference of scan paths. One is that artists have high sensitivity to high-level features such as textures and composition of colors and therefore their fixations are more driven by such features compared with novices. The other is that fixations of artists are more attracted by salient features than those of novices and the fixations are driven by low-level features. To test these, we measured eye fixations of artists and novices during the free viewing of various abstract paintings and compared the distribution of their fixations for each painting with a topological attentional map that quantifies the conspicuity of low-level features in the painting (i.e. saliency map). We found that the fixation distribution of artists was more distinguishable from the saliency map than that of novices. This difference indicates that fixations of artists are less driven by low-level features than those of novices. Our result suggests that artists may extract visual information from paintings based on high-level features. This ability of artists may be associated with artists’ deep aesthetic appreciation of paintings.
Collapse
Affiliation(s)
- Naoko Koide
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | - Takatomi Kubo
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | - Satoshi Nishida
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Osaka, Japan
| | - Tomohiro Shibata
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
- Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Fukuoka, Japan
| | - Kazushi Ikeda
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
- * E-mail:
| |
Collapse
|
380
|
|
381
|
Attentional Scene-Exploration and Object Discovery in Image and RGB-D Data. KUNSTLICHE INTELLIGENZ 2015. [DOI: 10.1007/s13218-014-0337-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
382
|
|
383
|
|
384
|
Mettler B, Kong Z, Li B, Andersh J. Systems view on spatial planning and perception based on invariants in agent-environment dynamics. Front Neurosci 2015; 8:439. [PMID: 25628524 PMCID: PMC4292452 DOI: 10.3389/fnins.2014.00439] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 12/14/2014] [Indexed: 11/13/2022] Open
Abstract
Modeling agile and versatile spatial behavior remains a challenging task, due to the intricate coupling of planning, control, and perceptual processes. Previous results have shown that humans plan and organize their guidance behavior by exploiting patterns in the interactions between agent or organism and the environment. These patterns, described under the concept of Interaction Patterns (IPs), capture invariants arising from equivalences and symmetries in the interaction with the environment, as well as effects arising from intrinsic properties of human control and guidance processes, such as perceptual guidance mechanisms. The paper takes a systems' perspective, considering the IP as a unit of organization, and builds on its properties to present a hierarchical model that delineates the planning, control, and perceptual processes and their integration. The model's planning process is further elaborated by showing that the IP can be abstracted, using spatial time-to-go functions. The perceptual processes are elaborated from the hierarchical model. The paper provides experimental support for the model's ability to predict the spatial organization of behavior and the perceptual processes.
Collapse
Affiliation(s)
- Bérénice Mettler
- Interactive Guidance and Control Lab, Department of Aerospace Engineering and Mechanics, University of Minnesota Minneapolis, MN, USA
| | - Zhaodan Kong
- Department of Mechanical Engineering, Boston University Boston, MA, USA
| | - Bin Li
- Interactive Guidance and Control Lab, Department of Aerospace Engineering and Mechanics, University of Minnesota Minneapolis, MN, USA
| | - Jonathan Andersh
- Interactive Guidance and Control Lab, Department of Aerospace Engineering and Mechanics, University of Minnesota Minneapolis, MN, USA ; Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN, USA
| |
Collapse
|
385
|
|
386
|
|
387
|
Gan L, Duan H. Chemical Reaction Optimization for Feature Combination in Bio-inspired Visual Attention. INT J COMPUT INT SYS 2015. [DOI: 10.1080/18756891.2015.1036220] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
388
|
Scharfenberger C, Wong A, Clausi DA. Structure-guided statistical textural distinctiveness for salient region detection in natural images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:457-470. [PMID: 25695960 DOI: 10.1109/tip.2014.2380351] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We propose a simple yet effective structure-guided statistical textural distinctiveness approach to salient region detection. Our method uses a multilayer approach to analyze the structural and textural characteristics of natural images as important features for salient region detection from a scale point of view. To represent the structural characteristics, we abstract the image using structured image elements and extract rotational-invariant neighborhood-based textural representations to characterize each element by an individual texture pattern. We then learn a set of representative texture atoms for sparse texture modeling and construct a statistical textural distinctiveness matrix to determine the distinctiveness between all representative texture atom pairs in each layer. Finally, we determine saliency maps for each layer based on the occurrence probability of the texture atoms and their respective statistical textural distinctiveness and fuse them to compute a final saliency map. Experimental results using four public data sets and a variety of performance evaluation metrics show that our approach provides promising results when compared with existing salient region detection approaches.
Collapse
|
389
|
Lateralized discrimination of emotional scenes in peripheral vision. Exp Brain Res 2014; 233:997-1006. [DOI: 10.1007/s00221-014-4174-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/04/2014] [Indexed: 10/24/2022]
|
390
|
Gao R, Uchida S, Shahab A, Shafait F, Frinken V. Visual Saliency Models for Text Detection in Real World. PLoS One 2014; 9:e114539. [PMID: 25494196 PMCID: PMC4262416 DOI: 10.1371/journal.pone.0114539] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2014] [Accepted: 10/27/2014] [Indexed: 11/30/2022] Open
Abstract
This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.
Collapse
Affiliation(s)
- Renwu Gao
- Department of Advanced Information technology, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Seiichi Uchida
- Department of Advanced Information technology, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Asif Shahab
- German Research Center for Artificial Intelligence, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Faisal Shafait
- School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
| | - Volkmar Frinken
- Department of Advanced Information technology, Kyushu University, Fukuoka, Fukuoka, Japan
| |
Collapse
|
391
|
Waldner M, Le Muzic M, Bernhard M, Purgathofer W, Viola I. Attractive Flicker--Guiding Attention in Dynamic Narrative Visualizations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:2456-2465. [PMID: 26356959 DOI: 10.1109/tvcg.2014.2346352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Focus+context techniques provide visual guidance in visualizations by giving strong visual prominence to elements of interest while the context is suppressed. However, finding a visual feature to enhance for the focus to pop out from its context in a large dynamic scene, while leading to minimal visual deformation and subjective disturbance, is challenging. This paper proposes Attractive Flicker, a novel technique for visual guidance in dynamic narrative visualizations. We first show that flicker is a strong visual attractor in the entire visual field, without distorting, suppressing, or adding any scene elements. The novel aspect of our Attractive Flicker technique is that it consists of two signal stages: The first "orientation stage" is a short but intensive flicker stimulus to attract the attention to elements of interest. Subsequently, the intensive flicker is reduced to a minimally disturbing luminance oscillation ("engagement stage") as visual support to keep track of the focus elements. To find a good trade-off between attraction effectiveness and subjective annoyance caused by flicker, we conducted two perceptual studies to find suitable signal parameters. We showcase Attractive Flicker with the parameters obtained from the perceptual statistics in a study of molecular interactions. With Attractive Flicker, users were able to easily follow the narrative of the visualization on a large display, while the flickering of focus elements was not disturbing when observing the context.
Collapse
|
392
|
Fu S, He H, Hou ZG. Learning Race from Face: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014; 36:2483-2509. [PMID: 26353153 DOI: 10.1109/tpami.2014.2321570] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Faces convey a wealth of social signals, including race, expression, identity, age and gender, all of which have attracted increasing attention from multi-disciplinary research, such as psychology, neuroscience, computer science, to name a few. Gleaned from recent advances in computer vision, computer graphics, and machine learning, computational intelligence based racial face analysis has been particularly popular due to its significant potential and broader impacts in extensive real-world applications, such as security and defense, surveillance, human computer interface (HCI), biometric-based identification, among others. These studies raise an important question: How implicit, non-declarative racial category can be conceptually modeled and quantitatively inferred from the face? Nevertheless, race classification is challenging due to its ambiguity and complexity depending on context and criteria. To address this challenge, recently, significant efforts have been reported toward race detection and categorization in the community. This survey provides a comprehensive and critical review of the state-of-the-art advances in face-race perception, principles, algorithms, and applications. We first discuss race perception problem formulation and motivation, while highlighting the conceptual potentials of racial face processing. Next, taxonomy of feature representational models, algorithms, performance and racial databases are presented with systematic discussions within the unified learning scenario. Finally, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potentially important cross-cutting themes and research directions for the issue of learning race from face.
Collapse
|
393
|
Augmented saliency model using automatic 3D head pose detection and learned gaze following in natural scenes. Vision Res 2014; 116:113-26. [PMID: 25448115 DOI: 10.1016/j.visres.2014.10.027] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 10/22/2014] [Accepted: 10/28/2014] [Indexed: 11/23/2022]
Abstract
Previous studies have shown that gaze direction of actors in a scene influences eye movements of passive observers during free-viewing (Castelhano, Wieth, & Henderson, 2007; Borji, Parks, & Itti, 2014). However, no computational model has been proposed to combine bottom-up saliency with actor's head pose and gaze direction for predicting where observers look. Here, we first learn probability maps that predict fixations leaving head regions (gaze following fixations), as well as fixations on head regions (head fixations), both dependent on the actor's head size and pose angle. We then learn a combination of gaze following, head region, and bottom-up saliency maps with a Markov chain composed of head region and non-head region states. This simple structure allows us to inspect the model and make comments about the nature of eye movements originating from heads as opposed to other regions. Here, we assume perfect knowledge of actor head pose direction (from an oracle). The combined model, which we call the Dynamic Weighting of Cues model (DWOC), explains observers' fixations significantly better than each of the constituent components. Finally, in a fully automatic combined model, we replace the oracle head pose direction data with detections from a computer vision model of head pose. Using these (imperfect) automated detections, we again find that the combined model significantly outperforms its individual components. Our work extends the engineering and scientific applications of saliency models and helps better understand mechanisms of visual attention.
Collapse
|
394
|
Sun X, Yao H, Ji R, Liu XM. Toward statistical modeling of saccadic eye-movement and visual saliency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:4649-4662. [PMID: 25029460 DOI: 10.1109/tip.2014.2337758] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
Collapse
|
395
|
Tian H, Fang Y, Zhao Y, Lin W, Ni R, Zhu Z. Salient region detection by fusing bottom-up and top-down features extracted from a single image. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:4389-4398. [PMID: 25163061 DOI: 10.1109/tip.2014.2350914] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Recently, some global contrast-based salient region detection models have been proposed based on only the low-level feature of color. It is necessary to consider both color and orientation features to overcome their limitations, and thus improve the performance of salient region detection for images with low-contrast in color and high-contrast in orientation. In addition, the existing fusion methods for different feature maps, like the simple averaging method and the selective method, are not effective sufficiently. To overcome these limitations of existing salient region detection models, we propose a novel salient region model based on the bottom-up and top-down mechanisms: the color contrast and orientation contrast are adopted to calculate the bottom-up feature maps, while the top-down cue of depth-from-focus from the same single image is used to guide the generation of final salient regions, since depth-from-focus reflects the photographer's preference and knowledge of the task. A more general and effective fusion method is designed to combine the bottom-up feature maps. According to the degree-of-scattering and eccentricities of feature maps, the proposed fusion method can assign adaptive weights to different feature maps to reflect the confidence level of each feature map. The depth-from-focus of the image as a significant top-down feature for visual attention in the image is used to guide the salient regions during the fusion process; with its aid, the proposed fusion method can filter out the background and highlight salient regions for the image. Experimental results show that the proposed model outperforms the state-of-the-art models on three public available data sets.
Collapse
|
396
|
Zhang L, Shen Y, Li H. VSI: a visual saliency-induced index for perceptual image quality assessment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:4270-4281. [PMID: 25122572 DOI: 10.1109/tip.2014.2346028] [Citation(s) in RCA: 155] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Perceptual image quality assessment (IQA) aims to use computational models to measure the image quality in consistent with subjective evaluations. Visual saliency (VS) has been widely studied by psychologists, neurobiologists, and computer scientists during the last decade to investigate, which areas of an image will attract the most attention of the human visual system. Intuitively, VS is closely related to IQA in that suprathreshold distortions can largely affect VS maps of images. With this consideration, we propose a simple but very effective full reference IQA method using VS. In our proposed IQA model, the role of VS is twofold. First, VS is used as a feature when computing the local quality map of the distorted image. Second, when pooling the quality score, VS is employed as a weighting function to reflect the importance of a local region. The proposed IQA index is called visual saliency-based index (VSI). Several prominent computational VS models have been investigated in the context of IQA and the best one is chosen for VSI. Extensive experiments performed on four large-scale benchmark databases demonstrate that the proposed IQA index VSI works better in terms of the prediction accuracy than all state-of-the-art IQA indices we can find while maintaining a moderate computational complexity. The MATLAB source code of VSI and the evaluation results are publicly available online at http://sse.tongji.edu.cn/linzhang/IQA/VSI/VSI.htm.
Collapse
|
397
|
Calvo MG, Gutiérrez-García A, Fernández-Martín A, Nummenmaa L. Recognition of Facial Expressions of Emotion is Related to their Frequency in Everyday Life. JOURNAL OF NONVERBAL BEHAVIOR 2014. [DOI: 10.1007/s10919-014-0191-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
398
|
Han S, Vasconcelos N. Object recognition with hierarchical discriminant saliency networks. Front Comput Neurosci 2014; 8:109. [PMID: 25249971 PMCID: PMC4158795 DOI: 10.3389/fncom.2014.00109] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 08/22/2014] [Indexed: 12/22/2022] Open
Abstract
The benefits of integrating attention and object recognition are investigated. While attention is frequently modeled as a pre-processor for recognition, we investigate the hypothesis that attention is an intrinsic component of recognition and vice-versa. This hypothesis is tested with a recognition model, the hierarchical discriminant saliency network (HDSN), whose layers are top-down saliency detectors, tuned for a visual class according to the principles of discriminant saliency. As a model of neural computation, the HDSN has two possible implementations. In a biologically plausible implementation, all layers comply with the standard neurophysiological model of visual cortex, with sub-layers of simple and complex units that implement a combination of filtering, divisive normalization, pooling, and non-linearities. In a convolutional neural network implementation, all layers are convolutional and implement a combination of filtering, rectification, and pooling. The rectification is performed with a parametric extension of the now popular rectified linear units (ReLUs), whose parameters can be tuned for the detection of target object classes. This enables a number of functional enhancements over neural network models that lack a connection to saliency, including optimal feature denoising mechanisms for recognition, modulation of saliency responses by the discriminant power of the underlying features, and the ability to detect both feature presence and absence. In either implementation, each layer has a precise statistical interpretation, and all parameters are tuned by statistical learning. Each saliency detection layer learns more discriminant saliency templates than its predecessors and higher layers have larger pooling fields. This enables the HDSN to simultaneously achieve high selectivity to target object classes and invariance. The performance of the network in saliency and object recognition tasks is compared to those of models from the biological and computer vision literatures. This demonstrates benefits for all the functional enhancements of the HDSN, the class tuning inherent to discriminant saliency, and saliency layers based on templates of increasing target selectivity and invariance. Altogether, these experiments suggest that there are non-trivial benefits in integrating attention and recognition.
Collapse
Affiliation(s)
- Sunhyoung Han
- Analytics Department, ID Analytics San Diego, CA, USA
| | - Nuno Vasconcelos
- Statistical and Visual Computing Lab, Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
399
|
Marfil R, Palomino AJ, Bandera A. Combining segmentation and attention: a new foveal attention model. Front Comput Neurosci 2014; 8:96. [PMID: 25177289 PMCID: PMC4132578 DOI: 10.3389/fncom.2014.00096] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 07/24/2014] [Indexed: 11/13/2022] Open
Abstract
Artificial vision systems cannot process all the information that they receive from the world in real time because it is highly expensive and inefficient in terms of computational cost. Inspired by biological perception systems, artificial attention models pursuit to select only the relevant part of the scene. On human vision, it is also well established that these units of attention are not merely spatial but closely related to perceptual objects (proto-objects). This implies a strong bidirectional relationship between segmentation and attention processes. While the segmentation process is the responsible to extract the proto-objects from the scene, attention can guide segmentation, arising the concept of foveal attention. When the focus of attention is deployed from one visual unit to another, the rest of the scene is perceived but at a lower resolution that the focused object. The result is a multi-resolution visual perception in which the fovea, a dimple on the central retina, provides the highest resolution vision. In this paper, a bottom-up foveal attention model is presented. In this model the input image is a foveal image represented using a Cartesian Foveal Geometry (CFG), which encodes the field of view of the sensor as a fovea (placed in the focus of attention) surrounded by a set of concentric rings with decreasing resolution. Then multi-resolution perceptual segmentation is performed by building a foveal polygon using the Bounded Irregular Pyramid (BIP). Bottom-up attention is enclosed in the same structure, allowing to set the fovea over the most salient image proto-object. Saliency is computed as a linear combination of multiple low level features such as color and intensity contrast, symmetry, orientation and roundness. Obtained results from natural images show that the performance of the combination of hierarchical foveal segmentation and saliency estimation is good in terms of accuracy and speed.
Collapse
Affiliation(s)
- Rebeca Marfil
- ISIS Group, Department of Electronic Technology, University of Málaga Málaga, Spain
| | - Antonio J Palomino
- ISIS Group, Department of Electronic Technology, University of Málaga Málaga, Spain
| | - Antonio Bandera
- ISIS Group, Department of Electronic Technology, University of Málaga Málaga, Spain
| |
Collapse
|
400
|
Qiao H, Li Y, Tang T, Wang P. Introducing memory and association mechanism into a biologically inspired visual model. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:1485-1496. [PMID: 24184793 DOI: 10.1109/tcyb.2013.2287014] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A famous biologically inspired hierarchical model (HMAX model), which was proposed recently and corresponds to V1 to V4 of the ventral pathway in primate visual cortex, has been successfully applied to multiple visual recognition tasks. The model is able to achieve a set of position- and scale-tolerant recognition, which is a central problem in pattern recognition. In this paper, based on some other biological experimental evidence, we introduce the memory and association mechanism into the HMAX model. The main contributions of the work are: 1) mimicking the active memory and association mechanism and adding the top down adjustment to the HMAX model, which is the first try to add the active adjustment to this famous model and 2) from the perspective of information, algorithms based on the new model can reduce the computation storage and have a good recognition performance. The new model is also applied to object recognition processes. The primary experimental results show that our method is efficient with a much lower memory requirement.
Collapse
|