1
|
Spjut J, Boudaoud B, Kim J, Greer T, Albert R, Stengel M, Aksit K, Luebke D. Toward Standardized Classification of Foveated Displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2126-2134. [PMID: 32078547 DOI: 10.1109/tvcg.2020.2973053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Emergent in the field of head mounted display design is a desire to leverage the limitations of the human visual system to reduce the computation, communication, and display workload in power and form-factor constrained systems. Fundamental to this reduced workload is the ability to match display resolution to the acuity of the human visual system, along with a resulting need to follow the gaze of the eye as it moves, a process referred to as foveation. A display that moves its content along with the eye may be called a Foveated Display, though this term is also commonly used to describe displays with non-uniform resolution that attempt to mimic human visual acuity. We therefore recommend a definition for the term Foveated Display that accepts both of these interpretations. Furthermore, we include a simplified model for human visual Acuity Distribution Functions (ADFs) at various levels of visual acuity, across wide fields of view and propose comparison of this ADF with the Resolution Distribution Function of a foveated display for evaluation of its resolution at a particular gaze direction. We also provide a taxonomy to allow the field to meaningfully compare and contrast various aspects of foveated displays in a display and optical technology-agnostic manner.
Collapse
|
2
|
Aksit K, Chakravarthula P, Rathinavel K, Jeong Y, Albert R, Fuchs H, Luebke D. Manufacturing Application-Driven Foveated Near-Eye Displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:1928-1939. [PMID: 30794179 DOI: 10.1109/tvcg.2019.2898781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Traditional optical manufacturing poses a great challenge to near-eye display designers due to large lead times in the order of multiple weeks, limiting the abilities of optical designers to iterate fast and explore beyond conventional designs. We present a complete near-eye display manufacturing pipeline with a day lead time using commodity hardware. Our novel manufacturing pipeline consists of several innovations including a rapid production technique to improve surface of a 3D printed component to optical quality suitable for near-eye display application, a computational design methodology using machine learning and ray tracing to create freeform static projection screen surfaces for near-eye displays that can represent arbitrary focal surfaces, and a custom projection lens design that distributes pixels non-uniformly for a foveated near-eye display hardware design candidate. We have demonstrated untethered augmented reality near-eye display prototypes to assess success of our technique, and show that a ski-goggles form factor, a large monocular field of view (30o×55o), and a resolution of 12 cycles per degree can be achieved.
Collapse
|
3
|
Implicit processing during change blindness revealed with mouse-contingent and gaze-contingent displays. Atten Percept Psychophys 2019; 80:844-859. [PMID: 29363028 PMCID: PMC5948240 DOI: 10.3758/s13414-017-1468-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
People often miss salient events that occur right in front of them. This phenomenon, known as change blindness, reveals the limits of visual awareness. Here, we investigate the role of implicit processing in change blindness using an approach that allows partial dissociation of covert and overt attention. Traditional gaze-contingent paradigms adapt the display in real time according to current gaze position. We compare such a paradigm with a newly designed mouse-contingent paradigm where the visual display changes according to the real-time location of a user-controlled mouse cursor, effectively allowing comparison of change detection with mainly overt attention (gaze-contingent display; Experiment 2) and untethered overt and covert attention (mouse-contingent display; Experiment 1). We investigate implicit indices of target detection during change blindness in eye movement and behavioral data, and test whether affective devaluation of unnoticed targets may contribute to change blindness. The results show that unnoticed targets are processed implicitly, but that the processing is shallower than if the target is consciously detected. Additionally, the partial untethering of covert attention with the mouse-contingent display changes the pattern of search and leads to faster detection of the changing target. Finally, although it remains possible that the deployment of covert attention is linked to implicit processing, the results fall short of establishing a direct connection.
Collapse
|
4
|
Le Callet P, Niebur E. Visual Attention and Applications in Multimedia Technologies. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2013; 101:2058-2067. [PMID: 24489403 PMCID: PMC3902206 DOI: 10.1109/jproc.2013.2265801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Making technological advances in the field of human-machine interactions requires that the capabilities and limitations of the human perceptual system are taken into account. The focus of this report is an important mechanism of perception, visual selective attention, which is becoming more and more important for multimedia applications. We introduce the concept of visual attention and describe its underlying mechanisms. In particular, we introduce the concepts of overt and covert visual attention, and of bottom-up and top-down processing. Challenges related to modeling visual attention and their validation using ad hoc ground truth are also discussed. Examples of the usage of visual attention models in image and video processing are presented. We emphasize multimedia delivery, retargeting and quality assessment of image and video, medical imaging, and the field of stereoscopic 3D images applications.
Collapse
Affiliation(s)
- Patrick Le Callet
- LUNAM Université, Université de Nantes, Institut de Recherche en Communications et Cybernétique de Nantes, Polytech Nantes, UMR CNRS 6597, France
| | - Ernst Niebur
- Solomon Snyder Department of Neuroscience and the Zanvyl Krieger Mind Brain Institute, Johns Hopkins University, Baltimore MD 21218 USA
| |
Collapse
|
5
|
Komogortsev OV, Gobert DV, Jayarathna S, Koh DH, Gowda S. Standardization of Automated Analyses of Oculomotor Fixation and Saccadic Behaviors. IEEE Trans Biomed Eng 2010; 57. [PMID: 20667803 DOI: 10.1109/tbme.2010.2057429] [Citation(s) in RCA: 179] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
6
|
Guo C, Zhang L. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2010; 19:185-198. [PMID: 19709976 DOI: 10.1109/tip.2009.2030969] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Salient areas in natural scenes are generally regarded as areas which the human eye will typically focus on, and finding these areas is the key step in object detection. In computer vision, many models have been proposed to simulate the behavior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT), and others, but they demand high computational cost and computing useful results mostly relies on their choice of parameters. Although some region-based approaches were proposed to reduce the computational complexity of feature maps, these approaches still were not able to work in real time. Recently, a simple and fast approach called spectral residual (SR) was proposed, which uses the SR of the amplitude spectrum to calculate the image's saliency map. However, in our previous work, we pointed out that it is the phase spectrum, not the amplitude spectrum, of an image's Fourier transform that is key to calculating the location of salient areas, and proposed the phase spectrum of Fourier transform (PFT) model. In this paper, we present a quaternion representation of an image which is composed of intensity, color, and motion features. Based on the principle of PFT, a novel multiresolution spatiotemporal saliency detection model called phase spectrum of quaternion Fourier transform (PQFT) is proposed in this paper to calculate the spatiotemporal saliency map of an image by its quaternion representation. Distinct from other models, the added motion dimension allows the phase spectrum to represent spatiotemporal saliency in order to perform attention selection not only for images but also for videos. In addition, the PQFT model can compute the saliency map of an image under various resolutions from coarse to fine. Therefore, the hierarchical selectivity (HS) framework based on the PQFT model is introduced here to construct the tree structure representation of an image. With the help of HS, a model called multiresolution wavelet domain foveation (MWDF) is proposed in this paper to improve coding efficiency in image and video compression. Extensive tests of videos, natural images, and psychological patterns show that the proposed PQFT model is more effective in saliency detection and can predict eye fixations better than other state-of-the-art models in previous literature. Moreover, our model requires low computational cost and, therefore, can work in real time. Additional experiments on image and video compression show that the HS-MWDF model can achieve higher compression rate than the traditional model.
Collapse
Affiliation(s)
- Chenlei Guo
- Department of Electronic Engineering, Fudan University, Shanghai, 200433, China.
| | | |
Collapse
|
7
|
Komogortsev OV, Khan JI. Eye movement prediction by oculomotor plant Kalman filter with brainstem control. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/s11768-009-7218-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
8
|
Quek F, Ehrich R, Lockhart T. As Go the Feet … : On the Estimation of Attentional Focus from Stance. ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION : A PUBLICATION OF THE ASSOCIATION FOR COMPUTING MACHINERY 2008; 2008:97-104. [PMID: 20830212 PMCID: PMC2935654 DOI: 10.1145/1452392.1452412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The estimation of the direction of visual attention is critical to a large number of interactive systems. This paper investigates the cross-modal relation of the position of one's feet (or standing stance) to the focus of gaze. The intuition is that while one CAN have a range of attentional foci from a particular stance, one may be MORE LIKELY to look in specific directions given an approach vector and stance. We posit that the cross-modal relationship is constrained by biomechanics and personal style. We define a stance vector that models the approach direction before stopping and the pose of a subject's feet. We present a study where the subjects' feet and approach vector are tracked. The subjects read aloud contents of note cards in 4 locations. The order of `visits' to the cards were randomized. Ten subjects read 40 lines of text each, yielding 400 stance vectors and gaze directions. We divided our data into 4 sets of 300 training and 100 test vectors and trained a neural net to estimate the gaze direction given the stance vector. Our results show that 31% our gaze orientation estimates were within 5°, 51% of our estimates were within 10°, and 60% were within 15°. Given the ability to track foot position, the procedure is minimally invasive.
Collapse
Affiliation(s)
- Francis Quek
- Center for Human-Computer Interaction Virginia Tech
| | | | | |
Collapse
|
9
|
|
10
|
A technique for simulating visual field losses in virtual environments to study human navigation. Behav Res Methods 2007; 39:552-60. [PMID: 17958167 DOI: 10.3758/bf03193025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The following paper describes a new technique for simulating peripheral field losses in virtual environments to study the roles of the central and peripheral visual fields during navigation. Based on Geisler and Perry's (2002) gaze-contingent multiresolution display concept, the technique extends their methodology to work with three-dimensional images that are both transformed and rendered in real time by a computer graphics system. In order to assess the usefulness of this method for studying visual field losses, an experiment was run in which seven participants were required to walk to a target tree in a virtual forest as quickly and efficiently as possible while artificial head and eye-based delays were systematically introduced. Bilinear fits were applied to the mean trial times in order to assess at what delay lengths breaks in performance could be observed. Results suggest that breaks occur beyond the current delays inherent in the system. Increases in trial times across all delays tested were also observed when simulated peripheral field losses were applied compared to full FOV conditions. Possible applications and limitations of the system are discussed. The source code needed to program visual field losses can be found at lions.med.jhu.edu/archive/turanolab/Simulated_Visual_Field_Loss_Code.html.
Collapse
|
11
|
Toet A. Gaze directed displays as an enabling technology for attention aware systems. COMPUTERS IN HUMAN BEHAVIOR 2006. [DOI: 10.1016/j.chb.2005.12.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
12
|
Loschky L, McConkie G, Yang J, Miller M. The limits of visual resolution in natural scene viewing. VISUAL COGNITION 2005. [DOI: 10.1080/13506280444000652] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
13
|
Abstract
Gaze-contingent displays (GCDs) attempt to balance the amount of information displayed against the visual information processing capacity of the observer through real-time eye movement sensing. Based on the assumed knowledge of the instantaneous location of the observer's focus of attention, GCD content can be "tuned" through several display processing means. Screen-based displays alter pixel level information generally matching the resolvability of the human retina in an effort to maximize bandwidth. Model-based displays alter geometric-level primitives along similar goals. Attentive user interfaces (AUIs) manage object- level entities (e.g., windows, applications) depending on the assumed attentive state of the observer. Such real-time display manipulation is generally achieved through non-contact, unobtrusive tracking of the observer's eye movements. This paper briefly reviews past and present display techniques as well as emerging graphics and eye tracking technology for GCD development.
Collapse
Affiliation(s)
- Andrew T Duchowski
- Computer Science Department, Clemson University, Clemson, South Carolina 29634-0974, USA.
| | | | | |
Collapse
|
14
|
Itti L. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2004; 13:1304-1318. [PMID: 15462141 DOI: 10.1109/tip.2004.834657] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
We evaluate the applicability of a biologically-motivated algorithm to select visually-salient regions of interest in video streams for multiply-foveated video compression. Regions are selected based on a nonlinear integration of low-level visual cues, mimicking processing in primate occipital, and posterior parietal cortex. A dynamic foveation filter then blurs every frame, increasingly with distance from salient locations. Sixty-three variants of the algorithm (varying number and shape of virtual foveas, maximum blur, and saliency competition) are evaluated against an outdoor video scene, using MPEG-1 and constant-quality MPEG-4 (DivX) encoding. Additional compression radios of 1.1 to 8.5 are achieved by foveation. Two variants of the algorithm are validated against eye fixations recorded from four to six human observers on a heterogeneous collection of 50 video clips (over 45 000 frames in total). Significantly higher overlap than expected by chance is found between human and algorithmic foveations. With both variants, foveated clips are, on average, approximately half the size of unfoveated clips, for both MPEG-1 and MPEG-4. These results suggest a general-purpose usefulness of the algorithm in improving compression ratios of unconstrained video.
Collapse
Affiliation(s)
- Laurent Itti
- Departments of Computer Science, Psychology and Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90089-2520, USA.
| |
Collapse
|
15
|
Reingold EM, Loschky LC, McConkie GW, Stampe DM. Gaze-contingent multiresolutional displays: an integrative review. HUMAN FACTORS 2003; 45:307-328. [PMID: 14529201 DOI: 10.1518/hfes.45.2.307.27235] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Gaze-contingent multiresolutional displays (GCMRDs) center high-resolution information on the user's gaze position, matching the user's area of interest (AOI). Image resolution and details outside the AOI are reduced, lowering the requirements for processing resources and transmission bandwidth in demanding display and imaging applications. This review provides a general framework within which GCMRD research can be integrated, evaluated, and guided. GCMRDs (or "moving windows") are analyzed in terms of (a) the nature of their images (i.e., "multiresolution," "variable resolution," "space variant," or "level of detail"), and (b) the movement of the AOI (i.e., "gaze contingent," "foveated," or "eye slaved"). We also synthesize the known human factors research on GCMRDs and point out important questions for future research and development. Actual or potential applications of this research include flight, medical, and driving simulators; virtual reality; remote piloting and teleoperation; infrared and indirect vision; image transmission and retrieval; telemedicine; video teleconferencing; and artificial vision systems.
Collapse
Affiliation(s)
- Eyal M Reingold
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada.
| | | | | | | |
Collapse
|