1
|
Bischof WF, Anderson NC, Kingstone A. A tutorial: Analyzing eye and head movements in virtual reality. Behav Res Methods 2024; 56:8396-8421. [PMID: 39117987 DOI: 10.3758/s13428-024-02482-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2024] [Indexed: 08/10/2024]
Abstract
This tutorial provides instruction on how to use the eye tracking technology built into virtual reality (VR) headsets, emphasizing the analysis of head and eye movement data when an observer is situated in the center of an omnidirectional environment. We begin with a brief description of how VR eye movement research differs from previous forms of eye movement research, as well as identifying some outstanding gaps in the current literature. We then introduce the basic methodology used to collect VR eye movement data both in general and with regard to the specific data that we collected to illustrate different analytical approaches. We continue with an introduction of the foundational ideas regarding data analysis in VR, including frames of reference, how to map eye and head position, and event detection. In the next part, we introduce core head and eye data analyses focusing on determining where the head and eyes are directed. We then expand on what has been presented, introducing several novel spatial, spatio-temporal, and temporal head-eye data analysis techniques. We conclude with a reflection on what has been presented, and how the techniques introduced in this tutorial provide the scaffolding for extensions to more complex and dynamic VR environments.
Collapse
Affiliation(s)
- Walter F Bischof
- Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T 1Z4, Canada.
| | - Nicola C Anderson
- Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
2
|
Sanaei M, Gilbert SB, Perron AJ, Dorneich MC, Kelly JW. An examination of scene complexity's role in cybersickness. ERGONOMICS 2024:1-12. [PMID: 39530917 DOI: 10.1080/00140139.2024.2427862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024]
Abstract
This study explored the effects of scene complexity factor on cybersickness. In this between-subjects experiment, 44 participants played the Pendulum Chair VR game, half with a simple scene and half with a complex scene. The complex scene featured higher optic flow (lower-level perceptual factor) and higher familiarity (higher level factor). Dependent variables were cybersickness and task performance. Results were unexpected in that cybersickness did not differ significantly between the simple and complex scenes. These results suggest that the impact of optic flow and familiarity on cybersickness may be affected by each other or other factors, making them unreliable predictors of cybersickness if considered alone. Both lower level and higher-level factors would benefit from further research to deduce the conditions under which they affect cybersickness. VR designers could consider that optic flow and familiarity alone are not reliable factors when predicting the cybersickness-inducing effects of a new environment.
Collapse
Affiliation(s)
- Mohammadamin Sanaei
- VRAC, Iowa State University, Ames, Iowa, USA
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA
| | - Stephen B Gilbert
- VRAC, Iowa State University, Ames, Iowa, USA
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA
| | | | - Michael C Dorneich
- VRAC, Iowa State University, Ames, Iowa, USA
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA
| | - Jonathan W Kelly
- VRAC, Iowa State University, Ames, Iowa, USA
- Department of Psychology, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
3
|
Peng X, Zhang Y, Jimenez-Navarro D, Serrano A, Myszkowski K, Sun Q. Measuring and Predicting Multisensory Reaction Latency: A Probabilistic Model for Visual-Auditory Integration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:7364-7374. [PMID: 39250397 DOI: 10.1109/tvcg.2024.3456185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Virtual/augmented reality (VR/AR) devices offer both immersive imagery and sound. With those wide-field cues, we can simultaneously acquire and process visual and auditory signals to quickly identify objects, make decisions, and take action. While vision often takes precedence in perception, our visual sensitivity degrades in the periphery. In contrast, auditory sensitivity can exhibit an opposite trend due to the elevated interaural time difference. What occurs when these senses are simultaneously integrated, as is common in VR applications such as 360° video watching and immersive gaming? We present a computational and probabilistic model to predict VR users' reaction latency to visual-auditory multisensory targets. To this aim, we first conducted a psychophysical experiment in VR to measure the reaction latency by tracking the onset of eye movements. Experiments with numerical metrics and user studies with naturalistic scenarios showcase the model's accuracy and generalizability. Lastly, we discuss the potential applications, such as measuring the sufficiency of target appearance duration in immersive video playback, and suggesting the optimal spatial layouts for AR interface design.
Collapse
|
4
|
Hu Z, Yin Z, Haeufle D, Schmitt S, Bulling A. HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:7375-7385. [PMID: 39255111 DOI: 10.1109/tvcg.2024.3456161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.
Collapse
|
5
|
Marañes C, Gutierrez D, Serrano A. Revisiting the Heider and Simmel experiment for social meaning attribution in virtual reality. Sci Rep 2024; 14:17103. [PMID: 39048600 PMCID: PMC11269668 DOI: 10.1038/s41598-024-65532-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 06/20/2024] [Indexed: 07/27/2024] Open
Abstract
In their seminal experiment in 1944, Heider and Simmel revealed that humans have a pronounced tendency to impose narrative meaning even in the presence of simple animations of geometric shapes. Despite the shapes having no discernible features or emotions, participants attributed strong social context, meaningful interactions, and even emotions to them. This experiment, run on traditional 2D displays has since had a significant impact on fields ranging from psychology to narrative storytelling. Virtual Reality (VR), on the other hand, offers a significantly new viewing paradigm, a fundamentally different type of experience with the potential to enhance presence, engagement and immersion. In this work, we explore and analyze to what extent the findings of the original experiment by Heider and Simmel carry over into a VR setting. We replicate such experiment in both traditional 2D displays and with a head mounted display (HMD) in VR, and use both subjective (questionnaire-based) and objective (eye-tracking) metrics to record the observers' visual behavior. We perform a thorough analysis of this data, and propose novel metrics for assessing the observers' visual behavior. Our questionnaire-based results suggest that participants who viewed the animation through a VR headset developed stronger emotional connections with the geometric shapes than those who viewed it on a traditional 2D screen. Additionally, the analysis of our eye-tracking data indicates that participants who watched the animation in VR exhibited fewer shifts in gaze, suggesting greater engagement with the action. However, we did not find evidence of differences in how subjects perceived the roles of the shapes, with both groups interpreting the animation's plot at the same level of accuracy. Our findings may have important implications for future psychological research using VR, especially regarding our understanding of social cognition and emotions.
Collapse
Affiliation(s)
| | | | - Ana Serrano
- Universidad de Zaragoza, I3A, Zaragoza, Spain.
| |
Collapse
|
6
|
Wu 吴奕忱 Y, Li 李晟 S. Complexity Matters: Normalization to Prototypical Viewpoint Induces Memory Distortion along the Vertical Axis of Scenes. J Neurosci 2024; 44:e1175232024. [PMID: 38777600 PMCID: PMC11223457 DOI: 10.1523/jneurosci.1175-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 04/24/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024] Open
Abstract
Scene memory is prone to systematic distortions potentially arising from experience with the external world. Boundary transformation, a well-known memory distortion effect along the near-far axis of the three-dimensional space, represents the observer's erroneous recall of scenes' viewing distance. Researchers argued that normalization to the prototypical viewpoint with the high-probability viewing distance influenced this phenomenon. Herein, we hypothesized that the prototypical viewpoint also exists in the vertical angle of view (AOV) dimension and could cause memory distortion along scenes' vertical axis. Human subjects of both sexes were recruited to test this hypothesis, and two behavioral experiments were conducted, revealing a systematic memory distortion in the vertical AOV in both the forced choice (n = 79) and free adjustment (n = 30) tasks. Furthermore, the regression analysis implied that the complexity information asymmetry in scenes' vertical axis and the independent subjective AOV ratings from a large set of online participants (n = 1,208) could jointly predict AOV biases. Furthermore, in a functional magnetic resonance imaging experiment (n = 24), we demonstrated the involvement of areas in the ventral visual pathway (V3/V4, PPA, and OPA) in AOV bias judgment. Additionally, in a magnetoencephalography experiment (n = 20), we could significantly decode the subjects' AOV bias judgments ∼140 ms after scene onset and the low-level visual complexity information around the similar temporal interval. These findings suggest that AOV bias is driven by the normalization process and associated with the neural activities in the early stage of scene processing.
Collapse
Affiliation(s)
- Yichen Wu 吴奕忱
- School of Psychological and Cognitive Sciences, Peking University, Beijing 100871, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- National Key Laboratory of General Artificial Intelligence, Peking University, Beijing 100871, China
| | - Sheng Li 李晟
- School of Psychological and Cognitive Sciences, Peking University, Beijing 100871, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- National Key Laboratory of General Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
7
|
Wilson E, Ibragimov A, Proulx MJ, Tetali SD, Butler K, Jain E. Privacy-Preserving Gaze Data Streaming in Immersive Interactive Virtual Reality: Robustness and User Experience. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2257-2268. [PMID: 38457326 DOI: 10.1109/tvcg.2024.3372032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Eye tracking is routinely being incorporated into virtual reality (VR) systems. Prior research has shown that eye tracking data, if exposed, can be used for re-identification attacks [14]. The state of our knowledge about currently existing privacy mechanisms is limited to privacy-utility trade-off curves based on data-centric metrics of utility, such as prediction error, and black-box threat models. We propose that for interactive VR applications, it is essential to consider user-centric notions of utility and a variety of threat models. We develop a methodology to evaluate real-time privacy mechanisms for interactive VR applications that incorporate subjective user experience and task performance metrics. We evaluate selected privacy mechanisms using this methodology and find that re-identification accuracy can be decreased to as low as 14% while maintaining a high usability score and reasonable task performance. Finally, we elucidate three threat scenarios (black-box, black-box with exemplars, and white-box) and assess how well the different privacy mechanisms hold up to these adversarial scenarios. This work advances the state of the art in VR privacy by providing a methodology for end-to-end assessment of the risk of re-identification attacks and potential mitigating solutions. f.
Collapse
|
8
|
Javerliat C, Villenave S, Raimbaud P, Lavoue G. PLUME: Record, Replay, Analyze and Share User Behavior in 6DoF XR Experiences. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2087-2097. [PMID: 38437111 DOI: 10.1109/tvcg.2024.3372107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
From education to medicine to entertainment, a wide range of industrial and academic fields now utilize eXtended Reality (XR) technologies. This diversity and growing use are boosting research and leading to an increasing number of XR experiments involving human subjects. The main aim of these studies is to understand the user experience in the broadest sense, such as the user cognitive and emotional states. Behavioral data collected during XR experiments, such as user movements, gaze, actions, and physiological signals constitute precious assets for analyzing and understanding the user experience. While they contribute to overcome the intrinsic flaws of explicit data such as post-experiment questionnaires, the required acquisition and analysis tools are costly and challenging to develop, especially for 6DoF (Degrees of Freedom) XR experiments. Moreover, there is no common format for XR behavioral data, which restrains data-sharing, and thus hinders wide usages across the community, replicability of studies, and the constitution of large datasets or meta-analysis. In this context, we present PLUME, an open-source software toolbox (PLUME Recorder, PLUME Viewer, PLUME Python) that allows for the exhaustive record of XR behavioral data (including synchronous physiological signals), their offline interactive replay and analysis (with a standalone application), and their easy sharing due to our compact and interoperable data format. We believe that PLUME can greatly benefit the scientific community by making the use of behavioral and physiological data available for the greatest, contributing to the reproducibility and replicability of XR user studies, enabling the creation of large datasets, and contributing to a deeper understanding of user experience.
Collapse
|
9
|
Bernal-Berdun E, Vallejo M, Sun Q, Serrano A, Gutierrez D. Modeling the Impact of Head-Body Rotations on Audio-Visual Spatial Perception for Virtual Reality Applications. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2624-2632. [PMID: 38446650 DOI: 10.1109/tvcg.2024.3372112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Humans perceive the world by integrating multimodal sensory feedback, including visual and auditory stimuli, which holds true in virtual reality (VR) environments. Proper synchronization of these stimuli is crucial for perceiving a coherent and immersive VR experience. In this work, we focus on the interplay between audio and vision during localization tasks involving natural head-body rotations. We explore the impact of audio-visual offsets and rotation velocities on users' directional localization acuity for various viewing modes. Using psychometric functions, we model perceptual disparities between visual and auditory cues and determine offset detection thresholds. Our findings reveal that target localization accuracy is affected by perceptual audio-visual disparities during head-body rotations, but remains consistent in the absence of stimuli-head relative motion. We then showcase the effectiveness of our approach in predicting and enhancing users' localization accuracy within realistic VR gaming applications. To provide additional support for our findings, we implement a natural VR game wherein we apply a compensatory audio-visual offset derived from our measured psychometric functions. As a result, we demonstrate a substantial improvement of up to 40% in participants' target localization accuracy. We additionally provide guidelines for content creation to ensure coherent and seamless VR experiences.
Collapse
|
10
|
Haskins AJ, Mentch J, Van Wicklin C, Choi YB, Robertson CE. Brief Report: Differences in Naturalistic Attention to Real-World Scenes in Adolescents with 16p.11.2 Deletion. J Autism Dev Disord 2024; 54:1078-1087. [PMID: 36512194 DOI: 10.1007/s10803-022-05850-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/23/2022] [Indexed: 12/15/2022]
Abstract
Sensory differences are nearly universal in autism, but their genetic origins are poorly understood. Here, we tested how individuals with an autism-linked genotype, 16p.11.2 deletion ("16p"), attend to visual information in immersive, real-world photospheres. We monitored participants' (N = 44) gaze while they actively explored 360° scenes via headmounted virtual reality. We modeled the visually salient and semantically meaningful information in scenes and quantified the relative bottom-up vs. top-down influences on attentional deployment. We found, when compared to typically developed control (TD) participants, 16p participants' attention was less dominantly predicted by semantically meaningful scene regions, relative to visually salient regions. These results suggest that a reduction in top-down relative to bottom-up attention characterizes how individuals with 16p.11.2 deletions engage with naturalistic visual environments.
Collapse
Affiliation(s)
- Amanda J Haskins
- Department of Psychological & Brain Sciences, Dartmouth College, 3 Maynard Street, Hanover, NH, 03755, USA.
| | - Jeff Mentch
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA, 02115, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, 02139, USA
| | | | - Yeo Bi Choi
- Department of Psychological & Brain Sciences, Dartmouth College, 3 Maynard Street, Hanover, NH, 03755, USA
| | - Caroline E Robertson
- Department of Psychological & Brain Sciences, Dartmouth College, 3 Maynard Street, Hanover, NH, 03755, USA
| |
Collapse
|
11
|
Martin D, Fandos A, Masia B, Serrano A. SAL3D: a model for saliency prediction in 3D meshes. THE VISUAL COMPUTER 2024; 40:7761-7771. [PMID: 39525941 PMCID: PMC11541373 DOI: 10.1007/s00371-023-03206-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 11/21/2023] [Indexed: 11/16/2024]
Abstract
Advances in virtual and augmented reality have increased the demand for immersive and engaging 3D experiences. To create such experiences, it is crucial to understand visual attention in 3D environments, which is typically modeled by means of saliency maps. While attention in 2D images and traditional media has been widely studied, there is still much to explore in 3D settings. In this work, we propose a deep learning-based model for predicting saliency when viewing 3D objects, which is a first step toward understanding and predicting attention in 3D environments. Previous approaches rely solely on low-level geometric cues or unnatural conditions, however, our model is trained on a dataset of real viewing data that we have manually captured, which indeed reflects actual human viewing behavior. Our approach outperforms existing state-of-the-art methods and closely approximates the ground-truth data. Our results demonstrate the effectiveness of our approach in predicting attention in 3D objects, which can pave the way for creating more immersive and engaging 3D experiences.
Collapse
Affiliation(s)
| | | | - Belen Masia
- Universidad de Zaragoza, I3A, Zaragoza, Spain
| | - Ana Serrano
- Universidad de Zaragoza, I3A, Zaragoza, Spain
| |
Collapse
|
12
|
Bernal-Berdun E, Martin D, Malpica S, Perez PJ, Gutierrez D, Masia B, Serrano A. D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4350-4360. [PMID: 37782595 DOI: 10.1109/tvcg.2023.3320237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
Understanding human visual behavior within virtual reality environments is crucial to fully leverage their potential. While previous research has provided rich visual data from human observers, existing gaze datasets often suffer from the absence of multimodal stimuli. Moreover, no dataset has yet gathered eye gaze trajectories (i.e., scanpaths) for dynamic content with directional ambisonic sound, which is a critical aspect of sound perception by humans. To address this gap, we introduce D-SAV360, a dataset of 4,609 head and eye scanpaths for 360° videos with first-order ambisonics. This dataset enables a more comprehensive study of multimodal interaction on visual behavior in virtual reality environments. We analyze our collected scanpaths from a total of 87 participants viewing 85 different videos and show that various factors such as viewing mode, content type, and gender significantly impact eye movement statistics. We demonstrate the potential of D-SAV360 as a benchmarking resource for state-of-the-art attention prediction models and discuss its possible applications in further research. By providing a comprehensive dataset of eye movement data for dynamic, multimodal virtual environments, our work can facilitate future investigations of visual behavior and attention in virtual reality.
Collapse
|
13
|
Sidenmark L, Prummer F, Newn J, Gellersen H. Comparing Gaze, Head and Controller Selection of Dynamically Revealed Targets in Head-Mounted Displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4740-4750. [PMID: 37782604 DOI: 10.1109/tvcg.2023.3320235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
This paper presents a head-mounted virtual reality study that compared gaze, head, and controller pointing for selection of dynamically revealed targets. Existing studies on head-mounted 3D interaction have focused on pointing and selection tasks where all targets are visible to the user. Our study compared the effects of screen width (field of view), target amplitude and width, and prior knowledge of target location on modality performance. Results show that gaze and controller pointing are significantly faster than head pointing and that increased screen width only positively impacts performance up to a certain point. We further investigated the applicability of existing pointing models. Our analysis confirmed the suitability of previously proposed two-component models for all modalities while uncovering differences for gaze at known and unknown target positions. Our findings provide new empirical evidence for understanding input with gaze, head, and controller and are significant for applications that extend around the user.
Collapse
|
14
|
Malpica S, Martin D, Serrano A, Gutierrez D, Masia B. Task-Dependent Visual Behavior in Immersive Environments: A Comparative Study of Free Exploration, Memory and Visual Search. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4417-4425. [PMID: 37788210 DOI: 10.1109/tvcg.2023.3320259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Visual behavior depends on both bottom-up mechanisms, where gaze is driven by the visual conspicuity of the stimuli, and top-down mechanisms, guiding attention towards relevant areas based on the task or goal of the viewer. While this is well-known, visual attention models often focus on bottom-up mechanisms. Existing works have analyzed the effect of high-level cognitive tasks like memory or visual search on visual behavior; however, they have often done so with different stimuli, methodology, metrics and participants, which makes drawing conclusions and comparisons between tasks particularly difficult. In this work we present a systematic study of how different cognitive tasks affect visual behavior in a novel within-subjects design scheme. Participants performed free exploration, memory and visual search tasks in three different scenes while their eye and head movements were being recorded. We found significant, consistent differences between tasks in the distributions of fixations, saccades and head movements. Our findings can provide insights for practitioners and content creators designing task-oriented immersive applications.
Collapse
|
15
|
Sendjasni A, Larabi MC. Attention-Aware Patch-Based CNN for Blind 360-Degree Image Quality Assessment. SENSORS (BASEL, SWITZERLAND) 2023; 23:8676. [PMID: 37960376 PMCID: PMC10647793 DOI: 10.3390/s23218676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/05/2023] [Accepted: 10/08/2023] [Indexed: 11/15/2023]
Abstract
An attention-aware patch-based deep-learning model for a blind 360-degree image quality assessment (360-IQA) is introduced in this paper. It employs spatial attention mechanisms to focus on spatially significant features, in addition to short skip connections to align them. A long skip connection is adopted to allow features from the earliest layers to be used at the final level. Patches are properly sampled on the sphere to correspond to the viewports displayed to the user using head-mounted displays. The sampling incorporates the relevance of patches by considering (i) the exploration behavior and (ii) a latitude-based selection. An adaptive strategy is applied to improve the pooling of local patch qualities to global image quality. This includes an outlier score rejection step relying on the standard deviation of the obtained scores to consider the agreement, as well as a saliency to weigh them based on their visual significance. Experiments on available 360-IQA databases show that our model outperforms the state of the art in terms of accuracy and generalization ability. This is valid for general deep-learning-based models, multichannel models, and natural scene statistic-based models. Furthermore, when compared to multichannel models, the computational complexity is significantly reduced. Finally, an extensive ablation study gives insights into the efficacy of each component of the proposed model.
Collapse
|
16
|
Schirm J, Gómez-Vargas AR, Perusquía-Hernández M, Skarbez RT, Isoyama N, Uchiyama H, Kiyokawa K. Identification of Language-Induced Mental Load from Eye Behaviors in Virtual Reality. SENSORS (BASEL, SWITZERLAND) 2023; 23:6667. [PMID: 37571449 PMCID: PMC10422404 DOI: 10.3390/s23156667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/03/2023] [Accepted: 07/14/2023] [Indexed: 08/13/2023]
Abstract
Experiences of virtual reality (VR) can easily break if the method of evaluating subjective user states is intrusive. Behavioral measures are increasingly used to avoid this problem. One such measure is eye tracking, which recently became more standard in VR and is often used for content-dependent analyses. This research is an endeavor to utilize content-independent eye metrics, such as pupil size and blinks, for identifying mental load in VR users. We generated mental load independently from visuals through auditory stimuli. We also defined and measured a new eye metric, focus offset, which seeks to measure the phenomenon of "staring into the distance" without focusing on a specific surface. In the experiment, VR-experienced participants listened to two native and two foreign language stimuli inside a virtual phone booth. The results show that with increasing mental load, relative pupil size on average increased 0.512 SDs (0.118 mm), with 57% reduced variance. To a lesser extent, mental load led to fewer fixations, less voluntary gazing at distracting content, and a larger focus offset as if looking through surfaces (about 0.343 SDs, 5.10 cm). These results are in agreement with previous studies. Overall, we encourage further research on content-independent eye metrics, and we hope that hardware and algorithms will be developed in the future to further increase tracking stability.
Collapse
Affiliation(s)
- Johannes Schirm
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma 630-0192, Japan; (A.R.G.-V.); (M.P.-H.); (H.U.); (K.K.)
| | - Andrés Roberto Gómez-Vargas
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma 630-0192, Japan; (A.R.G.-V.); (M.P.-H.); (H.U.); (K.K.)
| | - Monica Perusquía-Hernández
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma 630-0192, Japan; (A.R.G.-V.); (M.P.-H.); (H.U.); (K.K.)
| | - Richard T. Skarbez
- Department of Computer Science and Information Technology, School of Computing, Engineering and Mathematical Sciences, La Trobe University, Melbourne Campus, Melbourne, VIC 3086, Australia
| | - Naoya Isoyama
- Faculty of Social Information Studies, Otsuma Women’s University, Tokyo 102-8357, Japan;
| | - Hideaki Uchiyama
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma 630-0192, Japan; (A.R.G.-V.); (M.P.-H.); (H.U.); (K.K.)
| | - Kiyoshi Kiyokawa
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma 630-0192, Japan; (A.R.G.-V.); (M.P.-H.); (H.U.); (K.K.)
| |
Collapse
|
17
|
Nguyen A, Yan Z. Enhancing 360 Video Streaming through Salient Content in Head-Mounted Displays. SENSORS (BASEL, SWITZERLAND) 2023; 23:4016. [PMID: 37112356 PMCID: PMC10143939 DOI: 10.3390/s23084016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/10/2023] [Accepted: 04/13/2023] [Indexed: 06/19/2023]
Abstract
Predicting where users will look inside head-mounted displays (HMDs) and fetching only the relevant content is an effective approach for streaming bulky 360 videos over bandwidth-constrained networks. Despite previous efforts, anticipating users' fast and sudden head movements is still difficult because there is a lack of clear understanding of the unique visual attention in 360 videos that dictates the users' head movement in HMDs. This in turn reduces the effectiveness of streaming systems and degrades the users' Quality of Experience. To address this issue, we propose to extract salient cues unique in the 360 video content to capture the attentive behavior of HMD users. Empowered by the newly discovered saliency features, we devise a head-movement prediction algorithm to accurately predict users' head orientations in the near future. A 360 video streaming framework that takes full advantage of the head movement predictor is proposed to enhance the quality of delivered 360 videos. Practical trace-driven results show that the proposed saliency-based 360 video streaming system reduces the stall duration by 65% and the stall count by 46%, while saving 31% more bandwidth than state-of-the-art approaches.
Collapse
|
18
|
Hu Z, Bulling A, Li S, Wang G. EHTask: Recognizing User Tasks From Eye and Head Movements in Immersive Virtual Reality. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1992-2004. [PMID: 34962869 DOI: 10.1109/tvcg.2021.3138902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Understanding human visual attention in immersive virtual reality (VR) is crucial for many important applications, including gaze prediction, gaze guidance, and gaze-contingent rendering. However, previous works on visual attention analysis typically only explored one specific VR task and paid less attention to the differences between different tasks. Moreover, existing task recognition methods typically focused on 2D viewing conditions and only explored the effectiveness of human eye movements. We first collect eye and head movements of 30 participants performing four tasks, i.e., Free viewing, Visual search, Saliency, and Track, in 15 360-degree VR videos. Using this dataset, we analyze the patterns of human eye and head movements and reveal significant differences across different tasks in terms of fixation duration, saccade amplitude, head rotation velocity, and eye-head coordination. We then propose EHTask - a novel learning-based method that employs eye and head movements to recognize user tasks in VR. We show that our method significantly outperforms the state-of-the-art methods derived from 2D viewing conditions both on our dataset (accuracy of 84.4% versus 62.8%) and on a real-world dataset ( 61.9% versus 44.1%). As such, our work provides meaningful insights into human visual attention under different VR tasks and guides future work on recognizing user tasks in VR.
Collapse
|
19
|
Martin D, Sun X, Gutierrez D, Masia B. A Study of Change Blindness in Immersive Environments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; PP:2446-2455. [PMID: 37027712 DOI: 10.1109/tvcg.2023.3247102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Human performance is poor at detecting certain changes in a scene, a phenomenon known as change blindness. Although the exact reasons of this effect are not yet completely understood, there is a consensus that it is due to our constrained attention and memory capacity: We create our own mental, structured representation of what surrounds us, but such representation is limited and imprecise. Previous efforts investigating this effect have focused on 2D images; however, there are significant differences regarding attention and memory between 2D images and the viewing conditions of daily life. In this work, we present a systematic study of change blindness using immersive 3D environments, which offer more natural viewing conditions closer to our daily visual experience. We devise two experiments; first, we focus on analyzing how different change properties (namely type, distance, complexity, and field of view) may affect change blindness. We then further explore its relation with the capacity of our visual working memory and conduct a second experiment analyzing the influence of the number of changes. Besides gaining a deeper understanding of the change blindness effect, our results may be leveraged in several VR applications such as redirected walking, games, or even studies on saliency or attention prediction.
Collapse
|
20
|
Diaz-Guerra F, Jimenez-Molina A. Continuous Prediction of Web User Visual Attention on Short Span Windows Based on Gaze Data Analytics. SENSORS (BASEL, SWITZERLAND) 2023; 23:2294. [PMID: 36850892 PMCID: PMC9960063 DOI: 10.3390/s23042294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/27/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Understanding users' visual attention on websites is paramount to enhance the browsing experience, such as providing emergent information or dynamically adapting Web interfaces. Existing approaches to accomplish these challenges are generally based on the computation of salience maps of static Web interfaces, while websites increasingly become more dynamic and interactive. This paper proposes a method and provides a proof-of-concept to predict user's visual attention on specific regions of a website with dynamic components. This method predicts the regions of a user's visual attention without requiring a constant recording of the current layout of the website, but rather by knowing the structure it presented in a past period. To address this challenge, the concept of visit intention is introduced in this paper, defined as the probability that a user, while browsing, will fixate their gaze on a specific region of the website in the next period. Our approach uses the gaze patterns of a population that browsed a specific website, captured via an eye-tracker device, to aid personalized prediction models built with individual visual kinetics features. We show experimentally that it is possible to conduct such a prediction through multilabel classification models using a small number of users, obtaining an average area under curve of 84.3%, and an average accuracy of 79%. Furthermore, the user's visual kinetics features are consistently selected in every set of a cross-validation evaluation.
Collapse
Affiliation(s)
| | - Angel Jimenez-Molina
- Department of Industrial Engineering, University of Chile, Santiago 8370456, Chile
- Engineering Complex Systems Institute, Santiago 8370398, Chile
| |
Collapse
|
21
|
Bischof WF, Anderson NC, Kingstone A. Eye and head movements while encoding and recognizing panoramic scenes in virtual reality. PLoS One 2023; 18:e0282030. [PMID: 36800398 PMCID: PMC9937482 DOI: 10.1371/journal.pone.0282030] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 02/06/2023] [Indexed: 02/18/2023] Open
Abstract
One approach to studying the recognition of scenes and objects relies on the comparison of eye movement patterns during encoding and recognition. Past studies typically analyzed the perception of flat stimuli of limited extent presented on a computer monitor that did not require head movements. In contrast, participants in the present study saw omnidirectional panoramic scenes through an immersive 3D virtual reality viewer, and they could move their head freely to inspect different parts of the visual scenes. This allowed us to examine how unconstrained observers use their head and eyes to encode and recognize visual scenes. By studying head and eye movement within a fully immersive environment, and applying cross-recurrence analysis, we found that eye movements are strongly influenced by the content of the visual environment, as are head movements-though to a much lesser degree. Moreover, we found that the head and eyes are linked, with the head supporting, and by and large mirroring the movements of the eyes, consistent with the notion that the head operates to support the acquisition of visual information by the eyes.
Collapse
Affiliation(s)
- Walter F. Bischof
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | - Nicola C. Anderson
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
22
|
Aizenman AM, Koulieris GA, Gibaldi A, Sehgal V, Levi DM, Banks MS. The Statistics of Eye Movements and Binocular Disparities during VR Gaming: Implications for Headset Design. ACM TRANSACTIONS ON GRAPHICS 2023; 42:7. [PMID: 37122317 PMCID: PMC10139447 DOI: 10.1145/3549529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The human visual system evolved in environments with statistical regularities. Binocular vision is adapted to these such that depth perception and eye movements are more precise, faster, and performed comfortably in environments consistent with the regularities. We measured the statistics of eye movements and binocular disparities in virtual-reality (VR) - gaming environments and found that they are quite different from those in the natural environment. Fixation distance and direction are more restricted in VR, and fixation distance is farther. The pattern of disparity across the visual field is less regular in VR and does not conform to a prominent property of naturally occurring disparities. From this we predict that double vision is more likely in VR than in the natural environment. We also determined the optimal screen distance to minimize discomfort due to the vergence-accommodation conflict, and the optimal nasal-temporal positioning of head-mounted display (HMD) screens to maximize binocular field of view. Finally, in a user study we investigated how VR content affects comfort and performance. Content that is more consistent with the statistics of the natural world yields less discomfort than content that is not. Furthermore, consistent content yields slightly better performance than inconsistent content.
Collapse
|
23
|
Cannavò A, Castiello A, Pratticò FG, Mazali T, Lamberti F. Immersive movies: the effect of point of view on narrative engagement. AI & SOCIETY 2023. [DOI: 10.1007/s00146-022-01622-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
AbstractCinematic virtual reality (CVR) offers filmmakers a wide range of possibilities to explore new techniques regarding movie scripting, shooting and editing. Despite the many experiments performed so far both with both live action and computer-generated movies, just a few studies focused on analyzing how the various techniques actually affect the viewers’ experience. Like in traditional cinema, a key step for CVR screenwriters and directors is to choose from which perspective the viewers will see the scene, the so-called point of view (POV). The aim of this paper is to understand to what extent watching an immersive movie from a specific POV could impact the narrative engagement (NE), i.e., the viewers’ sensation of being immersed in the movie environment and being connected with its characters and story. Two POVs that are typically used in CVR, i.e., first-person perspective (1-PP) and external perspective (EP), are investigated through a user study in which both objective and subjective metrics were collected. The user study was carried out by leveraging two live action 360° short films with distinct scripts. The results suggest that the 1-PP experience could be more pleasant than the EP one in terms of overall NE and narrative presence, or even for all the NE dimensions if the potential of that POV is specifically exploited.
Collapse
|
24
|
Abstract
This chapter explores the current state of the art in eye tracking within 3D virtual environments. It begins with the motivation for eye tracking in Virtual Reality (VR) in psychological research, followed by descriptions of the hardware and software used for presenting virtual environments as well as for tracking eye and head movements in VR. This is followed by a detailed description of an example project on eye and head tracking while observers look at 360° panoramic scenes. The example is illustrated with descriptions of the user interface and program excerpts to show the measurement of eye and head movements in VR. The chapter continues with fundamentals of data analysis, in particular methods for the determination of fixations and saccades when viewing spherical displays. We then extend these methodological considerations to determining the spatial and temporal coordination of the eyes and head in VR perception. The chapter concludes with a discussion of outstanding problems and future directions for conducting eye- and head-tracking research in VR. We hope that this chapter will serve as a primer for those intending to implement VR eye tracking in their own research.
Collapse
|
25
|
Haskins AJ, Mentch J, Botch TL, Garcia BD, Burrows AL, Robertson CE. Reduced social attention in autism is magnified by perceptual load in naturalistic environments. Autism Res 2022; 15:2310-2323. [PMID: 36207799 PMCID: PMC10092155 DOI: 10.1002/aur.2829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/13/2022] [Indexed: 12/15/2022]
Abstract
Individuals with autism spectrum conditions (ASC) describe differences in both social cognition and sensory processing, but little is known about the causal relationship between these disparate functional domains. In the present study, we sought to understand how a core characteristic of autism-reduced social attention-is impacted by the complex multisensory signals present in real-world environments. We tested the hypothesis that reductions in social attention associated with autism would be magnified by increasing perceptual load (e.g., motion, multisensory cues). Adult participants (N = 40; 19 ASC) explored a diverse set of 360° real-world scenes in a naturalistic, active viewing paradigm (immersive virtual reality + eyetracking). Across three conditions, we systematically varied perceptual load while holding the social and semantic information present in each scene constant. We demonstrate that reduced social attention is not a static signature of the autistic phenotype. Rather, group differences in social attention emerged with increasing perceptual load in naturalistic environments, and the susceptibility of social attention to perceptual load predicted continuous measures of autistic traits across groups. Crucially, this pattern was specific to the social domain: we did not observe differential impacts of perceptual load on attention directed toward nonsocial semantic (i.e., object, place) information or low-level fixation behavior (i.e., overall fixation frequency or duration). This study provides a direct link between social and sensory processing in autism. Moreover, reduced social attention may be an inaccurate characterization of autism. Instead, our results suggest that social attention in autism is better explained by "social vulnerability," particularly to the perceptual load of real-world environments.
Collapse
Affiliation(s)
- Amanda J. Haskins
- Department of Psychological & Brain SciencesDartmouth CollegeHanoverNew HampshireUSA
| | - Jeff Mentch
- Speech and Hearing Bioscience and TechnologyHarvard UniversityBostonMassachusettsUSA
- McGovern Institute for Brain Research, MITCambridgeMassachusettsUSA
| | - Thomas L. Botch
- Department of Psychological & Brain SciencesDartmouth CollegeHanoverNew HampshireUSA
| | - Brenda D. Garcia
- Department of Psychological & Brain SciencesDartmouth CollegeHanoverNew HampshireUSA
| | - Alexandra L. Burrows
- Department of Psychological & Brain SciencesDartmouth CollegeHanoverNew HampshireUSA
| | - Caroline E. Robertson
- Department of Psychological & Brain SciencesDartmouth CollegeHanoverNew HampshireUSA
| |
Collapse
|
26
|
Eye movement behavior in a real-world virtual reality task reveals ADHD in children. Sci Rep 2022; 12:20308. [PMID: 36434040 PMCID: PMC9700686 DOI: 10.1038/s41598-022-24552-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 11/16/2022] [Indexed: 11/26/2022] Open
Abstract
Eye movements and other rich data obtained in virtual reality (VR) environments resembling situations where symptoms are manifested could help in the objective detection of various symptoms in clinical conditions. In the present study, 37 children with attention deficit hyperactivity disorder and 36 typically developing controls (9-13 y.o) played a lifelike prospective memory game using head-mounted display with inbuilt 90 Hz eye tracker. Eye movement patterns had prominent group differences, but they were dispersed across the full performance time rather than associated with specific events or stimulus features. A support vector machine classifier trained on eye movement data showed excellent discrimination ability with 0.92 area under curve, which was significantly higher than for task performance measures or for eye movements obtained in a visual search task. We demonstrated that a naturalistic VR task combined with eye tracking allows accurate prediction of attention deficits, paving the way for precision diagnostics.
Collapse
|
27
|
Wang Y, Li H, Jiang Q. Dynamically attentive viewport sequence for no-reference quality assessment of omnidirectional images. Front Neurosci 2022; 16:1022041. [PMID: 36507332 PMCID: PMC9727405 DOI: 10.3389/fnins.2022.1022041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/13/2022] [Indexed: 11/24/2022] Open
Abstract
Omnidirectional images (ODIs) have drawn great attention in virtual reality (VR) due to the capability of providing an immersive experience to users. However, ODIs are usually subject to various quality degradations during different processing stages. Thus, the quality assessment of ODIs is of critical importance to the community of VR. The quality assessment of ODIs is quite different from that of traditional 2D images. Existing IQA methods focus on extracting features from spherical scenes while ignoring the characteristics of actual viewing behavior of humans in continuously browsing an ODI through HMD and failing to characterize the temporal dynamics of the browsing process in terms of the temporal order of viewports. In this article, we resort to the law of gravity to detect the dynamically attentive regions of humans when viewing ODIs. In this article, we propose a novel no-reference (NR) ODI quality evaluation method by making efforts on two aspects including the construction of Dynamically Attentive Viewport Sequence (DAVS) from ODIs and the extraction of Quality-Aware Features (QAFs) from DAVS. The construction of DAVS aims to build a sequence of viewports that are likely to be explored by viewers based on the prediction of visual scanpath when viewers are freely exploring the ODI within the exploration time via HMD. A DAVS that contains only global motion can then be obtained by sampling a series of viewports from the ODI along the predicted visual scanpath. The subsequent quality evaluation of ODIs is performed merely based on the DAVS. The extraction of QAFs aims to obtain effective feature representations that are highly discriminative in terms of perceived distortion and visual quality. Finally, we can adopt a regression model to map the extracted QAFs to a single predicted quality score. Experimental results on two datasets demonstrate that the proposed method is able to deliver state-of-the-art performance.
Collapse
Affiliation(s)
- Yuhong Wang
- School of Information Science and Engineering, Ningbo University, Ningbo, China,College of Science and Technology, Ningbo University, Ningbo, China
| | - Hong Li
- College of Science and Technology, Ningbo University, Ningbo, China,*Correspondence: Hong Li,
| | - Qiuping Jiang
- School of Information Science and Engineering, Ningbo University, Ningbo, China
| |
Collapse
|
28
|
Yang L, Xu M, Guo Y, Deng X, Gao F, Guan Z. Hierarchical Bayesian LSTM for Head Trajectory Prediction on Omnidirectional Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7563-7580. [PMID: 34596534 DOI: 10.1109/tpami.2021.3117019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
When viewing omnidirectional images (ODIs), viewers can access different viewports via head movement (HM), which sequentially forms head trajectories in spatial-temporal domain. Thus, head trajectories play a key role in modeling human attention on ODIs. In this paper, we establish a large-scale dataset collecting 21,600 head trajectories on 1,080 ODIs. By mining our dataset, we find two important factors influencing head trajectories, i.e., temporal dependency and subject-specific variance. Accordingly, we propose a novel approach integrating hierarchical Bayesian inference into long short-term memory (LSTM) network for head trajectory prediction on ODIs, which is called HiBayes-LSTM. In HiBayes-LSTM, we develop a mechanism of Future Intention Estimation (FIE), which captures the temporal correlations from previous, current and estimated future information, for predicting viewport transition. Additionally, a training scheme called Hierarchical Bayesian inference (HBI) is developed for modeling inter-subject uncertainty in HiBayes-LSTM. For HBI, we introduce a joint Gaussian distribution in a hierarchy, to approximate the posterior distribution over network weights. By sampling subject-specific weights from the approximated posterior distribution, our HiBayes-LSTM approach can yield diverse viewport transition among different subjects and obtain multiple head trajectories. Extensive experiments validate that our HiBayes-LSTM approach significantly outperforms 9 state-of-the-art approaches for trajectory prediction on ODIs, and then it is successfully applied to predict saliency on ODIs.
Collapse
|
29
|
Xu Y, Zhang Z, Gao S. Spherical DNNs and Their Applications in 360 ° Images and Videos. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7235-7252. [PMID: 34314354 DOI: 10.1109/tpami.2021.3100259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Spherical images or videos, as typical non-euclidean data, are usually stored in the form of 2D panoramas obtained through an equirectangular projection, which is neither equal area nor conformal. The distortion caused by the projection limits the performance of vanilla Deep Neural Networks (DNNs) designed for traditional euclidean data. In this paper, we design a novel Spherical Deep Neural Network (DNN) to deal with the distortion caused by the equirectangular projection. Specifically, we customize a set of components, including a spherical convolution, a spherical pooling, a spherical ConvLSTM cell and a spherical MSE loss, as the replacements of their counterparts in vanilla DNNs for spherical data. The core idea is to change the identical behavior of the conventional operations in vanilla DNNs across different feature patches so that they will be adjusted to the distortion caused by the variance of sampling rate among different feature patches. We demonstrate the effectiveness of our Spherical DNNs for saliency detection and gaze estimation in 360° videos. For saliency detection, we take the temporal coherence of an observer's viewing process into consideration and propose to use a Spherical U-Net and a Spherical ConvLSTM to predict the saliency maps for each frame sequentially. As for gaze prediction, we propose to leverage a Spherical Encoder Module to extract spatial panoramic features, then we combine them with the gaze trajectory feature extracted by an LSTM for future gaze prediction. To facilitate the study of the 360° videos saliency detection, we further construct a large-scale 360° video saliency detection dataset that consists of 104 360° videos viewed by 20+ human subjects. Comprehensive experiments validate the effectiveness of our proposed Spherical DNNs for 360 ° handwritten digit classification and sport classification, saliency detection and gaze tracking in 360° videos. We also visualize the regions contributing to the classification decisions in our proposed Spherical DNNs via the Grad-CAM technique in the classification task, and the results show that our Spherical DNNs constantly leverage reasonable and important regions for decision making, regardless the large distortions. All codes and dataset are available on https://github.com/svip-lab/SphericalDNNs.
Collapse
|
30
|
Li M, Li J, Gu S, Wu F, Zhang D. End-to-End Optimized 360° Image Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6267-6281. [PMID: 36166564 DOI: 10.1109/tip.2022.3208429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The 360° image that offers a 360-degree scenario of the world is widely used in virtual reality and has drawn increasing attention. In 360° image compression, the spherical image is first transformed into a planar image with a projection such as equirectangular projection (ERP) and then saved with the existing codecs. The ERP images that represent different circles of latitude with the same number of pixels suffer from the unbalance sampling problem, resulting in inefficiency using planar compression methods, especially for the deep neural network (DNN) based codecs. To tackle this problem, we introduce a latitude adaptive coding scheme for DNNs by allocating variant numbers of codes for different regions according to the latitude on the sphere. Specifically, taking both the number of allocated codes for each region and their entropy into consideration, we introduce a flexible regional adaptive rate loss for region-wise rate controlling. Latitude adaptive constraints are then introduced to prevent spending too many codes on the over-sampling regions. Furthermore, we introduce viewport-based distortion loss by calculating the average distortion on a set of viewports. We optimize and test our model on a large 360° dataset containing 19,790 images collected from the Internet. The experiment results demonstrate the superiority of the proposed latitude adaptive coding scheme. On the whole, our model outperforms the existing image compression standards, including JPEG, JPEG2000, HEVC Intra Coding, and VVC Intra Coding, and helps to save around 15% bits compared to the baseline learned image compression model for planar images.
Collapse
|
31
|
Rondon MFR, Sassatelli L, Aparicio-Pardo R, Precioso F. TRACK: A New Method From a Re-Examination of Deep Architectures for Head Motion Prediction in 360 ° Videos. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5681-5699. [PMID: 33819149 DOI: 10.1109/tpami.2021.3070520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We consider predicting the user's head motion in 360 ° videos, with 2 modalities only: the past user's positions and the video content (not knowing other users' traces). We make two main contributions. First, we re-examine existing deep-learning approaches for this problem and identify hidden flaws from a thorough root-cause analysis. Second, from the results of this analysis, we design a new proposal establishing state-of-the-art performance. First, re-assessing the existing methods that use both modalities, we obtain the surprising result that they all perform worse than baselines using the user's trajectory only. A root-cause analysis of the metrics, datasets and neural architectures shows in particular that (i) the content can inform the prediction for horizons longer than 2 to 3 sec. (existing methods consider shorter horizons), and that (ii) to compete with the baselines, it is necessary to have a recurrent unit dedicated to process the positions, but this is not sufficient. Second, from a re-examination of the problem supported with the concept of Structural-RNN, we design a new deep neural architecture, named TRACK. TRACK achieves state-of-the-art performance on all considered datasets and prediction horizons, outperforming competitors by up to 20 percent on focus-type videos and horizons 2-5 seconds. The entire framework (codes and datasets) is online and received an ACM reproducibility badge https://gitlab.com/miguelfromeror/head-motion-prediction.
Collapse
|
32
|
Decoupled Dynamic Group Equivariant Filter for Saliency Prediction on Omnidirectional Image. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
33
|
Towards mesh saliency in 6 degrees of freedom. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
34
|
Sui X, Ma K, Yao Y, Fang Y. Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3022-3034. [PMID: 33434131 DOI: 10.1109/tvcg.2021.3050888] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Omnidirectional images (also referred to as static 360 ° panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives less attention. We argue that, apart from the distorted panorama itself, two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama: the starting point and the exploration time. We first carry out a psychophysical experiment to investigate the interplay among the VR viewing conditions, the user viewing behaviors, and the perceived quality of 360 ° images. Then, we provide a thorough analysis of the collected human data, leading to several interesting findings. Moreover, we propose a computational framework for objective quality assessment of 360 ° images, embodying viewing conditions and behaviors in a delightful way. Specifically, we first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality. We construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases.
Collapse
|
35
|
Zhu D, Chen Y, Zhao D, Zhu Y, Zhou Q, Zhai G, Yang X. Multiscale Brain-Like Neural Network for Saliency Prediction on Omnidirectional Images. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3052526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Dandan Zhu
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Yongqing Chen
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Defang Zhao
- School of Software Engineering, Tongji University, Shanghai, China
| | - Yucheng Zhu
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Qiangqiang Zhou
- School of Software, Jiangxi Normal University, Nanchang, China
| | - Guangtao Zhai
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaokang Yang
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
36
|
Martin D, Serrano A, Bergman AW, Wetzstein G, Masia B. ScanGAN360: A Generative Model of Realistic Scanpaths for 360° Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2003-2013. [PMID: 35167469 DOI: 10.1109/tvcg.2022.3150502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Understanding and modeling the dynamics of human gaze behavior in 360° environments is crucial for creating, improving, and developing emerging virtual reality applications. However, recruiting human observers and acquiring enough data to analyze their behavior when exploring virtual environments requires complex hardware and software setups, and can be time-consuming. Being able to generate virtual observers can help overcome this limitation, and thus stands as an open problem in this medium. Particularly, generative adversarial approaches could alleviate this challenge by generating a large number of scanpaths that reproduce human behavior when observing new scenes, essentially mimicking virtual observers. However, existing methods for scanpath generation do not adequately predict realistic scanpaths for 360° images. We present ScanGAN360, a new generative adversarial approach to address this problem. We propose a novel loss function based on dynamic time warping and tailor our network to the specifics of 360° images. The quality of our generated scanpaths outperforms competing approaches by a large margin, and is almost on par with the human baseline. ScanGAN360 allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior, facilitating experimentation, and aiding novel applications in virtual reality and beyond.
Collapse
|
37
|
Chen S, Duinkharjav B, Sun X, Wei LY, Petrangeli S, Echevarria J, Silva C, Sun Q. Instant Reality: Gaze-Contingent Perceptual Optimization for 3D Virtual Reality Streaming. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2157-2167. [PMID: 35148266 DOI: 10.1109/tvcg.2022.3150522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Media streaming, with an edge-cloud setting, has been adopted for a variety of applications such as entertainment, visualization, and design. Unlike video/audio streaming where the content is usually consumed passively, virtual reality applications require 3D assets stored on the edge to facilitate frequent edge-side interactions such as object manipulation and viewpoint movement. Compared to audio and video streaming, 3D asset streaming often requires larger data sizes and yet lower latency to ensure sufficient rendering quality, resolution, and latency for perceptual comfort. Thus, streaming 3D assets faces remarkably additional than streaming audios/videos, and existing solutions often suffer from long loading time or limited quality. To address this challenge, we propose a perceptually-optimized progressive 3D streaming method for spatial quality and temporal consistency in immersive interactions. On the cloud-side, our main idea is to estimate perceptual importance in 2D image space based on user gaze behaviors, including where they are looking and how their eyes move. The estimated importance is then mapped to 3D object space for scheduling the streaming priorities for edge-side rendering. Since this computational pipeline could be heavy, we also develop a simple neural network to accelerate the cloud-side scheduling process. We evaluate our method via subjective studies and objective analysis under varying network conditions (from 3G to 5G) and edge devices (HMD and traditional displays), and demonstrate better visual quality and temporal consistency than alternative solutions.
Collapse
|
38
|
Xu M, Jiang L, Li C, Wang Z, Tao X. Viewport-Based CNN: A Multi-Task Approach for Assessing 360° Video Quality. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2198-2215. [PMID: 33017289 DOI: 10.1109/tpami.2020.3028509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
For 360° video, the existing visual quality assessment (VQA) approaches are designed based on either the whole frames or the cropped patches, ignoring the fact that subjects can only access viewports. When watching 360° video, subjects select viewports through head movement (HM) and then fixate on attractive regions within the viewports through eye movement (EM). Therefore, this paper proposes a two-staged multi-task approach for viewport-based VQA on 360° video. Specifically, we first establish a large-scale VQA dataset of 360° video, called VQA-ODV, which collects the subjective quality scores and the HM and EM data on 600 video sequences. By mining our dataset, we find that the subjective quality of 360° video is related to camera motion, viewport positions and saliency within viewports. Accordingly, we propose a viewport-based convolutional neural network (V-CNN) approach for VQA on 360° video, which has a novel multi-task architecture composed of a viewport proposal network (VP-net) and viewport quality network (VQ-net). The VP-net handles the auxiliary tasks of camera motion detection and viewport proposal, while the VQ-net accomplishes the auxiliary task of viewport saliency prediction and the main task of VQA. The experiments validate that our V-CNN approach significantly advances state-of-the-art VQA performance on 360° video and it is also effective in the three auxiliary tasks.
Collapse
|
39
|
Malpica S, Masia B, Herman L, Wetzstein G, Eagleman DM, Gutierrez D, Bylinskii Z, Sun Q. Larger visual changes compress time: The inverted effect of asemantic visual features on interval time perception. PLoS One 2022; 17:e0265591. [PMID: 35316292 PMCID: PMC8939824 DOI: 10.1371/journal.pone.0265591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/04/2022] [Indexed: 12/01/2022] Open
Abstract
Time perception is fluid and affected by manipulations to visual inputs. Previous literature shows that changes to low-level visual properties alter time judgments at the millisecond-level. At longer intervals, in the span of seconds and minutes, high-level cognitive effects (e.g., emotions, memories) elicited by visual inputs affect time perception, but these effects are confounded with semantic information in these inputs, and are therefore challenging to measure and control. In this work, we investigate the effect of asemantic visual properties (pure visual features devoid of emotional or semantic value) on interval time perception. Our experiments were conducted with binary and production tasks in both conventional and head-mounted displays, testing the effects of four different visual features (spatial luminance contrast, temporal frequency, field of view, and visual complexity). Our results reveal a consistent pattern: larger visual changes all shorten perceived time in intervals of up to 3min, remarkably contrary to their effect on millisecond-level perception. Our findings may help alter participants' time perception, which can have broad real-world implications.
Collapse
Affiliation(s)
| | | | - Laura Herman
- Adobe, Inc., Mountain View, CA, United States of America
| | - Gordon Wetzstein
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
| | - David M. Eagleman
- Department of Psychiatry, Stanford University School of Medicine, Stanford, CA, United States of America
| | | | - Zoya Bylinskii
- Adobe, Inc., Mountain View, CA, United States of America
| | - Qi Sun
- Adobe, Inc., Mountain View, CA, United States of America
- New York University, New York, NY, United States of America
| |
Collapse
|
40
|
David EJ, Lebranchu P, Perreira Da Silva M, Le Callet P. What are the visuo-motor tendencies of omnidirectional scene free-viewing in virtual reality? J Vis 2022; 22:12. [PMID: 35323868 PMCID: PMC8963670 DOI: 10.1167/jov.22.4.12] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 02/08/2022] [Indexed: 11/24/2022] Open
Abstract
Central and peripheral vision during visual tasks have been extensively studied on two-dimensional screens, highlighting their perceptual and functional disparities. This study has two objectives: replicating on-screen gaze-contingent experiments removing central or peripheral field of view in virtual reality, and identifying visuo-motor biases specific to the exploration of 360 scenes with a wide field of view. Our results are useful for vision modelling, with applications in gaze position prediction (e.g., content compression and streaming). We ask how previous on-screen findings translate to conditions where observers can use their head to explore stimuli. We implemented a gaze-contingent paradigm to simulate loss of vision in virtual reality, participants could freely view omnidirectional natural scenes. This protocol allows the simulation of vision loss with an extended field of view (\(\gt \)80°) and studying the head's contributions to visual attention. The time-course of visuo-motor variables in our pure free-viewing task reveals long fixations and short saccades during first seconds of exploration, contrary to literature in visual tasks guided by instructions. We show that the effect of vision loss is reflected primarily on eye movements, in a manner consistent with two-dimensional screens literature. We hypothesize that head movements mainly serve to explore the scenes during free-viewing, the presence of masks did not significantly impact head scanning behaviours. We present new fixational and saccadic visuo-motor tendencies in a 360° context that we hope will help in the creation of gaze prediction models dedicated to virtual reality.
Collapse
Affiliation(s)
- Erwan Joël David
- Department of Psychology, Goethe-Universität, Frankfurt, Germany
| | - Pierre Lebranchu
- LS2N UMR CNRS 6004, University of Nantes and Nantes University Hospital, Nantes, France
| | | | - Patrick Le Callet
- LS2N UMR CNRS 6004, University of Nantes, Nantes, France
- http://pagesperso.ls2n.fr/~lecallet-p/index.html
| |
Collapse
|
41
|
Wearable Technology and Visual Reality Application for Healthcare Systems. ELECTRONICS 2022. [DOI: 10.3390/electronics11020178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study developed a virtual reality interactive game with smart wireless wearable technology for healthcare of elderly users. The proposed wearable system uses its intelligent and wireless features to collect electromyography signals and upload them to a cloud database for further analysis. The electromyography signals are then analyzed for the users’ muscle fatigue, health, strength, and other physiological conditions. The average slope maximum So and Chan (ASM S & C) algorithm is integrated in the proposed system to effectively detect the quantity of electromyography peaks, and the accuracy is as high as 95%. The proposed system can promote the health conditions of elderly users, and motivate them to acquire new knowledge of science and technology.
Collapse
|
42
|
Ren X, Duan H, Min X, Zhu Y, Shen W, Wang L, Shi F, Fan L, Yang X, Zhai G. Where are the Children with Autism Looking in Reality? ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20500-2_48] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
43
|
|
44
|
Zhu D, Chen Y, Min X, Zhu Y, Zhang G, Zhou Q, Zhai G, Yang X. RANSP: Ranking attention network for saliency prediction on omnidirectional images. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
45
|
Saliency prediction on omnidirectional images with attention-aware feature fusion network. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01857-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
46
|
Masia B, Camon J, Gutierrez D, Serrano A. Influence of Directional Sound Cues on Users' Exploration Across 360° Movie Cuts. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2021; 41:64-75. [PMID: 33705310 DOI: 10.1109/mcg.2021.3064688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Virtual reality (VR) is a powerful medium for $360^{\circ }$360∘ storytelling, yet content creators are still in the process of developing cinematographic rules for effectively communicating stories in VR. Traditional cinematography has relied for over a century on well-established techniques for editing, and one of the most recurrent resources for this are cinematic cuts that allow content creators to seamlessly transition between scenes. One fundamental assumption of these techniques is that the content creator can control the camera; however, this assumption breaks in VR: Users are free to explore $360^{\circ }$360∘ around them. Recent works have studied the effectiveness of different cuts in $360^{\circ }$360∘ content, but the effect of directional sound cues while experiencing these cuts has been less explored. In this work, we provide the first systematic analysis of the influence of directional sound cues in users' behavior across $360^{\circ }$360∘ movie cuts, providing insights that can have an impact on deriving conventions for VR storytelling.
Collapse
|
47
|
Du R, Varshney A, Potel M. Saliency Computation for Virtual Cinematography in 360° Videos. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2021; 41:99-106. [PMID: 34264820 DOI: 10.1109/mcg.2021.3080320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent advances in virtual reality cameras have contributed to a phenomenal growth of 360$^{\circ }$∘ videos. Estimating regions likely to attract user attention is critical for efficiently streaming and rendering 360$^{\circ }$∘ videos. In this article, we present a simple, novel, GPU-driven pipeline for saliency computation and virtual cinematography in 360$^{\circ }$∘ videos using spherical harmonics (SH). We efficiently compute the 360$^{\circ }$∘ video saliency through the spectral residual of the SH coefficients between multiple bands at over 60FPS for 4K resolution videos. Further, our interactive computation of spherical saliency can be used for saliency-guided virtual cinematography in 360$^{\circ }$∘ videos.
Collapse
|
48
|
Thatte J, Girod B. Real-World Virtual Reality With Head-Motion Parallax. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2021; 41:29-39. [PMID: 34010127 DOI: 10.1109/mcg.2021.3082041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Most of the real-world virtual reality (VR) content available today is captured and rendered from a fixed vantage point. The visual-vestibular conflict arising from the lack of head-motion parallax degrades the feeling of presence in the virtual environment and has been shown to induce nausea and visual discomfort. We present an end-to-end framework for VR with head-motion parallax for real-world scenes. To capture both horizontally and vertically separated perspectives, we use a camera rig with two vertically stacked rings of outward-facing cameras. The data from the rig are processed offline and stored into a compact intermediate representation, which is used to render novel views for a head-mounted display, in accordance with the viewer's head movements. We compare two promising intermediate representations-Stacked OmniStereo and Layered Depth Panoramas-and evaluate them in terms of objective image quality metrics and the occurrence of disocclusion holes in synthesized novel views.
Collapse
|
49
|
Hepperle D, Purps CF, Deuchler J, Wölfel M. Aspects of visual avatar appearance: self-representation, display type, and uncanny valley. THE VISUAL COMPUTER 2021; 38:1227-1244. [PMID: 34177022 PMCID: PMC8211459 DOI: 10.1007/s00371-021-02151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/23/2021] [Indexed: 06/13/2023]
Abstract
The visual representation of human-like entities in virtual worlds is becoming a very important aspect as virtual reality becomes more and more "social". The visual representation of a character's resemblance to a real person and the emotional response to it, as well as the expectations raised, have been a topic of discussion for several decades and have been debated by scientists from different disciplines. But as with any new technology, the findings may need to be reevaluated and adapted to new modalities. In this context, we make two contributions which may have implications for how avatars should be represented in social virtual reality applications. First, we determine how default and customized characters of current social virtual reality platforms appear in terms of human likeness, eeriness, and likability, and whether there is a clear resemblance to a given person. It can be concluded that the investigated platforms vary strongly in their representation of avatars. Common to all is that a clear resemblance does not exist. Second, we show that the uncanny valley effect is also present in head-mounted displays, but-compared to 2D monitors-even more pronounced.
Collapse
Affiliation(s)
- Daniel Hepperle
- Faculty of Computer Science and Business Information Systems, Karlsruhe University of Applied Sciences, Karlsruhe, Germany
- Faculty of Business, Economics and Social Sciences, University of Hohenheim, Stuttgart, Germany
| | - Christian Felix Purps
- Faculty of Computer Science and Business Information Systems, Karlsruhe University of Applied Sciences, Karlsruhe, Germany
| | - Jonas Deuchler
- Faculty of Computer Science and Business Information Systems, Karlsruhe University of Applied Sciences, Karlsruhe, Germany
| | - Matthias Wölfel
- Faculty of Computer Science and Business Information Systems, Karlsruhe University of Applied Sciences, Karlsruhe, Germany
- Faculty of Business, Economics and Social Sciences, University of Hohenheim, Stuttgart, Germany
| |
Collapse
|
50
|
Hu Z, Bulling A, Li S, Wang G. FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2681-2690. [PMID: 33750707 DOI: 10.1109/tvcg.2021.3067779] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human visual attention in immersive virtual reality (VR) is key for many important applications, such as content design, gaze-contingent rendering, or gaze-based interaction. However, prior works typically focused on free-viewing conditions that have limited relevance for practical applications. We first collect eye tracking data of 27 participants performing a visual search task in four immersive VR environments. Based on this dataset, we provide a comprehensive analysis of the collected data and reveal correlations between users' eye fixations and other factors, i.e. users' historical gaze positions, task-related objects, saliency information of the VR content, and users' head rotation velocities. Based on this analysis, we propose FixationNet - a novel learning-based model to forecast users' eye fixations in the near future in VR. We evaluate the performance of our model for free-viewing and task-oriented settings and show that it outperforms the state of the art by a large margin of 19.8% (from a mean error of 2.93° to 2.35°) in free-viewing and of 15.1% (from 2.05° to 1.74°) in task-oriented situations. As such, our work provides new insights into task-oriented attention in virtual environments and guides future work on this important topic in VR research.
Collapse
|