1
|
Afzal S, Ghani S, Hittawe MM, Rashid SF, Knio OM, Hadwiger M, Hoteit I. Visualization and Visual Analytics Approaches for Image and Video Datasets: A Survey. ACM T INTERACT INTEL 2023. [DOI: 10.1145/3576935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Image and video data analysis has become an increasingly important research area with applications in different domains such as security surveillance, healthcare, augmented and virtual reality, video and image editing, activity analysis and recognition, synthetic content generation, distance education, telepresence, remote sensing, sports analytics, art, non-photorealistic rendering, search engines, and social media. Recent advances in Artificial Intelligence (AI) and particularly deep learning have sparked new research challenges and led to significant advancements, especially in image and video analysis. These advancements have also resulted in significant research and development in other areas such as visualization and visual analytics, and have created new opportunities for future lines of research. In this survey paper, we present the current state of the art at the intersection of visualization and visual analytics, and image and video data analysis. We categorize the visualization papers included in our survey based on different taxonomies used in visualization and visual analytics research. We review these papers in terms of task requirements, tools, datasets, and application areas. We also discuss insights based on our survey results, trends and patterns, the current focus of visualization research, and opportunities for future research.
Collapse
Affiliation(s)
- Shehzad Afzal
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Sohaib Ghani
- King Abdullah University of Science & Technology, Saudi Arabia
| | | | | | - Omar M Knio
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Markus Hadwiger
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Ibrahim Hoteit
- King Abdullah University of Science & Technology, Saudi Arabia
| |
Collapse
|
2
|
Deng Z, Weng D, Liu S, Tian Y, Xu M, Wu Y. A survey of urban visual analytics: Advances and future directions. COMPUTATIONAL VISUAL MEDIA 2022; 9:3-39. [PMID: 36277276 PMCID: PMC9579670 DOI: 10.1007/s41095-022-0275-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/08/2022] [Indexed: 06/16/2023]
Abstract
Developing effective visual analytics systems demands care in characterization of domain problems and integration of visualization techniques and computational models. Urban visual analytics has already achieved remarkable success in tackling urban problems and providing fundamental services for smart cities. To promote further academic research and assist the development of industrial urban analytics systems, we comprehensively review urban visual analytics studies from four perspectives. In particular, we identify 8 urban domains and 22 types of popular visualization, analyze 7 types of computational method, and categorize existing systems into 4 types based on their integration of visualization techniques and computational models. We conclude with potential research directions and opportunities.
Collapse
Affiliation(s)
- Zikun Deng
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Di Weng
- Microsoft Research Asia, Beijing, 100080 China
| | - Shuhan Liu
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Yuan Tian
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Mingliang Xu
- School of Information Engineering, Zhengzhou University, Zhengzhou, China
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou, 450001 China
| | - Yingcai Wu
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| |
Collapse
|
3
|
Helala MA, Qureshi FZ, Pu KQ. A Stream Algebra for Performance Optimization of Large Scale Computer Vision Pipelines. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:905-923. [PMID: 32780697 DOI: 10.1109/tpami.2020.3015867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
There is a large growth in hardware and software systems capable of producing vast amounts of image and video data. These systems are rich sources of continuous image and video streams. This motivates researchers to build scalable computer vision systems that utilize data-streaming concepts for processing of visual data streams. However, several challenges exist in building large-scale computer vision systems. For example, computer vision algorithms have different accuracy and speed profiles depending on the content, type, and speed of incoming data. Also, it is not clear how to adaptively tune these algorithms in large-scale systems. These challenges exist because we lack formal frameworks for building and optimizing large-scale visual processing. This paper presents formal methods and algorithms that aim to overcome these challenges and improve building and optimizing large-scale computer vision systems. We describe a formal algebra framework for the mathematical description of computer vision pipelines for processing image and video streams. The algebra naturally describes feedback control and provides a formal and abstract method for optimizing computer vision pipelines. We then show that a general optimizer can be used with the feedback-control mechanisms of our stream algebra to provide a common online parameter optimization method for computer vision pipelines.
Collapse
|
4
|
Tang T, Wu Y, Wu Y, Yu L, Li Y. VideoModerator: A Risk-aware Framework for Multimodal Video Moderation in E-Commerce. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:846-856. [PMID: 34587029 DOI: 10.1109/tvcg.2021.3114781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Video moderation, which refers to remove deviant or explicit content from e-commerce livestreams, has become prevalent owing to social and engaging features. However, this task is tedious and time consuming due to the difficulties associated with watching and reviewing multimodal video content, including video frames and audio clips. To ensure effective video moderation, we propose VideoModerator, a risk-aware framework that seamlessly integrates human knowledge with machine insights. This framework incorporates a set of advanced machine learning models to extract the risk-aware features from multimodal video content and discover potentially deviant videos. Moreover, this framework introduces an interactive visualization interface with three views, namely, a video view, a frame view, and an audio view. In the video view, we adopt a segmented timeline and highlight high-risk periods that may contain deviant information. In the frame view, we present a novel visual summarization method that combines risk-aware features and video context to enable quick video navigation. In the audio view, we employ a storyline-based design to provide a multi-faceted overview which can be used to explore audio content. Furthermore, we report the usage of VideoModerator through a case scenario and conduct experiments and a controlled user study to validate its effectiveness.
Collapse
|
5
|
Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong TC, Qu H. EmotionCues: Emotion-Oriented Visual Summarization of Classroom Videos. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3168-3181. [PMID: 31902765 DOI: 10.1109/tvcg.2019.2963659] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Analyzing students' emotions from classroom videos can help both teachers and parents quickly know the engagement of students in class. The availability of high-definition cameras creates opportunities to record class scenes. However, watching videos is time-consuming, and it is challenging to gain a quick overview of the emotion distribution and find abnormal emotions. In this article, we propose EmotionCues, a visual analytics system to easily analyze classroom videos from the perspective of emotion summary and detailed analysis, which integrates emotion recognition algorithms with visualizations. It consists of three coordinated views: a summary view depicting the overall emotions and their dynamic evolution, a character view presenting the detailed emotion status of an individual, and a video view enhancing the video analysis with further details. Considering the possible inaccuracy of emotion recognition, we also explore several factors affecting the emotion analysis, such as face size and occlusion. They provide hints for inferring the possible inaccuracy and the corresponding reasons. Two use cases and interviews with end users and domain experts are conducted to show that the proposed system could be useful and effective for analyzing emotions in the classroom videos.
Collapse
|
6
|
Sun G, Wu H, Zhu L, Xu C, Liang H, Xu B, Liang R. VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3458928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.
Collapse
Affiliation(s)
- Guodao Sun
- Zhejiang University of Technology, Hangzhou, China
| | - Hao Wu
- Zhejiang University of Technology, Hangzhou, China
| | - Lin Zhu
- Zhejiang University of Technology, Hangzhou, China
| | - Chaoqing Xu
- Zhejiang University of Technology, Hangzhou, China
| | - Haoran Liang
- Zhejiang University of Technology, Hangzhou, China
| | - Binwei Xu
- Zhejiang University of Technology, Hangzhou, China
| | | |
Collapse
|
7
|
Park JH, Nadeem S, Boorboor S, Marino J, Kaufman A. CMed: Crowd Analytics for Medical Imaging Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2869-2880. [PMID: 31751242 PMCID: PMC7859862 DOI: 10.1109/tvcg.2019.2953026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a visual analytics framework, CMed, for exploring medical image data annotations acquired from crowdsourcing. CMed can be used to visualize, classify, and filter crowdsourced clinical data based on a number of different metrics such as detection rate, logged events, and clustering of the annotations. CMed provides several interactive linked visualization components to analyze the crowd annotation results for a particular video and the associated workers. Additionally, all results of an individual worker can be inspected using multiple linked views in our CMed framework. We allow a crowdsourcing application analyst to observe patterns and gather insights into the crowdsourced medical data, helping him/her design future crowdsourcing applications for optimal output from the workers. We demonstrate the efficacy of our framework with two medical crowdsourcing studies: polyp detection in virtual colonoscopy videos and lung nodule detection in CT thin-slab maximum intensity projection videos. We also provide experts' feedback to show the effectiveness of our framework. Lastly, we share the lessons we learned from our framework with suggestions for integrating our framework into a clinical workflow.
Collapse
|
8
|
Li W, Qi D, Zhang C, Guo J, Yao J. Video Summarization Based on Mutual Information and Entropy Sliding Window Method. ENTROPY (BASEL, SWITZERLAND) 2020; 22:e22111285. [PMID: 33287053 PMCID: PMC7711815 DOI: 10.3390/e22111285] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/06/2020] [Accepted: 11/10/2020] [Indexed: 06/12/2023]
Abstract
This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.
Collapse
Affiliation(s)
- WenLin Li
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China;
| | - DeYu Qi
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China; (J.G.); (J.Y.)
| | - ChangJian Zhang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China;
| | - Jing Guo
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China; (J.G.); (J.Y.)
| | - JiaJun Yao
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China; (J.G.); (J.Y.)
| |
Collapse
|
9
|
Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.08.003] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
10
|
Zeng H, Wang X, Wu A, Wang Y, Li Q, Endert A, Qu H. EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:927-937. [PMID: 31443002 DOI: 10.1109/tvcg.2019.2934656] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Emotions play a key role in human communication and public presentations. Human emotions are usually expressed through multiple modalities. Therefore, exploring multimodal emotions and their coherence is of great value for understanding emotional expressions in presentations and improving presentation skills. However, manually watching and studying presentation videos is often tedious and time-consuming. There is a lack of tool support to help conduct an efficient and in-depth multi-level analysis. Thus, in this paper, we introduce EmoCo, an interactive visual analytics system to facilitate efficient analysis of emotion coherence across facial, text, and audio modalities in presentation videos. Our visualization system features a channel coherence view and a sentence clustering view that together enable users to obtain a quick overview of emotion coherence and its temporal evolution. In addition, a detail view and word view enable detailed exploration and comparison from the sentence level and word level, respectively. We thoroughly evaluate the proposed system and visualization techniques through two usage scenarios based on TED Talk videos and interviews with two domain experts. The results demonstrate the effectiveness of our system in gaining insights into emotion coherence in presentations.
Collapse
|
11
|
Integration of GIS and Moving Objects in Surveillance Video. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2017. [DOI: 10.3390/ijgi6040094] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
Zhao B, Xu S, Lin S, Luo X, Duan L. A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos. J Am Med Inform Assoc 2015; 23:e34-41. [PMID: 26335986 DOI: 10.1093/jamia/ocv123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 07/11/2015] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Biomedical videos as open educational resources (OERs) are increasingly proliferating on the Internet. Unfortunately, seeking personally valuable content from among the vast corpus of quality yet diverse OER videos is nontrivial due to limitations of today's keyword- and content-based video retrieval techniques. To address this need, this study introduces a novel visual navigation system that facilitates users' information seeking from biomedical OER videos in mass quantity by interactively offering visual and textual navigational clues that are both semantically revealing and user-friendly. MATERIALS AND METHODS The authors collected and processed around 25 000 YouTube videos, which collectively last for a total length of about 4000 h, in the broad field of biomedical sciences for our experiment. For each video, its semantic clues are first extracted automatically through computationally analyzing audio and visual signals, as well as text either accompanying or embedded in the video. These extracted clues are subsequently stored in a metadata database and indexed by a high-performance text search engine. During the online retrieval stage, the system renders video search results as dynamic web pages using a JavaScript library that allows users to interactively and intuitively explore video content both efficiently and effectively.ResultsThe authors produced a prototype implementation of the proposed system, which is publicly accessible athttps://patentq.njit.edu/oer To examine the overall advantage of the proposed system for exploring biomedical OER videos, the authors further conducted a user study of a modest scale. The study results encouragingly demonstrate the functional effectiveness and user-friendliness of the new system for facilitating information seeking from and content exploration among massive biomedical OER videos. CONCLUSION Using the proposed tool, users can efficiently and effectively find videos of interest, precisely locate video segments delivering personally valuable information, as well as intuitively and conveniently preview essential content of a single or a collection of videos.
Collapse
Affiliation(s)
- Baoquan Zhao
- National Engineering Research Center of Digital Life, School of Information Science and Technology, Sun Yat-sen University, Guangzhou, P.R. China
| | - Songhua Xu
- Information Systems Department, New Jersey Institute of Technology, Newark, NJ, USA
| | - Shujin Lin
- National Engineering Research Center of Digital Life, School of Information Science and Technology, Sun Yat-sen University, Guangzhou, P.R. China School of Communication and Design, Sun Yat-sen University, Guangzhou, P.R. China
| | - Xiaonan Luo
- National Engineering Research Center of Digital Life, School of Information Science and Technology, Sun Yat-sen University, Guangzhou, P.R. China
| | - Lian Duan
- Department of Information Systems and Business Analytics, Hofstra University, NY, USA
| |
Collapse
|