1
|
Warchol S, Troidl J, Muhlich J, Krueger R, Hoffer J, Lin T, Beyer J, Glassman E, Sorger PK, Pfister H. psudo: Exploring Multi-Channel Biomedical Image Data with Spatially and Perceptually Optimized Pseudocoloring. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.589087. [PMID: 38659870 PMCID: PMC11042212 DOI: 10.1101/2024.04.11.589087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Over the past century, multichannel fluorescence imaging has been pivotal in myriad scientific breakthroughs by enabling the spatial visualization of proteins within a biological sample. With the shift to digital methods and visualization software, experts can now flexibly pseudocolor and combine image channels, each corresponding to a different protein, to explore their spatial relationships. We thus propose psudo, an interactive system that allows users to create optimal color palettes for multichannel spatial data. In psudo, a novel optimization method generates palettes that maximize the perceptual differences between channels while mitigating confusing color blending in overlapping channels. We integrate this method into a system that allows users to explore multi-channel image data and compare and evaluate color palettes for their data. An interactive lensing approach provides on-demand feedback on channel overlap and a color confusion metric while giving context to the underlying channel values. Color palettes can be applied globally or, using the lens, to local regions of interest. We evaluate our palette optimization approach using three graphical perception tasks in a crowdsourced user study with 150 participants, showing that users are more accurate at discerning and comparing the underlying data using our approach. Additionally, we showcase psudo in a case study exploring the complex immune responses in cancer tissue data with a biologist.
Collapse
Affiliation(s)
- Simon Warchol
- Harvard John A. Paulson School Of Engineering And Applied Sciences
- Visual Computing Group, Harvard University
- Laboratory of Systems Pharmacology, Harvard Medical School
| | - Jakob Troidl
- Harvard John A. Paulson School Of Engineering And Applied Sciences
- Visual Computing Group, Harvard University
| | - Jeremy Muhlich
- Department of Systems Biology, Harvard Medical School
- Visual Computing Group, Harvard University
| | - Robert Krueger
- Laboratory of Systems Pharmacology, Harvard Medical School
| | - John Hoffer
- Department of Systems Biology, Harvard Medical School
- Laboratory of Systems Pharmacology, Harvard Medical School
| | - Tica Lin
- Harvard John A. Paulson School Of Engineering And Applied Sciences
- Visual Computing Group, Harvard University
| | - Johanna Beyer
- Harvard John A. Paulson School Of Engineering And Applied Sciences
- Visual Computing Group, Harvard University
| | - Elena Glassman
- Harvard John A. Paulson School Of Engineering And Applied Sciences
| | - Peter K Sorger
- Department of Systems Biology, Harvard Medical School
- Laboratory of Systems Pharmacology, Harvard Medical School
| | - Hanspeter Pfister
- Harvard John A. Paulson School Of Engineering And Applied Sciences
- Visual Computing Group, Harvard University
- Laboratory of Systems Pharmacology, Harvard Medical School
| |
Collapse
|
2
|
Chen C, Fan L, Gao Y, Qiu S, Wei W, He H. EEG-FRM: a neural network based familiar and unfamiliar face EEG recognition method. Cogn Neurodyn 2024; 18:357-370. [PMID: 38699605 PMCID: PMC11061081 DOI: 10.1007/s11571-024-10073-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/28/2023] [Accepted: 01/23/2024] [Indexed: 05/05/2024] Open
Abstract
Recognizing familiar faces holds great value in various fields such as medicine, criminal investigation, and lie detection. In this paper, we designed a Complex Trial Protocol-based familiar and unfamiliar face recognition experiment that using self-face information, and collected EEG data from 147 subjects. A novel neural network-based method, the EEG-based Face Recognition Model (EEG-FRM), is proposed in this paper for cross-subject familiar/unfamiliar face recognition, which combines a multi-scale convolutional classification network with the maximum probability mechanism to realize individual face recognition. The multi-scale convolutional neural network extracts temporal information and spatial features from the EEG data, the attention module and supervised contrastive learning module are employed to promote the classification performance. Experimental results on the dataset reveal that familiar face stimuli could evoke significant P300 responses, mainly concentrated in the parietal lobe and nearby regions. Our proposed model achieved impressive results, with a balanced accuracy of 85.64%, a true positive rate of 73.23%, and a false positive rate of 1.96% on the collected dataset, outperforming other compared methods. The experimental results demonstrate the effectiveness and superiority of our proposed model.
Collapse
Affiliation(s)
- Chao Chen
- Key Laboratory of Complex System Control Theory and Application, Tianjin University of Technology, Tianjin, China
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Lingfeng Fan
- Key Laboratory of Complex System Control Theory and Application, Tianjin University of Technology, Tianjin, China
| | - Ying Gao
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China
| | - Shuang Qiu
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Wei
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China
| | - Huiguang He
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
3
|
Du X, Xu X, Liu Y, Wang Z, Qiu H, Zhao A, Lu L. Cell Heterogeneity Analysis Revealed the Key Role of Fibroblasts in the Magnum Regression of Ducks. Animals (Basel) 2024; 14:1072. [PMID: 38612311 PMCID: PMC11011120 DOI: 10.3390/ani14071072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 04/14/2024] Open
Abstract
Duck egg production, like that of laying hens, follows a typical low-peak-low cycle, reflecting the dynamics of the reproductive system. Post-peak, some ducks undergo a cessation of egg laying, indicative of a regression process in the oviduct. Notably, the magnum, being the longest segment of the oviduct, plays a crucial role in protein secretion. Despite its significance, few studies have investigated the molecular mechanisms underlying oviduct regression in ducks that have ceased laying eggs. In this study, we conducted single-cell transcriptome sequencing on the magnum tissue of Shaoxing ducks at 467 days of age, utilizing the 10× Genomics platform. This approach allowed us to generate a detailed magnum transcriptome map of both egg-laying and ceased-laying ducks. We collected transcriptome data from 13,708 individual cells, which were then subjected to computational analysis, resulting in the identification of 27 distinct cell clusters. Marker genes were subsequently employed to categorize these clusters into specific cell types. Our analysis revealed notable heterogeneity in magnum cells between the egg-laying and ceased-laying ducks, primarily characterized by variations in cells involved in protein secretion and extracellular matrix (ECM)-producing fibroblasts. Specifically, cells engaged in protein secretion were predominantly observed in the egg-laying ducks, indicative of their role in functional albumen deposition within the magnum, a phenomenon not observed in the ceased-laying ducks. Moreover, the proportion of THY1+ cells within the ECM-producing fibroblasts was found to be significantly higher in the egg-laying ducks (59%) compared to the ceased-laying ducks (24%). Similarly, TIMP4+ fibroblasts constituted a greater proportion of the ECM-producing fibroblasts in the egg-laying ducks (83%) compared to the ceased-laying ducks (58%). These findings suggest a potential correlation between the expression of THY1 and TIMP4 in ECM-producing fibroblasts and oviduct activity during functional reproduction. Our study provides valuable single-cell insights that warrant further investigation into the biological implications of fibroblast subsets in the degeneration of the reproductive tract. Moreover, these insights hold promise for enhancing the production efficiency of laying ducks.
Collapse
Affiliation(s)
- Xue Du
- Key Laboratory of Applied Technology on Green-Eco-Healthy Animal Husbandry of Zhejiang Province, Zhejiang Provincial Engineering Laboratory for Animal Health Inspection & Internet Technology, Zhejiang International Science and Technology Cooperation Base for Veterinary Medicine and Health Management, China-Australia Joint Laboratory for Animal Health Big Data Analytics, College of Animal Science and Technology & College of Veterinary Medicine of Zhejiang A&F University, Hangzhou 311300, China; (X.D.)
| | - Xiaoqin Xu
- Institute of Ecology, China West Normal University, Nanchong 637002, China
| | - Yali Liu
- Zhejiang Provincial Animal Husbandry Technology Promotion and Breeding Livestock and Poultry Monitoring Station, Hangzhou 310020, China
| | - Zhijun Wang
- Key Laboratory of Applied Technology on Green-Eco-Healthy Animal Husbandry of Zhejiang Province, Zhejiang Provincial Engineering Laboratory for Animal Health Inspection & Internet Technology, Zhejiang International Science and Technology Cooperation Base for Veterinary Medicine and Health Management, China-Australia Joint Laboratory for Animal Health Big Data Analytics, College of Animal Science and Technology & College of Veterinary Medicine of Zhejiang A&F University, Hangzhou 311300, China; (X.D.)
| | - Hao Qiu
- Independent Researcher, Hangzhou 310021, China
| | - Ayong Zhao
- Key Laboratory of Applied Technology on Green-Eco-Healthy Animal Husbandry of Zhejiang Province, Zhejiang Provincial Engineering Laboratory for Animal Health Inspection & Internet Technology, Zhejiang International Science and Technology Cooperation Base for Veterinary Medicine and Health Management, China-Australia Joint Laboratory for Animal Health Big Data Analytics, College of Animal Science and Technology & College of Veterinary Medicine of Zhejiang A&F University, Hangzhou 311300, China; (X.D.)
| | - Lizhi Lu
- Key Laboratory of Livestock and Poultry Resources (Poultry) Evaluation and Utilization, Ministry of Agriculture and Rural Affairs of China, State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Institute of Animal Science & Veterinary, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| |
Collapse
|
4
|
Narvaez-Montoya C, Mahlknecht J, Torres-Martínez JA, Mora A, Pino-Vargas E. FlowSOM clustering - A novel pattern recognition approach for water research: Application to a hyper-arid coastal aquifer system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 915:169988. [PMID: 38211857 DOI: 10.1016/j.scitotenv.2024.169988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 01/13/2024]
Abstract
Monitoring and understanding of water resources have become essential in designing effective and sustainable management strategies to overcome the growing water quality challenges. In this context, the utilization of unsupervised learning techniques for evaluating environmental tracers has facilitated the exploration of sources and dynamics of groundwater systems through pattern recognition. However, conventional techniques may overlook spatial and temporal non-linearities present in water research data. This paper introduces the adaptation of FlowSOM, a pioneering approach that combines self-organizing maps (SOM) and minimal spanning trees (MST), with the fast-greedy network clustering algorithm to unravel intricate relationships within multivariate water quality datasets. By capturing connections within the data, this ensemble tool enhances clustering and pattern recognition. Applied to the complex water quality context of the hyper-arid transboundary Caplina/Concordia coastal aquifer system (Peru/Chile), the FlowSOM network and clustering yielded compelling results in pattern recognition of the aquifer salinization. Analyzing 143 groundwater samples across eight variables, including major ions, the approach supports the identification of distinct clusters and connections between them. Three primary sources of salinization were identified: river percolation, slow lateral aquitard recharge, and seawater intrusion. The analysis demonstrated the superiority of FlowSOM clustering over traditional techniques in the case study, producing clusters that align more closely with the actual hydrogeochemical pattern. The outcomes broaden the utilization of multivariate analysis in water research, presenting a comprehensive approach to support the understanding of groundwater systems.
Collapse
Affiliation(s)
- Christian Narvaez-Montoya
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Jürgen Mahlknecht
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico.
| | - Juan Antonio Torres-Martínez
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Abrahan Mora
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Edwin Pino-Vargas
- Facultad de Ingenieria Civil, Arquitectura y Geotecnia, Universidad Nacional Jorge Basadre Grohmann, Av. Miraflores S/N, Tacna 23000, Peru
| |
Collapse
|
5
|
Wang J, Wang H, Xu J, Song Q, Zhou B, Shangguan J, Xue M, Wang Y. Identification of protein signatures for lung cancer subtypes based on BPSO method. PLoS One 2023; 18:e0294243. [PMID: 38060494 PMCID: PMC10703216 DOI: 10.1371/journal.pone.0294243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
The objective of this study was to identify protein biomarkers that can distinguish between LUAD and LUSC, critical for personalized treatment plans. The proteomic profiling data of LUAD and LUSC samples from TCPA database, along with phenotype and survival information from TCGA database were downloaded and preprocessed for analysis. We used BPSO feature selection method and identified 10 candidate protein biomarkers that have better classifying performance, as analyzed by t-SNE and PCA algorithms. To explore the causalities among these proteins and their associations with tumor subtypes, we conducted the PCStable algorithm to construct a regulatory network. Results indicated that 4 proteins, MIG6, CD26, NF2, and INPP4B, were directly linked to the lung cancer subtypes and may be useful in guiding therapeutic decision-making. Besides, spearman correlation, Cox proportional hazard model and Kaplan-Meier curve was employed to validate the biological significance of the candidate proteins. In summary, our study highlights the importance of protein biomarkers in the classification of lung cancer subtypes and the potential of computational methods for identifying key biomarkers and understanding their underlying biological mechanisms.
Collapse
Affiliation(s)
- Jihan Wang
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
| | - Hanping Wang
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
- Engineering Research Center of Personalized Anti-aging Health Product Development and Transformation, Universities of Shaanxi Province, Xi’an, 710077, China
| | - Jing Xu
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
| | - Qiying Song
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
| | - Baozhen Zhou
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
| | - Jingbo Shangguan
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
| | - Mengju Xue
- Department of Basic Medicine, School of Medicine, Xi’an International University, Xi’an, 710077, China
| | - Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, 710129, China
| |
Collapse
|
6
|
Espadoto M, Appleby G, Suh A, Cashman D, Li M, Scheidegger C, Anderson EW, Chang R, Telea AC. UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1559-1572. [PMID: 34748493 DOI: 10.1109/tvcg.2021.3125576] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection - the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this article we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
Collapse
|
7
|
Liu Q, Ren Y, Zhu Z, Li D, Ma X, Li Q. RankAxis: Towards a Systematic Combination of Projection and Ranking in Multi-Attribute Data Exploration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:701-711. [PMID: 36155453 DOI: 10.1109/tvcg.2022.3209463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Projection and ranking are frequently used analysis techniques in multi-attribute data exploration. Both families of techniques help analysts with tasks such as identifying similarities between observations and determining ordered subgroups, and have shown good performances in multi-attribute data exploration. However, they often exhibit problems such as distorted projection layouts, obscure semantic interpretations, and non-intuitive effects produced by selecting a subset of (weighted) attributes. Moreover, few studies have attempted to combine projection and ranking into the same exploration space to complement each other's strengths and weaknesses. For this reason, we propose RankAxis, a visual analytics system that systematically combines projection and ranking to facilitate the mutual interpretation of these two techniques and jointly support multi-attribute data exploration. A real-world case study, expert feedback, and a user study demonstrate the efficacy of RankAxis.
Collapse
|
8
|
Liu S, Weng D, Tian Y, Deng Z, Xu H, Zhu X, Yin H, Zhan X, Wu Y. ECoalVis: Visual Analysis of Control Strategies in Coal-fired Power Plants. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1091-1101. [PMID: 36191102 DOI: 10.1109/tvcg.2022.3209430] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Improving the efficiency of coal-fired power plants has numerous benefits. The control strategy is one of the major factors affecting such efficiency. However, due to the complex and dynamic environment inside the power plants, it is hard to extract and evaluate control strategies and their cascading impact across massive sensors. Existing manual and data-driven approaches cannot well support the analysis of control strategies because these approaches are time-consuming and do not scale with the complexity of the power plant systems. Three challenges were identified: a) interactive extraction of control strategies from large-scale dynamic sensor data, b) intuitive visual representation of cascading impact among the sensors in a complex power plant system, and c) time-lag-aware analysis of the impact of control strategies on electricity generation efficiency. By collaborating with energy domain experts, we addressed these challenges with ECoalVis, a novel interactive system for experts to visually analyze the control strategies of coal-fired power plants extracted from historical sensor data. The effectiveness of the proposed system is evaluated with two usage scenarios on a real-world historical dataset and received positive feedback from experts.
Collapse
|
9
|
Lawonn K, Meuschke M, Eulzer P, Mitterreiter M, Giesen J, Gunther T. GRay: Ray Casting for Visualization and Interactive Data Exploration of Gaussian Mixture Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:526-536. [PMID: 36155437 DOI: 10.1109/tvcg.2022.3209374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The Gaussian mixture model (GMM) describes the distribution of random variables from several different populations. GMMs have widespread applications in probability theory, statistics, machine learning for unsupervised cluster analysis and topic modeling, as well as in deep learning pipelines. So far, few efforts have been made to explore the underlying point distribution in combination with the GMMs, in particular when the data becomes high-dimensional and when the GMMs are composed of many Gaussians. We present an analysis tool comprising various GPU-based visualization techniques to explore such complex GMMs. To facilitate the exploration of high-dimensional data, we provide a novel navigation system to analyze the underlying data. Instead of projecting the data to 2D, we utilize interactive 3D views to better support users in understanding the spatial arrangements of the Gaussian distributions. The interactive system is composed of two parts: (1) raycasting-based views that visualize cluster memberships, spatial arrangements, and support the discovery of new modes. (2) overview visualizations that enable the comparison of Gaussians with each other, as well as small multiples of different choices of basis vectors. Users are supported in their exploration with customization tools and smooth camera navigations. Our tool was developed and assessed by five domain experts, and its usefulness was evaluated with 23 participants. To demonstrate the effectiveness, we identify interesting features in several data sets.
Collapse
|
10
|
Russig B, Grab D, Dachselt R, Gumhold S. On-Tube Attribute Visualization for Multivariate Trajectory Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1288-1298. [PMID: 36170405 DOI: 10.1109/tvcg.2022.3209400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Stylized tubes are an established visualization primitive for line data as encountered in many scientific fields, ranging from characteristic lines in flow fields, fiber tracks reconstructed from diffusion tensor imaging, to trajectories of moving objects as they arise from cyber-physical systems in many engineering disciplines. Typical challenges include large data set sizes demanding for efficient rendering techniques as well as a large number of attributes that cannot be mapped simultaneously to the basic visual attributes provided by a tube-based visualization. In this work, we tackle both challenges with a new on-tube visualization approach. We improve recent work on high-quality GPU ray casting of Hermite spline tubes supporting ambient occlusion and extend it by a new layered procedural texturing technique. In the proposed framework, a large number of data set attributes can be mapped simultaneously to a variety of glyphs and plots that are embedded in texture space and organized in layers. Efficient rendering with minimal data transfer is achieved by generating the glyphs procedurally and drawing them in a deferred shading pass. We integrated these techniques in a prototype visualization tool that facilitates flexible mapping of data set attributes to visual tube and glyph attributes. We studied our approach on a variety of example data from different fields and found it to provide a highly adaptable and extensible toolbox to quickly craft tailor-made tube-based trajectory visualizations.
Collapse
|
11
|
Du X, Lai S, Zhao W, Xu X, Xu W, Zeng T, Tian Y, Lu L. Single-cell RNA sequencing revealed the liver heterogeneity between egg-laying duck and ceased-laying duck. BMC Genomics 2022; 23:857. [PMID: 36577943 PMCID: PMC9798604 DOI: 10.1186/s12864-022-09089-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 12/19/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND In the late phase of production, ducks untimely cease laying, leading to a lower feed conversion. Liver plays a vital role in the synthesis and transport of yolk materials during egg formation in birds. However, the molecular mechanism of liver in ceased-laying duck is far from clear, higher resolution and deeper analysis is needed. Sing-cell RNA-sequencing of 10 × Genomics platform can help to map the liver single cell gene expression atlas of Shaoxing duck and provide new insights into the liver between egg-laying and ceased-laying ducks. RESULTS About 20,000 single cells were profiled and 22 clusters were identified. All the clusters were identified as 6 cell types. The dominant cell type is hepatocyte, accounted for about 60% of all the cells. Of note, the heterogeneity of cells between egg-laying duck and ceased-laying duck mainly occurred in hepatocytes. Cells of cluster 3 and 12 were the unique hepatocyte states of egg-laying ducks, while cells of cluster 0 and 15 were the unique hepatocyte states of ceased-laying ducks. The expression mode of yolk precursor transporters, lipid metabolizing enzymes and fibrinogens were different in hepatocytes between egg-laying duck and ceased-laying duck. APOV1, VTG2, VTG1, APOB, RBP, VTDB and SCD might be activated in egg-laying ducks, while APOA1, APOA4, APOC3, FGB and FGG might be activated in ceased-laying ducks. CONCLUSIONS Our study further proofs that APOV1 and APOB play key roles in egg production, rather than APOA1 and APOA4. It is also the first to detect a correlation between the higher expression of APOC3, FGB, FGG and ceased-laying in duck.
Collapse
Affiliation(s)
- Xue Du
- grid.410744.20000 0000 9883 3553State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Institute of Animal Husbandry and Veterinary Medicine, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021 Zhejiang China ,grid.443483.c0000 0000 9152 7385College of Animal Science and Technology, College of Veterinary Medicine, Zhejiang A & F University, Hangzhou, China
| | - Shujing Lai
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Immunology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wanqiu Zhao
- grid.410744.20000 0000 9883 3553Institute of Horticulture, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310022 Zhejiang China
| | - Xiaoqin Xu
- grid.411527.40000 0004 0610 111XInstitute of Ecology, China West Normal University, Nanchong, 637002 Sichuan China
| | - Wenwu Xu
- grid.410744.20000 0000 9883 3553State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Institute of Animal Husbandry and Veterinary Medicine, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021 Zhejiang China
| | - Tao Zeng
- grid.410744.20000 0000 9883 3553State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Institute of Animal Husbandry and Veterinary Medicine, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021 Zhejiang China
| | - Yong Tian
- grid.410744.20000 0000 9883 3553State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Institute of Animal Husbandry and Veterinary Medicine, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021 Zhejiang China
| | - Lizhi Lu
- grid.410744.20000 0000 9883 3553State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Institute of Animal Husbandry and Veterinary Medicine, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021 Zhejiang China
| |
Collapse
|
12
|
iHELP: interactive hierarchical linear projections for interpreting non-linear projections. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00900-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Kumpf A, Stumpfegger J, Hartl PF, Westermann R. Visual Analysis of Multi-Parameter Distributions Across Ensembles of 3D Fields. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3530-3545. [PMID: 33625986 DOI: 10.1109/tvcg.2021.3061925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
For an ensemble of 3D multi-parameter fields, we present a visual analytics workflow to analyse whether and which parts of a selected multi-parameter distribution is present in all ensemble members. Supported by a parallel coordinate plot, a multi-parameter brush is applied to all ensemble members to select data points with similar multi-parameter distribution. By a combination of spatial sub-division and a covariance analysis of partitioned sub-sets of data points, a tight partition in multi-parameter space with reduced number of selected data points is obtained. To assess the representativeness of the selected multi-parameter distribution across the ensemble, we propose a novel extension of violin plots that can show multiple parameter distributions simultaneously. We investigate the visual design that effectively conveys (dis-)similarities in multi-parameter distributions, and demonstrate that users can quickly comprehend parameter-specific differences regarding distribution shape and representativeness from a side-by-side view of these plots. In a 3D spatial view, users can analyse and compare the spatial distribution of selected data points in different ensemble members via interval-based isosurface raycasting. In two real-world application cases we show how our approach is used to analyse the multi-parameter distributions across an ensemble of 3D fields.
Collapse
|
14
|
Limberger D, Scheibel W, Döllner J, Trapp M. Visual variables and configuration of software maps. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00868-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractSoftware maps provide a general-purpose interactive user interface and information display in software analytics. This paper classifies software maps as a containment-based treemap embedded into a 3D attribute space and introduces respective terminology. It provides a comprehensive overview of advanced visual metaphors and techniques, each suitable for interactive visual analytics tasks. The metaphors and techniques are briefly described, located within a visualization pipeline model, and considered within a software map design space. The general expressiveness and applicability of visual variables are detailed and discussed. Consequent applications and use cases for different software system data and software engineering data are discussed, arguing for the versatile use of software maps in visual software analytics.
Collapse
|
15
|
Toward a taxonomy for 2D non-paired General Line Coordinates: a comprehensive survey. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00361-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
16
|
Heimerl F, Kralj C, Moller T, Gleicher M. embComp: Visual Interactive Comparison of Vector Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2953-2969. [PMID: 33347410 DOI: 10.1109/tvcg.2020.3045918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.
Collapse
|
17
|
Exploration and Assessment of Interaction in an Immersive Analytics Module: A Software-Based Comparison. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The focus of computer systems in the field of visual analytics is to make the results clear and understandable. However, enhancing human-computer interaction (HCI) in the field is less investigated. Data visualization and visual analytics (VA) are usually performed using traditional desktop settings and mouse interaction. These methods are based on the window, icon, menu, and pointer (WIMP) interface, which often results in information clutter and is difficult to analyze and understand, especially by novice users. Researchers believe that introducing adequate, natural interaction techniques to the field is necessary for building effective and enjoyable visual analytics systems. This work introduces a novel virtual reality (VR) module to perform basic visual analytics tasks and aims to explore new interaction techniques in the field. A pilot study was conducted to measure the time it takes students to perform basic tasks for analytics using the developed VR module and compares it to the time it takes them to perform the same tasks using a traditional desktop to assess the effectiveness of the VR module in enhancing student’s performance. The results show that novice users (Participants with less programming experience) took about 50% less time to complete tasks using the developed VR module as a comrade to a programming language, notably R. Experts (Participants with advanced programming experience) took about the same time to complete tasks under both conditions (R and VR).
Collapse
|
18
|
Ahn Y, Yan M, Lin YR, Chung WT, Hwa R. Tribe or Not? Critical Inspection of Group Differences Using TribalGram. ACM T INTERACT INTEL 2022. [DOI: 10.1145/3484509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
With the rise of AI and data mining techniques, group profiling and group-level analysis have been increasingly used in many domains, including policy making and direct marketing. In some cases, the statistics extracted from data may provide insights to a group’s shared characteristics; in others, the group-level analysis can lead to problems, including stereotyping and systematic oppression. How can analytic tools facilitate a more conscientious process in group analysis? In this work, we identify a set of
accountable group analytics
design guidelines to explicate the needs for group differentiation and preventing overgeneralization of a group. Following the design guidelines, we develop
TribalGram
, a visual analytic suite that leverages interpretable machine learning algorithms and visualization to offer inference assessment, model explanation, data corroboration, and sense-making. Through the interviews with domain experts, we showcase how our design and tools can bring a richer understanding of “groups” mined from the data.
Collapse
Affiliation(s)
- Yongsu Ahn
- University of Pittsburgh, North Bellefield Avenue Pittsburgh, PA
| | - Muheng Yan
- University of Pittsburgh, North Bellefield Avenue Pittsburgh, PA
| | - Yu-Ru Lin
- University of Pittsburgh, North Bellefield Avenue Pittsburgh, PA
| | - Wen-Ting Chung
- University of Pittsburgh, North Bellefield Avenue Pittsburgh, PA
| | - Rebecca Hwa
- University of Pittsburgh, North Bellefield Avenue Pittsburgh, PA
| |
Collapse
|
19
|
Zhou Z, Zu X, Wang Y, Lelieveldt BPF, Tao Q. Deep Recursive Embedding for High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1237-1248. [PMID: 34699363 DOI: 10.1109/tvcg.2021.3122388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embedding high-dimensional data onto a low-dimensional manifold is of both theoretical and practical value. In this article, we propose to combine deep neural networks (DNN) with mathematics-guided embedding rules for high-dimensional data embedding. We introduce a generic deep embedding network (DEN) framework, which is able to learn a parametric mapping from high-dimensional space to low-dimensional space, guided by well-established objectives such as Kullback-Leibler (KL) divergence minimization. We further propose a recursive strategy, called deep recursive embedding (DRE), to make use of the latent data representations for boosted embedding performance. We exemplify the flexibility of DRE by different architectures and loss functions, and benchmarked our method against the two most popular embedding methods, namely, t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP). The proposed DRE method can map out-of-sample data and scale to extremely large datasets. Experiments on a range of public datasets demonstrated improved embedding performance in terms of local and global structure preservation, compared with other state-of-the-art embedding methods. Code is available at https://github.com/tao-aimi/DeepRecursiveEmbedding.
Collapse
|
20
|
Sohns JT, Schmitt M, Jirasek F, Hasse H, Leitte H. Attribute-based Explanation of Non-Linear Embeddings of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:540-550. [PMID: 34587086 DOI: 10.1109/tvcg.2021.3114870] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
Collapse
|
21
|
Fujiwara T, Wei X, Zhao J, Ma KL. Interactive Dimensionality Reduction for Comparative Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:758-768. [PMID: 34591765 DOI: 10.1109/tvcg.2021.3114807] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.
Collapse
|
22
|
Castro SC, Quinan PS, Hosseinpour H, Padilla L. Examining Effort in 1D Uncertainty Communication Using Individual Differences in Working Memory and NASA-TLX. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:411-421. [PMID: 34587043 DOI: 10.1109/tvcg.2021.3114803] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As uncertainty visualizations for general audiences become increasingly common, designers must understand the full impact of uncertainty communication techniques on viewers' decision processes. Prior work demonstrates mixed performance outcomes with respect to how individuals make decisions using various visual and textual depictions of uncertainty. Part of the inconsistency across findings may be due to an over-reliance on task accuracy, which cannot, on its own, provide a comprehensive understanding of how uncertainty visualization techniques support reasoning processes. In this work, we advance the debate surrounding the efficacy of modern 1D uncertainty visualizations by conducting converging quantitative and qualitative analyses of both the effort and strategies used by individuals when provided with quantile dotplots, density plots, interval plots, mean plots, and textual descriptions of uncertainty. We utilize two approaches for examining effort across uncertainty communication techniques: a measure of individual differences in working-memory capacity known as an operation span (OSPAN) task and self-reports of perceived workload via the NASA-TLX. The results reveal that both visualization methods and working-memory capacity impact participants' decisions. Specifically, quantile dotplots and density plots (i.e., distributional annotations) result in more accurate judgments than interval plots, textual descriptions of uncertainty, and mean plots (i.e., summary annotations). Additionally, participants' open-ended responses suggest that individuals viewing distributional annotations are more likely to employ a strategy that explicitly incorporates uncertainty into their judgments than those viewing summary annotations. When comparing quantile dotplots to density plots, this work finds that both methods are equally effective for low-working-memory individuals. However, for individuals with high-working-memory capacity, quantile dotplots evoke more accurate responses with less perceived effort. Given these results, we advocate for the inclusion of converging behavioral and subjective workload metrics in addition to accuracy performance to further disambiguate meaningful differences among visualization techniques.
Collapse
|
23
|
Shavazipour B, López-Ibáñez M, Miettinen K. Visualizations for decision support in scenario-based multiobjective optimization. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
24
|
Alcaide D, Aerts J. Spanning Trees as Approximation of Data Structures. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3994-4008. [PMID: 32746253 DOI: 10.1109/tvcg.2020.2995465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
The connections in a graph generate a structure that is independent of a coordinate system. This visual metaphor allows creating a more flexible representation of data than a two-dimensional scatterplot. In this article, we present STAD (Simplified Topological Abstraction of Data), a parameter-free dimensionality reduction method that projects high-dimensional data into a graph. STAD generates an abstract representation of high-dimensional data by giving each data point a location in a graph which preserves the approximate distances in the original high-dimensional space. The STAD graph is built upon the Minimum Spanning Tree (MST) to which new edges are added until the correlation between the distances from the graph and the original dataset is maximized. Additionally, STAD supports the inclusion of additional functions to focus the exploration and allow the analysis of data from new perspectives, emphasizing traits in data which otherwise would remain hidden. We demonstrate the effectiveness of our method by applying it to two real-world datasets: traffic density in Barcelona and temporal measurements of air quality in Castile and León in Spain.
Collapse
|
25
|
Hägele D, Abdelaal M, Oguz OS, Toussaint M, Weiskopf D. Visual analytics for nonlinear programming in robot motion planning. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00786-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Abstract
Nonlinear programming is a complex methodology where a problem is mathematically expressed in terms of optimality while imposing constraints on feasibility. Such problems are formulated by humans and solved by optimization algorithms. We support domain experts in their challenging tasks of understanding and troubleshooting optimization runs of intricate and high-dimensional nonlinear programs through a visual analytics system. The system was designed for our collaborators’ robot motion planning problems, but is domain agnostic in most parts of the visualizations. It allows for an exploration of the iterative solving process of a nonlinear program through several linked views of the computational process. We give insights into this design study, demonstrate our system for selected real-world cases, and discuss the extension of visualization and visual analytics methods for nonlinear programming.
Graphic abstract
Collapse
|
26
|
IXVC: An interactive pipeline for explaining visual clusters in dimensionality reduction visualizations with decision trees. ARRAY 2021. [DOI: 10.1016/j.array.2021.100080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
27
|
Rees D, Laramee RS, Brookes P, D'Cruze T, Smith GA, Miah A. AgentVis: Visual Analysis of Agent Behavior With Hierarchical Glyphs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3626-3643. [PMID: 32305921 DOI: 10.1109/tvcg.2020.2985923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Glyphs representing complex behavior provide a useful and common means of visualizing multivariate data. However, due to their complex shape, overlapping, and occlusion of glyphs is a common and prominent limitation. This limits the number of discreet data tuples that can be displayed in a given image. Using a real-world application, glyphs are used to depict agent behavior in a call center. However, many call centers feature thousands of agents. A standard approach representing thousands of agents with glyphs does not scale. To accommodate the visualization incorporating thousands of glyphs we develop clustering of overlapping glyphs into a single parent glyph. This hierarchical glyph represents the mean value of all child agent glyphs, removing overlap and reduTcing visual clutter. Multi-variate clustering techniques are explored and developed in collaboration with domain experts in the call center industry. We implement dynamic control of glyph clusters according to zoom level and customized distance metrics, to utilize image space with reduced overplotting and cluttering. We demonstrate our technique with examples and a usage scenario using real-world call-center data to visualize thousands of call center agents, revealing insight into their behavior and reporting feedback from expert call-center analysts.
Collapse
|
28
|
Dou H, Xu B, Shen F, Zhao J. V-SOINN: A topology preserving visualization method for multidimensional data. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Quantitative and Qualitative Comparison of 2D and 3D Projection Techniques for High-Dimensional Data. INFORMATION 2021. [DOI: 10.3390/info12060239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Projections are well-known techniques that help the visual exploration of high-dimensional data by creating depictions thereof in a low-dimensional space. While projections that target the 2D space have been studied in detail both quantitatively and qualitatively, 3D projections are far less well understood, with authors arguing both for and against the added-value of a third visual dimension. We fill this gap by first presenting a quantitative study that compares 2D and 3D projections along a rich selection of datasets, projection techniques, and quality metrics. To refine these insights, we conduct a qualitative study that compares the preference of users in exploring high-dimensional data using 2D vs. 3D projections, both without and with visual explanations. Our quantitative and qualitative findings indicate that, in general, 3D projections bring only limited added-value atop of the one provided by their 2D counterparts. However, certain 3D projection techniques can show more structure than their 2D counterparts, and can stimulate users to further exploration. All our datasets, source code, and measurements are made public for ease of replication and extension.
Collapse
|
30
|
Garrison L, Muller J, Schreiber S, Oeltze-Jafra S, Hauser H, Bruckner S. DimLift: Interactive Hierarchical Data Exploration Through Dimensional Bundling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2908-2922. [PMID: 33544674 DOI: 10.1109/tvcg.2021.3057519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of interesting patterns and relationships is essential to exploratory data analysis. This becomes increasingly difficult in high dimensional datasets. While dimensionality reduction techniques can be utilized to reduce the analysis space, these may unintentionally bury key dimensions within a larger grouping and obfuscate meaningful patterns. With this work we introduce DimLift, a novel visual analysis method for creating and interacting with dimensional bundles. Generated through an iterative dimensionality reduction or user-driven approach, dimensional bundles are expressive groups of dimensions that contribute similarly to the variance of a dataset. Interactive exploration and reconstruction methods via a layered parallel coordinates plot allow users to lift interesting and subtle relationships to the surface, even in complex scenarios of missing and mixed data types. We exemplify the power of this technique in an expert case study on clinical cohort data alongside two additional case examples from nutrition and ecology.
Collapse
|
31
|
Alcaide D, Aerts J. A visual analytic approach for the identification of ICU patient subpopulations using ICD diagnostic codes. PeerJ Comput Sci 2021; 7:e430. [PMID: 33954230 PMCID: PMC8049127 DOI: 10.7717/peerj-cs.430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 02/15/2021] [Indexed: 05/03/2023]
Abstract
A large number of clinical concepts are categorized under standardized formats that ease the manipulation, understanding, analysis, and exchange of information. One of the most extended codifications is the International Classification of Diseases (ICD) used for characterizing diagnoses and clinical procedures. With formatted ICD concepts, a patient profile can be described through a set of standardized and sorted attributes according to the relevance or chronology of events. This structured data is fundamental to quantify the similarity between patients and detect relevant clinical characteristics. Data visualization tools allow the representation and comprehension of data patterns, usually of a high dimensional nature, where only a partial picture can be projected. In this paper, we provide a visual analytics approach for the identification of homogeneous patient cohorts by combining custom distance metrics with a flexible dimensionality reduction technique. First we define a new metric to measure the similarity between diagnosis profiles through the concordance and relevance of events. Second we describe a variation of the Simplified Topological Abstraction of Data (STAD) dimensionality reduction technique to enhance the projection of signals preserving the global structure of data. The MIMIC-III clinical database is used for implementing the analysis into an interactive dashboard, providing a highly expressive environment for the exploration and comparison of patients groups with at least one identical diagnostic ICD code. The combination of the distance metric and STAD not only allows the identification of patterns but also provides a new layer of information to establish additional relationships between patient cohorts. The method and tool presented here add a valuable new approach for exploring heterogeneous patient populations. In addition, the distance metric described can be applied in other domains that employ ordered lists of categorical data.
Collapse
Affiliation(s)
- Daniel Alcaide
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
| | - Jan Aerts
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
- UHasselt, I-BioStat, Data Science Institute, Hasselt, Belgium
| |
Collapse
|
32
|
Espadoto M, Martins RM, Kerren A, Hirata NST, Telea AC. Toward a Quantitative Survey of Dimension Reduction Techniques. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2153-2173. [PMID: 31567092 DOI: 10.1109/tvcg.2019.2944182] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction methods, also known as projections, are frequently used in multidimensional data exploration in machine learning, data science, and information visualization. Tens of such techniques have been proposed, aiming to address a wide set of requirements, such as ability to show the high-dimensional data structure, distance or neighborhood preservation, computational scalability, stability to data noise and/or outliers, and practical ease of use. However, it is far from clear for practitioners how to choose the best technique for a given use context. We present a survey of a wide body of projection techniques that helps answering this question. For this, we characterize the input data space, projection techniques, and the quality of projections, by several quantitative metrics. We sample these three spaces according to these metrics, aiming at good coverage with bounded effort. We describe our measurements and outline observed dependencies of the measured variables. Based on these results, we draw several conclusions that help comparing projection techniques, explain their results for different types of data, and ultimately help practitioners when choosing a projection for a given context. Our methodology, datasets, projection implementations, metrics, visualizations, and results are publicly open, so interested stakeholders can examine and/or extend this benchmark.
Collapse
|
33
|
Reipschlager P, Flemisch T, Dachselt R. Personal Augmented Reality for Information Visualization on Large Interactive Displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1182-1192. [PMID: 33052863 DOI: 10.1109/tvcg.2020.3030460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this work we propose the combination of large interactive displays with personal head-mounted Augmented Reality (AR) for information visualization to facilitate data exploration and analysis. Even though large displays provide more display space, they are challenging with regard to perception, effective multi-user support, and managing data density and complexity. To address these issues and illustrate our proposed setup, we contribute an extensive design space comprising first, the spatial alignment of display, visualizations, and objects in AR space. Next, we discuss which parts of a visualization can be augmented. Finally, we analyze how AR can be used to display personal views in order to show additional information and to minimize the mutual disturbance of data analysts. Based on this conceptual foundation, we present a number of exemplary techniques for extending visualizations with AR and discuss their relation to our design space. We further describe how these techniques address typical visualization problems that we have identified during our literature research. To examine our concepts, we introduce a generic AR visualization framework as well as a prototype implementing several example techniques. In order to demonstrate their potential, we further present a use case walkthrough in which we analyze a movie data set. From these experiences, we conclude that the contributed techniques can be useful in exploring and understanding multivariate data. We are convinced that the extension of large displays with AR for information visualization has a great potential for data analysis and sense-making.
Collapse
|
34
|
Wang ZJ, Turko R, Shaikh O, Park H, Das N, Hohman F, Kahng M, Polo Chau DH. CNN Explainer: Learning Convolutional Neural Networks with Interactive Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1396-1406. [PMID: 33048723 DOI: 10.1109/tvcg.2020.3030418] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Deep learning's great success motivates many practitioners and students to learn about this exciting technology. However, it is often challenging for beginners to take their first step due to the complexity of understanding and applying deep learning. We present CNN Explainer, an interactive visualization tool designed for non-experts to learn and examine convolutional neural networks (CNNs), a foundational deep learning model architecture. Our tool addresses key challenges that novices face while learning about CNNs, which we identify from interviews with instructors and a survey with past students. CNN Explainer tightly integrates a model overview that summarizes a CNN's structure, and on-demand, dynamic visual explanation views that help users understand the underlying components of CNNs. Through smooth transitions across levels of abstraction, our tool enables users to inspect the interplay between low-level mathematical operations and high-level model structures. A qualitative user study shows that CNN Explainer helps users more easily understand the inner workings of CNNs, and is engaging and enjoyable to use. We also derive design lessons from our study. Developed using modern web technologies, CNN Explainer runs locally in users' web browsers without the need for installation or specialized hardware, broadening the public's education access to modern deep learning techniques.
Collapse
|
35
|
So W, Bogucka EP, Scepanovic S, Joglekar S, Zhou K, Quercia D. Humane Visual AI: Telling the Stories Behind a Medical Condition. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:678-688. [PMID: 33048711 DOI: 10.1109/tvcg.2020.3030391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A biological understanding is key for managing medical conditions, yet psychological and social aspects matter too. The main problem is that these two aspects are hard to quantify and inherently difficult to communicate. To quantify psychological aspects, this work mined around half a million Reddit posts in the sub-communities specialised in 14 medical conditions, and it did so with a new deep-learning framework. In so doing, it was able to associate mentions of medical conditions with those of emotions. To then quantify social aspects, this work designed a probabilistic approach that mines open prescription data from the National Health Service in England to compute the prevalence of drug prescriptions, and to relate such a prevalence to census data. To finally visually communicate each medical condition's biological, psychological, and social aspects through storytelling, we designed a narrative-style layered Martini Glass visualization. In a user study involving 52 participants, after interacting with our visualization, a considerable number of them changed their mind on previously held opinions: 10% gave more importance to the psychological aspects of medical conditions, and 27% were more favourable to the use of social media data in healthcare, suggesting the importance of persuasive elements in interactive visualizations.
Collapse
|
36
|
Fujiwara T, Sakamoto N, Nonaka J, Yamamoto K, Ma KL. A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1601-1611. [PMID: 33026990 DOI: 10.1109/tvcg.2020.3028889] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually examine and correlate the DR results out of different data subsets. When the number of dimensions is large either in terms of the number of time points or attributes, this manual task becomes too tedious and infeasible. In this paper, we present MulTiDR, a new DR framework that enables processing of time-dependent multivariate data as a whole to provide a comprehensive overview of the data. With the framework, we employ DR in two steps. When treating the instances, time points, and attributes of the data as a 3D array, the first DR step reduces the three axes of the array to two, and the second DR step visualizes the data in a lower-dimensional space. In addition, by coupling with a contrastive learning method and interactive visualizations, our framework enhances analysts' ability to interpret DR results. We demonstrate the effectiveness of our framework with four case studies using real-world datasets.
Collapse
|
37
|
Weng D, Zheng C, Deng Z, Ma M, Bao J, Zheng Y, Xu M, Wu Y. Towards Better Bus Networks: A Visual Analytics Approach. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:817-827. [PMID: 33048743 DOI: 10.1109/tvcg.2020.3030458] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Bus routes are typically updated every 3-5 years to meet constantly changing travel demands. However, identifying deficient bus routes and finding their optimal replacements remain challenging due to the difficulties in analyzing a complex bus network and the large solution space comprising alternative routes. Most of the automated approaches cannot produce satisfactory results in real-world settings without laborious inspection and evaluation of the candidates. The limitations observed in these approaches motivate us to collaborate with domain experts and propose a visual analytics solution for the performance analysis and incremental planning of bus routes based on an existing bus network. Developing such a solution involves three major challenges, namely, a) the in-depth analysis of complex bus route networks, b) the interactive generation of improved route candidates, and c) the effective evaluation of alternative bus routes. For challenge a, we employ an overview-to-detail approach by dividing the analysis of a complex bus network into three levels to facilitate the efficient identification of deficient routes. For challenge b, we improve a route generation model and interpret the performance of the generation with tailored visualizations. For challenge c, we incorporate a conflict resolution strategy in the progressive decision-making process to assist users in evaluating the alternative routes and finding the most optimal one. The proposed system is evaluated with two usage scenarios based on real-world data and received positive feedback from the experts. Index Terms-Bus route planning, spatial decision-making, urban data visual analytics.
Collapse
|
38
|
Sabando MV, Ulbrich P, Selzer M, Byska J, Mican J, Ponzoni I, Soto AJ, Ganuza ML, Kozlikova B. ChemVA: Interactive Visual Analysis of Chemical Compound Similarity in Virtual Screening. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:891-901. [PMID: 33048734 DOI: 10.1109/tvcg.2020.3030438] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessing the impact of individual molecular features on the resulting representations. To provide a solution for scrutinizing such complex data, we introduce ChemVA, an interactive application for the visual exploration of large molecular ensembles and their features. Our tool consists of multiple coordinated views: Hexagonal view, Detail view, 3D view, Table view, and a newly proposed Difference view designed for the comparison of DR projections. These views display DR projections combined with biological activity, selected molecular features, and confidence scores for each of these projections. This conjunction of views allows the user to drill down through the dataset and to efficiently select candidate compounds. Our approach was evaluated on two case studies of finding structurally similar ligands with similar binding affinity to a target protein, as well as on an external qualitative evaluation. The results suggest that our system allows effective visual inspection and comparison of different high-dimensional molecular representations. Furthermore, ChemVA assists in the identification of candidate compounds while providing information on the certainty behind different molecular representations.
Collapse
|
39
|
Ma Y, Maciejewski R. Visual Analysis of Class Separations With Locally Linear Segments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:241-253. [PMID: 32746282 DOI: 10.1109/tvcg.2020.3011155] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
High-dimensional labeled data widely exists in many real-world applications such as classification and clustering. One main task in analyzing such datasets is to explore class separations and class boundaries derived from machine learning models. Dimension reduction techniques are commonly applied to support analysts in exploring the underlying decision boundary structures by depicting a low-dimensional representation of the data distributions from multiple classes. However, such projection-based analyses are limited due to their lack of ability to show separations in complex non-linear decision boundary structures and can suffer from heavy distortion and low interpretability. To overcome these issues of separability and interpretability, we propose a visual analysis approach that utilizes the power of explainability from linear projections to support analysts when exploring non-linear separation structures. Our approach is to extract a set of locally linear segments that approximate the original non-linear separations. Unlike traditional projection-based analysis where the data instances are mapped to a single scatterplot, our approach supports the exploration of complex class separations through multiple local projection results. We conduct case studies on two labeled datasets to demonstrate the effectiveness of our approach.
Collapse
|
40
|
Patashnik O, Lu M, Bermano AH, Cohen-Or D. Temporal scatterplots. COMPUTATIONAL VISUAL MEDIA 2020; 6:385-400. [PMID: 33194253 PMCID: PMC7648217 DOI: 10.1007/s41095-020-0197-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 09/12/2020] [Indexed: 06/11/2023]
Abstract
Visualizing high-dimensional data on a 2D canvas is generally challenging. It becomes significantly more difficult when multiple time-steps are to be presented, as the visual clutter quickly increases. Moreover, the challenge to perceive the significant temporal evolution is even greater. In this paper, we present a method to plot temporal high-dimensional data in a static scatterplot; it uses the established PCA technique to project data from multiple time-steps. The key idea is to extend each individual displacement prior to applying PCA, so as to skew the projection process, and to set a projection plane that balances the directions of temporal change and spatial variance. We present numerous examples and various visual cues to highlight the data trajectories, and demonstrate the effectiveness of the method for visualizing temporal data.
Collapse
Affiliation(s)
| | - Min Lu
- Shenzhen University, Shenzhen, China
| | | | | |
Collapse
|
41
|
Track Iran's national COVID-19 response committee's major concerns using two-stage unsupervised topic modeling. Int J Med Inform 2020; 145:104309. [PMID: 33181447 PMCID: PMC7609243 DOI: 10.1016/j.ijmedinf.2020.104309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 09/22/2020] [Accepted: 10/20/2020] [Indexed: 12/20/2022]
Abstract
Background Since the World Health Organization (WHO) declared the COVID-19 as a Public Health Emergency of International Concern (PHEIC) on January 31, 2020, governments have been enfaced with crisis for timely responses. The efficacy of these responses directly depends on the social behaviors of the target society. People react to these actions with respect to the information they received from different channels, such as news and social networks. Thus, analyzing news demonstrates a brief view of the information users received during the outbreak. Methods The raw data used in this study is collected from official news channels of news wires and agencies in Telegram messenger, which exceeds 2,400,000 posts. The posts that are quoted by NCRC’s members are collected, cleaned, and divided into sentences. The topic modeling and tracking are utilized in a two-stage framework, which is customized for this problem to separate miscellaneous sentences from those presenting concerns. The first stage is fed with embedding vectors of sentences where they are grouped by the Mapper algorithm. Sentences belonging to singleton nodes are labeled as miscellaneous sentences. The remained sentences are vectorized, adopting Tf-IDF weighting schema in the second stage and topically modeled by the LDA method. Finally, relevant topics are aligned to the list of policies and actions, named topic themes, that are set up by the NCRC. Results Our results show that major concerns presented in about half of the sentences are (1) PCR lab. test, diagnosis, and screening, (2) Closure of the education system, and (3) awareness actions about washing hands and facial mask usage. Among the eight themes, intra-provincial travel and traffic restrictions, as well as briefing the national and provincial status, are under-presented. The timeline of concerns annotated by the preventive actions illustrates the changes in concerns addressed by NCRC. This timeline shows that although the announcements and public responses are not lagged behind the events, but cannot be considered as timely. Furthermore, the fluctuating series of concerns reveal that the NCRC has not a long-time response map, and members react to the closest announced policy/act. Conclusion The results of our study can be used as a quantitative indicator for evaluating the availability of an on-time public response of Iran’s NCRC during the first three months of the outbreak. Moreover, it can be used in comparative studies to investigate the differences between awareness acts in various countries. Results of our customized-design framework showed that about one-third of the discussions of the NCRC’s members cover miscellaneous topics that must be removed from the data.
Collapse
|
42
|
Ränger LM, von Kurnatowski M, Bortz M, Grützner T. Multi-Objective Optimization of Dividing Wall Columns and Visualization of the High-Dimensional Results. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2020.107059] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
43
|
Araújo AFR, Antonino VO, Ponce-Guevara KL. Self-organizing subspace clustering for high-dimensional and multi-view data. Neural Netw 2020; 130:253-268. [PMID: 32711348 DOI: 10.1016/j.neunet.2020.06.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/30/2020] [Accepted: 06/28/2020] [Indexed: 12/14/2022]
Abstract
A surge in the availability of data from multiple sources and modalities is correlated with advances in how to obtain, compress, store, transfer, and process large amounts of complex high-dimensional data. The clustering challenge increases with the growth of data dimensionality which decreases the discriminate power of the distance metrics. Subspace clustering aims to group data drawn from a union of subspaces. In such a way, there is a large number of state-of-the-art approaches and we divide them into families regarding the method used in the clustering. We introduce a soft subspace clustering algorithm, a Self-organizing Map (SOM) with a time-varying structure, to cluster data without any prior knowledge of the number of categories or of the neural network topology, both determined during the training process. The model also assigns proper relevancies (weights) to different dimensions, capturing from the learning process the influence of each dimension on uncovering clusters. We employ a number of real-world datasets to validate the model. This algorithm presents a competitive performance in a diverse range of contexts among them data mining, gene expression, multi-view, computer vision and text clustering problems which include high-dimensional data. Extensive experiments suggest that our method very often outperforms the state-of-the-art approaches in all types of problems considered.
Collapse
Affiliation(s)
- Aluizio F R Araújo
- Centro de Informática, Universidade Federal de Pernambuco, 50740560, Recife, Brazil.
| | - Victor O Antonino
- Centro de Informática, Universidade Federal de Pernambuco, 50740560, Recife, Brazil
| | | |
Collapse
|
44
|
Affiliation(s)
- Priyanga Dilini Talagala
- Department of Econometrics and Business Statistics, Monash University
,
Clayton
,
VIC
,
Australia
- ARC Centre of Excellence for Mathematics and Statistical Frontiers (ACEMS), University of Melbourne
,
Parkville
,
VIC
,
Australia
- Department of Computational Mathematics, University of Moratuwa
,
Moratuwa
,
Sri Lanka
| | - Rob J. Hyndman
- Department of Econometrics and Business Statistics, Monash University
,
Clayton
,
VIC
,
Australia
- ARC Centre of Excellence for Mathematics and Statistical Frontiers (ACEMS), University of Melbourne
,
Parkville
,
VIC
,
Australia
| | - Kate Smith-Miles
- School of Mathematics and Statistics, University of Melbourne
,
Parkville
,
VIC
,
Australia
- ARC Centre of Excellence for Mathematics and Statistical Frontiers (ACEMS), University of Melbourne
,
Parkville
,
VIC
,
Australia
| |
Collapse
|
45
|
Salekin S, Mostavi M, Chiu YC, Chen Y, Zhang J(M, Huang Y. Predicting sites of epitranscriptome modifications using unsupervised representation learning based on generative adversarial networks. FRONTIERS IN PHYSICS 2020; 8:196. [PMID: 33274189 PMCID: PMC7710330 DOI: 10.3389/fphy.2020.00196] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epitranscriptome is an exciting area that studies different types of modifications in transcripts and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN) based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for eight epitranscriptome modifications, including m6A, m1A, m1G, m2G, m5C, m5U, 2'-O-Me, Pseudouridine (Ψ) and Dihydrouridine (D), of which the positive samples are very limited. Prediction models were trained based on the embeddings extracted by MR-GAN. We compared the prediction performance with the one-hot encoding of the training sequences and SRAMP, a state-of-the-art m6A site prediction algorithm and demonstrated that the learned embeddings outperform one-hot encoding by a significant margin for up to 15% improvement. Using MR-GAN, we also investigated the sequence motifs for each modification type and uncovered known motifs as well as new motifs not possible with sequences directly. The results demonstrated that transcriptome features extracted using unsupervised learning could lead to high precision for predicting multiple types of epitranscriptome modifications, even when the data size is small and extremely imbalanced.
Collapse
Affiliation(s)
- Sirajul Salekin
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX, 78207, USA
| | - Milad Mostavi
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX, 78207, USA
| | - Yu-Chiao Chiu
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yidong Chen
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Jianqiu (Michelle) Zhang
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX, 78207, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX, 78207, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| |
Collapse
|
46
|
Kammer D, Keck M, Grunder T, Maasch A, Thom T, Kleinsteuber M, Groh R. Glyphboard: Visual Exploration of High-Dimensional Data Combining Glyphs with Dimensionality Reduction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1661-1671. [PMID: 31985425 DOI: 10.1109/tvcg.2020.2969060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Rigorous data science is interdisciplinary at its core. In order to make sense of high-dimensional data, data scientists need to enter into a dialogue with domain experts. We present Glyphboard, a visualization tool that aims to support this dialogue. Glyphboard is a zoomable user interface that combines well-known methods such as dimensionality reduction and glyph-based visualizations in a novel, seamless, and integrated tool. While the dimensionality reduction affords a quick overview over the data, glyph-based visualizations are able to show the most relevant dimensions in the data set at one glance. We contribute an open-source prototype of Glyphboard, a general exchange format for high-dimensional data, and a case study with nine data scientists and domain experts from four exemplary domains in order to evaluate how the different visualization and interaction features of Glyphboard are used.
Collapse
|
47
|
Suh J, Ghorashi S, Ramos G, Chen NC, Drucker S, Verwey J, Simard P. AnchorViz. ACM T INTERACT INTEL 2020. [DOI: 10.1145/3241379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
When building a classifier in interactive machine learning (iML), human knowledge about the target class can be a powerful reference to make the classifier robust to unseen items. The main challenge lies in finding unlabeled items that can either help discover or refine concepts for which the current classifier has no corresponding features (i.e., it has
feature blindness
). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This article presents
AnchorViz
, an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts through human-driven semantic data exploration. By creating example-based or dictionary-based anchors representing concepts, users create a topology that (a) spreads data based on their similarity to the concepts and (b) surfaces the prediction and label inconsistencies between data points that are semantically related. Once such inconsistencies and errors are discovered, users can encode the new information as labels or features and interact with the retrained classifier to validate their actions in an iterative loop. We evaluated AnchorViz through two user studies. Our results show that AnchorViz helps users discover more prediction errors than stratified random and uncertainty sampling methods. Furthermore, during the beginning stages of a training task, an iML tool with AnchorViz can help users build classifiers comparable to the ones built with the same tool with uncertainty sampling and keyword search, but with fewer labels and more generalizable features. We discuss exploration strategies observed during the two studies and how AnchorViz supports discovering, labeling, and refining of concepts through a sensemaking loop.
Collapse
Affiliation(s)
- Jina Suh
- Microsoft Research, Redmond, WA, Washington
| | | | | | | | | | | | | |
Collapse
|
48
|
Does design matter when visualizing Big Data? An empirical study to investigate the effect of visualization type and interaction use. JOURNAL OF MANAGEMENT CONTROL 2020. [DOI: 10.1007/s00187-020-00294-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractThe need for good visualization is increasing, as data volume and complexity expand. In order to work with high volumes of structured and unstructured data, visualizations, supporting the ability of humans to make perceptual inferences, are of the utmost importance. In this regard, a lot of interactive visualization techniques have been developed in recent years. However, little emphasis has been placed on the evaluation of their usability and, in particular, on design characteristics. This paper contributes to closing this research gap by measuring the effects of appropriate visualization use based on data and task characteristics. Further, we specifically test the feature of interaction as it has been said to be an essential component of Big Data visualizations but scarcely isolated as an independent variable in experimental research. Data collection for the large-scale quantitative experiment was done using crowdsourcing (Amazon Mechanical Turk). The results indicate that both, choosing an appropriate visualization based on task characteristics and using the feature of interaction, increase usability considerably.
Collapse
|
49
|
Kumeno F. Sofware engneering challenges for machine learning applications: A literature review. INTELLIGENT DECISION TECHNOLOGIES 2020. [DOI: 10.3233/idt-190160] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
50
|
Fujiwara T, Kwon OH, Ma KL. Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:45-55. [PMID: 31425080 DOI: 10.1109/tvcg.2019.2934251] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction (DR) is frequently used for analyzing and visualizing high-dimensional data as it provides a good first glance of the data. However, to interpret the DR result for gaining useful insights from the data, it would take additional analysis effort such as identifying clusters and understanding their characteristics. While there are many automatic methods (e.g., density-based clustering methods) to identify clusters, effective methods for understanding a cluster's characteristics are still lacking. A cluster can be mostly characterized by its distribution of feature values. Reviewing the original feature values is not a straightforward task when the number of features is large. To address this challenge, we present a visual analytics method that effectively highlights the essential features of a cluster in a DR result. To extract the essential features, we introduce an enhanced usage of contrastive principal component analysis (cPCA). Our method, called ccPCA (contrasting clusters in PCA), can calculate each feature's relative contribution to the contrast between one cluster and other clusters. With ccPCA, we have created an interactive system including a scalable visualization of clusters' feature contributions. We demonstrate the effectiveness of our method and system with case studies using several publicly available datasets.
Collapse
|