1
|
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: Visualization of Intersecting Sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1983-92. [PMID: 26356912 PMCID: PMC4720993 DOI: 10.1109/tvcg.2014.2346248] [Citation(s) in RCA: 1192] [Impact Index Per Article: 108.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Understanding relationships between sets is an important analysis task that has received widespread attention in the visualization community. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. In this paper we introduce UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections. UpSet is focused on creating task-driven aggregates, communicating the size and properties of aggregates and intersections, and a duality between the visualization of the elements in a dataset and their set membership. UpSet visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes. Sorting according to various measures enables a task-driven analysis of relevant intersections and aggregates. The elements represented in the sets and their associated attributes are visualized in a separate view. Queries based on containment in specific intersections, aggregates or driven by attribute filters are propagated between both views. We also introduce several advanced visual encodings and interaction methods to overcome the problems of varying scales and to address scalability. UpSet is web-based and open source. We demonstrate its general utility in multiple use cases from various domains.
Collapse
|
Research Support, N.I.H., Extramural |
11 |
1192 |
2
|
Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, Hwang J, Lee S, Alver BH, Pfister H, Mirny LA, Park PJ, Gehlenborg N. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 2018; 19:125. [PMID: 30143029 PMCID: PMC6109259 DOI: 10.1186/s13059-018-1486-1] [Citation(s) in RCA: 1084] [Impact Index Per Article: 154.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 07/18/2018] [Indexed: 11/28/2022] Open
Abstract
We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
1084 |
3
|
Schwaller P, Hoover B, Reymond JL, Strobelt H, Laino T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. SCIENCE ADVANCES 2021; 7:7/15/eabe4166. [PMID: 33827815 PMCID: PMC8026122 DOI: 10.1126/sciadv.abe4166] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 02/03/2021] [Indexed: 05/07/2023]
Abstract
Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of "reaction rules" from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.
Collapse
|
research-article |
4 |
95 |
4
|
Strobelt H, Gehrmann S, Pfister H, Rush AM. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:667-676. [PMID: 28866526 DOI: 10.1109/tvcg.2017.2744158] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recurrent neural networks, and in particular long short-term memory (LSTM) networks, are a remarkably effective tool for sequence modeling that learn a dense black-box hidden representation of their sequential input. Researchers interested in better understanding these models have studied the changes in hidden state representations over time and noticed some interpretable patterns but also significant noise. In this work, we present LSTMVis, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics. The tool allows users to select a hypothesis input range to focus on local state changes, to match these states changes to similar patterns in a large data set, and to align these results with structural annotations from their domain. We show several use cases of the tool for analyzing specific hidden state properties on dataset containing nesting, phrase structure, and chord progressions, and demonstrate how the tool can be used to isolate patterns for further statistical analysis. We characterize the domain, the different stakeholders, and their goals and tasks. Long-term usage data after putting the tool online revealed great interest in the machine learning community.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
71 |
5
|
Bau D, Zhu JY, Strobelt H, Lapedriza A, Zhou B, Torralba A. Understanding the role of individual units in a deep neural network. Proc Natl Acad Sci U S A 2020; 117:30071-30078. [PMID: 32873639 PMCID: PMC7720226 DOI: 10.1073/pnas.1907375117] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the output scenes while adapting to the context. Finally, we apply our analytic framework to understanding adversarial attacks and to semantic image editing.
Collapse
|
research-article |
5 |
69 |
6
|
Hakenberg J, Plake C, Royer L, Strobelt H, Leser U, Schroeder M. Gene mention normalization and interaction extraction with context models and sentence motifs. Genome Biol 2008; 9 Suppl 2:S14. [PMID: 18834492 PMCID: PMC2559985 DOI: 10.1186/gb-2008-9-s2-s14] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see http://bcms.bioinfo.cnio.es/.
Collapse
|
Research Support, Non-U.S. Gov't |
17 |
37 |
7
|
Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM. SEQ2SEQ-VIS : A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:353-363. [PMID: 30334796 DOI: 10.1109/tvcg.2018.2865044] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Neural sequence-to-sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work with a five-stage blackbox pipeline that begins with encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction and "what if"-style exploration of trained sequence-to-sequence models through each stage of the translation process. The aim is to identify which patterns have been learned, to detect model errors, and to probe the model with counterfactual scenario. We demonstrate the utility of our tool through several real-world sequence-to-sequence use cases on large-scale models.
Collapse
|
|
7 |
37 |
8
|
Al-Awami AK, Beyer J, Strobelt H, Kasthuri N, Lichtman JW, Pfister H, Hadwiger M. NeuroLines: A Subway Map Metaphor for Visualizing Nanoscale Neuronal Connectivity. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:2369-2378. [PMID: 26356951 DOI: 10.1109/tvcg.2014.2346312] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We present NeuroLines, a novel visualization technique designed for scalable detailed analysis of neuronal connectivity at the nanoscale level. The topology of 3D brain tissue data is abstracted into a multi-scale, relative distance-preserving subway map visualization that allows domain scientists to conduct an interactive analysis of neurons and their connectivity. Nanoscale connectomics aims at reverse-engineering the wiring of the brain. Reconstructing and analyzing the detailed connectivity of neurons and neurites (axons, dendrites) will be crucial for understanding the brain and its development and diseases. However, the enormous scale and complexity of nanoscale neuronal connectivity pose big challenges to existing visualization techniques in terms of scalability. NeuroLines offers a scalable visualization framework that can interactively render thousands of neurites, and that supports the detailed analysis of neuronal structures and their connectivity. We describe and analyze the design of NeuroLines based on two real-world use-cases of our collaborators in developmental neuroscience, and investigate its scalability to large-scale neuronal connectivity data.
Collapse
|
Research Support, N.I.H., Extramural |
11 |
14 |
9
|
Strobelt H, Bertini E, Braun J, Deussen O, Groth U, Mayer TU, Merhof D. HiTSEE KNIME: a visualization tool for hit selection and analysis in high-throughput screening experiments for the KNIME platform. BMC Bioinformatics 2012; 13 Suppl 8:S4. [PMID: 22607449 PMCID: PMC3355333 DOI: 10.1186/1471-2105-13-s8-s4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present HiTSEE (High-Throughput Screening Exploration Environment), a visualization tool for the analysis of large chemical screens used to examine biochemical processes. The tool supports the investigation of structure-activity relationships (SAR analysis) and, through a flexible interaction mechanism, the navigation of large chemical spaces. Our approach is based on the projection of one or a few molecules of interest and the expansion around their neighborhood and allows for the exploration of large chemical libraries without the need to create an all encompassing overview of the whole library. We describe the requirements we collected during our collaboration with biologists and chemists, the design rationale behind the tool, and two case studies on different datasets. The described integration (HiTSEE KNIME) into the KNIME platform allows additional flexibility in adopting our approach to a wide range of different biochemical problems and enables other research groups to use HiTSEE.
Collapse
|
Journal Article |
13 |
14 |
10
|
Carpendale S, Chen M, Evanko D, Gehlenborg N, Gorg C, Hunter L, Rowland F, Storey MA, Strobelt H. Ontologies in biological data visualization. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2014; 34:8-15. [PMID: 24808195 DOI: 10.1109/mcg.2014.33] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In computer science, an ontology is essentially a graph-based knowledge representation in which each node corresponds to a concept and each edge specifies a relation between two concepts. Ontological development in biology can serve as a focus to discuss the challenges and possible research directions for ontologies in visualization. The principle challenges are the dynamic and evolving nature of ontologies, the ever-present issue of scale, the diversity and richness of the relationships in ontologies, and the need to better understand the relationship between ontologies and the data analysis tasks scientists wish to support. Research directions include visualizing ontologies; visualizing semantically or ontologically annotated texts, documents, and corpora; automated generation of visualizations using ontologies; and visualizing ontological context to support search. Although this discussion uses issues of ontologies in biological data visualization as a springboard, these topics are of general relevance to visualization.
Collapse
|
|
11 |
10 |
11
|
Strobelt H, Oelke D, Kwon BC, Schreck T, Pfister H. Guidelines for Effective Usage of Text Highlighting Techniques. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:489-498. [PMID: 26529715 DOI: 10.1109/tvcg.2015.2467759] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Semi-automatic text analysis involves manual inspection of text. Often, different text annotations (like part-of-speech or named entities) are indicated by using distinctive text highlighting techniques. In typesetting there exist well-known formatting conventions, such as bold typeface, italics, or background coloring, that are useful for highlighting certain parts of a given text. Also, many advanced techniques for visualization and highlighting of text exist; yet, standard typesetting is common, and the effects of standard typesetting on the perception of text are not fully understood. As such, we surveyed and tested the effectiveness of common text highlighting techniques, both individually and in combination, to discover how to maximize pop-out effects while minimizing visual interference between techniques. To validate our findings, we conducted a series of crowdsourced experiments to determine: i) a ranking of nine commonly-used text highlighting techniques; ii) the degree of visual interference between pairs of text highlighting techniques; iii) the effectiveness of techniques for visual conjunctive search. Our results show that increasing font size works best as a single highlighting technique, and that there are significant visual interferences between some pairs of highlighting techniques. We discuss the pros and cons of different combinations as a design guideline to choose text highlighting techniques for text viewers.
Collapse
|
|
9 |
9 |
12
|
Strobelt H, Alsallakh B, Botros J, Peterson B, Borowsky M, Pfister H, Lex A. Vials: Visualizing Alternative Splicing of Genes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:399-408. [PMID: 26529712 PMCID: PMC4720991 DOI: 10.1109/tvcg.2015.2467911] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Alternative splicing is a process by which the same DNA sequence is used to assemble different proteins, called protein isoforms. Alternative splicing works by selectively omitting some of the coding regions (exons) typically associated with a gene. Detection of alternative splicing is difficult and uses a combination of advanced data acquisition methods and statistical inference. Knowledge about the abundance of isoforms is important for understanding both normal processes and diseases and to eventually improve treatment through targeted therapies. The data, however, is complex and current visualizations for isoforms are neither perceptually efficient nor scalable. To remedy this, we developed Vials, a novel visual analysis tool that enables analysts to explore the various datasets that scientists use to make judgments about isoforms: the abundance of reads associated with the coding regions of the gene, evidence for junctions, i.e., edges connecting the coding regions, and predictions of isoform frequencies. Vials is scalable as it allows for the simultaneous analysis of many samples in multiple groups. Our tool thus enables experts to (a) identify patterns of isoform abundance in groups of samples and (b) evaluate the quality of the data. We demonstrate the value of our tool in case studies using publicly available datasets.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
8 |
13
|
Strobelt H, Oelke D, Rohrdantz C, Stoffel A, Keim DA, Deussen O. Document cards: a top trumps visualization for documents. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2009; 15:1145-1152. [PMID: 19834183 DOI: 10.1109/tvcg.2009.139] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Finding suitable, less space consuming views for a document's main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document's key semantic as a mixture of images and important key terms, similar to cards in a top trumps game. The key terms are extracted using an advanced text mining approach based on a fully automatic document structure extraction. The images and their captions are extracted using a graphical heuristic and the captions are used for a semi-semantic image weighting. Furthermore, we use the image color histogram for classification and show at least one representative from each non-empty image class. The approach is demonstrated for the IEEE InfoVis publications of a complete year. The method can easily be applied to other publication collections and sets of documents which contain images.
Collapse
|
|
16 |
8 |
14
|
Zinsmaier M, Brandes U, Deussen O, Strobelt H. Interactive Level-of-Detail Rendering of Large Graphs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2012; 18:2486-2495. [PMID: 26357157 DOI: 10.1109/tvcg.2012.238] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We propose a technique that allows straight-line graph drawings to be rendered interactively with adjustable level of detail. The approach consists of a novel combination of edge cumulation with density-based node aggregation and is designed to exploit common graphics hardware for speed. It operates directly on graph data and does not require precomputed hierarchies or meshes. As proof of concept, we present an implementation that scales to graphs with millions of nodes and edges, and discuss several example applications.
Collapse
|
|
13 |
7 |
15
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
6 |
16
|
Aylett-Bullock J, Cuesta-Lazaro C, Quera-Bofarull A, Katta A, Hoffmann Pham K, Hoover B, Strobelt H, Moreno Jimenez R, Sedgewick A, Samir Evers E, Kennedy D, Harlass S, Gidraf Kahindo Maina A, Hussien A, Luengo-Oroz M. Operational response simulation tool for epidemics within refugee and IDP settlements: A scenario-based case study of the Cox's Bazar settlement. PLoS Comput Biol 2021; 17:e1009360. [PMID: 34710090 PMCID: PMC8553081 DOI: 10.1371/journal.pcbi.1009360] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 08/18/2021] [Indexed: 12/21/2022] Open
Abstract
The spread of infectious diseases such as COVID-19 presents many challenges to healthcare systems and infrastructures across the world, exacerbating inequalities and leaving the world's most vulnerable populations most affected. Given their density and available infrastructure, refugee and internally displaced person (IDP) settlements can be particularly susceptible to disease spread. In this paper we present an agent-based modeling approach to simulating the spread of disease in refugee and IDP settlements under various non-pharmaceutical intervention strategies. The model, based on the June open-source framework, is informed by data on geography, demographics, comorbidities, physical infrastructure and other parameters obtained from real-world observations and previous literature. The development and testing of this approach focuses on the Cox's Bazar refugee settlement in Bangladesh, although our model is designed to be generalizable to other informal settings. Our findings suggest the encouraging self-isolation at home of mild to severe symptomatic patients, as opposed to the isolation of all positive cases in purpose-built isolation and treatment centers, does not increase the risk of secondary infection meaning the centers can be used to provide hospital support to the most intense cases of COVID-19. Secondly we find that mask wearing in all indoor communal areas can be effective at dampening viral spread, even with low mask efficacy and compliance rates. Finally, we model the effects of reopening learning centers in the settlement under various mitigation strategies. For example, a combination of mask wearing in the classroom, halving attendance regularity to enable physical distancing, and better ventilation can almost completely mitigate the increased risk of infection which keeping the learning centers open may cause. These modeling efforts are being incorporated into decision making processes to inform future planning, and further exercises should be carried out in similar geographies to help protect those most vulnerable.
Collapse
|
research-article |
4 |
6 |
17
|
Gehrmann S, Strobelt H, Kruger R, Pfister H, Rush AM. Visual Interaction with Deep Learning Models through Collaborative Semantic Inference. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:884-894. [PMID: 31425116 DOI: 10.1109/tvcg.2019.2934595] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
6 |
18
|
Partl C, Lex A, Streit M, Strobelt H, Wassermann AM, Pfister H, Schmalstieg D. ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1883-92. [PMID: 26356902 PMCID: PMC4720990 DOI: 10.1109/tvcg.2014.2346752] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets. At its core ConTour lists all items of each dataset in a column. Relationships between the columns are revealed through interaction: selecting one or multiple items in one column highlights and re-sorts the items in other columns. Filters based on relationships enable drilling down into the large data space. To identify interesting items in the first place, ConTour employs advanced sorting strategies, including strategies based on connectivity strength and uniqueness, as well as sorting based on item attributes. ConTour also introduces interactive nesting of columns, a powerful method to show the related items of a child column for each item in the parent column. Within the columns, ConTour shows rich attribute data about the items as well as information about the connection strengths to other datasets. Finally, ConTour provides a number of detail views, which can show items from multiple datasets and their associated data at the same time. We demonstrate the utility of our system in case studies conducted with a team of chemical biologists, who investigate the effects of chemical compounds on cells and need to understand the underlying mechanisms.
Collapse
|
research-article |
11 |
5 |
19
|
Dinkla K, Strobelt H, Genest B, Reiling S, Borowsky M, Pfister H. Screenit: Visual Analysis of Cellular Screens. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:591-600. [PMID: 27875174 DOI: 10.1109/tvcg.2016.2598587] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-throughput and high-content screening enables large scale, cost-effective experiments in which cell cultures are exposed to a wide spectrum of drugs. The resulting multivariate data sets have a large but shallow hierarchical structure. The deepest level of this structure describes cells in terms of numeric features that are derived from image data. The subsequent level describes enveloping cell cultures in terms of imposed experiment conditions (exposure to drugs). We present Screenit, a visual analysis approach designed in close collaboration with screening experts. Screenit enables the navigation and analysis of multivariate data at multiple hierarchy levels and at multiple levels of detail. Screenit integrates the interactive modeling of cell physical states (phenotypes) and the effects of drugs on cell cultures (hits). In addition, quality control is enabled via the detection of anomalies that indicate low-quality data, while providing an interface that is designed to match workflows of screening experts. We demonstrate analyses for a real-world data set, CellMorph, with 6 million cells across 20,000 cell cultures.
Collapse
|
|
8 |
4 |
20
|
Cashman D, Perer A, Chang R, Strobelt H. Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:863-873. [PMID: 31502978 DOI: 10.1109/tvcg.2019.2934261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The performance of deep learning models is dependent on the precise configuration of many layers and parameters. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space.
Collapse
|
|
5 |
4 |
21
|
Hinterreiter A, Ruch P, Stitz H, Ennemoser M, Bernard J, Strobelt H, Streit M. ConfusionFlow: A Model-Agnostic Visualization for Temporal Analysis of Classifier Confusion. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1222-1236. [PMID: 32746284 DOI: 10.1109/tvcg.2020.3012063] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Classifiers are among the most widely used supervised machine learning algorithms. Many classification models exist, and choosing the right one for a given task is difficult. During model selection and debugging, data scientists need to assess classifiers' performances, evaluate their learning behavior over time, and compare different models. Typically, this analysis is based on single-number performance measures such as accuracy. A more detailed evaluation of classifiers is possible by inspecting class errors. The confusion matrix is an established way for visualizing these class errors, but it was not designed with temporal or comparative analysis in mind. More generally, established performance analysis systems do not allow a combined temporal and comparative analysis of class-level information. To address this issue, we propose ConfusionFlow, an interactive, comparative visualization tool that combines the benefits of class confusion matrices with the visualization of performance characteristics over time. ConfusionFlow is model-agnostic and can be used to compare performances for different model types, model architectures, and/or training and test datasets. We demonstrate the usefulness of ConfusionFlow in a case study on instance selection strategies in active learning. We further assess the scalability of ConfusionFlow and present a use case in the context of neural network pruning.
Collapse
|
|
3 |
4 |
22
|
Schwab M, Strobelt H, Tompkin J, Fredericks C, Huff C, Higgins D, Strezhnev A, Komisarchik M, King G, Pfister H. booc.io: An Education System with Hierarchical Concept Maps and Dynamic Non-linear Learning Plans. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:571-580. [PMID: 27875172 DOI: 10.1109/tvcg.2016.2598518] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Information hierarchies are difficult to express when real-world space or time constraints force traversing the hierarchy in linear presentations, such as in educational books and classroom courses. We present booc.io, which allows linear and non-linear presentation and navigation of educational concepts and material. To support a breadth of material for each concept, booc.io is Web based, which allows adding material such as lecture slides, book chapters, videos, and LTIs. A visual interface assists the creation of the needed hierarchical structures. The goals of our system were formed in expert interviews, and we explain how our design meets these goals. We adapt a real-world course into booc.io, and perform introductory qualitative evaluation with students.
Collapse
|
|
8 |
2 |
23
|
Strobelt H, Webson A, Sanh V, Hoover B, Beyer J, Pfister H, Rush AM. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1146-1156. [PMID: 36191099 DOI: 10.1109/tvcg.2022.3209479] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates with different wording choices lead to significant accuracy differences. PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts. We developed a workflow that allows users to first focus on model feedback using small data before moving on to a large data regime that allows empirical grounding of promising prompts using quantitative measures of the task. The tool then allows easy deployment of the newly created ad-hoc models. We demonstrate the utility of PromptIDE (demo: http://prompt.vizhub.ai) and our workflow using several real-world use cases.
Collapse
|
|
2 |
2 |
24
|
Beauxis-Aussalet E, Behrisch M, Borgo R, Chau DH, Collins C, Ebert D, El-Assady M, Endert A, Keim DA, Kohlhammer J, Oelke D, Peltonen J, Riveiro M, Schreck T, Strobelt H, van Wijk JJ, Rhyne TM. The Role of Interactive Visualization in Fostering Trust in AI. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2021; 41:7-12. [PMID: 34890313 DOI: 10.1109/mcg.2021.3107875] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The increasing use of artificial intelligence (AI) technologies across application domains has prompted our society to pay closer attention to AI's trustworthiness, fairness, interpretability, and accountability. In order to foster trust in AI, it is important to consider the potential of interactive visualization, and how such visualizations help build trust in AI systems. This manifesto discusses the relevance of interactive visualizations and makes the following four claims: i) trust is not a technical problem, ii) trust is dynamic, iii) visualization cannot address all aspects of trust, and iv) visualization is crucial for human agency in AI.
Collapse
|
|
4 |
1 |
25
|
Strobelt H, Kinley J, Krueger R, Beyer J, Pfister H, Rush AM. GenNI: Human-AI Collaboration for Data-Backed Text Generation. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1106-1116. [PMID: 34587072 DOI: 10.1109/tvcg.2021.3114845] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Table2Text systems generate textual output based on structured data utilizing machine learning. These systems are essential for fluent natural language interfaces in tools such as virtual assistants; however, left to generate freely these ML systems often produce misleading or unexpected outputs. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text. The tool utilizes a deep learning model designed with explicit control states. These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models. The visual interface makes it possible for users to interact with AI systems following a Refine-Forecast paradigm to ensure that the generation system acts in a manner human users find suitable. We report multiple use cases on two experiments that improve over uncontrolled generation approaches, while at the same time providing fine-grained control. A demo and source code are available at https://genni.vizhub.ai.
Collapse
|
|
3 |
1 |