1
|
van den Brandt A, Jonkheer EM, van Workum DJM, van de Wetering H, Smit S, Vilanova A. PanVA: Pangenomic Variant Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4895-4909. [PMID: 37267130 DOI: 10.1109/tvcg.2023.3282364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Genomics researchers increasingly use multiple reference genomes to comprehensively explore genetic variants underlying differences in detectable characteristics between organisms. Pangenomes allow for an efficient data representation of multiple related genomes and their associated metadata. However, current visual analysis approaches for exploring these complex genotype-phenotype relationships are often based on single reference approaches or lack adequate support for interpreting the variants in the genomic context with heterogeneous (meta)data. This design study introduces PanVA, a visual analytics design for pangenomic variant analysis developed with the active participation of genomics researchers. The design uniquely combines tailored visual representations with interactions such as sorting, grouping, and aggregation, allowing users to navigate and explore different perspectives on complex genotype-phenotype relations. Through evaluation in the context of plants and pathogen research, we show that PanVA helps researchers explore variants in genes and generate hypotheses about their role in phenotypic variation.
Collapse
|
2
|
Zhou L, Feng T, Xu S, Gao F, Lam TT, Wang Q, Wu T, Huang H, Zhan L, Li L, Guan Y, Dai Z, Yu G. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Brief Bioinform 2022; 23:6603927. [PMID: 35671504 DOI: 10.1093/bib/bbac222] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 05/07/2022] [Accepted: 05/11/2022] [Indexed: 12/25/2022] Open
Abstract
The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).
Collapse
Affiliation(s)
- Lang Zhou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tingze Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Fangluan Gao
- Institute of Plant Virology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Tommy T Lam
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China.,Laboratory of Data Discovery for Health Limited, 19W Hong Kong Science & Technology Parks, Hong Kong SAR, China
| | - Qianwen Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Tianzhi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huina Huang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Zhuhai International Travel Healthcare Center, Zhuhai, Guangdong, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Lin Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Yi Guan
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China.,Joint Institute of Virology (Shantou University - The University of Hong Kong), Shantou University, Shantou, China
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
3
|
Prescott L. SARS-CoV-2 3CLpro whole human proteome cleavage prediction and enrichment/depletion analysis. Comput Biol Chem 2022; 98:107671. [PMID: 35429835 PMCID: PMC8958254 DOI: 10.1016/j.compbiolchem.2022.107671] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 03/21/2022] [Accepted: 03/25/2022] [Indexed: 12/12/2022]
Abstract
A novel coronavirus (SARS-CoV-2) has devastated the globe as a pandemic that has killed millions of people. Widespread vaccination is still uncertain, so many scientific efforts have been directed toward discovering antiviral treatments. Many drugs are being investigated to inhibit the coronavirus main protease, 3CLpro, from cleaving its viral polyprotein, but few publications have addressed this protease’s interactions with the host proteome or their probable contribution to virulence. Too few host protein cleavages have been experimentally verified to fully understand 3CLpro’s global effects on relevant cellular pathways and tissues. Here, I set out to determine this protease’s targets and corresponding potential drug targets. Using a neural network trained on cleavages from 392 coronavirus proteomes with a Matthews correlation coefficient of 0.985, I predict that a large proportion of the human proteome is vulnerable to 3CLpro, with 4898 out of approximately 20,000 human proteins containing at least one putative cleavage site. These cleavages are nonrandomly distributed and are enriched in the epithelium along the respiratory tract, brain, testis, plasma, and immune tissues and depleted in olfactory and gustatory receptors despite the prevalence of anosmia and ageusia in COVID-19 patients. Affected cellular pathways include cytoskeleton/motor/cell adhesion proteins, nuclear condensation and other epigenetics, host transcription and RNAi, ribosomal stoichiometry and nascent-chain detection and degradation, ubiquitination, pattern recognition receptors, coagulation, lipoproteins, redox, and apoptosis. This whole proteome cleavage prediction demonstrates the importance of 3CLpro in expected and nontrivial pathways affecting virulence, lead me to propose more than a dozen potential therapeutic targets against coronaviruses, and should therefore be applied to all viral proteases and subsequently experimentally verified.
Collapse
|
4
|
Xia M, Velumani RP, Wang Y, Qu H, Ma X. QLens: Visual Analytics of MUlti-step Problem-solving Behaviors for Improving Question Design. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:870-880. [PMID: 33048682 DOI: 10.1109/tvcg.2020.3030337] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the rapid development of online education in recent years, there has been an increasing number of learning platforms that provide students with multi-step questions to cultivate their problem-solving skills. To guarantee the high quality of such learning materials, question designers need to inspect how students' problem-solving processes unfold step by step to infer whether students' problem-solving logic matches their design intent. They also need to compare the behaviors of different groups (e.g., students from different grades) to distribute questions to students with the right level of knowledge. The availability of fine-grained interaction data, such as mouse movement trajectories from the online platforms, provides the opportunity to analyze problem-solving behaviors. However, it is still challenging to interpret, summarize, and compare the high dimensional problem-solving sequence data. In this paper, we present a visual analytics system, QLens, to help question designers inspect detailed problem-solving trajectories, compare different student groups, distill insights for design improvements. In particular, QLens models problem-solving behavior as a hybrid state transition graph and visualizes it through a novel glyph-embedded Sankey diagram, which reflects students' problem-solving logic, engagement, and encountered difficulties. We conduct three case studies and three expert interviews to demonstrate the usefulness of QLens on real-world datasets that consist of thousands of problem-solving traces.
Collapse
|
5
|
Bartolomeo SD, Zhang Y, Sheng F, Dunne C. Sequence Braiding: Visual Overviews of Temporal Event Sequences and Attributes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1353-1363. [PMID: 33074822 DOI: 10.1109/tvcg.2020.3030442] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Temporal event sequence alignment has been used in many domains to visualize nuanced changes and interactions over time. Existing approaches align one or two sentinel events. Overview tasks require examining all alignments of interest using interaction and time or juxtaposition of many visualizations. Furthermore, any event attribute overviews are not closely tied to sequence visualizations. We present Sequence Braiding, a novel overview visualization for temporal event sequences and attributes using a layered directed acyclic network. Sequence Braiding visually aligns many temporal events and attribute groups simultaneously and supports arbitrary ordering, absence, and duplication of events. In a controlled experiment we compare Sequence Braiding and IDMVis on user task completion time, correctness, error, and confidence. Our results provide good evidence that users of Sequence Braiding can understand high-level patterns and trends faster and with similar error. A full version of this paper with all appendices; the evaluation stimuli, data, and analysis code; and source code are available at [Formula: see text].
Collapse
|
6
|
Qi J, Bloemen V, Wang S, van Wijk J, van de Wetering H. STBins: Visual Tracking and Comparison of Multiple Data Sequences Using Temporal Binning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1054-1063. [PMID: 31425095 DOI: 10.1109/tvcg.2019.2934289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
While analyzing multiple data sequences, the following questions typically arise: how does a single sequence change over time, how do multiple sequences compare within a period, and how does such comparison change over time. This paper presents a visual technique named STBins to answer these questions. STBins is designed for visual tracking of individual data sequences and also for comparison of sequences. The latter is done by showing the similarity of sequences within temporal windows. A perception study is conducted to examine the readability of alternative visual designs based on sequence tracking and comparison tasks. Also, two case studies based on real-world datasets are presented in detail to demonstrate usage of our technique.
Collapse
|
7
|
Nusrat S, Harbig T, Gehlenborg N. Tasks, Techniques, and Tools for Genomic Data Visualization. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2019; 38:781-805. [PMID: 31768085 PMCID: PMC6876635 DOI: 10.1111/cgf.13727] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Genomic data visualization is essential for interpretation and hypothesis generation as well as a valuable aid in communicating discoveries. Visual tools bridge the gap between algorithmic approaches and the cognitive skills of investigators. Addressing this need has become crucial in genomics, as biomedical research is increasingly data-driven and many studies lack well-defined hypotheses. A key challenge in data-driven research is to discover unexpected patterns and to formulate hypotheses in an unbiased manner in vast amounts of genomic and other associated data. Over the past two decades, this has driven the development of numerous data visualization techniques and tools for visualizing genomic data. Based on a comprehensive literature survey, we propose taxonomies for data, visualization, and tasks involved in genomic data visualization. Furthermore, we provide a comprehensive review of published genomic visualization tools in the context of the proposed taxonomies.
Collapse
Affiliation(s)
- S Nusrat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - T Harbig
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - N Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
8
|
Yachdav G, Wilzbach S, Rauscher B, Sheridan R, Sillitoe I, Procter J, Lewis SE, Rost B, Goldberg T. MSAViewer: interactive JavaScript visualization of multiple sequence alignments. Bioinformatics 2016; 32:3501-3503. [PMID: 27412096 PMCID: PMC5181560 DOI: 10.1093/bioinformatics/btw474] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 06/03/2016] [Accepted: 06/29/2016] [Indexed: 11/29/2022] Open
Abstract
Summary: The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is ‘web ready’: written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. Availability and Implementation: The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/. Supplementary information:Supplementary data are available at Bioinformatics online. Contact:msa@bio.sh
Collapse
Affiliation(s)
- Guy Yachdav
- Bioinformatik - I12, TUM, Garching, 85748, Germany.,Biosof LLC, New York, NY 10001, USA
| | | | | | - Robert Sheridan
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Ian Sillitoe
- Institute of Structure and Molecular Biology, University College London, London, UK
| | - James Procter
- Biological Chemistry and Drug Discovery, University of Dundee, Dundee, UK
| | | | - Burkhard Rost
- Bioinformatik - I12, TUM, Garching, 85748, Germany.,Biosof LLC, New York, NY 10001, USA
| | | |
Collapse
|
9
|
Schwarz RF, Tamuri AU, Kultys M, King J, Godwin J, Florescu AM, Schultz J, Goldman N. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments. Nucleic Acids Res 2016; 44:e77. [PMID: 26819408 PMCID: PMC4856975 DOI: 10.1093/nar/gkw022] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/08/2016] [Indexed: 12/19/2022] Open
Abstract
Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).
Collapse
Affiliation(s)
- Roland F Schwarz
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Asif U Tamuri
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Marek Kultys
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - James King
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - James Godwin
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - Ana M Florescu
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - Jörg Schultz
- Center for Computational and Theoretical Biology and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany
| | - Nick Goldman
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
10
|
Yu JF, Dou XH, Wang HB, Sun X, Zhao HY, Wang JH. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences. J Chem Inf Model 2015; 55:1261-70. [PMID: 25945398 DOI: 10.1021/ci500577m] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .
Collapse
Affiliation(s)
- Jia-Feng Yu
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.,‡State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Xiang-Hua Dou
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Hong-Bo Wang
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Xiao Sun
- ‡State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hui-Ying Zhao
- §Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4000, Australia
| | - Ji-Hua Wang
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.,∥College of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| |
Collapse
|
11
|
Ray WC, Rumpf RW, Sullivan B, Callahan N, Magliery T, Machiraju R, Wong B, Krzywinski M, Bartlett CW. Understanding the sequence requirements of protein families: insights from the BioVis 2013 contests. BMC Proc 2014; 8:S1. [PMID: 25237388 PMCID: PMC4155613 DOI: 10.1186/1753-6561-8-s2-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Introduction In 2011, the BioVis symposium of the IEEE VisWeek conferences inaugurated a new variety of data analysis contest. Aimed at fostering collaborations between computational scientists and biologists, the BioVis contest provided real data from biological domains with emerging visualization needs, in the hope that novel approaches would result in powerful new tools for the community. In 2011 and 2012 the theme of these contests was expression Quantitative Trait Locus analysis, within and across tissues respectively. In 2013 the topic was updated to protein sequence and mutation visualization. Methods The contest was framed in the context of a real protein with numerous mutations that had lost function, and the question posed "what minimal set of changes would you propose to rescue function, or how could you support a biologist attempting to answer that question?". The data was grounded in actual experimental results in triosephosphate isomerase(TIM) enzymes. Seven teams composed of 36 individuals submitted entries with proposed solutions and approaches to the challenge. Their contributions ranged from careful analysis of the visualization and analytical requirements for the problem through integration of existing tools for analyzing the context and consequences of protein mutations, to completely new tools addressing the problem. Results Judges found valuable and novel contributions in each of the entries, including interesting ways to hierarchicalize the protein into domains of informational interaction, tools for simultaneously understanding both sequential and spatial order, and approaches for conveying some types of inter-residue dependencies. In this manuscript we document the problem presented to the contestants, summarize the biological contributions of their entries, and suggest opportunities that this work has highlighted for even more improved tools in the future.
Collapse
Affiliation(s)
- William C Ray
- Nationwide Children's Hospital, 575 Children's Crossroad, 43215, Columbus, OH, USA ; The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Contest Chairs
| | - R Wolfgang Rumpf
- Nationwide Children's Hospital, 575 Children's Crossroad, 43215, Columbus, OH, USA
| | - Brandon Sullivan
- The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Domain Experts
| | - Nicholas Callahan
- The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Domain Experts
| | - Thomas Magliery
- The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Domain Experts
| | - Raghu Machiraju
- The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Contest Chairs
| | - Bang Wong
- The Broad Institute, 7 Cambridge Center, 02142, Cambridge, MA, USA ; Contest Chairs
| | - Martin Krzywinski
- Genome Sciences Centre, 570 W, 7th Avenue, V5Z 4S6, Vancouver, BC, Canada ; Contest Chairs
| | - Christopher W Bartlett
- Nationwide Children's Hospital, 575 Children's Crossroad, 43215, Columbus, OH, USA ; The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Contest Chairs
| |
Collapse
|
12
|
Roca AI. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos. BMC Proc 2014; 8:S6. [PMID: 25237393 PMCID: PMC4155610 DOI: 10.1186/1753-6561-8-s2-s6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Collapse
Affiliation(s)
- Alberto I Roca
- ProfileGrid.org, P.O. Box 6414, Irvine, California 92616, USA
| |
Collapse
|