1
|
Nowak S, Rosin M, Stuerzlinger W, Bartram L. Visual Analytics: A Method to Explore Natural Histories of Oral Epithelial Dysplasia. FRONTIERS IN ORAL HEALTH 2022; 2:703874. [PMID: 35048041 PMCID: PMC8757761 DOI: 10.3389/froh.2021.703874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/02/2021] [Indexed: 11/17/2022] Open
Abstract
Risk assessment and follow-up of oral potentially malignant disorders in patients with mild or moderate oral epithelial dysplasia is an ongoing challenge for improved oral cancer prevention. Part of the challenge is a lack of understanding of how observable features of such dysplasia, gathered as data by clinicians during follow-up, relate to underlying biological processes driving progression. Current research is at an exploratory phase where the precise questions to ask are not known. While traditional statistical and the newer machine learning and artificial intelligence methods are effective in well-defined problem spaces with large datasets, these are not the circumstances we face currently. We argue that the field is in need of exploratory methods that can better integrate clinical and scientific knowledge into analysis to iteratively generate viable hypotheses. In this perspective, we propose that visual analytics presents a set of methods well-suited to these needs. We illustrate how visual analytics excels at generating viable research hypotheses by describing our experiences using visual analytics to explore temporal shifts in the clinical presentation of epithelial dysplasia. Visual analytics complements existing methods and fulfills a critical and at-present neglected need in the formative stages of inquiry we are facing.
Collapse
Affiliation(s)
- Stan Nowak
- School of Interactive Arts and Technology, Simon Fraser University, Burnaby, BC, Canada
| | - Miriam Rosin
- BC Oral Cancer Prevention Program, Cancer Control Research, BC Cancer, Vancouver, BC, Canada.,Department of Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | - Wolfgang Stuerzlinger
- School of Interactive Arts and Technology, Simon Fraser University, Burnaby, BC, Canada
| | - Lyn Bartram
- School of Interactive Arts and Technology, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
2
|
Peng Y, Tang S, Wang D, Zhong H, Jia H, Cai X, Zhang Z, Xiao M, Yang H, Wang J, Kristiansen K, Xu X, Li J. MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks. Gigascience 2018; 7:5114262. [PMID: 30277499 PMCID: PMC6251982 DOI: 10.1093/gigascience/giy121] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 09/20/2018] [Indexed: 02/01/2023] Open
Abstract
Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here, we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome networks from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa.
Collapse
Affiliation(s)
- Ye Peng
- School of Biology and Biological Engineering, South China University of Technology, Building B6, 382 Zhonghuan Road East, Guangzhou Higher Education Mega Center, Guangzhou 510006, China.,BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Shanmei Tang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| | - Dan Wang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| | - Huanzi Zhong
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen Biocenter, Ole MaalØes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Huijue Jia
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| | - Xianghang Cai
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Zhaoxi Zhang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Minfeng Xiao
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Huanming Yang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,James D. Watson Institute of Genome Sciences, No. 51, Zhijiang Road, Xihu District, Hangzhou 310058, China
| | - Jian Wang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,James D. Watson Institute of Genome Sciences, No. 51, Zhijiang Road, Xihu District, Hangzhou 310058, China
| | - Karsten Kristiansen
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen Biocenter, Ole MaalØes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Xun Xu
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Junhua Li
- School of Biology and Biological Engineering, South China University of Technology, Building B6, 382 Zhonghuan Road East, Guangzhou Higher Education Mega Center, Guangzhou 510006, China.,BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| |
Collapse
|
3
|
Pedersen TL. Hierarchical sets: analyzing pangenome structure through scalable set visualizations. Bioinformatics 2018; 33:1604-1612. [PMID: 28130242 PMCID: PMC5447240 DOI: 10.1093/bioinformatics/btx034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 01/24/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome analysis. Results We present a new hierarchical clustering algorithm based on set arithmetics that optimizes the intersection sizes along the branches. The intersection and union sizes along the hierarchy are visualized using a composite dendrogram and icicle plot, which, in pangenome context, shows the evolution of pangenome and core size along the evolutionary hierarchy. Outlying elements, i.e. elements whose presence pattern do not correspond with the hierarchy, can be visualized using hierarchical edge bundles. When applied to pangenome data this plot shows putative horizontal gene transfers between the genomes and can highlight relationships between genomes that is not represented by the hierarchy. We illustrate the utility of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis. Availability and Implementation The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN (https://cran.r-project.org/web/packages/hierarchicalSets) Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Lin Pedersen
- Department of Systems Biology, Center for Biological Sequence Analysis, The Technical University of Denmark, Building 208, Lyngby, Denmark.,Assays, Culture and Enzymes Division, Chr. Hansen A/S, Hørsholm, Denmark
| |
Collapse
|
4
|
Pedersen TL, Nookaew I, Wayne Ussery D, Månsson M. PanViz: interactive visualization of the structure of functionally annotated pangenomes. Bioinformatics 2017; 33:1081-1082. [PMID: 28057677 PMCID: PMC5859990 DOI: 10.1093/bioinformatics/btw761] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 11/29/2016] [Indexed: 12/13/2022] Open
Abstract
Summary PanViz is a novel, interactive, visualization tool for pangenome analysis. PanViz allows visualization of changes in gene group (groups of similar genes across genomes) classification as different subsets of pangenomes are selected, as well as comparisons of individual genomes to pangenomes with gene ontology based navigation of gene groups. Furthermore it allows for rich and complex visual querying of gene groups in the pangenome. PanViz visualizations require no external programs and are easily sharable, allowing for rapid pangenome analyses. Availability and Implementation PanViz is written entirely in JavaScript and is available on https://github.com/thomasp85/PanViz . A companion R package that facilitates the creation of PanViz visualizations from a range of data formats is released through Bioconductor and is available at https://bioconductor.org/packages/PanVizGenerator . Contact thomasp85@gmail.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Lin Pedersen
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, DK-2800 Lyngby, Denmark.,Assays, Culture and Enzymes Division, DK-2970 Hørsholm, Denmark
| | - Intawat Nookaew
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Department Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - David Wayne Ussery
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Department Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Maria Månsson
- Assays, Culture and Enzymes Division, DK-2970?Hørsholm, Denmark
| |
Collapse
|
5
|
Nishimura A, Nishimura K, Kada A, Iihara K. Status and Future Perspectives of Utilizing Big Data in Neurosurgical and Stroke Research. Neurol Med Chir (Tokyo) 2016; 56:655-663. [PMID: 27680330 PMCID: PMC5221776 DOI: 10.2176/nmc.ra.2016-0174] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The management, analysis, and integration of Big Data have received increasing attention in healthcare research as well as in medical bioinformatics. The J-ASPECT study is the first nationwide survey in Japan on the real-world setting of stroke care using data obtained from the diagnosis procedure combination-based payment system. The J-ASPECT study demonstrated a significant association between comprehensive stroke care (CSC) capacity and the hospital volume of stroke interventions in Japan; further, it showed that CSC capabilities were associated with reduced in-hospital mortality rates. Our study aims to create new evidence and insight from ‘real world’ neurosurgical practice and stroke care in Japan using Big Data. The final aim of this study is to develop effective methods to bridge the evidence-practice gap in acute stroke healthcare. In this study, the authors describe the status and future perspectives of the development of a new method of stroke registry as a powerful tool for acute stroke care research.
Collapse
Affiliation(s)
- Ataru Nishimura
- Department of Neurosurgery, Graduate School of Medical Sciences, Kyushu University
| | | | | | | | | |
Collapse
|
6
|
Lin G, Chai J, Yuan S, Mai C, Cai L, Murphy RW, Zhou W, Luo J. VennPainter: A Tool for the Comparison and Identification of Candidate Genes Based on Venn Diagrams. PLoS One 2016; 11:e0154315. [PMID: 27120465 PMCID: PMC4847855 DOI: 10.1371/journal.pone.0154315] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 04/12/2016] [Indexed: 12/21/2022] Open
Abstract
VennPainter is a program for depicting unique and shared sets of genes lists and generating Venn diagrams, by using the Qt C++ framework. The software produces Classic Venn, Edwards’ Venn and Nested Venn diagrams and allows for eight sets in a graph mode and 31 sets in data processing mode only. In comparison, previous programs produce Classic Venn and Edwards’ Venn diagrams and allow for a maximum of six sets. The software incorporates user-friendly features and works in Windows, Linux and Mac OS. Its graphical interface does not require a user to have programing skills. Users can modify diagram content for up to eight datasets because of the Scalable Vector Graphics output. VennPainter can provide output results in vertical, horizontal and matrix formats, which facilitates sharing datasets as required for further identification of candidate genes. Users can obtain gene lists from shared sets by clicking the numbers on the diagram. Thus, VennPainter is an easy-to-use, highly efficient, cross-platform and powerful program that provides a more comprehensive tool for identifying candidate genes and visualizing the relationships among genes or gene families in comparative analysis.
Collapse
Affiliation(s)
- Guoliang Lin
- Key Laboratory for Animal Genetic Diversity and Evolution of High Education in Yunnan Province, School of Life Sciences, Yunnan University, Kunming, 650091, China
- State Key Laboratory for Conservation and Utilization of Bio-resource, Yunnan University, Kunming, 650091, Yunnan, China
| | - Jing Chai
- Key Laboratory for Animal Genetic Diversity and Evolution of High Education in Yunnan Province, School of Life Sciences, Yunnan University, Kunming, 650091, China
- State Key Laboratory for Conservation and Utilization of Bio-resource, Yunnan University, Kunming, 650091, Yunnan, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, The Chinese Academy of Sciences, Kunming, 650223, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650000, China
| | - Shuo Yuan
- School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Chao Mai
- School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Li Cai
- School of Software, Yunnan University, Kunming, 650091, Yunnan, China
- School of Computer and Science, Fudan University, Shanghai, 200433, China
| | - Robert W. Murphy
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, The Chinese Academy of Sciences, Kunming, 650223, Yunnan, China
- Centre for Biodiversity and Conservation Biology, Royal Ontario Museum, Toronto, M5S 2C6, Canada
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, 650091, Yunnan, China
- * E-mail: (WZ); (JL)
| | - Jing Luo
- Key Laboratory for Animal Genetic Diversity and Evolution of High Education in Yunnan Province, School of Life Sciences, Yunnan University, Kunming, 650091, China
- State Key Laboratory for Conservation and Utilization of Bio-resource, Yunnan University, Kunming, 650091, Yunnan, China
- * E-mail: (WZ); (JL)
| |
Collapse
|
7
|
Helbig C, Bilke L, Bauer HS, Böttinger M, Kolditz O. MEVA--An Interactive Visualization Application for Validation of Multifaceted Meteorological Data with Multiple 3D Devices. PLoS One 2015; 10:e0123811. [PMID: 25915061 PMCID: PMC4411171 DOI: 10.1371/journal.pone.0123811] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 03/07/2015] [Indexed: 11/30/2022] Open
Abstract
Background To achieve more realistic simulations, meteorologists develop and use models with increasing spatial and temporal resolution. The analyzing, comparing, and visualizing of resulting simulations becomes more and more challenging due to the growing amounts and multifaceted character of the data. Various data sources, numerous variables and multiple simulations lead to a complex database. Although a variety of software exists suited for the visualization of meteorological data, none of them fulfills all of the typical domain-specific requirements: support for quasi-standard data formats and different grid types, standard visualization techniques for scalar and vector data, visualization of the context (e.g., topography) and other static data, support for multiple presentation devices used in modern sciences (e.g., virtual reality), a user-friendly interface, and suitability for cooperative work. Methods and Results Instead of attempting to develop yet another new visualization system to fulfill all possible needs in this application domain, our approach is to provide a flexible workflow that combines different existing state-of-the-art visualization software components in order to hide the complexity of 3D data visualization tools from the end user. To complete the workflow and to enable the domain scientists to interactively visualize their data without advanced skills in 3D visualization systems, we developed a lightweight custom visualization application (MEVA - multifaceted environmental data visualization application) that supports the most relevant visualization and interaction techniques and can be easily deployed. Specifically, our workflow combines a variety of different data abstraction methods provided by a state-of-the-art 3D visualization application with the interaction and presentation features of a computer-games engine. Our customized application includes solutions for the analysis of multirun data, specifically with respect to data uncertainty and differences between simulation runs. In an iterative development process, our easy-to-use application was developed in close cooperation with meteorologists and visualization experts. The usability of the application has been validated with user tests. We report on how this application supports the users to prove and disprove existing hypotheses and discover new insights. In addition, the application has been used at public events to communicate research results.
Collapse
Affiliation(s)
- Carolin Helbig
- Department of Environmental Informatics, Helmholtz Centre for Environmental Research (UFZ), Leipzig, Germany; Faculty of Environmental Sciences, Technical University Dresden, Dresden, Germany
| | - Lars Bilke
- Department of Environmental Informatics, Helmholtz Centre for Environmental Research (UFZ), Leipzig, Germany
| | - Hans-Stefan Bauer
- Institute of Physics and Meteorology, University of Hohenheim, Stuttgart, Germany
| | | | - Olaf Kolditz
- Department of Environmental Informatics, Helmholtz Centre for Environmental Research (UFZ), Leipzig, Germany; Faculty of Environmental Sciences, Technical University Dresden, Dresden, Germany
| |
Collapse
|
8
|
Simpao AF, Ahumada LM, Rehman MA. Big data and visual analytics in anaesthesia and health care. Br J Anaesth 2015; 115:350-6. [PMID: 25627395 DOI: 10.1093/bja/aeu552] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Advances in computer technology, patient monitoring systems, and electronic health record systems have enabled rapid accumulation of patient data in electronic form (i.e. big data). Organizations such as the Anesthesia Quality Institute and Multicenter Perioperative Outcomes Group have spearheaded large-scale efforts to collect anaesthesia big data for outcomes research and quality improvement. Analytics--the systematic use of data combined with quantitative and qualitative analysis to make decisions--can be applied to big data for quality and performance improvements, such as predictive risk assessment, clinical decision support, and resource management. Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces, and it can facilitate performance of cognitive activities involving big data. Ongoing integration of big data and analytics within anaesthesia and health care will increase demand for anaesthesia professionals who are well versed in both the medical and the information sciences.
Collapse
Affiliation(s)
- A F Simpao
- Department of Anesthesiology and Critical Care, Perelman School of Medicine at the University of Pennsylvania and the Children's Hospital of Philadelphia, 3401 Civic Center Boulevard, Suite 9329, Philadelphia, PA 19104-4399, USA
| | - L M Ahumada
- Enterprise Analytics and Reporting, The Children's Hospital of Philadelphia, 1300 Market Street, Room W-8006, Philadelphia, PA 19107-3323, USA
| | - M A Rehman
- Department of Anesthesiology and Critical Care, Perelman School of Medicine at the University of Pennsylvania and the Children's Hospital of Philadelphia, 3401 Civic Center Boulevard, Suite 9329, Philadelphia, PA 19104-4399, USA
| |
Collapse
|
9
|
Morrison SS, Pyzh R, Jeon MS, Amaro C, Roig FJ, Baker-Austin C, Oliver JD, Gibas CJ. Impact of analytic provenance in genome analysis. BMC Genomics 2014; 15 Suppl 8:S1. [PMID: 25435180 PMCID: PMC4248810 DOI: 10.1186/1471-2164-15-s8-s1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background Many computational methods are available for assembly and annotation of newly sequenced microbial genomes. However, when new genomes are reported in the literature, there is frequently very little critical analysis of choices made during the sequence assembly and gene annotation stages. These choices have a direct impact on the biologically relevant products of a genomic analysis - for instance identification of common and differentiating regions among genomes in a comparison, or identification of enriched gene functional categories in a specific strain. Here, we examine the outcomes of different assembly and analysis steps in typical workflows in a comparison among strains of Vibrio vulnificus. Results Using six recently sequenced strains of V. vulnificus, we demonstrate the "alternate realities" of comparative genomics, and how they depend on the choice of a robust assembly method and accurate ab initio annotation. We apply several popular assemblers for paired-end Illumina data, and three well-regarded ab initio genefinders. We demonstrate significant differences in detected gene overlap among comparative genomics workflows that depend on these two steps. The divergence between workflows, even those using widely adopted methods, is obvious both at the single genome level and when a comparison is performed. In a typical example where multiple workflows are applied to the strain V. vulnificus CECT 4606, a workflow that uses the Velvet assembler and Glimmer gene finder identifies 3275 gene features, while a workflow that uses the Velvet assembler and the RAST annotation system identifies 5011 gene features. Only 3171 genes are identical between both workflows. When we examine 9 assembly/ annotation workflow scenarios as input to a three-way genome comparison, differentiating genes and even differentially represented functional categories change significantly from scenario to scenario. Conclusions Inconsistencies in genomic analysis can arise depending on the choices that are made during the assembly and annotation stages. These inconsistencies can have a significant impact on the interpretation of an individual genome's content. The impact is multiplied when comparison of content and function among multiple genomes is the goal. Tracking the analysis history of the data - its analytic provenance - is critical for reproducible analysis of genome data.
Collapse
|
10
|
Morrison SS, Williams T, Cain A, Froelich B, Taylor C, Baker-Austin C, Verner-Jeffreys D, Hartnell R, Oliver JD, Gibas CJ. Pyrosequencing-based comparative genome analysis of Vibrio vulnificus environmental isolates. PLoS One 2012; 7:e37553. [PMID: 22662170 PMCID: PMC3360785 DOI: 10.1371/journal.pone.0037553] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2011] [Accepted: 04/25/2012] [Indexed: 01/22/2023] Open
Abstract
Between 1996 and 2006, the US Centers for Disease Control reported that the only category of food-borne infections increasing in frequency were those caused by members of the genus Vibrio. The Gram-negative bacterium Vibrio vulnificus is a ubiquitous inhabitant of estuarine waters, and is the number one cause of seafood-related deaths in the US. Many V. vulnificus isolates have been studied, and it has been shown that two genetically distinct subtypes, distinguished by 16S rDNA and other gene polymorphisms, are associated predominantly with either environmental or clinical isolation. While local genetic differences between the subtypes have been probed, only the genomes of clinical isolates have so far been completely sequenced. In order to better understand V. vulnificus as an agent of disease and to identify the molecular components of its virulence mechanisms, we have completed whole genome shotgun sequencing of three diverse environmental genotypes using a pyrosequencing approach. V. vulnificus strain JY1305 was sequenced to a depth of 33×, and strains E64MW and JY1701 were sequenced to lesser depth, covering approximately 99.9% of each genome. We have performed a comparative analysis of these sequences against the previously published sequences of three V. vulnificus clinical isolates. We find that the genome of V. vulnificus is dynamic, with 1.27% of genes in the C-genotype genomes not found in the E- genotype genomes. We identified key genes that differentiate between the genomes of the clinical and environmental genotypes. 167 genes were found to be specifically associated with environmental genotypes and 278 genes with clinical genotypes. Genes specific to the clinical strains include components of sialic acid catabolism, mannitol fermentation, and a component of a Type IV secretory pathway VirB4, as well as several other genes with potential significance for human virulence. Genes specific to environmental strains included several that may have implications for the balance between self-preservation under stress and nutritional competence.
Collapse
Affiliation(s)
- Shatavia S. Morrison
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Tiffany Williams
- Department of Biology, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Aurora Cain
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Brett Froelich
- Department of Biology, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Casey Taylor
- Department of Biology, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Craig Baker-Austin
- Centre for Environment, Fisheries, and Aquaculture Science, Weymouth, Dorset, United Kingdom
| | - David Verner-Jeffreys
- Centre for Environment, Fisheries, and Aquaculture Science, Weymouth, Dorset, United Kingdom
| | - Rachel Hartnell
- Centre for Environment, Fisheries, and Aquaculture Science, Weymouth, Dorset, United Kingdom
| | - James D. Oliver
- Department of Biology, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Cynthia J. Gibas
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
- * E-mail:
| |
Collapse
|