1
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
2
|
Pangenomics in Microbial and Crop Research: Progress, Applications, and Perspectives. Genes (Basel) 2022; 13:genes13040598. [PMID: 35456404 PMCID: PMC9031676 DOI: 10.3390/genes13040598] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 03/16/2022] [Accepted: 03/25/2022] [Indexed: 01/25/2023] Open
Abstract
Advances in sequencing technologies and bioinformatics tools have fueled a renewed interest in whole genome sequencing efforts in many organisms. The growing availability of multiple genome sequences has advanced our understanding of the within-species diversity, in the form of a pangenome. Pangenomics has opened new avenues for future research such as allowing dissection of complex molecular mechanisms and increased confidence in genome mapping. To comprehensively capture the genetic diversity for improving plant performance, the pangenome concept is further extended from species to genus level by the inclusion of wild species, constituting a super-pangenome. Characterization of pangenome has implications for both basic and applied research. The concept of pangenome has transformed the way biological questions are addressed. From understanding evolution and adaptation to elucidating host–pathogen interactions, finding novel genes or breeding targets to aid crop improvement to design effective vaccines for human prophylaxis, the increasing availability of the pangenome has revolutionized several aspects of biological research. The future availability of high-resolution pangenomes based on reference-level near-complete genome assemblies would greatly improve our ability to address complex biological problems.
Collapse
|
3
|
Durant É, Sabot F, Conte M, Rouard M. Panache: a Web Browser-Based Viewer for Linearized Pangenomes. Bioinformatics 2021; 37:4556-4558. [PMID: 34601567 PMCID: PMC8652104 DOI: 10.1093/bioinformatics/btab688] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 07/28/2021] [Accepted: 09/24/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Pangenomics evolved since its first applications on bacteria, extending from the study of genes for a given population to the study of all of its sequences available. While multiple methods are being developed to construct pangenomes in eukaryotic species there is still a gap for efficient and user-friendly visualization tools. Emerging graph representations come with their own challenges, and linearity remains a suitable option for user-friendliness. Results We introduce Panache, a tool for the visualization and exploration of linear representations of gene-based and sequence-based pangenomes. It uses a layout similar to genome browsers to display presence absence variations and additional tracks along a linear axis with a pangenomics perspective. Availability and implementation Panache is available at github.com/SouthGreenPlatform/panache under the MIT License.
Collapse
Affiliation(s)
- Éloi Durant
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France.,Syngenta Seeds SAS, Saint-Sauveur, 31790, France.,Bioversity International, Parc Scientifique Agropolis II, Montpellier, 34397, France.,French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, 34398, France
| | - François Sabot
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France.,French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, 34398, France
| | | | - Mathieu Rouard
- Bioversity International, Parc Scientifique Agropolis II, Montpellier, 34397, France
| |
Collapse
|
4
|
Ferrés I, Iraola G. An object-oriented framework for evolutionary pangenome analysis. CELL REPORTS METHODS 2021; 1:100085. [PMID: 35474671 PMCID: PMC9017228 DOI: 10.1016/j.crmeth.2021.100085] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 06/04/2021] [Accepted: 08/25/2021] [Indexed: 05/13/2023]
Abstract
Pangenome analysis is fundamental to explore molecular evolution occurring in bacterial populations. Here, we introduce Pagoo, an R framework that enables straightforward handling of pangenome data. The encapsulated nature of Pagoo allows the storage of complex molecular and phenotypic information using an object-oriented approach. This facilitates to go back and forward to the data using a single programming environment and saving any stage of analysis (including the raw data) in a single file, making it sharable and reproducible. Pagoo provides tools to query, subset, compare, visualize, and perform statistical analyses, in concert with other microbial genomics packages available in the R ecosystem. As working examples, we used 1,000 Escherichia coli genomes to show that Pagoo is scalable, and a global dataset of Campylobacter fetus genomes to identify evolutionary patterns and genomic markers of host-adaptation in this pathogen.
Collapse
Affiliation(s)
- Ignacio Ferrés
- Microbial Genomics Laboratory, Institut Pasteur Montevideo, Montevideo, Uruguay
- Center for Innovation in Epidemiological Surveillance, Institut Pasteur Montevideo, Montevideo, Uruguay
| | - Gregorio Iraola
- Microbial Genomics Laboratory, Institut Pasteur Montevideo, Montevideo, Uruguay
- Center for Innovation in Epidemiological Surveillance, Institut Pasteur Montevideo, Montevideo, Uruguay
- Wellcome Sanger Institute, Hinxton, UK
- Center for Integrative Biology, Universidad Mayor, Santiago de Chile, Chile
| |
Collapse
|
5
|
Razzaq A, Kaur P, Akhter N, Wani SH, Saleem F. Next-Generation Breeding Strategies for Climate-Ready Crops. FRONTIERS IN PLANT SCIENCE 2021; 12:620420. [PMID: 34367194 PMCID: PMC8336580 DOI: 10.3389/fpls.2021.620420] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 06/14/2021] [Indexed: 05/17/2023]
Abstract
Climate change is a threat to global food security due to the reduction of crop productivity around the globe. Food security is a matter of concern for stakeholders and policymakers as the global population is predicted to bypass 10 billion in the coming years. Crop improvement via modern breeding techniques along with efficient agronomic practices innovations in microbiome applications, and exploiting the natural variations in underutilized crops is an excellent way forward to fulfill future food requirements. In this review, we describe the next-generation breeding tools that can be used to increase crop production by developing climate-resilient superior genotypes to cope with the future challenges of global food security. Recent innovations in genomic-assisted breeding (GAB) strategies allow the construction of highly annotated crop pan-genomes to give a snapshot of the full landscape of genetic diversity (GD) and recapture the lost gene repertoire of a species. Pan-genomes provide new platforms to exploit these unique genes or genetic variation for optimizing breeding programs. The advent of next-generation clustered regularly interspaced short palindromic repeat/CRISPR-associated (CRISPR/Cas) systems, such as prime editing, base editing, and de nova domestication, has institutionalized the idea that genome editing is revamped for crop improvement. Also, the availability of versatile Cas orthologs, including Cas9, Cas12, Cas13, and Cas14, improved the editing efficiency. Now, the CRISPR/Cas systems have numerous applications in crop research and successfully edit the major crop to develop resistance against abiotic and biotic stress. By adopting high-throughput phenotyping approaches and big data analytics tools like artificial intelligence (AI) and machine learning (ML), agriculture is heading toward automation or digitalization. The integration of speed breeding with genomic and phenomic tools can allow rapid gene identifications and ultimately accelerate crop improvement programs. In addition, the integration of next-generation multidisciplinary breeding platforms can open exciting avenues to develop climate-ready crops toward global food security.
Collapse
Affiliation(s)
- Ali Razzaq
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| | - Parwinder Kaur
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA, Australia
| | - Naheed Akhter
- College of Allied Health Professional, Faculty of Medical Sciences, Government College University Faisalabad, Faisalabad, Pakistan
| | - Shabir Hussain Wani
- Mountain Research Center for Field Crops, Khudwani, Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Srinagar, India
| | - Fozia Saleem
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| |
Collapse
|
6
|
viromeBrowser: A Shiny App for Browsing Virome Sequencing Analysis Results. Viruses 2021; 13:v13030437. [PMID: 33803225 PMCID: PMC7999463 DOI: 10.3390/v13030437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/26/2021] [Accepted: 03/04/2021] [Indexed: 11/16/2022] Open
Abstract
Experiments in which complex virome sequencing data is generated remain difficult to explore and unpack for scientists without a background in data science. The processing of raw sequencing data by high throughput sequencing workflows usually results in contigs in FASTA format coupled to an annotation file linking the contigs to a reference sequence or taxonomic identifier. The next step is to compare the virome of different samples based on the metadata of the experimental setup and extract sequences of interest that can be used in subsequent analyses. The viromeBrowser is an application written in the opensource R shiny framework that was developed in collaboration with end-users and is focused on three common data analysis steps. First, the application allows interactive filtering of annotations by default or custom quality thresholds. Next, multiple samples can be visualized to facilitate comparison of contig annotations based on sample specific metadata values. Last, the application makes it easy for users to extract sequences of interest in FASTA format. With the interactive features in the viromeBrowser we aim to enable scientists without a data science background to compare and extract annotation data and sequences from virome sequencing analysis results.
Collapse
|
7
|
Anani H, Zgheib R, Hasni I, Raoult D, Fournier PE. Interest of bacterial pangenome analyses in clinical microbiology. Microb Pathog 2020; 149:104275. [PMID: 32562810 DOI: 10.1016/j.micpath.2020.104275] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 05/22/2020] [Accepted: 05/25/2020] [Indexed: 12/12/2022]
Abstract
Thanks to the progress and decreasing costs in genome sequencing technologies, more than 250,000 bacterial genomes are currently available in public databases, covering most, if not all, of the major human-associated phylogenetic groups of these microorganisms, pathogenic or not. In addition, for many of them, sequences from several strains of a given species are available, thus enabling to evaluate their genetic diversity and study their evolution. In addition, the significant cost reduction of bacterial whole genome sequencing as well as the rapid increase in the number of available bacterial genomes have prompted the development of pangenomic software tools. The study of bacterial pangenome has many applications in clinical microbiology. It can unveil the pathogenic potential and ability of bacteria to resist antimicrobials as well identify specific sequences and predict antigenic epitopes that allow molecular or serologic assays and vaccines to be designed. Bacterial pangenome constitutes a powerful method for understanding the history of human bacteria and relating these findings to diagnosis in clinical microbiology laboratories in order to optimize patient management.
Collapse
Affiliation(s)
- Hussein Anani
- Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditerranéennes (VITROME), Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France
| | - Rita Zgheib
- Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditerranéennes (VITROME), Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France
| | - Issam Hasni
- Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), UMR Microbes Evolution Phylogeny and Infections (MEPHI), Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France
| | - Didier Raoult
- Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), UMR Microbes Evolution Phylogeny and Infections (MEPHI), Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France; Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Pierre-Edouard Fournier
- Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditerranéennes (VITROME), Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France.
| |
Collapse
|
8
|
Wagner J, Chelaru F, Kancherla J, Paulson JN, Zhang A, Felix V, Mahurkar A, Elmqvist N, Corrada Bravo H. Metaviz: interactive statistical and visual analysis of metagenomic data. Nucleic Acids Res 2019. [PMID: 29529268 PMCID: PMC5887897 DOI: 10.1093/nar/gky136] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these studies are publicly available but significant effort is required for users to effectively organize, explore and integrate it, limiting the utility of these rich data resources. Effective integrative and interactive visual and statistical tools to analyze many metagenomic samples can greatly increase the value of these data for researchers. We present Metaviz, a tool for interactive exploratory data analysis of annotated microbiome taxonomic community profiles derived from marker gene or whole metagenome shotgun sequencing. Metaviz is uniquely designed to address the challenge of browsing the hierarchical structure of metagenomic data features while rendering visualizations of data values that are dynamically updated in response to user navigation. We use Metaviz to provide the UMD Metagenome Browser web service, allowing users to browse and explore data for more than 7000 microbiomes from published studies. Users can also deploy Metaviz as a web service, or use it to analyze data through the metavizr package to interoperate with state-of-the-art analysis tools available through Bioconductor. Metaviz is free and open source with the code, documentation and tutorials publicly accessible.
Collapse
Affiliation(s)
- Justin Wagner
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,University of Maryland Institute for Advanced Computer Studies, College Park, MD 20742, USA
| | - Florin Chelaru
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,University of Maryland Institute for Advanced Computer Studies, College Park, MD 20742, USA
| | - Jayaram Kancherla
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,University of Maryland Institute for Advanced Computer Studies, College Park, MD 20742, USA
| | - Joseph N Paulson
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA 02115, USA
| | - Alexander Zhang
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Victor Felix
- Institute for Genome Sciences, University of Maryland, Baltimore, MD 21201, USA
| | - Anup Mahurkar
- Institute for Genome Sciences, University of Maryland, Baltimore, MD 21201, USA
| | - Niklas Elmqvist
- University of Maryland Institute for Advanced Computer Studies, College Park, MD 20742, USA.,College of Information Studies, University of Maryland, College Park, MD 20742, USA.,Human-Computer Interaction Lab, University of Maryland, College Park, MD 20742, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,University of Maryland Institute for Advanced Computer Studies, College Park, MD 20742, USA
| |
Collapse
|
9
|
Peng Y, Tang S, Wang D, Zhong H, Jia H, Cai X, Zhang Z, Xiao M, Yang H, Wang J, Kristiansen K, Xu X, Li J. MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks. Gigascience 2018; 7:5114262. [PMID: 30277499 PMCID: PMC6251982 DOI: 10.1093/gigascience/giy121] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 09/20/2018] [Indexed: 02/01/2023] Open
Abstract
Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here, we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome networks from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa.
Collapse
Affiliation(s)
- Ye Peng
- School of Biology and Biological Engineering, South China University of Technology, Building B6, 382 Zhonghuan Road East, Guangzhou Higher Education Mega Center, Guangzhou 510006, China.,BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Shanmei Tang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| | - Dan Wang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| | - Huanzi Zhong
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen Biocenter, Ole MaalØes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Huijue Jia
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| | - Xianghang Cai
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Zhaoxi Zhang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Minfeng Xiao
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Huanming Yang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,James D. Watson Institute of Genome Sciences, No. 51, Zhijiang Road, Xihu District, Hangzhou 310058, China
| | - Jian Wang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,James D. Watson Institute of Genome Sciences, No. 51, Zhijiang Road, Xihu District, Hangzhou 310058, China
| | - Karsten Kristiansen
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen Biocenter, Ole MaalØes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Xun Xu
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Junhua Li
- School of Biology and Biological Engineering, South China University of Technology, Building B6, 382 Zhonghuan Road East, Guangzhou Higher Education Mega Center, Guangzhou 510006, China.,BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China.,China National GeneBank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Shenzhen Key Laboratory of Human commensal microorganisms and Health Research, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian, Shenzhen 518083, China
| |
Collapse
|
10
|
Clarke TH, Brinkac LM, Inman JM, Sutton G, Fouts DE. PanACEA: a bioinformatics tool for the exploration and visualization of bacterial pan-chromosomes. BMC Bioinformatics 2018; 19:246. [PMID: 29945570 PMCID: PMC6020400 DOI: 10.1186/s12859-018-2250-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 06/14/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to locate and analyze these regions. Multiple software packages are available to visualize pan-genomes, but currently their ability to address these concerns are limited by using only pre-computed data sets, prioritizing core over variable gene clusters, or by not accounting for pan-chromosome positioning in the viewer. RESULTS We introduce PanACEA (Pan-genome Atlas with Chromosome Explorer and Analyzer), which utilizes locally-computed interactive web-pages to view ordered pan-genome data. It consists of multi-tiered, hierarchical display pages that extend from pan-chromosomes to both core and variable regions to single genes. Regions and genes are functionally annotated to allow for rapid searching and visual identification of regions of interest with the option that user-supplied genomic phylogenies and metadata can be incorporated. PanACEA's memory and time requirements are within the capacities of standard laptops. The capability of PanACEA as a research tool is demonstrated by highlighting a variable region important in differentiating strains of Enterobacter hormaechei. CONCLUSIONS PanACEA can rapidly translate the results of pan-chromosome programs into an intuitive and interactive visual representation. It will empower researchers to visually explore and identify regions of the pan-chromosome that are most biologically interesting, and to obtain publication quality images of these regions.
Collapse
Affiliation(s)
| | - Lauren M Brinkac
- J. Craig Venter Institute, Rockville, MD, 20850, USA
- Department of Biotechnology and Food Technology, Durban University of Technology, Durban, 4000, South Africa
| | - Jason M Inman
- J. Craig Venter Institute, Rockville, MD, 20850, USA
| | | | | |
Collapse
|
11
|
Pedersen TL. Hierarchical sets: analyzing pangenome structure through scalable set visualizations. Bioinformatics 2018; 33:1604-1612. [PMID: 28130242 PMCID: PMC5447240 DOI: 10.1093/bioinformatics/btx034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 01/24/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome analysis. Results We present a new hierarchical clustering algorithm based on set arithmetics that optimizes the intersection sizes along the branches. The intersection and union sizes along the hierarchy are visualized using a composite dendrogram and icicle plot, which, in pangenome context, shows the evolution of pangenome and core size along the evolutionary hierarchy. Outlying elements, i.e. elements whose presence pattern do not correspond with the hierarchy, can be visualized using hierarchical edge bundles. When applied to pangenome data this plot shows putative horizontal gene transfers between the genomes and can highlight relationships between genomes that is not represented by the hierarchy. We illustrate the utility of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis. Availability and Implementation The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN (https://cran.r-project.org/web/packages/hierarchicalSets) Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Lin Pedersen
- Department of Systems Biology, Center for Biological Sequence Analysis, The Technical University of Denmark, Building 208, Lyngby, Denmark.,Assays, Culture and Enzymes Division, Chr. Hansen A/S, Hørsholm, Denmark
| |
Collapse
|