1
|
Ritsch M, Cassman NA, Saghaei S, Marz M. Navigating the Landscape: A Comprehensive Review of Current Virus Databases. Viruses 2023; 15:1834. [PMID: 37766241 PMCID: PMC10537806 DOI: 10.3390/v15091834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/18/2023] [Accepted: 08/21/2023] [Indexed: 09/29/2023] Open
Abstract
Viruses are abundant and diverse entities that have important roles in public health, ecology, and agriculture. The identification and surveillance of viruses rely on an understanding of their genome organization, sequences, and replication strategy. Despite technological advancements in sequencing methods, our current understanding of virus diversity remains incomplete, highlighting the need to explore undiscovered viruses. Virus databases play a crucial role in providing access to sequences, annotations and other metadata, and analysis tools for studying viruses. However, there has not been a comprehensive review of virus databases in the last five years. This study aimed to fill this gap by identifying 24 active virus databases and included an extensive evaluation of their content, functionality and compliance with the FAIR principles. In this study, we thoroughly assessed the search capabilities of five database catalogs, which serve as comprehensive repositories housing a diverse array of databases and offering essential metadata. Moreover, we conducted a comprehensive review of different types of errors, encompassing taxonomy, names, missing information, sequences, sequence orientation, and chimeric sequences, with the intention of empowering users to effectively tackle these challenges. We expect this review to aid users in selecting suitable virus databases and other resources, and to help databases in error management and improve their adherence to the FAIR principles. The databases listed here represent the current knowledge of viruses and will help aid users find databases of interest based on content, functionality, and scope. The use of virus databases is integral to gaining new insights into the biology, evolution, and transmission of viruses, and developing new strategies to manage virus outbreaks and preserve global health.
Collapse
Affiliation(s)
- Muriel Ritsch
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
| | - Noriko A. Cassman
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
| | - Shahram Saghaei
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany
- FLI Leibniz Institute for Age Research, 07745 Jena, Germany
| |
Collapse
|
2
|
Chindelevitch L, van Dongen M, Graz H, Pedrotta A, Suresh A, Uplekar S, Jauneikaite E, Wheeler N. Ten simple rules for the sharing of bacterial genotype-Phenotype data on antimicrobial resistance. PLoS Comput Biol 2023; 19:e1011129. [PMID: 37347768 PMCID: PMC10286994 DOI: 10.1371/journal.pcbi.1011129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023] Open
Abstract
The increasing availability of high-throughput sequencing (frequently termed next-generation sequencing (NGS)) data has created opportunities to gain deeper insights into the mechanisms of a number of diseases and is already impacting many areas of medicine and public health. The area of infectious diseases stands somewhat apart from other human diseases insofar as the relevant genomic data comes from the microbes rather than their human hosts. A particular concern about the threat of antimicrobial resistance (AMR) has driven the collection and reporting of large-scale datasets containing information from microbial genomes together with antimicrobial susceptibility test (AST) results. Unfortunately, the lack of clear standards or guiding principles for the reporting of such data is hampering the field's advancement. We therefore present our recommendations for the publication and sharing of genotype and phenotype data on AMR, in the form of 10 simple rules. The adoption of these recommendations will enhance AMR data interoperability and help enable its large-scale analyses using computational biology tools, including mathematical modelling and machine learning. We hope that these rules can shed light on often overlooked but nonetheless very necessary aspects of AMR data sharing and enhance the field's ability to address the problems of understanding AMR mechanisms, tracking their emergence and spread in populations, and predicting microbial susceptibility to antimicrobials for diagnostic purposes.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, United Kingdom
| | | | | | | | - Anita Suresh
- FIND, the global alliance for diagnostics, Geneva, Switzerland
| | - Swapna Uplekar
- FIND, the global alliance for diagnostics, Geneva, Switzerland
| | - Elita Jauneikaite
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, United Kingdom
- NIHR HPRU in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College, London, England, United Kingdom
| | - Nicole Wheeler
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, England, United Kingdom
| |
Collapse
|
3
|
Mukherjee S, Stamatis D, Li C, Ovchinnikova G, Bertsch J, Sundaramurthi J, Kandimalla M, Nicolopoulos P, Favognano A, Chen IM, Kyrpides N, Reddy TBK. Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9. Nucleic Acids Res 2023; 51:D957-D963. [PMID: 36318257 PMCID: PMC9825498 DOI: 10.1093/nar/gkac974] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 10/05/2022] [Accepted: 10/16/2022] [Indexed: 01/09/2023] Open
Abstract
The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute (DOE-JGI) continues to maintain its role as one of the flagship genomic metadata repositories of the world. The ever-increasing number of projects and metadata are freely available to the user community world-wide. GOLD's metadata is consumed by scientists and remains an important source for large-scale comparative genomics analysis initiatives. Encouraged by this active user engagement and growth, GOLD has continued to add new components and capabilities. The new features such as a public Application Programming Interface (API) and Ecosystem landing page as well as the growth of different entities in this current GOLD v.9 edition are described in detail in this manuscript.
Collapse
Affiliation(s)
- Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Dimitri Stamatis
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Cindy Tianqing Li
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Galina Ovchinnikova
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jon Bertsch
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Mahathi Kandimalla
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Paul A Nicolopoulos
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Alessandro Favognano
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - I-Min A Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
4
|
Arita M, Karsch-Mizrachi I, Cochrane G. The international nucleotide sequence database collaboration. Nucleic Acids Res 2021; 49:D121-D124. [PMID: 33166387 PMCID: PMC7778961 DOI: 10.1093/nar/gkaa967] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/07/2020] [Accepted: 10/09/2020] [Indexed: 12/20/2022] Open
Abstract
The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data and metadata for >30 years. Three partner organizations, the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have been collaboratively maintaining the INSDC for the benefit of not only science but all types of community worldwide.
Collapse
Affiliation(s)
- Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Ilene Karsch-Mizrachi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
5
|
Chen IMA, Chu K, Palaniappan K, Ratner A, Huang J, Huntemann M, Hajek P, Ritter S, Varghese N, Seshadri R, Roux S, Woyke T, Eloe-Fadrosh EA, Ivanova NN, Kyrpides N. The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities. Nucleic Acids Res 2021; 49:D751-D763. [PMID: 33119741 PMCID: PMC7778900 DOI: 10.1093/nar/gkaa939] [Citation(s) in RCA: 266] [Impact Index Per Article: 88.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 10/04/2020] [Accepted: 10/07/2020] [Indexed: 12/22/2022] Open
Abstract
The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) contains annotated isolate genome and metagenome datasets sequenced at the DOE's Joint Genome Institute (JGI), submitted by external users, or imported from public sources such as NCBI. IMG v 6.0 includes advanced search functions and a new tool for statistical analysis of mixed sets of genomes and metagenome bins. The new IMG web user interface also has a new Help page with additional documentation and webinar tutorials to help users better understand how to use various IMG functions and tools for their research. New datasets have been processed with the prokaryotic annotation pipeline v.5, which includes extended protein family assignments.
Collapse
Affiliation(s)
- I-Min A Chen
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Ken Chu
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Krishnaveni Palaniappan
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Anna Ratner
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Jinghua Huang
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Marcel Huntemann
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Patrick Hajek
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Stephan Ritter
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Neha Varghese
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Rekha Seshadri
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Simon Roux
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Tanja Woyke
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Emiley A Eloe-Fadrosh
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Natalia N Ivanova
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| |
Collapse
|
6
|
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Sundaramurthi J, Lee J, Kandimalla M, Chen IMA, Kyrpides NC, Reddy TBK. Genomes OnLine Database (GOLD) v.8: overview and updates. Nucleic Acids Res 2021; 49:D723-D733. [PMID: 33152092 PMCID: PMC7778979 DOI: 10.1093/nar/gkaa983] [Citation(s) in RCA: 109] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/08/2020] [Accepted: 10/19/2020] [Indexed: 12/28/2022] Open
Abstract
The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) is a manually curated, daily updated collection of genome projects and their metadata accumulated from around the world. The current version of the database includes over 1.17 million entries organized broadly into Studies (45 770), Organisms (387 382) or Biosamples (101 207), Sequencing Projects (355 364) and Analysis Projects (283 481). These four levels contain over 600 metadata fields, which includes 76 controlled vocabulary (CV) tables containing 3873 terms. GOLD provides an interactive web user interface for browsing and searching by a wide range of project and metadata fields. Users can enter details about their own projects in GOLD, which acts as a gatekeeper to ensure that metadata is accurately documented before submitting sequence information to the Integrated Microbial Genomes (IMG) system for analysis. In order to maintain a reference dataset for use by members of the scientific community, GOLD also imports projects from public repositories such as GenBank and SRA. The current status of the database, along with recent updates and improvements are described in this manuscript.
Collapse
Affiliation(s)
- Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Dimitri Stamatis
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jon Bertsch
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Galina Ovchinnikova
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Janey Lee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Mahathi Kandimalla
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - I-Min A Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
7
|
Muturi SM, Muthui LW, Njogu PM, Onguso JM, Wachira FN, Opiyo SO, Pelle R. Metagenomics survey unravels diversity of biogas microbiomes with potential to enhance productivity in Kenya. PLoS One 2021; 16:e0244755. [PMID: 33395690 PMCID: PMC7781671 DOI: 10.1371/journal.pone.0244755] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 12/16/2020] [Indexed: 12/27/2022] Open
Abstract
The obstacle to optimal utilization of biogas technology is poor understanding of biogas microbiomes diversities over a wide geographical coverage. We performed random shotgun sequencing on twelve environmental samples. Randomized complete block design was utilized to assign the twelve treatments to four blocks, within eastern and central regions of Kenya. We obtained 42 million paired-end reads that were annotated against sixteen reference databases using two ENVO ontologies, prior to β-diversity studies. We identified 37 phyla, 65 classes and 132 orders. Bacteria dominated and comprised 28 phyla, 42 classes and 92 orders, conveying substrate's versatility in the treatments. Though, Fungi and Archaea comprised 5 phyla, the Fungi were richer; suggesting the importance of hydrolysis and fermentation in biogas production. High β-diversity within the taxa was largely linked to communities' metabolic capabilities. Clostridiales and Bacteroidales, the most prevalent guilds, metabolize organic macromolecules. The identified Cytophagales, Alteromonadales, Flavobacteriales, Fusobacteriales, Deferribacterales, Elusimicrobiales, Chlamydiales, Synergistales to mention but few, also catabolize macromolecules into smaller substrates to conserve energy. Furthermore, δ-Proteobacteria, Gloeobacteria and Clostridia affiliates syntrophically regulate PH2 and reduce metal to provide reducing equivalents. Methanomicrobiales and other Methanomicrobia species were the most prevalence Archaea, converting formate, CO2(g), acetate and methylated substrates into CH4(g). Thermococci, Thermoplasmata and Thermoprotei were among the sulfur and other metal reducing Archaea that contributed to redox balancing and other metabolism within treatments. Eukaryotes, mainly fungi were the least abundant guild, comprising largely Ascomycota and Basidiomycota species. Chytridiomycetes, Blastocladiomycetes and Mortierellomycetes were among the rare species, suggesting their metabolic and substrates limitations. Generally, we observed that environmental and treatment perturbations influenced communities' abundance, β-diversity and reactor performance largely through stochastic effect. Understanding diversity of biogas microbiomes over wide environmental variables and its' productivity provided insights into better management strategies that ameliorate biochemical limitations to effective biogas production.
Collapse
Affiliation(s)
- Samuel Mwangangi Muturi
- Department of Biological Sciences, University of Eldoret, Eldoret, Kenya
- Institute for Bioteschnology Research, Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya
| | - Lucy Wangui Muthui
- Biosciences Eastern and Central Africa—International Livestock Research Institute (BecA-ILRI) Hub, Nairobi, Kenya
| | - Paul Mwangi Njogu
- Institute for Energy and Environmental Technology, Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya
| | - Justus Mong’are Onguso
- Institute for Bioteschnology Research, Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya
| | | | - Stephen Obol Opiyo
- OARDC, Molecular and Cellular Imaging Center-Columbus, Ohio State University, Columbus, Ohio, United States of America
- The University of Sacread Heart, Gulu, Uganda
| | - Roger Pelle
- Biosciences Eastern and Central Africa—International Livestock Research Institute (BecA-ILRI) Hub, Nairobi, Kenya
| |
Collapse
|
8
|
Chen IMA, Chu K, Palaniappan K, Pillay M, Ratner A, Huang J, Huntemann M, Varghese N, White JR, Seshadri R, Smirnova T, Kirton E, Jungbluth SP, Woyke T, Eloe-Fadrosh EA, Ivanova NN, Kyrpides NC. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 2020; 47:D666-D677. [PMID: 30289528 PMCID: PMC6323987 DOI: 10.1093/nar/gky901] [Citation(s) in RCA: 556] [Impact Index Per Article: 139.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 09/24/2018] [Indexed: 11/12/2022] Open
Abstract
The Integrated Microbial Genomes & Microbiomes system v.5.0 (IMG/M: https://img.jgi.doe.gov/m/) contains annotated datasets categorized into: archaea, bacteria, eukarya, plasmids, viruses, genome fragments, metagenomes, cell enrichments, single particle sorts, and metatranscriptomes. Source datasets include those generated by the DOE's Joint Genome Institute (JGI), submitted by external scientists, or collected from public sequence data archives such as NCBI. All submissions are typically processed through the IMG annotation pipeline and then loaded into the IMG data warehouse. IMG's web user interface provides a variety of analytical and visualization tools for comparative analysis of isolate genomes and metagenomes in IMG. IMG/M allows open access to all public genomes in the IMG data warehouse, while its expert review (ER) system (IMG/MER: https://img.jgi.doe.gov/mer/) allows registered users to access their private genomes and to store their private datasets in workspace for sharing and for further analysis. IMG/M data content has grown by 60% since the last report published in the 2017 NAR Database Issue. IMG/M v.5.0 has a new and more powerful genome search feature, new statistical tools, and supports metagenome binning.
Collapse
Affiliation(s)
- I-Min A Chen
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Ken Chu
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Krishna Palaniappan
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Manoj Pillay
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Anna Ratner
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Jinghua Huang
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Marcel Huntemann
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Neha Varghese
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | | | - Rekha Seshadri
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Tatyana Smirnova
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Edward Kirton
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Sean P Jungbluth
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Emiley A Eloe-Fadrosh
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Natalia N Ivanova
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| |
Collapse
|
9
|
Gonzalez-Beltran AN, Masuzzo P, Ampe C, Bakker GJ, Besson S, Eibl RH, Friedl P, Gunzer M, Kittisopikul M, Dévédec SEL, Leo S, Moore J, Paran Y, Prilusky J, Rocca-Serra P, Roudot P, Schuster M, Sergeant G, Strömblad S, Swedlow JR, van Erp M, Van Troys M, Zaritsky A, Sansone SA, Martens L. Community standards for open cell migration data. Gigascience 2020; 9:giaa041. [PMID: 32396199 PMCID: PMC7317087 DOI: 10.1093/gigascience/giaa041] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 04/02/2020] [Accepted: 04/02/2020] [Indexed: 01/08/2023] Open
Abstract
Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.
Collapse
Affiliation(s)
- Alejandra N Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, Oxford, UK
| | - Paola Masuzzo
- VIB-UGent Center for Medical Biotechnology, VIB, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
- Institute for Globally Distributed Open Research and Education (IGDORE), Kabupaten Gianyar, Bali 80571, Indonesia
| | - Christophe Ampe
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Gert-Jan Bakker
- Department of Cell Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands
| | - Sébastien Besson
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
| | - Robert H Eibl
- German Cancer Research Center, DKFZ Alumni Association, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Peter Friedl
- Department of Cell Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands
- David H. Koch Center for Applied Genitourinary Medicine, UT MD Anderson Cancer Center, 6767 Bertner Ave, Mitchell Basic Science Research Building, 77030 Houston, TX, USA
- Cancer Genomics Center, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
| | - Matthias Gunzer
- Institute for Experimental Immunology and Imaging, University Hospital, University Duisburg-Essen, Universitätsstr. 2, 45141 Essen, Germany
- Leibniz Institute for Analytical Sciences, ISAS, Bunsen-Kirchhoff-Straße 11, 44139 Dortmund, Germany
| | - Mark Kittisopikul
- Department of Biophysics, UT Southwestern Medical Center, 5323 Harry Hines Blvd. Dallas, TX 75390, USA
- Department of Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, 303 E. Chicago Ave, Chicago, IL 60611, USA
| | - Sylvia E Le Dévédec
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, PO box 9502 2300 RA Leiden, The Netherlands
| | - Simone Leo
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Loc. Piscina Manna, Edificio 1, 09050 Pula (CA) , Italy
| | - Josh Moore
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
| | - Yael Paran
- IDEA Bio-Medical Ltd, 2 Prof. Bergman St., Rehovot 76705, Israel
| | - Jaime Prilusky
- Life Science Core Facilities, Weizmann Institute of Science, P.O. Box 26 Rehovot 76100, Israel
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, Oxford, UK
| | - Philippe Roudot
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, 5323 Harry Hines Blvd. Dallas, TX 75390, USA
| | - Marc Schuster
- Institute for Experimental Immunology and Imaging, University Hospital, University Duisburg-Essen, Universitätsstr. 2, 45141 Essen, Germany
| | - Gwendolien Sergeant
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Staffan Strömblad
- Department of Biosciences and Nutrition, Karolinska Institutet, Neo, SE-141 83 Huddinge, Sweden
| | - Jason R Swedlow
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
| | - Merijn van Erp
- Department of Cell Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands
| | - Marleen Van Troys
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Assaf Zaritsky
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, P.O.B. 653, 8410501 Beer-Sheva, Israel
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, Oxford, UK
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
10
|
Meyer F, Bagchi S, Chaterji S, Gerlach W, Grama A, Harrison T, Paczian T, Trimble WL, Wilke A. MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis. Brief Bioinform 2020; 20:1151-1159. [PMID: 29028869 DOI: 10.1093/bib/bbx105] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/21/2017] [Indexed: 11/12/2022] Open
Abstract
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.
Collapse
|
11
|
Karsch-Mizrachi I, Takagi T, Cochrane G. The international nucleotide sequence database collaboration. Nucleic Acids Res 2019; 46:D48-D51. [PMID: 29190397 PMCID: PMC5753279 DOI: 10.1093/nar/gkx1097] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 10/25/2017] [Indexed: 11/14/2022] Open
Abstract
For more than 30 years, the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been committed to capturing, preserving and providing access to comprehensive public domain nucleotide sequence and associated metadata which enables discovery in biomedicine, biodiversity and biological sciences. Since 1987, the DNA Data Bank of Japan (DDBJ) at the National Institute for Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have worked collaboratively to enable access to nucleotide sequence data in standardized formats for the worldwide scientific community. In this article, we reiterate the principles of the INSDC collaboration and briefly summarize the trends of the archival content.
Collapse
Affiliation(s)
- Ilene Karsch-Mizrachi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | |
Collapse
|
12
|
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Katta HY, Mojica A, Chen IMA, Kyrpides NC, Reddy TBK. Genomes OnLine database (GOLD) v.7: updates and new features. Nucleic Acids Res 2019; 47:D649-D659. [PMID: 30357420 PMCID: PMC6323969 DOI: 10.1093/nar/gky977] [Citation(s) in RCA: 129] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/04/2018] [Accepted: 10/08/2018] [Indexed: 12/11/2022] Open
Abstract
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is an open online resource, which maintains an up-to-date catalog of genome and metagenome projects in the context of a comprehensive list of associated metadata. Information in GOLD is organized into four levels: Study, Biosample/Organism, Sequencing Project and Analysis Project. Currently GOLD hosts information on 33 415 Studies, 49 826 Biosamples, 313 324 Organisms, 215 881 Sequencing Projects and 174 454 Analysis Projects with a total of 541 metadata fields, of which 80 are based on controlled vocabulary (CV) terms. GOLD provides a user-friendly web interface to browse sequencing projects and launch advanced search tools across four classification levels. Users submit metadata on a wide range of Sequencing and Analysis Projects in GOLD before depositing sequence data to the Integrated Microbial Genomes (IMG) system for analysis. GOLD conforms with and supports the rules set by the Genomic Standards Consortium (GSC) Minimum Information standards. The current version of GOLD (v.7) has seen the number of projects and associated metadata increase exponentially over the years. This paper provides an update on the current status of GOLD and highlights the new features added over the last two years.
Collapse
Affiliation(s)
- Supratim Mukherjee
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Dimitri Stamatis
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Jon Bertsch
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Galina Ovchinnikova
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Hema Y Katta
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Alejandro Mojica
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - I-Min A Chen
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Nikos C Kyrpides
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - TBK Reddy
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| |
Collapse
|
13
|
Meisel JS, Nasko DJ, Brubach B, Cepeda-Espinoza V, Chopyk J, Corrada-Bravo H, Fedarko M, Ghurye J, Javkar K, Olson ND, Shah N, Allard SM, Bazinet AL, Bergman NH, Brown A, Caporaso JG, Conlan S, DiRuggiero J, Forry SP, Hasan NA, Kralj J, Luethy PM, Milton DK, Ondov BD, Preheim S, Ratnayake S, Rogers SM, Rosovitz MJ, Sakowski EG, Schliebs NO, Sommer DD, Ternus KL, Uritskiy G, Zhang SX, Pop M, Treangen TJ. Current progress and future opportunities in applications of bioinformatics for biodefense and pathogen detection: report from the Winter Mid-Atlantic Microbiome Meet-up, College Park, MD, January 10, 2018. MICROBIOME 2018; 6:197. [PMID: 30396371 PMCID: PMC6219074 DOI: 10.1186/s40168-018-0582-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 10/18/2018] [Indexed: 06/08/2023]
Abstract
The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.
Collapse
Affiliation(s)
- Jacquelyn S Meisel
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Daniel J Nasko
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Brian Brubach
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Victoria Cepeda-Espinoza
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Jessica Chopyk
- School of Public Health, University of Maryland, College Park, College Park, MD, USA
| | - Héctor Corrada-Bravo
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Marcus Fedarko
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Jay Ghurye
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Kiran Javkar
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Nathan D Olson
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nidhi Shah
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Sarah M Allard
- School of Public Health, University of Maryland, College Park, College Park, MD, USA
| | - Adam L Bazinet
- National Biodefense Analysis and Countermeasures Center, Frederick, MD, USA
| | - Nicholas H Bergman
- National Biodefense Analysis and Countermeasures Center, Frederick, MD, USA
| | - Alexis Brown
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - J Gregory Caporaso
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Sean Conlan
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Samuel P Forry
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nur A Hasan
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
- CosmosID, Inc., Rockville, MD, USA
| | - Jason Kralj
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Paul M Luethy
- Department of Pathology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Donald K Milton
- Maryland Institute for Applied Environmental Health, School of Public Health, University of Maryland, College Park, College Park, MD, USA
| | - Brian D Ondov
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sarah Preheim
- Environmental Health and Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | | | - M J Rosovitz
- National Biodefense Analysis and Countermeasures Center, Frederick, MD, USA
| | - Eric G Sakowski
- Environmental Health and Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Daniel D Sommer
- National Biodefense Analysis and Countermeasures Center, Frederick, MD, USA
| | | | - Gherman Uritskiy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Sean X Zhang
- Division of Medical Microbiology, Department of Pathology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Todd J Treangen
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA.
- Present address: Department of Computer Science - MS-132, Rice University, P.O. Box 1892, Houston, TX, 77005-1892, USA.
| |
Collapse
|
14
|
Gauthier J, Vincent AT, Charette SJ, Derome N. A brief history of bioinformatics. Brief Bioinform 2018; 20:1981-1996. [DOI: 10.1093/bib/bby063] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 06/22/2018] [Indexed: 02/06/2023] Open
Abstract
AbstractIt is easy for today’s students and researchers to believe that modern bioinformatics emerged recently to assist next-generation sequencing data analysis. However, the very beginnings of bioinformatics occurred more than 50 years ago, when desktop computers were still a hypothesis and DNA could not yet be sequenced. The foundations of bioinformatics were laid in the early 1960s with the application of computational methods to protein sequence analysis (notably, de novo sequence assembly, biological sequence databases and substitution models). Later on, DNA analysis also emerged due to parallel advances in (i) molecular biology methods, which allowed easier manipulation of DNA, as well as its sequencing, and (ii) computer science, which saw the rise of increasingly miniaturized and more powerful computers, as well as novel software better suited to handle bioinformatics tasks. In the 1990s through the 2000s, major improvements in sequencing technology, along with reduced costs, gave rise to an exponential increase of data. The arrival of ‘Big Data’ has laid out new challenges in terms of data mining and management, calling for more expertise from computer science into the field. Coupled with an ever-increasing amount of bioinformatics tools, biological Big Data had (and continues to have) profound implications on the predictive power and reproducibility of bioinformatics results. To overcome this issue, universities are now fully integrating this discipline into the curriculum of biology students. Recent subdisciplines such as synthetic biology, systems biology and whole-cell modeling have emerged from the ever-increasing complementarity between computer science and biology.
Collapse
Affiliation(s)
- Jeff Gauthier
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, 1030, av. de la Médecine, Québec, Canada
| | - Antony T Vincent
- INRS-Institut Armand-Frappier, Bacterial Symbionts Evolution, 531 boul. des Prairies, Laval, QC, Canada
| | - Steve J Charette
- Centre de Recherche de l'Institut, Universitaire de Cardiologie et de Pneumologie de Québec (CRIUCPQ), 2725 Chemin Sainte-Foy, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-informatique, Université Laval, Québec, Canada
| | - Nicolas Derome
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, 1030, av. de la Médecine, Québec, Canada
| |
Collapse
|
15
|
De Anda V, Zapata-Peñasco I, Poot-Hernandez AC, Eguiarte LE, Contreras-Moreira B, Souza V. MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle. Gigascience 2018; 6:1-17. [PMID: 29069412 PMCID: PMC5737871 DOI: 10.1093/gigascience/gix096] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Accepted: 10/01/2017] [Indexed: 01/30/2023] Open
Abstract
The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.
Collapse
Affiliation(s)
- Valerie De Anda
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 70-275, Coyoacán 04510, D.F., México
| | - Icoquih Zapata-Peñasco
- Dirección de Investigación en Transformación de Hidrocarburos, Instituto Mexicano del Petróleo, Eje Central Lázaro Cárdenas, Norte 152, Col. San Bartolo Atepehuacan, 07730, México
| | - Augusto Cesar Poot-Hernandez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización. Sección de Ingeniería de Sistemas Computacionales. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas. Circuito Escolar 3000, Cd. Universitaria, 04510 Ciudad de México
| | - Luis E Eguiarte
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 70-275, Coyoacán 04510, D.F., México
| | - Bruno Contreras-Moreira
- Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC), Avda. Montañana, 1005, Zaragoza 50059, Spain.,Fundación ARAID, calle María de Luna 11, 50018 Zaragoza, Spain
| | - Valeria Souza
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 70-275, Coyoacán 04510, D.F., México
| |
Collapse
|
16
|
Yassin AF, Langenberg S, Huntemann M, Clum A, Pillay M, Palaniappan K, Varghese N, Mikhailova N, Mukherjee S, Reddy TBK, Daum C, Shapiro N, Ivanova N, Woyke T, Kyrpides NC. Draft genome sequence of Actinotignum schaalii DSM 15541T: Genetic insights into the lifestyle, cell fitness and virulence. PLoS One 2017; 12:e0188914. [PMID: 29216246 PMCID: PMC5720513 DOI: 10.1371/journal.pone.0188914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 11/15/2017] [Indexed: 11/19/2022] Open
Abstract
The permanent draft genome sequence of Actinotignum schaalii DSM 15541T is presented. The annotated genome includes 2,130,987 bp, with 1777 protein-coding and 58 rRNA-coding genes. Genome sequence analysis revealed absence of genes encoding for: components of the PTS systems, enzymes of the TCA cycle, glyoxylate shunt and gluconeogensis. Genomic data revealed that A. schaalii is able to oxidize carbohydrates via glycolysis, the nonoxidative pentose phosphate and the Entner-Doudoroff pathways. Besides, the genome harbors genes encoding for enzymes involved in the conversion of pyruvate to lactate, acetate and ethanol, which are found to be the end products of carbohydrate fermentation. The genome contained the gene encoding Type I fatty acid synthase required for de novo FAS biosynthesis. The plsY and plsX genes encoding the acyltransferases necessary for phosphatidic acid biosynthesis were absent from the genome. The genome harbors genes encoding enzymes responsible for isoprene biosynthesis via the mevalonate (MVA) pathway. Genes encoding enzymes that confer resistance to reactive oxygen species (ROS) were identified. In addition, A. schaalii harbors genes that protect the genome against viral infections. These include restriction-modification (RM) systems, type II toxin-antitoxin (TA), CRISPR-Cas and abortive infection system. A. schaalii genome also encodes several virulence factors that contribute to adhesion and internalization of this pathogen such as the tad genes encoding proteins required for pili assembly, the nanI gene encoding exo-alpha-sialidase, genes encoding heat shock proteins and genes encoding type VII secretion system. These features are consistent with anaerobic and pathogenic lifestyles. Finally, resistance to ciprofloxacin occurs by mutation in chromosomal genes that encode the subunits of DNA-gyrase (GyrA) and topisomerase IV (ParC) enzymes, while resistant to metronidazole was due to the frxA gene, which encodes NADPH-flavin oxidoreductase.
Collapse
Affiliation(s)
- Atteyet F. Yassin
- Institut für medizinische Mikrobiologie und Immunologie der Universität Bonn, Bonn, Germany
- * E-mail:
| | - Stefan Langenberg
- Klinik und Poliklinik für Hals-Nasen-Ohrenheilkunde/Chirurgie, Bonn, Germany
| | - Marcel Huntemann
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Alicia Clum
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Manoj Pillay
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Krishnaveni Palaniappan
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Neha Varghese
- Klinik und Poliklinik für Hals-Nasen-Ohrenheilkunde/Chirurgie, Bonn, Germany
| | - Natalia Mikhailova
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Supratim Mukherjee
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - T. B. K. Reddy
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Chris Daum
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Nicole Shapiro
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Tanja Woyke
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| | - Nikos C. Kyrpides
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, United States of America
| |
Collapse
|
17
|
Draft genome sequence of Marinobacterium rhizophilum CL-YJ9 T (DSM 18822 T), isolated from the rhizosphere of the coastal tidal-flat plant Suaeda japonica. Stand Genomic Sci 2017; 12:65. [PMID: 29093768 PMCID: PMC5663061 DOI: 10.1186/s40793-017-0275-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 09/25/2017] [Indexed: 11/16/2022] Open
Abstract
The genus Marinobacterium belongs to the family Alteromonadaceae within the class Gammaproteobacteria and was reported in 1997. Currently the genus Marinobacterium contains 16 species. Marinobacterium rhizophilum CL-YJ9T was isolated from sediment associated with the roots of a plant growing in a tidal flat of Youngjong Island, Korea. The genome of the strain CL-YJ9T was sequenced through the Genomic Encyclopedia of Type Strains, Phase I: KMG project. Here we report the main features of the draft genome of the strain. The 5,364,574 bp long draft genome consists of 58 scaffolds with 4762 protein-coding and 91 RNA genes. Based on the genomic analyses, the strain seems to adapt to osmotic changes by intracellular production as well as extracellular uptake of compatible solutes, such as ectoine and betaine. In addition, the strain has a number of genes to defense against oxygen stresses such as reactive oxygen species and hypoxia.
Collapse
|
18
|
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Verezemska O, Isbandi M, Thomas AD, Ali R, Sharma K, Kyrpides NC, Reddy TBK. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements. Nucleic Acids Res 2017; 45:D446-D456. [PMID: 27794040 PMCID: PMC5210664 DOI: 10.1093/nar/gkw992] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 10/11/2016] [Accepted: 10/19/2016] [Indexed: 01/28/2023] Open
Abstract
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years.
Collapse
Affiliation(s)
- Supratim Mukherjee
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Dimitri Stamatis
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Jon Bertsch
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Galina Ovchinnikova
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Olena Verezemska
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Michelle Isbandi
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Alex D Thomas
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Rida Ali
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Kaushal Sharma
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Nikos C Kyrpides
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - T B K Reddy
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| |
Collapse
|
19
|
Sinclair L, Ijaz UZ, Jensen LJ, Coolen MJL, Gubry-Rangin C, Chroňáková A, Oulas A, Pavloudi C, Schnetzer J, Weimann A, Ijaz A, Eiler A, Quince C, Pafilis E. Seqenv: linking sequences to environments through text mining. PeerJ 2016; 4:e2690. [PMID: 28028456 PMCID: PMC5178346 DOI: 10.7717/peerj.2690] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 10/14/2016] [Indexed: 11/24/2022] Open
Abstract
Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the “nt” nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
Collapse
Affiliation(s)
- Lucas Sinclair
- Department of Ecology and Genetics, Limnology, Uppsala University, Uppsala, Sweden
| | - Umer Z Ijaz
- Infrastructure and Environment Research Division, School of Engineering, University of Glasgow, Glasgow, United Kingdom
| | - Lars Juhl Jensen
- The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Marco J L Coolen
- Western Australia Organic and Isotope Geochemistry Centre (WA-OIGC), Department of Chemistry, Curtin University of Technology, Bentley, WA, Australia
| | - Cecile Gubry-Rangin
- Institute of Biological & Environmental Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Alica Chroňáková
- Institute of Soil Biology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Anastasis Oulas
- Bioinformatics Group, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.,Institute of Marine Biology Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece
| | - Julia Schnetzer
- Department of Molecular Ecology, Microbial Genomics and Bioinformatics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Aaron Weimann
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ali Ijaz
- Hawkesbury Institute for the Environment, University of Western Sydney, Hawkesbury, Sydney, Australia
| | - Alexander Eiler
- Department of Ecology and Genetics, Limnology, Uppsala University, Uppsala, Sweden
| | | | - Evangelos Pafilis
- Institute of Marine Biology Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece
| |
Collapse
|
20
|
Choi DH, Ahn C, Jang GI, Lapidus A, Han J, Reddy TBK, Huntemann M, Pati A, Ivanova N, Markowitz V, Rohde M, Tindall B, Göker M, Woyke T, Klenk HP, Kyrpides NC, Cho BC. High-quality draft genome sequence of Gracilimonas tropica CL-CB462(T) (DSM 19535(T)), isolated from a Synechococcus culture. Stand Genomic Sci 2015; 10:98. [PMID: 26566423 PMCID: PMC4642740 DOI: 10.1186/s40793-015-0088-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 10/23/2015] [Indexed: 02/02/2023] Open
Abstract
Gracilimonas tropica Choi et al. 2009 is a member of order Sphingobacteriales, class Sphingobacteriia. Three species of the genus Gracilimonas have been isolated from marine seawater or a salt mine and showed extremely halotolerant and mesophilic features, although close relatives are extremely halophilic or thermophilic. The type strain of the type species of Gracilimonas, G. tropica DSM19535T, was isolated from a Synechococcus culture which was established from the tropical sea-surface water of the Pacific Ocean. The genome of the strain DSM19535T was sequenced through the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project. Here, we describe the genomic features of the strain. The 3,831,242 bp long draft genome consists of 48 contigs with 3373 protein-coding and 53 RNA genes. The strain seems to adapt to phosphate limitation and requires amino acids from external environment. In addition, genomic analyses and pasteurization experiment suggested that G. tropica DSM19535T did not form spore.
Collapse
Affiliation(s)
- Dong Han Choi
- Biological Oceanography & Marine Biology Division, Korea Institute of Ocean Science and Technology, Ansan, 426-744 Republic of Korea
| | - Chisang Ahn
- Microbial Oceanography Laboratory, School of Earth and Environmental Sciences, and Research Institute of Oceanography, Seoul National University, Gwanak-ro, Gwanak-gu Seoul, 151-742 Republic of Korea
| | - Gwang Il Jang
- Microbial Oceanography Laboratory, School of Earth and Environmental Sciences, and Research Institute of Oceanography, Seoul National University, Gwanak-ro, Gwanak-gu Seoul, 151-742 Republic of Korea
| | - Alla Lapidus
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia ; Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - James Han
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - T B K Reddy
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Marcel Huntemann
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Amrita Pati
- Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Victor Markowitz
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA USA
| | - Manfred Rohde
- Central Facility for Microscopy, HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Brian Tindall
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Markus Göker
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Tanja Woyke
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Hans-Peter Klenk
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany ; School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA ; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Byung Cheol Cho
- Microbial Oceanography Laboratory, School of Earth and Environmental Sciences, and Research Institute of Oceanography, Seoul National University, Gwanak-ro, Gwanak-gu Seoul, 151-742 Republic of Korea
| |
Collapse
|
21
|
Markowitz VM, Chen IMA, Chu K, Pati A, Ivanova NN, Kyrpides NC. Ten Years of Maintaining and Expanding a Microbial Genome and Metagenome Analysis System. Trends Microbiol 2015; 23:730-741. [DOI: 10.1016/j.tim.2015.07.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 07/15/2015] [Accepted: 07/31/2015] [Indexed: 10/22/2022]
|
22
|
Yassin AF, Lapidus A, Han J, Reddy TBK, Huntemann M, Pati A, Ivanova N, Markowitz V, Woyke T, Klenk HP, Kyrpides NC. High quality draft genome sequence of Corynebacterium ulceribovis type strain IMMIB-L1395(T) (DSM 45146(T)). Stand Genomic Sci 2015; 10:50. [PMID: 26380638 PMCID: PMC4572677 DOI: 10.1186/s40793-015-0036-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 07/07/2015] [Indexed: 01/21/2023] Open
Abstract
Corynebacterium ulceribovis strain IMMIB L-1395(T) (= DSM 45146(T)) is an aerobic to facultative anaerobic, Gram-positive, non-spore-forming, non-motile rod-shaped bacterium that was isolated from the skin of the udder of a cow, in Schleswig Holstein, Germany. The cell wall of C. ulceribovis contains corynemycolic acids. The cellular fatty acids are those described for the genus Corynebacterium, but tuberculostearic acid is not present. Here we describe the features of C. ulceribovis strain IMMIB L-1395(T), together with genome sequence information and its annotation. The 2,300,451 bp long genome containing 2,104 protein-coding genes and 54 RNA-encoding genes and is part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.
Collapse
Affiliation(s)
- Atteyet F Yassin
- Institut für Medizinische Mikrobiologie und Immunologie der Universität Bonn, Bonn, Germany
| | - Alla Lapidus
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia ; Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - James Han
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - T B K Reddy
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Marcel Huntemann
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Amrita Pati
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Victor Markowitz
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California USA
| | - Tanja Woyke
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA
| | - Hans-Peter Klenk
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA USA ; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
23
|
Hahnke RL, Stackebrandt E, Meier-Kolthoff JP, Tindall BJ, Huang S, Rohde M, Lapidus A, Han J, Trong S, Haynes M, Reddy TBK, Huntemann M, Pati A, Ivanova NN, Mavromatis K, Markowitz V, Woyke T, Göker M, Kyrpides NC, Klenk HP. High quality draft genome sequence of Flavobacterium rivuli type strain WB 3.3-2(T) (DSM 21788(T)), a valuable source of polysaccharide decomposing enzymes. Stand Genomic Sci 2015; 10:46. [PMID: 26380634 PMCID: PMC4572689 DOI: 10.1186/s40793-015-0032-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 06/29/2015] [Indexed: 11/23/2022] Open
Abstract
Flavobacterium rivuli Ali et al. 2009 emend. Dong et al. 2013 is one of about 100 species in the genus Flavobacterium (family Flavobacteriacae, phylum Bacteroidetes) with a validly published name, and has been isolated from the spring of a hard water rivulet in Northern Germany. Including all type strains of the genus Myroides and Flavobacterium into the 16S rRNA gene sequence phylogeny revealed a clustering of members of the genus Myroides as a monophyletic group within the genus Flavobacterium. Furthermore, F. rivuli WB 3.3-2T and its next relatives seem more closely related to the genus Myroides than to the type species of the genus Flavobacterium, F. aquatile. The 4,489,248 bp long genome with its 3,391 protein-coding and 65 RNA genes is part of the GenomicEncyclopedia ofBacteria andArchaea project. The genome of F. rivuli has almost as many genes encoding carbohydrate active enzymes (151 CAZymes) as genes encoding peptidases (177). Peptidases comprised mostly metallo (M) and serine (S) peptidases. Among CAZymes, 30 glycoside hydrolase families, 10 glycosyl transferase families, 7 carbohydrate binding module families and 7 carbohydrate esterase families were identified. Furthermore, we found four polysaccharide utilization loci (PUL) and one large CAZy rich gene cluster that might enable strain WB 3.3-2T to decompose plant and algae derived polysaccharides. Based on these results we propose F. rivuli as an interesting candidate for further physiological studies and the role of Bacteroidetes in the decomposition of complex polymers in the environment.
Collapse
Affiliation(s)
- Richard L Hahnke
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, Braunschweig, Germany
| | - Erko Stackebrandt
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, Braunschweig, Germany
| | - Jan P Meier-Kolthoff
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, Braunschweig, Germany
| | - Brian J Tindall
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, Braunschweig, Germany
| | - Sixing Huang
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, Braunschweig, Germany
| | - Manfred Rohde
- Helmholtz Centre for Infection Research, Inhoffenstraße 7, Braunschweig, Germany
| | - Alla Lapidus
- St. Petersburg State University, St. Petersburg, Russia ; Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - James Han
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Stephan Trong
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Matthew Haynes
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | | | - Amrita Pati
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | | | | | - Victor Markowitz
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA USA
| | - Tanja Woyke
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Markus Göker
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, Braunschweig, Germany
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Walnut Creek, California, USA ; School of Biology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
24
|
Mukherjee S, Lapidus A, Shapiro N, Cheng JF, Han J, Reddy TBK, Huntemann M, Ivanova N, Mikhailova N, Chen A, Palaniappan K, Spring S, Göker M, Markowitz V, Woyke T, Tindall BJ, Klenk HP, Kyrpides NC, Pati A. High quality draft genome sequence and analysis of Pontibacter roseus type strain SRC-1(T) (DSM 17521(T)) isolated from muddy waters of a drainage system in Chandigarh, India. Stand Genomic Sci 2015; 10:8. [PMID: 26203325 PMCID: PMC4511580 DOI: 10.1186/1944-3277-10-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 11/24/2014] [Indexed: 12/21/2022] Open
Abstract
Pontibacter roseus is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1(T) along with its complete genome sequence and annotation from a culture of DSM 17521(T). The 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50 RNA genes and is a part of Genomic Encyclopedia of Type Strains: KMG-I project.
Collapse
Affiliation(s)
| | - Alla Lapidus
- T. Dobzhansky Center for Genome Bionformatics, St. Petersburg State University, St. Petersburg, Russia
- Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - Nicole Shapiro
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Jan-Fang Cheng
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - James Han
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - TBK Reddy
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | | | | | | | - Amy Chen
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Krishna Palaniappan
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Stefan Spring
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Markus Göker
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Victor Markowitz
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Tanja Woyke
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Brian J Tindall
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Hans-Peter Klenk
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Walnut Creek, California, USA
- King Abdulaziz University, Jeddah, Saudi Arabia
| | - Amrita Pati
- DOE Joint Genome Institute, Walnut Creek, California, USA
| |
Collapse
|