1
|
Author Correction: A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses. BMC Biol 2023; 21:261. [PMID: 37974169 PMCID: PMC10655412 DOI: 10.1186/s12915-023-01764-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023] Open
|
2
|
Microbial dynamics and bioreactor performance are interlinked with organic matter removal from wastewater treatment plant effluent. BIORESOURCE TECHNOLOGY 2023; 372:128659. [PMID: 36690219 DOI: 10.1016/j.biortech.2023.128659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/17/2023] [Accepted: 01/18/2023] [Indexed: 06/17/2023]
Abstract
Optimizing bioreactor performance for organic matter removal can achieve sustainable and energy-efficient micropollutant removal in subsequent tertiary treatment. Bioreactor performance heavily depends on its resident microbial community; hence, a deeper understanding of community dynamics is essential. The microbial communities of three different bioreactors (biological activated carbon, moving bed biofilm reactor, sand filter), used for organic matter removal from wastewater treatment effluent, were characterized by 16S rRNA gene amplicon sequence analysis. An interdependency between bioreactor performance and microbial community profile was observed. Overall, Proteobacteria was the most predominant phylum, and Comamonadaceae was the most predominant family in all bioreactors. The relative abundance of the genus Roseococcus was positively correlated with organic matter removal. A generalized Lotka-Volterra (gLV) model was established to understand the interactions in the microbial community. By identifying microbial dynamics and their role in bioreactors, a strategy can be developed to improve bioreactor performance.
Collapse
|
3
|
FAIR data station for lightweight metadata management and validation of omics studies. Gigascience 2022; 12:giad014. [PMID: 36879493 PMCID: PMC9989329 DOI: 10.1093/gigascience/giad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 01/19/2023] [Accepted: 02/21/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND The life sciences are one of the biggest suppliers of scientific data. Reusing and connecting these data can uncover hidden insights and lead to new concepts. Efficient reuse of these datasets is strongly promoted when they are interlinked with a sufficient amount of machine-actionable metadata. While the FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles have been accepted by all stakeholders, in practice, there are only a limited number of easy-to-adopt implementations available that fulfill the needs of data producers. FINDINGS We developed the FAIR Data Station, a lightweight application written in Java, that aims to support researchers in managing research metadata according to the FAIR principles. It implements the ISA metadata framework and uses minimal information metadata standards to capture experiment metadata. The FAIR Data Station consists of 3 modules. Based on the minimal information model(s) selected by the user, the "form generation module" creates a metadata template Excel workbook with a header row of machine-actionable attribute names. The Excel workbook is subsequently used by the data producer(s) as a familiar environment for sample metadata registration. At any point during this process, the format of the recorded values can be checked using the "validation module." Finally, the "resource module" can be used to convert the set of metadata recorded in the Excel workbook in RDF format, enabling (cross-project) (meta)data searches and, for publishing of sequence data, in an European Nucleotide Archive-compatible XML metadata file. CONCLUSIONS Turning FAIR into reality requires the availability of easy-to-adopt data FAIRification workflows that are also of direct use for data producers. As such, the FAIR Data Station provides, in addition to the means to correctly FAIRify (omics) data, the means to build searchable metadata databases of similar projects and can assist in ENA metadata submission of sequence data. The FAIR Data Station is available at https://fairbydesign.nl.
Collapse
|
4
|
SALARECON connects the Atlantic salmon genome to growth and feed efficiency. PLoS Comput Biol 2022; 18:e1010194. [PMID: 35687595 PMCID: PMC9223387 DOI: 10.1371/journal.pcbi.1010194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 06/23/2022] [Accepted: 05/10/2022] [Indexed: 11/19/2022] Open
Abstract
Atlantic salmon (Salmo salar) is the most valuable farmed fish globally and there is much interest in optimizing its genetics and rearing conditions for growth and feed efficiency. Marine feed ingredients must be replaced to meet global demand, with challenges for fish health and sustainability. Metabolic models can address this by connecting genomes to metabolism, which converts nutrients in the feed to energy and biomass, but such models are currently not available for major aquaculture species such as salmon. We present SALARECON, a model focusing on energy, amino acid, and nucleotide metabolism that links the Atlantic salmon genome to metabolic fluxes and growth. It performs well in standardized tests and captures expected metabolic (in)capabilities. We show that it can explain observed hypoxic growth in terms of metabolic fluxes and apply it to aquaculture by simulating growth with commercial feed ingredients. Predicted limiting amino acids and feed efficiencies agree with data, and the model suggests that marine feed efficiency can be achieved by supplementing a few amino acids to plant- and insect-based feeds. SALARECON is a high-quality model that makes it possible to simulate Atlantic salmon metabolism and growth. It can be used to explain Atlantic salmon physiology and address key challenges in aquaculture such as development of sustainable feeds. Atlantic salmon aquaculture generates billions of euros annually, but faces challenges of sustainability. Salmon are carnivores by nature, and fish oil and fish meal have become scarce resources in fish feed production. Novel, sustainable feedstuffs are being trialed hand in hand with studies of the genetics of growth and feed efficiency. This calls for a mathematical-biological framework to integrate data with understanding of the effects of novel feeds on salmon physiology and its interplay with genetics. We have developed the SALARECON model of the core salmon metabolic reaction network, linking its genome to metabolic fluxes and growth. Computational analyses show good agreement with observed growth, amino acid limitations, and feed efficiencies, illustrating the potential for in silico studies of potential feed mixtures. In particular, in silico screening of possible diets will enable more efficient animal experiments with improved knowledge gain. We have adopted best practices for test-driven development, virtual experiments to assay metabolic capabilities, revision control, and FAIR data and model management. This facilitates fast, collaborative, reliable development of the model for future applications in sustainable production biology.
Collapse
|
5
|
Machine learning approaches to predict the Plant-associated phenotype of Xanthomonas strains. BMC Genomics 2021; 22:848. [PMID: 34814827 PMCID: PMC8612006 DOI: 10.1186/s12864-021-08093-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding pathogenicity of individual strains over many studies. Unification of this information into a single resource was therefore considered to be an essential step. RESULTS Mining of 39 papers considering both plant-associated phenotypes, allowed for a phenotypic classification of 578 Xanthomonas strains. For 65 plant-pathogenic and 53 non-pathogenic strains the corresponding genomes were available and de novo annotated for the presence of Pfam protein domains used as features to train and compare three ML classification algorithms; CART, Lasso and Random Forest. CONCLUSION The literature resource in combination with recursive feature extraction used in the ML classification algorithms provided further insights into the virulence enabling factors, but also highlighted domains linked to traits not present in pathogenic strains.
Collapse
|
6
|
Genomic convergence between Akkermansia muciniphila in different mammalian hosts. BMC Microbiol 2021; 21:298. [PMID: 34715771 PMCID: PMC8555344 DOI: 10.1186/s12866-021-02360-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 10/12/2021] [Indexed: 02/07/2023] Open
Abstract
Background Akkermansia muciniphila is a member of the human gut microbiota where it resides in the mucus layer and uses mucin as the sole carbon, nitrogen and energy source. A. muciniphila is the only representative of the Verrucomicrobia phylum in the human gut. However, A. muciniphila 16S rRNA gene sequences have also been found in the intestines of many vertebrates. Results We detected A. muciniphila-like bacteria in the intestines of animals belonging to 15 out of 16 mammalian orders. In addition, other species belonging to the Verrucomicrobia phylum were detected in fecal samples. We isolated 10 new A. muciniphila strains from the feces of chimpanzee, siamang, mouse, pig, reindeer, horse and elephant. The physiology and genome of these strains were highly similar in comparison to the type strain A. muciniphila MucT. Overall, the genomes of the new strains showed high average nucleotide identity (93.9 to 99.7%). In these genomes, we detected considerable conservation of at least 75 of the 78 mucin degradation genes that were previously detected in the genome of the type strain MucT. Conclusions The low genomic divergence observed in the new strains may indicate that A. muciniphila favors mucosal colonization independent of the differences in hosts. In addition, the conserved mucus degradation capability points towards a similar beneficial role of the new strains in regulating host metabolic health. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-021-02360-6.
Collapse
|
7
|
A chromosome-level assembly of the black tiger shrimp (Penaeus monodon) genome facilitates the identification of growth-associated genes. Mol Ecol Resour 2021; 21:1620-1640. [PMID: 33586292 PMCID: PMC8197738 DOI: 10.1111/1755-0998.13357] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 01/31/2021] [Accepted: 02/10/2021] [Indexed: 12/13/2022]
Abstract
To salvage marine ecosystems from fishery overexploitation, sustainable and efficient aquaculture must be emphasized. The knowledge obtained from available genome sequence of marine organisms has accelerated marine aquaculture in many cases. The black tiger shrimp (Penaeus monodon) is one of the most prominent cultured penaeid shrimps (Crustacean) with an average annual global production of half a million tons in the last decade. However, its currently available genome assemblies lack the contiguity and completeness required for accurate genome annotation due to the highly repetitive nature of the genome and technical difficulty in extracting high-quality, high-molecular weight DNA. Here, we report the first chromosome-level whole-genome assembly of P. monodon. The combination of long-read Pacific Biosciences (PacBio) and long-range Chicago and Hi-C technologies enabled a successful assembly of this first high-quality genome sequence. The final assembly covered 2.39 Gb (92.3% of the estimated genome size) and contained 44 pseudomolecules, corresponding to the haploid chromosome number. Repetitive elements occupied a substantial portion of the assembly (62.5%), the highest of the figures reported among crustacean species. The availability of this high-quality genome assembly enabled the identification of genes associated with rapid growth in the black tiger shrimp through the comparison of hepatopancreas transcriptome of slow-growing and fast-growing shrimps. The results highlighted several growth-associated genes. Our high-quality genome assembly provides an invaluable resource for genetic improvement and breeding penaeid shrimp in aquaculture. The availability of P. monodon genome enables analyses of ecological impact, environment adaptation and evolution, as well as the role of the genome to protect the ecological resources by promoting sustainable shrimp farming.
Collapse
|
8
|
Galactocerebroside biosynthesis pathways of Mycoplasma species: an antigen triggering Guillain-Barré-Stohl syndrome. Microb Biotechnol 2021; 14:1201-1211. [PMID: 33773097 PMCID: PMC8085918 DOI: 10.1111/1751-7915.13794] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 02/22/2021] [Indexed: 12/18/2022] Open
Abstract
Infection by Mycoplasma pneumoniae has been identified as a preceding factor of Guillain-Barré-Stohl syndrome. The Guillain-Barré-Stohl syndrome is triggered by an immune reaction against the major glycolipids and it has been postulated that M. pneumoniae infection triggers this syndrome due to bacterial production of galactocerebroside. Here, we present an extensive comparison of 224 genome sequences from 104 Mycoplasma species to characterize the genetic determinants of galactocerebroside biosynthesis. Hidden Markov models were used to analyse glycosil transferases, leading to identification of a functional protein domain, termed M2000535 that appears in about a third of the studied genomes. This domain appears to be associated with a potential UDP-glucose epimerase, which converts UDP-glucose into UDP-galactose, a main substrate for the biosynthesis of galactocerebroside. These findings clarify the pathogenic mechanisms underlining the triggering of Guillain-Barré-Stohl syndrome by M. pneumoniae infections.
Collapse
|
9
|
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses. BMC Biol 2021; 19:12. [PMID: 33482803 PMCID: PMC7820539 DOI: 10.1186/s12915-020-00940-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 12/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
Collapse
|
10
|
A metabolic and physiological design study of Pseudomonas putida KT2440 capable of anaerobic respiration. BMC Microbiol 2021; 21:9. [PMID: 33407113 PMCID: PMC7789669 DOI: 10.1186/s12866-020-02058-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 12/02/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Pseudomonas putida KT2440 is a metabolically versatile, HV1-certified, genetically accessible, and thus interesting microbial chassis for biotechnological applications. However, its obligate aerobic nature hampers production of oxygen sensitive products and drives up costs in large scale fermentation. The inability to perform anaerobic fermentation has been attributed to insufficient ATP production and an inability to produce pyrimidines under these conditions. Addressing these bottlenecks enabled growth under micro-oxic conditions but does not lead to growth or survival under anoxic conditions. RESULTS Here, a data-driven approach was used to develop a rational design for a P. putida KT2440 derivative strain capable of anaerobic respiration. To come to the design, data derived from a genome comparison of 1628 Pseudomonas strains was combined with genome-scale metabolic modelling simulations and a transcriptome dataset of 47 samples representing 14 environmental conditions from the facultative anaerobe Pseudomonas aeruginosa. CONCLUSIONS The results indicate that the implementation of anaerobic respiration in P. putida KT2440 would require at least 49 additional genes of known function, at least 8 genes encoding proteins of unknown function, and 3 externally added vitamins.
Collapse
|
11
|
NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis. Front Genet 2020; 10:1366. [PMID: 32117417 PMCID: PMC6989550 DOI: 10.3389/fgene.2019.01366] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S ribosomal RNA (rRNA), eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. It can directly use single or merged reads, paired-end reads and unmerged paired-end reads from long range fragments as input to generate de novo amplicon sequence variants (ASV). Using the RDF data model, ASV's can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance, thereby achieving the level of interoperability required to utilize such data to its full potential. The graph database can be directly queried, allowing for comparative analyses of over thousands of samples and is connected with an interactive Rshiny toolbox for analysis and visualization of (meta) data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. The extended BIOM file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed, classification confidence scores and is backwards compatible. The performance of NG-Tax 2.0 was compared with DADA2, using the plugin in the QIIME 2 analysis pipeline. Fourteen 16S rRNA gene amplicon mock community samples were obtained from the literature and evaluated. Precision of NG-Tax 2.0 was significantly higher with an average of 0.95 vs 0.58 for QIIME2-DADA2 while recall was comparable with an average of 0.85 and 0.77, respectively. NG-Tax 2.0 is written in Java. The code, the ontology, a Galaxy platform implementation, the analysis toolbox, tutorials and example SPARQL queries are freely available at http://wurssb.gitlab.io/ngtax under the MIT License.
Collapse
|
12
|
Diversity of tryptophan halogenases in sponges of the genus Aplysina. FEMS Microbiol Ecol 2019; 95:fiz108. [PMID: 31276591 PMCID: PMC6644159 DOI: 10.1093/femsec/fiz108] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 07/04/2019] [Indexed: 12/21/2022] Open
Abstract
Marine sponges are a prolific source of novel enzymes with promising biotechnological potential. Especially halogenases, which are key enzymes in the biosynthesis of brominated and chlorinated secondary metabolites, possess interesting properties towards the production of pharmaceuticals that are often halogenated. In this study we used a polymerase chain reaction (PCR)-based screening to simultaneously examine and compare the richness and diversity of putative tryptophan halogenase protein sequences and bacterial community structures of six Aplysina species from the Mediterranean and Caribbean seas. At the phylum level, bacterial community composition was similar amongst all investigated species and predominated by Actinobacteria, Chloroflexi, Cyanobacteria, Gemmatimonadetes, and Proteobacteria. We detected four phylogenetically diverse clades of putative tryptophan halogenase protein sequences, which were only distantly related to previously reported halogenases. The Mediterranean species Aplysina aerophoba harbored unique halogenase sequences, of which the most predominant was related to a sponge-associated Psychrobacter-derived sequence. In contrast, the Caribbean species shared numerous novel halogenase sequence variants and exhibited a highly similar bacterial community composition at the operational taxonomic unit (OTU) level. Correlations of relative abundances of halogenases with those of bacterial taxa suggest that prominent sponge symbiotic bacteria, including Chloroflexi and Actinobacteria, are putative producers of the detected enzymes and may thus contribute to the chemical defense of their host.
Collapse
|
13
|
SAPP: functional genome annotation and analysis through a semantic framework using FAIR principles. Bioinformatics 2019; 34:1401-1403. [PMID: 29186322 PMCID: PMC5905645 DOI: 10.1093/bioinformatics/btx767] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 11/22/2017] [Indexed: 11/17/2022] Open
Abstract
Summary To unlock the full potential of genome data and to enhance data interoperability and reusability of genome annotations we have developed SAPP, a Semantic Annotation Platform with Provenance. SAPP is designed as an infrastructure supporting FAIR de novo computational genomics but can also be used to process and analyze existing genome annotations. SAPP automatically predicts, tracks and stores structural and functional annotations and associated dataset- and element-wise provenance in a Linked Data format, thereby enabling information mining and retrieval with Semantic Web technologies. This greatly reduces the administrative burden of handling multiple analysis tools and versions thereof and facilitates multi-level large scale comparative analysis. Availability and implementation SAPP is written in JAVA and freely available at https://gitlab.com/sapp and runs on Unix-like operating systems. The documentation, examples and a tutorial are available at https://sapp.gitlab.io.
Collapse
|
14
|
Comparative Genomics Highlights Symbiotic Capacities and High Metabolic Flexibility of the Marine Genus Pseudovibrio. Genome Biol Evol 2018; 10:125-142. [PMID: 29319806 PMCID: PMC5765558 DOI: 10.1093/gbe/evx271] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/18/2017] [Indexed: 12/19/2022] Open
Abstract
Pseudovibrio is a marine bacterial genus members of which are predominantly isolated from sessile marine animals, and particularly sponges. It has been hypothesized that Pseudovibrio spp. form mutualistic relationships with their hosts. Here, we studied Pseudovibrio phylogeny and genetic adaptations that may play a role in host colonization by comparative genomics of 31 Pseudovibrio strains, including 25 sponge isolates. All genomes were highly similar in terms of encoded core metabolic pathways, albeit with substantial differences in overall gene content. Based on gene composition, Pseudovibrio spp. clustered by geographic region, indicating geographic speciation. Furthermore, the fact that isolates from the Mediterranean Sea clustered by sponge species suggested host-specific adaptation or colonization. Genome analyses suggest that Pseudovibrio hongkongensis UST20140214-015BT is only distantly related to other Pseudovibrio spp., thereby challenging its status as typical Pseudovibrio member. All Pseudovibrio genomes were found to encode numerous proteins with SEL1 and tetratricopeptide repeats, which have been suggested to play a role in host colonization. For evasion of the host immune system, Pseudovibrio spp. may depend on type III, IV, and VI secretion systems that can inject effector molecules into eukaryotic cells. Furthermore, Pseudovibrio genomes carry on average seven secondary metabolite biosynthesis clusters, reinforcing the role of Pseudovibrio spp. as potential producers of novel bioactive compounds. Tropodithietic acid, bacteriocin, and terpene biosynthesis clusters were highly conserved within the genus, suggesting an essential role in survival, for example through growth inhibition of bacterial competitors. Taken together, these results support the hypothesis that Pseudovibrio spp. have mutualistic relations with sponges.
Collapse
|
15
|
Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life. Front Cell Infect Microbiol 2017; 7:31. [PMID: 28224116 PMCID: PMC5293770 DOI: 10.3389/fcimb.2017.00031] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 01/23/2017] [Indexed: 11/26/2022] Open
Abstract
Mycoplasmas are the smallest self-replicating organisms and obligate parasites of a specific vertebrate host. An in-depth analysis of the functional capabilities of mycoplasma species is fundamental to understand how some of simplest forms of life on Earth succeeded in subverting complex hosts with highly sophisticated immune systems. In this study we present a genome-scale comparison, focused on identification of functional protein domains, of 80 publically available mycoplasma genomes which were consistently re-annotated using a standardized annotation pipeline embedded in a semantic framework to keep track of the data provenance. We examined the pan- and core-domainome and studied predicted functional capability in relation to host specificity and phylogenetic distance. We show that the pan- and core-domainome of mycoplasma species is closed. A comparison with the proteome of the “minimal” synthetic bacterium JCVI-Syn3.0 allowed us to classify domains and proteins essential for minimal life. Many of those essential protein domains, essential Domains of Unknown Function (DUFs) and essential hypothetical proteins are not persistent across mycoplasma genomes suggesting that mycoplasma species support alternative domain configurations that bypass their essentiality. Based on the protein domain composition, we could separate mycoplasma species infecting blood and tissue. For selected genomes of tissue infecting mycoplasmas, we could also predict whether the host is ruminant, pig or human. Functionally closely related mycoplasma species, which have a highly similar protein domain repertoire, but different hosts could not be separated. This study provides a concise overview of the functional capabilities of mycoplasma species, which can be used as a basis to further understand host-pathogen interaction or to design synthetic minimal life.
Collapse
|
16
|
Comparison of 432 Pseudomonas strains through integration of genomic, functional, metabolic and expression data. Sci Rep 2016; 6:38699. [PMID: 27922098 PMCID: PMC5138606 DOI: 10.1038/srep38699] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 11/14/2016] [Indexed: 11/08/2022] Open
Abstract
Pseudomonas is a highly versatile genus containing species that can be harmful to humans and plants while others are widely used for bioengineering and bioremediation. We analysed 432 sequenced Pseudomonas strains by integrating results from a large scale functional comparison using protein domains with data from six metabolic models, nearly a thousand transcriptome measurements and four large scale transposon mutagenesis experiments. Through heterogeneous data integration we linked gene essentiality, persistence and expression variability. The pan-genome of Pseudomonas is closed indicating a limited role of horizontal gene transfer in the evolutionary history of this genus. A large fraction of essential genes are highly persistent, still non essential genes represent a considerable fraction of the core-genome. Our results emphasize the power of integrating large scale comparative functional genomics with heterogeneous data for exploring bacterial diversity and versatility.
Collapse
|
17
|
Complete genome sequence of thermophilic Bacillus smithii type strain DSM 4216(T). Stand Genomic Sci 2016; 11:52. [PMID: 27559429 PMCID: PMC4995803 DOI: 10.1186/s40793-016-0172-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 08/09/2016] [Indexed: 11/10/2022] Open
Abstract
Bacillus smithii is a facultatively anaerobic, thermophilic bacterium able to use a variety of sugars that can be derived from lignocellulosic feedstocks. Being genetically accessible, it is a potential new host for biotechnological production of green chemicals from renewable resources. We determined the complete genomic sequence of the B. smithii type strain DSM 4216T, which consists of a 3,368,778 bp chromosome (GenBank accession number CP012024.1) and a 12,514 bp plasmid (GenBank accession number CP012025.1), together encoding 3880 genes. Genome annotation via RAST was complemented by a protein domain analysis. Some unique features of B. smithii central metabolism in comparison to related organisms included the lack of a standard acetate production pathway with no apparent pyruvate formate lyase, phosphotransacetylase, and acetate kinase genes, while acetate was the second fermentation product.
Collapse
|
18
|
Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics. F1000Res 2016; 5:1987. [PMID: 27703668 PMCID: PMC5031134 DOI: 10.12688/f1000research.9416.3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/26/2017] [Indexed: 11/20/2022] Open
Abstract
A functional comparative genome analysis is essential to understand the mechanisms underlying bacterial evolution and adaptation. Detection of functional orthologs using standard global sequence similarity methods faces several problems; the need for defining arbitrary acceptance thresholds for similarity and alignment length, lateral gene acquisition and the high computational cost for finding bi-directional best matches at a large scale. We investigated the use of protein domain architectures for large scale functional comparative analysis as an alternative method. The performance of both approaches was assessed through functional comparison of 446 bacterial genomes sampled at different taxonomic levels. We show that protein domain architectures provide a fast and efficient alternative to methods based on sequence similarity to identify groups of functionally equivalent proteins within and across taxonomic boundaries, and it is suitable for large scale comparative analysis. Running both methods in parallel pinpoints potential functional adaptations that may add to bacterial fitness.
Collapse
|
19
|
RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource. J Biomed Semantics 2015; 6:39. [PMID: 26500754 PMCID: PMC4619317 DOI: 10.1186/s13326-015-0038-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 09/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Semantic web technologies have a tremendous potential for the integration of heterogeneous data sets. Therefore, an increasing number of widely used biological resources are becoming available in the RDF data model. There are however, no tools available that provide structural overviews of these resources. Such structural overviews are essential to efficiently query these resources and to assess their structural integrity and design, thereby strengthening their use and potential. RESULTS Here we present RDF2Graph, a tool that automatically recovers the structure of an RDF resource. The generated overview allows to create complex queries on these resources and to structurally validate newly created resources. CONCLUSION RDF2Graph facilitates the creation of complex queries thereby enabling access to knowledge stored across multiple RDF resources. RDF2Graph facilitates creation of high quality resources and resource descriptions, which in turn increases usability of the semantic web technologies.
Collapse
|
20
|
Comparative genomics of Streptococcus pyogenes M1 isolates differing in virulence and propensity to cause systemic infection in mice. Int J Med Microbiol 2015; 305:532-43. [PMID: 26129624 DOI: 10.1016/j.ijmm.2015.06.002] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 06/15/2015] [Accepted: 06/19/2015] [Indexed: 12/14/2022] Open
Abstract
Streptococcus pyogenes serotype M1 is a frequent cause of severe infections in humans. Some M1 isolates are pathogenic in mice and used in studies on infection pathogenesis. We observed marked differences in murine infections caused by M1 strain SF370, 5448, 5448AP or AP1 which prompted us to sequence the whole genome of isolates 5448 and AP1 for comparative analysis. Strain 5448 is known to acquire inactivating mutations in the CovRS two-component system during mouse infection, producing hypervirulent progeny such as 5448AP. Isolates AP1 and 5448AP, more than 5448, caused disseminating infections that became systemic and lethal. SF370 was not pathogenic. Phages caused gross genetic differences and increased the gene content of AP1 by 8% as compared to 5448 and SF370. Each of six examined M1 genomes contained two CRISPR-Cas systems. Phage insertion destroyed a type II CRISPR-Cas system in AP1 and other strains of serotypes M1, M3, M6 and M24, but not in M1 strains 5448, SF370, MGAS5005, A20 or M1 476. A resulting impaired defence against invading genetic elements could have led to the wealth of phages in AP1. AP1 lacks genetic features of the MGAS5005-like clonal complex including the streptodornase that drives selection for hypervirulent clones with inactivated CovRS system. Still, inactivating mutations in covS were a common genetic feature of AP1 and the MGAS5005-like isolate 5448AP. Abolished expression of the cysteine proteinase SpeB, due to CovRS inactivation could be a common cause for hypervirulence of the two isolates. Moreover, an additional protein H-coding gene and a mutation in the regulator gene rofA distinguished AP1 form other M1 isolates. In conclusion, hypervirulence of S. pyogenes M1 in mice is not limited to the MGAS5005-like genotype.
Collapse
|
21
|
Genome and proteome analysis of Pseudomonas chloritidismutans AW-1 T that grows on n-decane with chlorate or oxygen as electron acceptor. Environ Microbiol 2015; 18:3247-3257. [PMID: 25900248 DOI: 10.1111/1462-2920.12880] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 02/25/2015] [Accepted: 03/05/2015] [Indexed: 01/15/2023]
Abstract
Growth of Pseudomonas chloritidismutans AW-1T on C7 to C12 n-alkanes with oxygen or chlorate as electron acceptor was studied by genome and proteome analysis. Whole genome shotgun sequencing resulted in a 5 Mbp assembled sequence with a G + C content of 62.5%. The automatic annotation identified 4767 protein-encoding genes and a putative function could be assigned to almost 80% of the predicted proteins. The distinct phylogenetic position of P. chloritidismutans AW-1T within the Pseudomonas stutzeri cluster became clear by comparison of average nucleotide identity values of sequenced genomes. Analysis of the proteome of P. chloritidismutans AW-1T showed the versatility of this bacterium to adapt to aerobic and anaerobic growth conditions with acetate or n-decane as substrates. All enzymes involved in the alkane oxidation pathway were identified. An alkane monooxygenase was detected in n-decane-grown cells, but not in acetate-grown cells. The enzyme was found when grown in the presence of oxygen or chlorate, indicating that under both conditions an oxygenase-mediated pathway is employed for alkane degradation. Proteomic and biochemical data also showed that both chlorate reductase and chlorite dismutase are constitutively present, but most abundant under chlorate-reducing conditions.
Collapse
|
22
|
Abstract
Background Eukaryotic Argonaute proteins mediate RNA-guided RNA interference, allowing both regulation of host gene expression and defense against invading mobile genetic elements. Recently, it has become evident that prokaryotic Argonaute homologs mediate DNA-guided DNA interference, and play a role in host defense. Argonaute of the bacterium Thermus thermophilus (TtAgo) targets invading plasmid DNA during and after transformation. Using small interfering DNA guides, TtAgo can cleave single and double stranded DNAs. Although TtAgo additionally has been demonstrated to cleave RNA targets complementary to its DNA guide in vitro, RNA targeting by TtAgo has not been demonstrated in vivo. Methods To investigate if TtAgo also has the potential to control RNA levels, we analyzed RNA-seq data derived from cultures of four T. thermophilus strain HB27 variants: wild type, TtAgo knockout (Δago), and either strain transformed with a plasmid. Additionally we determined the effect of TtAgo on expression of plasmid-encoded RNA and plasmid DNA levels. Results In the absence of exogenous DNA (plasmid), TtAgo presence or absence had no effect on gene expression levels. When plasmid DNA is present, TtAgo reduces plasmid DNA levels 4-fold, and a corresponding reduction of plasmid gene transcript levels was observed. We therefore conclude that TtAgo interferes with plasmid DNA, but not with plasmid-encoded RNA. Interestingly, TtAgo presence stimulates expression of specific endogenous genes, but only when exogenous plasmid DNA was present. Specifically, the presence of TtAgo directly or indirectly stimulates expression of CRISPR loci and associated genes, some of which are involved in CRISPR adaptation. This suggests that TtAgo-mediated interference with plasmid DNA stimulates CRISPR adaptation.
Collapse
|
23
|
RNA targeting by the type III-A CRISPR-Cas Csm complex of Thermus thermophilus. Mol Cell 2014; 56:518-30. [PMID: 25457165 PMCID: PMC4342149 DOI: 10.1016/j.molcel.2014.10.005] [Citation(s) in RCA: 215] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Revised: 09/27/2014] [Accepted: 10/02/2014] [Indexed: 02/07/2023]
Abstract
CRISPR-Cas is a prokaryotic adaptive immune system that provides sequence-specific defense against foreign nucleic acids. Here we report the structure and function of the effector complex of the Type III-A CRISPR-Cas system of Thermus thermophilus: the Csm complex (TtCsm). TtCsm is composed of five different protein subunits (Csm1-Csm5) with an uneven stoichiometry and a single crRNA of variable size (35-53 nt). The TtCsm crRNA content is similar to the Type III-B Cmr complex, indicating that crRNAs are shared among different subtypes. A negative stain EM structure of the TtCsm complex exhibits the characteristic architecture of Type I and Type III CRISPR-associated ribonucleoprotein complexes. crRNA-protein crosslinking studies show extensive contacts between the Csm3 backbone and the bound crRNA. We show that, like TtCmr, TtCsm cleaves complementary target RNAs at multiple sites. Unlike Type I complexes, interference by TtCsm does not proceed via initial base pairing by a seed sequence.
Collapse
|
24
|
Genome analyses of the carboxydotrophic sulfate-reducers Desulfotomaculum nigrificans and Desulfotomaculum carboxydivorans and reclassification of Desulfotomaculum caboxydivorans as a later synonym of Desulfotomaculum nigrificans. Stand Genomic Sci 2014; 9:655-75. [PMID: 25197452 PMCID: PMC4149029 DOI: 10.4056/sigs.4718645] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Desulfotomaculum nigrificans and D. carboxydivorans are moderately thermophilic members of the polyphyletic spore-forming genus Desulfotomaculum in the family Peptococcaceae. They are phylogenetically very closely related and belong to ‘subgroup a’ of the Desulfotomaculum cluster 1. D. nigrificans and D. carboxydivorans have a similar growth substrate spectrum; they can grow with glucose and fructose as electron donors in the presence of sulfate. Additionally, both species are able to ferment fructose, although fermentation of glucose is only reported for D. carboxydivorans. D. nigrificans is able to grow with 20% carbon monoxide (CO) coupled to sulfate reduction, while D. carboxydivorans can grow at 100% CO with and without sulfate. Hydrogen is produced during growth with CO by D. carboxydivorans. Here we present a summary of the features of D. nigrificans and D. carboxydivorans together with the description of the complete genome sequencing and annotation of both strains. Moreover, we compared the genomes of both strains to reveal their differences. This comparison led us to propose a reclassification of D. carboxydivorans as a later heterotypic synonym of D. nigrificans.
Collapse
|
25
|
Differential translation tunes uneven production of operon-encoded proteins. Cell Rep 2013; 4:938-44. [PMID: 24012761 DOI: 10.1016/j.celrep.2013.07.049] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 07/15/2013] [Accepted: 07/31/2013] [Indexed: 12/27/2022] Open
Abstract
Clustering of functionally related genes in operons allows for coregulated gene expression in prokaryotes. This is advantageous when equal amounts of gene products are required. Production of protein complexes with an uneven stoichiometry, however, requires tuning mechanisms to generate subunits in appropriate relative quantities. Using comparative genomic analysis, we show that differential translation is a key determinant of modulated expression of genes clustered in operons and that codon bias generally is the best in silico indicator of unequal protein production. Variable ribosome density profiles of polycistronic transcripts correlate strongly with differential translation patterns. In addition, we provide experimental evidence that de novo initiation of translation can occur at intercistronic sites, allowing for differential translation of any gene irrespective of its position on a polycistronic messenger. Thus, modulation of translation efficiency appears to be a universal mode of control in bacteria and archaea that allows for differential production of operon-encoded proteins.
Collapse
|
26
|
The Constructor: a web application optimizing cloning strategies based on modules from the registry of standard biological parts. J Biol Eng 2012; 6:14. [PMID: 22947262 PMCID: PMC3520746 DOI: 10.1186/1754-1611-6-14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 09/02/2012] [Indexed: 11/28/2022] Open
Abstract
Synthetic biology is an emerging field that combines molecular biology with engineering principles, which requires abstraction levels applied to a modular biological componentry. The Registry of Standard Biological Parts harbours such a repository of standardized parts, and thereby facilitates the combination of complex molecular modules to novel genetic circuits and devices. However, since finding the best parts for a pre-determined genetic design can be time consuming, we devised the Constructor, a web tool that recommends the smallest number of cloning steps for pre-designed circuits, and implements user-defined quality checks. We present the Constructor (
http://www.systemsbiology.nl/the_constructor) as a constructive web tool that simplifies the in silico assembly of pre-designed gene circuitries from standard parts, reducing both planning and subsequent cloning time.
Collapse
|