1
|
Gontijo MTP, Jorge GP, Brocchi M. Current Status of Endolysin-Based Treatments against Gram-Negative Bacteria. Antibiotics (Basel) 2021; 10:1143. [PMID: 34680724 PMCID: PMC8532960 DOI: 10.3390/antibiotics10101143] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 07/14/2021] [Accepted: 07/21/2021] [Indexed: 12/31/2022] Open
Abstract
The prevalence of multidrug-resistant Gram-negative bacteria is a public health concern. Bacteriophages and bacteriophage-derived lytic enzymes have been studied in response to the emergence of multidrug-resistant bacteria. The availability of tRNAs and endolysin toxicity during recombinant protein expression is circumvented by codon optimization and lower expression levels using inducible pET-type plasmids and controlled cultivation conditions, respectively. The use of polyhistidine tags facilitates endolysin purification and alters antimicrobial activity. Outer membrane permeabilizers, such as organic acids, act synergistically with endolysins, but some endolysins permeate the outer membrane of Gram-negative bacteria per se. However, the outer membrane permeation mechanisms of endolysins remain unclear. Other strategies, such as the co-administration of endolysins with polymyxins, silver nanoparticles, and liposomes confer additional outer membrane permeation. Engineered endolysins comprising domains for outer membrane permeation is also a strategy used to overcome the current challenges on the control of multidrug-resistant Gram-negative bacteria. Metagenomics is a new strategy for screening endolysins with interesting antimicrobial properties from uncultured phage genomes. Here, we review the current state of the art on the heterologous expression of endolysin, showing the potential of bacteriophage endolysins in controlling bacterial infections.
Collapse
Affiliation(s)
- Marco Túlio Pardini Gontijo
- Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Rua Monteiro Lobato 255, Campinas 13083-862, Brazil; (G.P.J.); (M.B.)
| | | | | |
Collapse
|
2
|
Priya R, Sneha P, Dass JFP, Doss C GP, Manickavasagam M, Siva R. Exploring the codon patterns between CCD and NCED genes among different plant species. Comput Biol Med 2019; 114:103449. [DOI: 10.1016/j.compbiomed.2019.103449] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 09/13/2019] [Accepted: 09/13/2019] [Indexed: 01/16/2023]
|
3
|
Danchin A, Sekowska A, Noria S. Functional Requirements in the Program and the Cell Chassis for Next-Generation Synthetic Biology. Synth Biol (Oxf) 2018. [DOI: 10.1002/9783527688104.ch5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Affiliation(s)
- Antoine Danchin
- Institute of Cardiometabolism and Nutrition; 47 boulevard de l'Hôpital Paris 75013 France
| | - Agnieszka Sekowska
- Institute of Cardiometabolism and Nutrition; 47 boulevard de l'Hôpital Paris 75013 France
| | - Stanislas Noria
- Fondation Fourmentin-Guilbert; 2 avenue du Pavé Neuf Noisy le Grand 93160 France
| |
Collapse
|
4
|
Junier I, Frémont P, Rivoire O. Universal and idiosyncratic characteristic lengths in bacterial genomes. Phys Biol 2018; 15:035001. [PMID: 29512518 DOI: 10.1088/1478-3975/aab4ac] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10-20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.
Collapse
Affiliation(s)
- Ivan Junier
- CNRS, TIMC-IMAG, Grenoble, France. Univ. Grenoble Alpes, TIMC-IMAG, Grenoble, France
| | | | | |
Collapse
|
5
|
Jha M, Malhotra R, Acharya R. A Generalized Lattice based Probabilistic Approach for Metagenomic Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:749-761. [PMID: 27168602 DOI: 10.1109/tcbb.2016.2563422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Metagenomics involves the analysis of genomes of microorganisms sampled directly from their environment. Next Generation Sequencing allows a high-throughput sampling of small segments from genomes in the metagenome to generate reads. To study the properties and relationships of the microorganisms present, clustering can be performed based on the inherent composition of the sampled reads for unknown species. We propose a two-dimensional lattice based probabilistic model for clustering metagenomic datasets. The occurrence of a species in the metagenome is estimated using a lattice of probabilistic distributions over small sized genomic sequences. The two dimensions denote distributions for different sizes and groups of words respectively. The lattice structure allows for additional support for a node from its neighbors when the probabilistic support for the species using the parameters of the current node is deemed insufficient. We also show convergence for our algorithm. We test our algorithm on simulated metagenomic data containing bacterial species and observe more than 85% precision. We also evaluate our algorithm on an in vitro-simulated bacterial metagenome and on human patient data, and show a better clustering than other algorithms even for short reads and varied abundance. The software and datasets can be downloaded from https:// github.com/lattclus/lattice-metage.
Collapse
|
6
|
Pang TY, Lercher MJ. Supra-operonic clusters of functionally related genes (SOCs) are a source of horizontal gene co-transfers. Sci Rep 2017; 7:40294. [PMID: 28067311 PMCID: PMC5220362 DOI: 10.1038/srep40294] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 12/01/2016] [Indexed: 12/14/2022] Open
Abstract
Adaptation of bacteria occurs predominantly via horizontal gene transfer (HGT). While it is widely recognized that horizontal acquisitions frequently encompass multiple genes, it is unclear what the size distribution of successfully transferred DNA segments looks like and what evolutionary forces shape this distribution. Here, we identified 1790 gene family pairs that were consistently co-gained on the same branches across a phylogeny of 53 E. coli strains. We estimated a lower limit of their genomic distances at the time they were transferred to their host genomes; this distribution shows a sharp upper bound at 30 kb. The same gene-pairs can have larger distances (up to 70 kb) in other genomes. These more distant pairs likely represent recent acquisitions via transduction that involve the co-transfer of excised prophage genes, as they are almost always associated with intervening phage-associated genes. The observed distribution of genomic distances of co-transferred genes is much broader than expected from a model based on the co-transfer of genes within operons; instead, this distribution is highly consistent with the size distribution of supra-operonic clusters (SOCs), groups of co-occurring and co-functioning genes that extend beyond operons. Thus, we propose that SOCs form a basic unit of horizontal gene transfer.
Collapse
Affiliation(s)
- Tin Yau Pang
- Institute for Computer Science, Heinrich Heine University, Düsseldorf, 40225, Germany
| | - Martin J Lercher
- Institute for Computer Science, Heinrich Heine University, Düsseldorf, 40225, Germany
| |
Collapse
|
7
|
Touchon M, Rocha EPC. Coevolution of the Organization and Structure of Prokaryotic Genomes. Cold Spring Harb Perspect Biol 2016; 8:a018168. [PMID: 26729648 DOI: 10.1101/cshperspect.a018168] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology.
Collapse
Affiliation(s)
- Marie Touchon
- Microbial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France CNRS, UMR3525, 75015 Paris, France
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France CNRS, UMR3525, 75015 Paris, France
| |
Collapse
|
8
|
From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems. Cell Mol Life Sci 2015; 72:4287-308. [PMID: 26254872 PMCID: PMC4611022 DOI: 10.1007/s00018-015-2004-1] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2015] [Revised: 07/23/2015] [Accepted: 07/28/2015] [Indexed: 12/30/2022]
Abstract
Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.
Collapse
|
9
|
Shen W, Wang D, Ye B, Shi M, Ma L, Zhang Y, Zhao Z. GC3-biased gene domains in mammalian genomes. Bioinformatics 2015; 31:3081-4. [PMID: 26019240 PMCID: PMC4576692 DOI: 10.1093/bioinformatics/btv329] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/19/2015] [Indexed: 01/17/2023] Open
Abstract
Motivation: Synonymous codon usage bias has been shown to be correlated with many genomic features among different organisms. However, the biological significance of codon bias with respect to gene function and genome organization remains unclear. Results: Guanine and cytosine content at the third codon position (GC3) could be used as a good indicator of codon bias. Here, we used relative GC3 bias values to compare the strength of GC3 bias of genes in human and mouse. We reported, for the first time, that GC3-rich and GC3-poor gene products might have distinct sub-cellular spatial distributions. Moreover, we extended the view of genomic gene domains and identified conserved GC3 biased gene domains along chromosomes. Our results indicated that similar GC3 biased genes might be co-translated in specific spatial regions to share local translational machineries, and that GC3 could be involved in the organization of genome architecture. Availability and implementation: Source code is available upon request from the authors. Contact:zhaozh@nic.bmi.ac.cn or zany1983@gmail.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenlong Shen
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Dong Wang
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Bingyu Ye
- Beijing Institute of Biotechnology, Beijing 100071, China, College of Life Sciences, Capital Normal University, Beijing 100048, China and
| | - Minglei Shi
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Lei Ma
- College of Life Sciences, Shihezi University, Shihezi 832003, China
| | - Yan Zhang
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Zhihu Zhao
- Beijing Institute of Biotechnology, Beijing 100071, China
| |
Collapse
|
10
|
Genome-wide patterns of codon bias are shaped by natural selection in the purple sea urchin, Strongylocentrotus purpuratus. G3-GENES GENOMES GENETICS 2013; 3:1069-83. [PMID: 23637123 PMCID: PMC3704236 DOI: 10.1534/g3.113.005769] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Codon usage bias has been documented in a wide diversity of species, but the relative contributions of mutational bias and various forms of natural selection remain unclear. Here, we describe for the first time genome-wide patterns of codon bias at 4623 genes in the purple sea urchin, Strongylocentrotus purpuratus. Preferred codons were identified at 18 amino acids that exclusively used G or C at third positions, which contrasted with the strong AT bias of the genome (overall GC content is 36.9%). The GC content of third positions and coding regions exhibited significant correlations with the magnitude of codon bias. In contrast, the GC content of introns and flanking regions was indistinguishable from the genome-wide background, which suggested a limited contribution of mutational bias to synonymous codon usage. Five distinct clusters of genes were identified that had significantly different synonymous codon usage patterns. A significant correlation was observed between codon bias and mRNA expression supporting translational selection, but this relationship was driven by only one highly biased cluster that represented only 8.6% of all genes. In all five clusters preferred codons were evolutionarily conserved to a similar degree despite differences in their synonymous codon usage distributions and magnitude of codon bias. The third positions of preferred codons in two codon usage groups also paired significantly more often in stems than in loops of mRNA secondary structure predictions, which suggested that codon bias might also affect mRNA stability. Our results suggest that mutational bias has played a minor role in determining codon bias in S. purpuratus and that preferred codon usage may be heterogeneous across different genes and subject to different forms of natural selection.
Collapse
|
11
|
Cardinale DJ, DeRosa K, Duffy S. Base composition and translational selection are insufficient to explain codon usage bias in plant viruses. Viruses 2013; 5:162-81. [PMID: 23322170 PMCID: PMC3564115 DOI: 10.3390/v5010162] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Revised: 01/09/2013] [Accepted: 01/11/2013] [Indexed: 02/06/2023] Open
Abstract
Viral codon usage bias may be the product of a number of synergistic or antagonistic factors, including genomic nucleotide composition, translational selection, genomic architecture, and mutational or repair biases. Most studies of viral codon bias evaluate only the relative importance of genomic base composition and translational selection, ignoring other possible factors. We analyzed the codon preferences of ssRNA (luteoviruses and potyviruses) and ssDNA (geminiviruses) plant viruses that infect translationally distinct monocot and dicot hosts. We found that neither genomic base composition nor translational selection satisfactorily explains their codon usage biases. Furthermore, we observed a strong relationship between the codon preferences of viruses in the same family or genus, regardless of host or genomic nucleotide content. Our results suggest that analyzing codon bias as either due to base composition or translational selection is a false dichotomy that obscures the role of other factors. Constraints such as genomic architecture and secondary structure can and do influence codon usage in plant viruses, and likely in viruses of other hosts.
Collapse
Affiliation(s)
- Daniel J Cardinale
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA.
| | | | | |
Collapse
|
12
|
Potapov I, Mäkelä J, Yli-Harja O, Ribeiro AS. Effects of codon sequence on the dynamics of genetic networks. J Theor Biol 2012; 315:17-25. [PMID: 22960571 DOI: 10.1016/j.jtbi.2012.08.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2012] [Revised: 08/17/2012] [Accepted: 08/22/2012] [Indexed: 11/27/2022]
Abstract
In prokaryotes, the rate at which codons are translated varies from one codon to the next. Using a stochastic model of transcription and translation at the nucleotide and codon levels, we investigate the effects of the codon sequence on the dynamics of protein numbers. For sequences generated according to the codon frequencies in Escherichia coli, we find that mean protein numbers at near equilibrium differ with the codon sequence, due to the mean codon translation efficiencies, in particular of the codons at the ribosome binding site region. We find close agreement between these predictions and measurements of protein expression levels as a function of the codon sequence. Next, we investigate the effects of short codon sequences at the start/end of the RNA sequence with linearly increasing/decreasing translation efficiencies, known as slow ramps. The ramps affect the mean, but not the fluctuations, in proteins numbers by affecting the rate of translation initiation. Finally, we show that slow ramps affect the dynamics of small genetic circuits, namely, switches and clocks. In switches, ramps affect the frequency of switching and bias the robustness of the noisy attractors. In repressilators, ramps alter the robustness of periodicity. We conclude that codon sequences affect the dynamics of gene expression and genetic circuits and, thus, are likely to be under selection regarding both mean codon frequency as well as spatial arrangement along the sequence.
Collapse
Affiliation(s)
- Ilya Potapov
- Computational Systems Biology Research Group, Department of Signal Processing, Tampere University of Technology, P.O. Box 527, FIN-33101, Finland.
| | | | | | | |
Collapse
|
13
|
Phan TH, Nguyen DL. Species-specificity of DNA trimer densities in chromosomes and their use in the classification of closely related organisms. J Microbiol Methods 2012; 91:30-7. [PMID: 22820348 DOI: 10.1016/j.mimet.2012.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Revised: 07/09/2012] [Accepted: 07/10/2012] [Indexed: 11/27/2022]
Abstract
16S rDNA sequences are conventionally used for classification of organisms. However, the use of these sequences is sometimes not successful, especially for closely related species. For better classification of these organisms, several methods that are genome sequence-based have been developed. Sequence alignment-based methods are tedious and time-consuming, as they need conserved coding sequences to be identified and deduced prior to sequence alignment. Likewise, method that relies on gene function needs genes to be assessed for function similarity. Other alignment-free methods, which are based on particular genome sequence properties, so far have been complex and not species-specific enough for classification of organisms below genus level. The present study found that the ratios of DNA trimer frequencies to chromosomal length were species-specific. Density of a trimer in a chromosomal sequence was defined as the average frequency of the trimer per 1 kbp. The species-specificity of trimer densities in chromosomes of many closely related bacteria was compared in parallel with 16S rDNA sequences in these same bacteria. The results of these comparisons indicate that trimer densities in chromosomes can be used to simply and efficiently classify the organisms below genus level.
Collapse
Affiliation(s)
- Thi Huyen Phan
- Department of Biotechnology, Ho Chi Minh City University of Technology, VNU-HCM, Ward 14, District 10, Ho Chi Minh City, Vietnam.
| | | |
Collapse
|
14
|
Unsupervised two-way clustering of metagenomic sequences. J Biomed Biotechnol 2012; 2012:153647. [PMID: 22577288 PMCID: PMC3336163 DOI: 10.1155/2012/153647] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 01/26/2012] [Indexed: 11/30/2022] Open
Abstract
A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.
Collapse
|
15
|
Dass JFP, Sudandiradoss C. Insight into pattern of codon biasness and nucleotide base usage in serotonin receptor gene family from different mammalian species. Gene 2012; 503:92-100. [PMID: 22480817 DOI: 10.1016/j.gene.2012.03.057] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 03/14/2012] [Accepted: 03/17/2012] [Indexed: 11/16/2022]
Abstract
5-HT (5-Hydroxy-tryptamine) or serotonin receptors are found both in central and peripheral nervous system as well as in non-neuronal tissues. In the animal and human nervous system, serotonin produces various functional effects through a variety of membrane bound receptors. In this study, we focus on 5-HT receptor family from different mammals and examined the factors that account for codon and nucleotide usage variation. A total of 110 homologous coding sequences from 11 different mammalian species were analyzed using relative synonymous codon usage (RSCU), correspondence analysis (COA) and hierarchical cluster analysis together with nucleotide base usage frequency of chemically similar amino acid codons. The mean effective number of codon (ENc) value of 37.06 for 5-HT(6) shows very high codon bias within the family and may be due to high selective translational efficiency. The COA and Spearman's rank correlation reveals that the nucleotide compositional mutation bias as the major factors influencing the codon usage in serotonin receptor genes. The hierarchical cluster analysis suggests that gene function is another dominant factor that affects the codon usage bias, while species is a minor factor. Nucleotide base usage was reported using Goldman, Engelman, Stietz (GES) scale reveals the presence of high uracil (>45%) content at functionally important hydrophobic regions. Our in silico approach will certainly help for further investigations on critical inference on evolution, structure, function and gene expression aspects of 5-HT receptors family which are potential antipsychotic drug targets.
Collapse
Affiliation(s)
- J Febin Prabhu Dass
- School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu State, India
| | | |
Collapse
|
16
|
Fritsche M, Li S, Heermann DW, Wiggins PA. A model for Escherichia coli chromosome packaging supports transcription factor-induced DNA domain formation. Nucleic Acids Res 2012; 40:972-80. [PMID: 21976727 PMCID: PMC3273793 DOI: 10.1093/nar/gkr779] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Revised: 09/05/2011] [Accepted: 09/05/2011] [Indexed: 01/07/2023] Open
Abstract
What physical mechanism leads to organization of a highly condensed and confined circular chromosome? Computational modeling shows that confinement-induced organization is able to overcome the chromosome's propensity to mix by the formation of topological domains. The experimentally observed high precision of separate subcellular positioning of loci (located on different chromosomal domains) in Escherichia coli naturally emerges as a result of entropic demixing of such chromosomal loops. We propose one possible mechanism for organizing these domains: regulatory control defined by the underlying E. coli gene regulatory network requires the colocalization of transcription factor genes and target genes. Investigating this assumption, we find the DNA chain to self-organize into several topologically distinguishable domains where the interplay between the entropic repulsion of chromosomal loops and their compression due to the confining geometry induces an effective nucleoid filament-type of structure. Thus, we propose that the physical structure of the chromosome is a direct result of regulatory interactions. To reproduce the observed precise ordering of the chromosome, we estimate that the domain sizes are distributed between 10 and 700 kb, in agreement with the size of topological domains identified in the context of DNA supercoiling.
Collapse
Affiliation(s)
- Miriam Fritsche
- Institute for Theoretical Physics, University of Heidelberg, Philosophenweg 19, D-69120 Heidelberg, Germany.
| | | | | | | |
Collapse
|
17
|
Waegeman H, Beauprez J, Moens H, Maertens J, De Mey M, Foulquié-Moreno MR, Heijnen JJ, Charlier D, Soetaert W. Effect of iclR and arcA knockouts on biomass formation and metabolic fluxes in Escherichia coli K12 and its implications on understanding the metabolism of Escherichia coli BL21 (DE3). BMC Microbiol 2011; 11:70. [PMID: 21481254 PMCID: PMC3094197 DOI: 10.1186/1471-2180-11-70] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Accepted: 04/11/2011] [Indexed: 11/10/2022] Open
Abstract
Background Gene expression is regulated through a complex interplay of different transcription factors (TFs) which can enhance or inhibit gene transcription. ArcA is a global regulator that regulates genes involved in different metabolic pathways, while IclR as a local regulator, controls the transcription of the glyoxylate pathway genes of the aceBAK operon. This study investigates the physiological and metabolic consequences of arcA and iclR deletions on E. coli K12 MG1655 under glucose abundant and limiting conditions and compares the results with the metabolic characteristics of E. coli BL21 (DE3). Results The deletion of arcA and iclR results in an increase in the biomass yield both under glucose abundant and limiting conditions, approaching the maximum theoretical yield of 0.65 c-mole/c-mole glucose under glucose abundant conditions. This can be explained by the lower flux through several CO2 producing pathways in the E. coli K12 ΔarcAΔiclR double knockout strain. Due to iclR gene deletion, the glyoxylate pathway is activated resulting in a redirection of 30% of the isocitrate molecules directly to succinate and malate without CO2 production. Furthermore, a higher flux at the entrance of the TCA was noticed due to arcA gene deletion, resulting in a reduced production of acetate and less carbon loss. Under glucose limiting conditions the flux through the glyoxylate pathway is further increased in the ΔiclR knockout strain, but this effect was not observed in the double knockout strain. Also a striking correlation between the glyoxylate flux data and the isocitrate lyase activity was observed for almost all strains and under both growth conditions, illustrating the transcriptional control of this pathway. Finally, similar central metabolic fluxes were observed in E. coli K12 ΔarcA ΔiclR compared to the industrially relevant E. coli BL21 (DE3), especially with respect to the pentose pathway, the glyoxylate pathway, and the TCA fluxes. In addition, a comparison of the genome sequences of the two strains showed that BL21 possesses two mutations in the promoter region of iclR and rare codons are present in arcA implying a lower tRNA acceptance. Both phenomena presumably result in a reduced ArcA and IclR synthesis in BL21, which contributes to the similar physiology as observed in E. coli K12 ΔarcAΔiclR. Conclusions The deletion of arcA results in a decrease of repression on transcription of TCA cycle genes under glucose abundant conditions, without significantly affecting the glyoxylate pathway activity. IclR clearly represses transcription of glyoxylate pathway genes under glucose abundance, a condition in which Crp activation is absent. Under glucose limitation, Crp is responsible for the high glyoxylate flux, but IclR still represses transcription. Finally, in E. coli BL21 (DE3), ArcA and IclR are poorly expressed, explaining the similar fluxes observed compared to the ΔarcAΔiclR strain.
Collapse
Affiliation(s)
- Hendrik Waegeman
- Centre of Expertise-Industrial Biotechnology and Biocatalysis, Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, B-9000 Ghent, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Metagenomics has revolutionized microbiology by paving the way for a cultivation-independent assessment and exploitation of microbial communities present in complex ecosystems. Metagenomics comprising construction and screening of metagenomic DNA libraries has proven to be a powerful tool to isolate new enzymes and drugs of industrial importance. So far, the majority of the metagenomically exploited habitats comprised temperate environments, such as soil and marine environments. Recently, metagenomes of extreme environments have also been used as sources of novel biocatalysts. The employment of next-generation sequencing techniques for metagenomics resulted in the generation of large sequence data sets derived from various environments, such as soil, the human body, and ocean water. Analyses of these data sets opened a window into the enormous taxonomic and functional diversity of environmental microbial communities. To assess the functional dynamics of microbial communities, metatranscriptomics and metaproteomics have been developed. The combination of DNA-based, mRNA-based, and protein-based analyses of microbial communities present in different environments is a way to elucidate the compositions, functions, and interactions of microbial communities and to link these to environmental processes.
Collapse
|
19
|
Abstract
Metagenomics has revolutionized microbiology by paving the way for a cultivation-independent assessment and exploitation of microbial communities present in complex ecosystems. Metagenomics comprising construction and screening of metagenomic DNA libraries has proven to be a powerful tool to isolate new enzymes and drugs of industrial importance. So far, the majority of the metagenomically exploited habitats comprised temperate environments, such as soil and marine environments. Recently, metagenomes of extreme environments have also been used as sources of novel biocatalysts. The employment of next-generation sequencing techniques for metagenomics resulted in the generation of large sequence data sets derived from various environments, such as soil, the human body, and ocean water. Analyses of these data sets opened a window into the enormous taxonomic and functional diversity of environmental microbial communities. To assess the functional dynamics of microbial communities, metatranscriptomics and metaproteomics have been developed. The combination of DNA-based, mRNA-based, and protein-based analyses of microbial communities present in different environments is a way to elucidate the compositions, functions, and interactions of microbial communities and to link these to environmental processes.
Collapse
|
20
|
Poptsova MS, Larionov SA, Ryadchenko EV, Rybalko SD, Zakharov IA, Loskutov A. Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes. PLoS One 2009; 4:e6396. [PMID: 19636424 PMCID: PMC2712679 DOI: 10.1371/journal.pone.0006396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed.
Collapse
Affiliation(s)
- Maria S Poptsova
- University of Connecticut, Storrs, Connecticut, United States of America.
| | | | | | | | | | | |
Collapse
|
21
|
Barbe V, Cruveiller S, Kunst F, Lenoble P, Meurice G, Sekowska A, Vallenet D, Wang T, Moszer I, Médigue C, Danchin A. From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. MICROBIOLOGY (READING, ENGLAND) 2009; 155:1758-1775. [PMID: 19383706 PMCID: PMC2885750 DOI: 10.1099/mic.0.027839-0] [Citation(s) in RCA: 257] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Revised: 02/25/2009] [Accepted: 02/25/2009] [Indexed: 11/18/2022]
Abstract
Comparative genomics is the cornerstone of identification of gene functions. The immense number of living organisms precludes experimental identification of functions except in a handful of model organisms. The bacterial domain is split into large branches, among which the Firmicutes occupy a considerable space. Bacillus subtilis has been the model of Firmicutes for decades and its genome has been a reference for more than 10 years. Sequencing the genome involved more than 30 laboratories, with different expertises, in a attempt to make the most of the experimental information that could be associated with the sequence. This had the expected drawback that the sequencing expertise was quite varied among the groups involved, especially at a time when sequencing genomes was extremely hard work. The recent development of very efficient, fast and accurate sequencing techniques, in parallel with the development of high-level annotation platforms, motivated the present resequencing work. The updated sequence has been reannotated in agreement with the UniProt protein knowledge base, keeping in perspective the split between the paleome (genes necessary for sustaining and perpetuating life) and the cenome (genes required for occupation of a niche, suggesting here that B. subtilis is an epiphyte). This should permit investigators to make reliable inferences to prepare validation experiments in a variety of domains of bacterial growth and development as well as build up accurate phylogenies.
Collapse
Affiliation(s)
- Valérie Barbe
- CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Stéphane Cruveiller
- CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Frank Kunst
- CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Patricia Lenoble
- CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Guillaume Meurice
- Institut Pasteur, Intégration et Analyse Génomiques, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Agnieszka Sekowska
- Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - David Vallenet
- CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Tingzhang Wang
- Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Ivan Moszer
- Institut Pasteur, Intégration et Analyse Génomiques, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Claudine Médigue
- CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Antoine Danchin
- Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
22
|
Chapter 1 A Phylogenetic View of Bacterial Ribonucleases. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2009; 85:1-41. [DOI: 10.1016/s0079-6603(08)00801-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
23
|
Danchin A. Bacteria as computers making computers. FEMS Microbiol Rev 2009; 33:3-26. [PMID: 19016882 PMCID: PMC2704931 DOI: 10.1111/j.1574-6976.2008.00137.x] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Revised: 09/20/2008] [Accepted: 09/21/2008] [Indexed: 12/13/2022] Open
Abstract
Various efforts to integrate biological knowledge into networks of interactions have produced a lively microbial systems biology. Putting molecular biology and computer sciences in perspective, we review another trend in systems biology, in which recursivity and information replace the usual concepts of differential equations, feedback and feedforward loops and the like. Noting that the processes of gene expression separate the genome from the cell machinery, we analyse the role of the separation between machine and program in computers. However, computers do not make computers. For cells to make cells requires a specific organization of the genetic program, which we investigate using available knowledge. Microbial genomes are organized into a paleome (the name emphasizes the role of the corresponding functions from the time of the origin of life), comprising a constructor and a replicator, and a cenome (emphasizing community-relevant genes), made up of genes that permit life in a particular context. The cell duplication process supposes rejuvenation of the machine and replication of the program. The paleome also possesses genes that enable information to accumulate in a ratchet-like process down the generations. The systems biology must include the dynamics of information creation in its future developments.
Collapse
Affiliation(s)
- Antoine Danchin
- Génétique des Génomes Bactériens, Institut Pasteur, Paris, France.
| |
Collapse
|
24
|
Gaudriault S, Pages S, Lanois A, Laroui C, Teyssier C, Jumas-Bilak E, Givaudan A. Plastic architecture of bacterial genome revealed by comparative genomics of Photorhabdus variants. Genome Biol 2008; 9:R117. [PMID: 18647395 PMCID: PMC2530875 DOI: 10.1186/gb-2008-9-7-r117] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Revised: 06/12/2008] [Accepted: 07/22/2008] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND The phenotypic consequences of large genomic architecture modifications within a clonal bacterial population are rarely evaluated because of the difficulties associated with using molecular approaches in a mixed population. Bacterial variants frequently arise among Photorhabdus luminescens, a nematode-symbiotic and insect-pathogenic bacterium. We therefore studied genome plasticity within Photorhabdus variants. RESULTS We used a combination of macrorestriction and DNA microarray experiments to perform a comparative genomic study of different P. luminescens TT01 variants. Prolonged culturing of TT01 strain and a genomic variant, collected from the laboratory-maintained symbiotic nematode, generated bacterial lineages composed of primary and secondary phenotypic variants and colonial variants. The primary phenotypic variants exhibit several characteristics that are absent from the secondary forms. We identify substantial plasticity of the genome architecture of some variants, mediated mainly by deletions in the 'flexible' gene pool of the TT01 reference genome and also by genomic amplification. We show that the primary or secondary phenotypic variant status is independent from global genomic architecture and that the bacterial lineages are genomic lineages. We focused on two unusual genomic changes: a deletion at a new recombination hotspot composed of long approximate repeats; and a 275 kilobase single block duplication belonging to a new class of genomic duplications. CONCLUSION Our findings demonstrate that major genomic variations occur in Photorhabdus clonal populations. The phenotypic consequences of these genomic changes are cryptic. This study provides insight into the field of bacterial genome architecture and further elucidates the role played by clonal genomic variation in bacterial genome evolution.
Collapse
Affiliation(s)
- Sophie Gaudriault
- INRA, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
- Université Montpellier 2, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
| | - Sylvie Pages
- INRA, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
- Université Montpellier 2, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
| | - Anne Lanois
- INRA, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
- Université Montpellier 2, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
| | - Christine Laroui
- INRA, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
- Université Montpellier 2, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
| | - Corinne Teyssier
- Université Montpellier 1, EA 3755, Laboratoire de Bactériologie-Virologie, 15, Avenue Charles Flahault, BP 14491, F-34060 Montpellier Cedex 5, France
| | - Estelle Jumas-Bilak
- Université Montpellier 1, EA 3755, Laboratoire de Bactériologie-Virologie, 15, Avenue Charles Flahault, BP 14491, F-34060 Montpellier Cedex 5, France
| | - Alain Givaudan
- INRA, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
- Université Montpellier 2, UMR 1133, Laboratoire EMIP, Place Eugène Bataillon, F-34095 Montpellier, France
| |
Collapse
|
25
|
Kloster M, Tang C. SCUMBLE: a method for systematic and accurate detection of codon usage bias by maximum likelihood estimation. Nucleic Acids Res 2008; 36:3819-27. [PMID: 18495752 PMCID: PMC2441815 DOI: 10.1093/nar/gkn288] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Revised: 04/22/2008] [Accepted: 04/25/2008] [Indexed: 11/23/2022] Open
Abstract
The genetic code is degenerate--most amino acids can be encoded by from two to as many as six different codons. The synonymous codons are not used with equal frequency: not only are some codons favored over others, but also their usage can vary significantly from species to species and between different genes in the same organism. Known causes of codon bias include differences in mutation rates as well as selection pressure related to the expression level of a gene, but the standard analysis methods can account for only a fraction of the observed codon usage variation. We here introduce an explicit model of codon usage bias, inspired by statistical physics. Combining this model with a maximum likelihood approach, we are able to clearly identify different sources of bias in various genomes. We have applied the algorithm to Saccharomyces cerevisiae as well as 325 prokaryote genomes, and in most cases our model explains essentially all observed variance.
Collapse
Affiliation(s)
- Morten Kloster
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, California 94158, USA and Center for Theoretical Biology, Peking University, Beijing 100871, China
| | - Chao Tang
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, California 94158, USA and Center for Theoretical Biology, Peking University, Beijing 100871, China
| |
Collapse
|
26
|
Riley M, Staley JT, Danchin A, Wang TZ, Brettin TS, Hauser LJ, Land ML, Thompson LS. Genomics of an extreme psychrophile, Psychromonas ingrahamii. BMC Genomics 2008; 9:210. [PMID: 18460197 PMCID: PMC2405808 DOI: 10.1186/1471-2164-9-210] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2007] [Accepted: 05/06/2008] [Indexed: 11/10/2022] Open
Abstract
Background The genome sequence of the sea-ice bacterium Psychromonas ingrahamii 37, which grows exponentially at -12C, may reveal features that help to explain how this extreme psychrophile is able to grow at such low temperatures. Determination of the whole genome sequence allows comparison with genes of other psychrophiles and mesophiles. Results Correspondence analysis of the composition of all P. ingrahamii proteins showed that (1) there are 6 classes of proteins, at least one more than other bacteria, (2) integral inner membrane proteins are not sharply separated from bulk proteins suggesting that, overall, they may have a lower hydrophobic character, and (3) there is strong opposition between asparagine and the oxygen-sensitive amino acids methionine, arginine, cysteine and histidine and (4) one of the previously unseen clusters of proteins has a high proportion of "orphan" hypothetical proteins, raising the possibility these are cold-specific proteins. Based on annotation of proteins by sequence similarity, (1) P. ingrahamii has a large number (61) of regulators of cyclic GDP, suggesting that this bacterium produces an extracellular polysaccharide that may help sequester water or lower the freezing point in the vicinity of the cell. (2) P. ingrahamii has genes for production of the osmolyte, betaine choline, which may balance the osmotic pressure as sea ice freezes. (3) P. ingrahamii has a large number (11) of three-subunit TRAP systems that may play an important role in the transport of nutrients into the cell at low temperatures. (4) Chaperones and stress proteins may play a critical role in transforming nascent polypeptides into 3-dimensional configurations that permit low temperature growth. (5) Metabolic properties of P. ingrahamii were deduced. Finally, a few small sets of proteins of unknown function which may play a role in psychrophily have been singled out as worthy of future study. Conclusion The results of this genomic analysis provide a springboard for further investigations into mechanisms of psychrophily. Focus on the role of asparagine excess in proteins, targeted phenotypic characterizations and gene expression investigations are needed to ascertain if and how the organism regulates various proteins in response to growth at lower temperatures.
Collapse
Affiliation(s)
- Monica Riley
- Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA 02543, USA.
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Chan CKK, Hsu AL, Halgamuge SK, Tang SL. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 2008; 9:215. [PMID: 18442374 PMCID: PMC2383919 DOI: 10.1186/1471-2105-9-215] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Accepted: 04/28/2008] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. RESULTS The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the >/= 10 reads datasets and comparable in the > or = 8 kb benchmark tests. CONCLUSION In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most importantly, the proposed method does not require knowledge from known genomes and uses only very few labels (one per species is sufficient in most cases), which are extracted from the metagenome itself. These advantages make it a very attractive binning method. S-GSOM outperformed the binning methods that depend on already-sequenced genomes, and compares well to the current most advanced binning method, PhyloPythia.
Collapse
|
28
|
Abstract
Phages have highly compact genomes with sizes reflecting their capacity to exploit the host resources. Here, we investigate the reasons for tRNAs being the only translation-associated genes frequently found in phages. We were able to unravel the selective processes shaping the tRNA distribution in phages by analyzing their genomes and those of their hosts. We found ample evidence against tRNAs being selected to facilitate phage integration in the prokaryotic chromosomes. Conversely, there is a significant association between tRNA distribution and codon usage. We support this observation by introducing a master equation model, where tRNAs are randomly gained from their hosts and then lost either neutrally or according to a set of different selection mechanisms. Those tRNAs present in phages tend to correspond to codons that are simultaneously highly used by the phage genes, while rare in the host genome. Accordingly, we propose that a selective recruitment of tRNAs compensates for the compositional differences between the phage and the host genomes. To further understand the importance of these results in phage biology, we analyzed the differences between temperate and virulent phages. Virulent phages contain more tRNAs than temperate ones, higher codon usage biases, and more important compositional differences with respect to the host genome. These differences are thus in perfect agreement with the results of our master equation model and further suggest that tRNA acquisition may contribute to higher virulence. Thus, even though phages use most of the cell's translation machinery, they can complement it with their own genetic information to attain higher fitness. These results suggest that similar selection pressures may act upon other cellular essential genes that are being found in the recently uncovered large viruses.
Collapse
Affiliation(s)
- Marc Bailly-Bechet
- CNRS URA 2171, Institut Pasteur, Unité Génétique in silico, F-75724 Paris Cedex 15, France.
| | | | | |
Collapse
|
29
|
McMurdie PJ, Behrens SF, Holmes S, Spormann AM. Unusual codon bias in vinyl chloride reductase genes of Dehalococcoides species. Appl Environ Microbiol 2007; 73:2744-7. [PMID: 17308190 PMCID: PMC1855607 DOI: 10.1128/aem.02768-06] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Vinyl chloride reductases (VC-RDase) are the key enzymes for complete microbial reductive dehalogenation of chloroethenes, including the groundwater pollutants tetrachloroethene and trichloroethene. Analysis of the codon usage of the VC-RDase genes vcrA and bvcA showed that these genes are highly unusual and are characterized by a low G+C fraction at the third position. The third position of codons in VC-RDase genes is biased toward the nucleotide T, even though available Dehalococcoides genome sequences indicate the absence of any tRNAs matching codons that end in T. The comparatively high level of abnormality in the codon usage of VC-RDase genes suggests an evolutionary history that is different from that of most other Dehalococcoides genes.
Collapse
Affiliation(s)
- Paul J McMurdie
- Department of Civil and Environmental Engineering, James H Clark Center East Wing, E250A, Stanford University, Stanford, CA 94305-5429, USA
| | | | | | | |
Collapse
|
30
|
Charles H, Calevro F, Vinuelas J, Fayard JM, Rahbe Y. Codon usage bias and tRNA over-expression in Buchnera aphidicola after aromatic amino acid nutritional stress on its host Acyrthosiphon pisum. Nucleic Acids Res 2006; 34:4583-92. [PMID: 16963497 PMCID: PMC1636365 DOI: 10.1093/nar/gkl597] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Codon usage bias and relative abundances of tRNA isoacceptors were analysed in the obligate intracellular symbiotic bacterium, Buchnera aphidicola from the aphid Acyrthosiphon pisum, using a dedicated 35mer oligonucleotide microarray. Buchnera is archetypal of organisms living with minimal metabolic requirements and presents a reduced genome with high-evolutionary rate. Codonusage in Buchnera has been overcome by the high mutational bias towards AT bases. However, several lines of evidence for codon usage selection are given here. A significant correlation was found between tRNA relative abundances and codon composition of Buchnera genes. A significant codon usage bias was found for the choice of rare codons in Buchnera: C-ending codons are preferred in highly expressed genes, whereas G-ending codons are avoided. This bias is not explained by GC skew in the bacteria and might correspond to a selection for perfect matching between codon-anticodon pairs for some essential amino acids in Buchnera proteins. Nutritional stress applied to the aphid host induced a significant overexpression of most of the tRNA isoacceptors in bacteria. Although, molecular regulation of the tRNA operons in Buchnera was not investigated, a correlation between relative expression levels and organization in transcription unit was found in the genome of Buchnera.
Collapse
Affiliation(s)
- Hubert Charles
- Laboratoire de Biologie Fonctionnelle Insectes et Interactions, UMR INRA/INSA de Lyon, 203 Bâtiment Louis Pasteur, 69621 Villeurbanne Cedex, France.
| | | | | | | | | |
Collapse
|
31
|
Pascal G, Médigue C, Danchin A. Persistent biases in the amino acid composition of prokaryotic proteins. Bioessays 2006; 28:726-38. [PMID: 16850406 DOI: 10.1002/bies.20431] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.
Collapse
Affiliation(s)
- Géraldine Pascal
- Genoscope/CNRS UMR 8030, Atelier de Génomique Comparative, Evry, France
| | | | | |
Collapse
|