51
|
Wei Z, Wang L, Zhang M, Xuan J, Wang Y, Liu B, Shao L, Li J, Zeng Z, Li T, Liu J, Wang T, Zhang M, Qin S, Xu Y, Feng G, He L, Xing Q. A pharmacogenetic study of risperidone on histamine H3 receptor gene (HRH3) in Chinese Han schizophrenia patients. J Psychopharmacol 2012; 26:813-8. [PMID: 21652606 DOI: 10.1177/0269881111405358] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Evidence suggests that the human histamine H3 receptor (HRH3) may be involved in the pharmacodynamics of risperidone and influence clinical efficacy. More information on the pharmacogenetics of this receptor may therefore be useful in developing individualized therapy. However, to our knowledge, no study has been reported in this area. The aim of this investigation was to clarify whether H3 receptor polymorphism could affect risperidone efficacy. We genotyped tag single nucleotide polymorphisms (SNPs) of the HRH3 gene (rs3787429 and rs3787430) and analyzed their association with the reduction of Brief Psychiatric Rating Scale (BPRS) score in Chinese Han schizophrenia patients (N = 129), following an eight-week period of risperidone monotherapy. The confounding effects of non-genetic factors were estimated, and then the significant one was included as the covariate for adjustment in statistical analysis. Baseline symptom score was the only significant confounding effect and thus the covariate. After adjustment, significant association of HRH3 with antipsychotic efficacy was detected (for rs3787429, p = 0.013, 0.087 after 4 weeks and 8 weeks of treatment, respectively; for rs3787430, p = 0.024, 0.010 after 4 weeks and 8 weeks of treatment, respectively) and stood up to conservative Bonferroni correction. Our results demonstrate that polymorphism of the HRH3 gene may be a potential genetic marker for predicting the therapeutic effect of risperidone, and suggest novel pharmacological links between HRH3 and risperidone. Further studies with larger samples and different ethnic populations are warranted to confirm our results.
Collapse
Affiliation(s)
- Zhiyun Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
52
|
Raiford DW, Heizer EM, Miller RV, Doom TE, Raymer ML, Krane DE. Metabolic and translational efficiency in microbial organisms. J Mol Evol 2012; 74:206-16. [PMID: 22538926 DOI: 10.1007/s00239-012-9500-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 04/05/2012] [Indexed: 11/25/2022]
Abstract
Metabolic efficiency, as a selective force shaping proteomes, has been shown to exist in Escherichia coli and Bacillus subtilis and in a small number of organisms with photoautotrophic and thermophilic lifestyles. Earlier attempts at larger-scale analyses have utilized proxies (such as molecular weight) for biosynthetic cost, and did not consider lifestyle or auxotrophy. This study extends the analysis to all currently sequenced microbial organisms that are amenable to these analyses while utilizing lifestyle specific amino acid biosynthesis pathways (where possible) to determine protein production costs and compensating for auxotrophy. The tendency for highly expressed proteins (with adherence to codon usage bias as a proxy for expressivity) to utilize less biosynthetically expensive amino acids is taken as evidence of cost selection. A comprehensive analysis of sequenced genomes to identify those that exhibit strong translational efficiency bias (389 out of 1,700 sequenced organisms) is also presented.
Collapse
Affiliation(s)
- Douglas W Raiford
- Department of Computer Science, University of Montana, Missoula, MT, USA.
| | | | | | | | | | | |
Collapse
|
53
|
Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J. Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 2012; 13:43. [PMID: 22435713 PMCID: PMC3368730 DOI: 10.1186/1471-2105-13-43] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 03/22/2012] [Indexed: 02/07/2023] Open
Abstract
Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | | | | | | | | | | | |
Collapse
|
54
|
Das S, Roymondal U, Chottopadhyay B, Sahoo S. Gene expression profile of the cynobacterium synechocystis genome. Gene 2012; 497:344-52. [PMID: 22310391 DOI: 10.1016/j.gene.2012.01.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 01/19/2012] [Indexed: 11/26/2022]
Abstract
The expression of functional proteins plays a crucial role in modern biotechnology. The free-living cynobacterium Synechocystis PCC 6803 is an interesting model organism to study oxygenic photosynthesis as well as other metabolic processes. Here we analyze a gene expression profiling methodology, RCBS (the scores of relative codon usage bias) to elucidate expression patterns of genes in the Synechocystis genome. To assess the predictive performance of the methodology, we propose a simple algorithm to calculate the threshold score to identify the highly expressed genes in a genome. Analysis of differential expression of the genes of this genome reveals that most of the genes in photosynthesis and respiration belong to the highly expressed category. The other genes with the higher predicted expression level include ribosomal proteins, translation processing factors and many hypothetical proteins. Only 9.5% genes are identified as highly expressed genes and we observe that highly expressed genes in Synechocystis genome often have strong compositional bias in terms of codon usage. An important application concerns the automatic detection of a set of impact codons and genes that are highly expressed tend to use this narrow set of preferred codons and display high codon bias .We further observe a strong correlation between RCBS and protein length indicating natural selection in favor of shorter genes to be expressed at higher level. The better correlations of RCBS with 2D electrophoresis and microarray data for heat shock proteins compared to the expression measure based on codon usage difference, E(g) and codon adaptive index, CAI indicate that the genomic expression profile available in our method can be applied in a meaningful way to study the mRNA expression patterns, which are by themselves necessary for the quantitative description of the biological states.
Collapse
Affiliation(s)
- Shibsankar Das
- Department of Mathematics, Uluberia College, Uluberia, Howrah, India.
| | | | | | | |
Collapse
|
55
|
Aoi MC, Rourke BC. Interspecific and intragenic differences in codon usage bias among vertebrate myosin heavy-chain genes. J Mol Evol 2011; 73:74-93. [PMID: 21915654 DOI: 10.1007/s00239-011-9457-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 08/19/2011] [Indexed: 01/13/2023]
Abstract
Synonymous codon usage bias is a broadly observed phenomenon in bacteria, plants, and invertebrates and may result from selection. However, the role of selective pressures in shaping codon bias is still controversial in vertebrates, particularly for mammals. The myosin heavy-chain (MyHC) gene family comprises multiple isoforms of the major force-producing contractile protein in cardiac and skeletal muscles. Slow and fast genes are tandemly arrayed on separate chromosomes, and have distinct patterns of functionality and expression in muscle. We analyze both full-length MyHC genes (~5400 bp) and a larger collection of partial sequences at the 3' end (~500 bp). The MyHC isoforms are an interesting system in which to study codon usage bias because of their length, expression, and critical importance to organismal mobility. Codon bias and GC content differs among MyHC genes with regards to functional type, isoform, and position within the gene. Codon bias even varies by isoform within a species. We find evidence in favor of both chromosomal influences on nucleotide composition and selection against nonsense errors (SANE) acting on codon usage in MyHC genes. Intragenic variation in codon bias and elongation rate is significant, with a strong trend for increasing codon bias and elongation rate towards the 3' end of the gene, although the trend is dependent upon the degeneracy class of the codons. Therefore, patterns of codon usage in MyHC genes are consistent with models supporting SANE as a major force shaping codon usage.
Collapse
Affiliation(s)
- Mikio C Aoi
- Department of Mathematics, North Carolina State University, Raleigh, NC 27695, USA
| | | |
Collapse
|
56
|
Retchless AC, Lawrence JG. Quantification of codon selection for comparative bacterial genomics. BMC Genomics 2011; 12:374. [PMID: 21787402 PMCID: PMC3162537 DOI: 10.1186/1471-2164-12-374] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 07/25/2011] [Indexed: 11/16/2022] Open
Abstract
Background Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE). Results This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (e.g. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation. Conclusions The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.
Collapse
Affiliation(s)
- Adam C Retchless
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | |
Collapse
|
57
|
Zhang J, Wang M, Liu WQ, Zhou JH, Chen HT, Ma LN, Ding YZ, Gu YX, Liu YS. Analysis of codon usage and nucleotide composition bias in polioviruses. Virol J 2011; 8:146. [PMID: 21450075 PMCID: PMC3079669 DOI: 10.1186/1743-422x-8-146] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 03/30/2011] [Indexed: 12/15/2022] Open
Abstract
Background Poliovirus, the causative agent of poliomyelitis, is a human enterovirus and a member of the family of Picornaviridae and among the most rapidly evolving viruses known. Analysis of codon usage can reveal much about the molecular evolution of the viruses. However, little information about synonymous codon usage pattern of polioviruses genome has been acquired to date. Methods The relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values, nucleotide contents and dinucleotides were investigated and a comparative analysis of codon usage pattern for open reading frames (ORFs) among 48 polioviruses isolates including 31 of genotype 1, 13 of genotype 2 and 4 of genotype 3. Results The result shows that the overall extent of codon usage bias in poliovirus samples is low (mean ENC = 53.754 > 40). The general correlation between base composition and codon usage bias suggests that mutational pressure rather than natural selection is the main factor that determines the codon usage bias in those polioviruses. Depending on the RSCU data, it was found that there was a significant variation in bias of codon usage among three genotypes. Geographic factor also has some effect on the codon usage pattern (exists in the genotype-1 of polioviruses). No significant effect in gene length or vaccine derived polioviruses (DVPVs), wild viruses and live attenuated virus was observed on the variations of synonymous codon usage in the virus genes. The relative abundance of dinucleotide (CpG) in the ORFs of polioviruses are far below expected values especially in DVPVs and attenuated virus of polioviruses genotype 1. Conclusion The information from this study may not only have theoretical value in understanding poliovirus evolution, especially for DVPVs genotype 1, but also have potential value for the development of poliovirus vaccines.
Collapse
Affiliation(s)
- Jie Zhang
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046 Gansu, China
| | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Wang M, Liu YS, Zhou JH, Chen HT, Ma LN, Ding YZ, Liu WQ, Gu YX, Zhang J. Analysis of codon usage in Newcastle disease virus. Virus Genes 2011; 42:245-53. [PMID: 21249440 PMCID: PMC7088932 DOI: 10.1007/s11262-011-0574-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 01/09/2011] [Indexed: 11/28/2022]
Abstract
In this study, the relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values, nucleotide contents, and dinucleotide were used to investigate codon usage pattern of each protein-coding gene and genome among 31 Newcastle disease virus (NDV) isolates. The result shows that the overall extent of codon usage bias in NDV is low (mean ENC = 56.15 > 40). The good correlation between the (C + G)12% and (G + C)3% suggests that the mutational pressure, rather than natural selection, is the main factor that determines the codon usage bias and base component in NDV. It is observed that synonymous codon usage pattern in NDV genes is gene function and geography specific, but not host specific. By contrasting synonymous codon usage patterns of different NDV isolates, we suggest that more than one genotype of NDV circulates in waterfowl in USA; and gene length has no significant effect on the variations of synonymous codon usage in these virus genes. CpG under-represented is a characteristic for NDV to fit in its host. These results not only provide an insight into the variation of codon usage pattern among the genomes of NDV, but also may help in understanding the processes governing the evolution of NDV.
Collapse
Affiliation(s)
- Meng Wang
- Key Laboratory of Animal Virology of Ministry of Agriculture, State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046 Gansu, People's Republic of China
| | | | | | | | | | | | | | | | | |
Collapse
|
59
|
Selected codon usage bias in members of the class Mollicutes. Gene 2010; 473:110-8. [PMID: 21147204 DOI: 10.1016/j.gene.2010.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Revised: 11/20/2010] [Accepted: 11/22/2010] [Indexed: 11/24/2022]
Abstract
Mollicutes are parasitic microorganisms mainly characterized by small cell sizes, reduced genomes and great A and T mutational bias. We analyzed the codon usage patterns of the completely sequenced genomes of bacteria that belong to this class. We found that for many organisms not only mutational bias but also selection has a major effect on codon usage. Through a comparative perspective and based on three widely used criteria we were able to classify Mollicutes according to the effect of selection on codon usage. We found conserved optimal codons in many species and study the tRNA gene pool in each genome. Previous results are reinforced by the fact that, when selection is operative, the putative optimal codons found match the respective cognate tRNA. Finally, we trace selection effect backwards to the common ancestor of the class and estimate the phylogenetic inertia associated with this character. We discuss the possible scenarios that explain the observed evolutionary patterns.
Collapse
|
60
|
Supek F, Vlahoviček K. Erratum to: Comparison of codon usage measures and their applicability in prediction of microbial gene expressivity. BMC Bioinformatics 2010. [PMCID: PMC2945942 DOI: 10.1186/1471-2105-11-463] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
61
|
von Mandach C, Merkl R. Genes optimized by evolution for accurate and fast translation encode in Archaea and Bacteria a broad and characteristic spectrum of protein functions. BMC Genomics 2010; 11:617. [PMID: 21050470 PMCID: PMC3091758 DOI: 10.1186/1471-2164-11-617] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 11/04/2010] [Indexed: 11/13/2022] Open
Abstract
Background In many microbial genomes, a strong preference for a small number of codons can be observed in genes whose products are needed by the cell in large quantities. This codon usage bias (CUB) improves translational accuracy and speed and is one of several factors optimizing cell growth. Whereas CUB and the overrepresentation of individual proteins have been studied in detail, it is still unclear which high-level metabolic categories are subject to translational optimization in different habitats. Results In a systematic study of 388 microbial species, we have identified for each genome a specific subset of genes characterized by a marked CUB, which we named the effectome. As expected, gene products related to protein synthesis are abundant in both archaeal and bacterial effectomes. In addition, enzymes contributing to energy production and gene products involved in protein folding and stabilization are overrepresented. The comparison of genomes from eleven habitats shows that the environment has only a minor effect on the composition of the effectomes. As a paradigmatic example, we detailed the effectome content of 37 bacterial genomes that are most likely exposed to strongest selective pressure towards translational optimization. These effectomes accommodate a broad range of protein functions like enzymes related to glycolysis/gluconeogenesis and the TCA cycle, ATP synthases, aminoacyl-tRNA synthetases, chaperones, proteases that degrade misfolded proteins, protectants against oxidative damage, as well as cold shock and outer membrane proteins. Conclusions We made clear that effectomes consist of specific subsets of the proteome being involved in several cellular functions. As expected, some functions are related to cell growth and affect speed and quality of protein synthesis. Additionally, the effectomes contain enzymes of central metabolic pathways and cellular functions sustaining microbial life under stress situations. These findings indicate that cell growth is an important but not the only factor modulating translational accuracy and speed by means of CUB.
Collapse
|
62
|
Flynn KM, Vohr SH, Hatcher PJ, Cooper VS. Evolutionary rates and gene dispensability associate with replication timing in the archaeon Sulfolobus islandicus. Genome Biol Evol 2010; 2:859-69. [PMID: 20978102 PMCID: PMC3000693 DOI: 10.1093/gbe/evq068] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In bacterial chromosomes, the position of a gene relative to the single origin of replication generally reflects its replication timing, how often it is expressed, and consequently, its rate of evolution. However, because some archaeal genomes contain multiple origins of replication, bias in gene dosage caused by delayed replication should be minimized and hence the substitution rate of genes should associate less with chromosome position. To test this hypothesis, six archaeal genomes from the genus Sulfolobus containing three origins of replication were selected, conserved orthologs were identified, and the evolutionary rates (dN and dS) of these orthologs were quantified. Ortholog families were grouped by their consensus position and designated by their proximity to one of the three origins (O1, O2, O3). Conserved orthologs were concentrated near the origins and most variation in genome content occurred distant from the origins. Linear regressions of both synonymous and nonsynonymous substitution rates on distance from replication origins were significantly positive, the rates being greatest in the region furthest from any of the origins and slowest among genes near the origins. Genes near O1 also evolved faster than those near O2 and O3, which suggest that this origin may fire later in the cell cycle. Increased evolutionary rates and gene dispensability are strongly associated with reduced gene expression caused in part by reduced gene dosage during the cell cycle. Therefore, in this genus of Archaea as well as in many Bacteria, evolutionary rates and variation in genome content associate with replication timing.
Collapse
Affiliation(s)
- Kenneth M Flynn
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, USA
| | | | | | | |
Collapse
|
63
|
Zhao WM, Qiao N, Wang XB, Chen Q, Cheng JH, Xu Q, Chen GH. Comparative genomic analysis of growth hormone gene in geese. Anim Sci J 2010; 82:62-6. [PMID: 21269361 DOI: 10.1111/j.1740-0929.2010.00812.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
To explore the mutation characteristic of growth hormone (GH) gene in geese, all the exons and introns of the gene were amplified by 20 pairs of primers, and then single nucleotide polymorphisms (SNPs) were detected by single strand conformation polymorphism (SSCP) and subsequently confirmed by sequencing. There were six SNPs per 1000 nucleotides in exons compared to two SNPs per 1000 nucleotides in intron regions. The variant in exons contained only one non-synonymous mutation and three synonymous mutations. The results show that its sequence identity with chicken and duck were 77.54% and 92.38%, respectively, which may be concluded that the GH gene was highly conservative in phylogenesis, although there were differences between waterfowls and chicken in their evolution process.
Collapse
Affiliation(s)
- Wen-ming Zhao
- Animal Science and Technology College, Yangzhou University, Yangzhou, China
| | | | | | | | | | | | | |
Collapse
|
64
|
Supek F, Škunca N, Repar J, Vlahoviček K, Šmuc T. Translational selection is ubiquitous in prokaryotes. PLoS Genet 2010; 6:e1001004. [PMID: 20585573 PMCID: PMC2891978 DOI: 10.1371/journal.pgen.1001004] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2009] [Accepted: 05/26/2010] [Indexed: 11/29/2022] Open
Abstract
Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea. Synonymous codons are not equally common in genomes. The main causes of unequal codon usage are varying nucleotide substitution patterns, as manifested in the wide range of genomic nucleotide compositions. However, since the first E. coli and yeast genes were sequenced, it became evident that there was also a bias towards codons that can be translated to protein faster and more accurately. This bias was stronger in highly expressed genes, and its driving force was termed translational selection. Researchers sought for effects of translational selection in microbial genomes as they became available, employing a flurry of mathematical approaches which sometimes led to contradictory conclusions. We introduce a sensitive and accurate machine learning-based methodology and find that highly expressed genes have a recognizable codon usage pattern in almost every bacterial and archaeal genome analyzed, even after accounting for large differences in background nucleotide composition. We also show that the gene functional category has a great bearing on whether that gene is subject to translational selection. Since presence of codon optimizations can be used as a purely sequence-derived proxy for expression levels, we can delineate “adaptomes” by relating predicted gene activity to organisms' phenotypes, which we demonstrate on genomes of temperature-resistant Bacteria and Archaea.
Collapse
Affiliation(s)
- Fran Supek
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Nives Škunca
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Jelena Repar
- Division of Molecular Biology, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Kristian Vlahoviček
- Division of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
- * E-mail:
| |
Collapse
|
65
|
Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 2010; 17:185-96. [PMID: 20453079 PMCID: PMC2885275 DOI: 10.1093/dnares/dsq012] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
Collapse
Affiliation(s)
- Jesse M Fox
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Road, Baltimore, MD 21228, USA
| | | |
Collapse
|
66
|
Martin J, Zhu W, Passalacqua KD, Bergman N, Borodovsky M. Bacillus anthracis genome organization in light of whole transcriptome sequencing. BMC Bioinformatics 2010; 11 Suppl 3:S10. [PMID: 20438648 PMCID: PMC2863060 DOI: 10.1186/1471-2105-11-s3-s10] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.
Collapse
|
67
|
On relevance of codon usage to expression of synthetic and natural genes in Escherichia coli. Genetics 2010; 185:1129-34. [PMID: 20421604 DOI: 10.1534/genetics.110.115477] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A recent investigation concluded that codon bias did not affect expression of green fluorescent protein (GFP) variants in Escherichia coli, while stability of an mRNA secondary structure near the 5' end played a dominant role. We demonstrate that combining the two variables using regression trees or support vector regression yields a biologically plausible model with better support in the GFP data set and in other experimental data: codon usage is relevant for protein levels if the 5' mRNA structures are not strong. Natural E. coli genes had weaker 5' mRNA structures than the examined set of GFP variants and did not exhibit a correlation between the folding free energy of 5' mRNA structures and protein expression.
Collapse
|
68
|
Ishikawa T, Sakurai A, Hirano H, Lezhava A, Sakurai M, Hayashizaki Y. Emerging New Technologies in Pharamcogenomics: Rapid SNP detection, molecular dynamic simulation, and QSAR analysis methods to validate clinically important genetic variants of human ABC Transporter ABCB1 (P-gp/MDR1). Pharmacol Ther 2010; 126:69-81. [DOI: 10.1016/j.pharmthera.2010.01.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2010] [Accepted: 01/19/2010] [Indexed: 01/18/2023]
|
69
|
Why genes evolve faster on secondary chromosomes in bacteria. PLoS Comput Biol 2010; 6:e1000732. [PMID: 20369015 PMCID: PMC2848543 DOI: 10.1371/journal.pcbi.1000732] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2009] [Accepted: 03/03/2010] [Indexed: 01/01/2023] Open
Abstract
In bacterial genomes composed of more than one chromosome, one replicon is typically larger, harbors more essential genes than the others, and is considered primary. The greater variability of secondary chromosomes among related taxa has led to the theory that they serve as an accessory genome for specific niches or conditions. By this rationale, purifying selection should be weaker on genes on secondary chromosomes because of their reduced necessity or usage. To test this hypothesis we selected bacterial genomes composed of multiple chromosomes from two genera, Burkholderia and Vibrio, and quantified the evolutionary rates (dN and dS) of all orthologs within each genus. Both evolutionary rate parameters were faster among orthologs found on secondary chromosomes than those on the primary chromosome. Further, in every bacterial genome with multiple chromosomes that we studied, genes on secondary chromosomes exhibited significantly weaker codon usage bias than those on primary chromosomes. Faster evolution and reduced codon bias could in turn result from global effects of chromosome position, as genes on secondary chromosomes experience reduced dosage and expression due to their delayed replication, or selection on specific gene attributes. These alternatives were evaluated using orthologs common to genomes with multiple chromosomes and genomes with single chromosomes. Analysis of these ortholog sets suggested that inherently fast-evolving genes tend to be sorted to secondary chromosomes when they arise; however, prolonged evolution on a secondary chromosome further accelerated substitution rates. In summary, secondary chromosomes in bacteria are evolutionary test beds where genes are weakly preserved and evolve more rapidly, likely because they are used less frequently. Why many bacteria have multiple chromosomes is largely unknown, but a leading hypothesis is that secondary chromosomes evolved from plasmids and now serve as accessory genomes. We tested a key prediction of this theory that genes on secondary chromosomes should evolve faster because they are under less selective constraint. Indeed, orthologous genes shared within two groups of bacteria (Burkholderia or Vibrio) with multiple chromosomes were less conserved and evolved more rapidly when found on secondary chromosomes. Much of these patterns could stem from the tendency of secondary chromosomes to be replicated later in the cell cycle, which reduces their gene dosage, their potential for expression, and selection for their optimal translation. However, the content of secondary chromosomes appears to be predisposed to evolve faster, because these same genes still evolve more rapidly in single-chromosome genomes. In summary, the evolution of divided genomes therefore appears to allow for the long-term segregation of genome content by their rates of expression and dispensability, placing some genes at increased risk of mutational decay and greater turnover.
Collapse
|
70
|
Boda U, Vadapalli S, Calambur N, Nallari P. Novel mutations in beta-myosin heavy chain, actin and troponin-I genes associated with dilated cardiomyopathy in Indian population. J Genet 2010; 88:373-7. [PMID: 20086309 DOI: 10.1007/s12041-009-0057-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Ushasree Boda
- Department of Genetics, Osmania University, Hyderabad 500 001, India
| | | | | | | |
Collapse
|
71
|
McMahon DP, Hayward A, Kathirithamby J. The mitochondrial genome of the 'twisted-wing parasite' Mengenilla australiensis (Insecta, Strepsiptera): a comparative study. BMC Genomics 2009; 10:603. [PMID: 20003419 PMCID: PMC2800125 DOI: 10.1186/1471-2164-10-603] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Accepted: 12/14/2009] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Strepsiptera are an unusual group of sexually dimorphic, entomophagous parasitoids whose evolutionary origins remain elusive. The lineage leading to Mengenilla australiensis (Family Mengenillidae) is the sister group to all remaining extant strepsipterans. It is unique in that members of this family have retained a less derived condition, where females are free-living from pupation onwards, and are structurally much less simplified. We sequenced almost the entire mitochondrial genome of M. australiensis as an important comparative data point to the already available genome of its distant relative Xenos vesparum (Family Xenidae). This study represents the first in-depth comparative mitochondrial genomic analysis of Strepsiptera. RESULTS The partial genome of M. australiensis is presented as a 13421 bp fragment, across which all 13 protein-coding genes (PCGs), 2 ribosomal RNA (rRNA) genes and 18 transfer RNA (tRNA) sequences are identified. Two tRNA translocations disrupt an otherwise ancestral insect mitochondrial genome order. A+T content is measured at 84.3%, C-content is also very skewed. Compared with M. australiensis, codon bias in X. vesparum is more balanced. Interestingly, the size of the protein coding genome is truncated in both strepsipterans, especially in X. vesparum which, uniquely, has 4.3% fewer amino acids than the average holometabolan complement. A revised assessment of mitochondrial rRNA secondary structure based on comparative structural considerations is presented for M. australiensis and X. vesparum. CONCLUSIONS The mitochondrial genome of X. vesparum has undergone a series of alterations which are probably related to an extremely derived lifestyle. Although M. australiensis shares some of these attributes; it has retained greater signal from the hypothetical most recent common ancestor (MRCA) of Strepsiptera, inviting the possibility that a shift in the mitochondrial selective environment might be related to the specialization accompanying the evolution of a small, morphologically simplified completely host-dependent lifestyle. These results provide useful insights into the nature of the evolutionary transitions that accompanied the emergence of Strepsiptera, but we emphasize the need for adequate sampling across the order in future investigations concerning the extraordinary developmental and evolutionary origins of this group.
Collapse
Affiliation(s)
- Dino P McMahon
- Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford, OX1 3PS, UK
| | - Alexander Hayward
- Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford, OX1 3PS, UK
| | - Jeyaraney Kathirithamby
- Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford, OX1 3PS, UK
| |
Collapse
|
72
|
Gao J, Chen LL. Theoretical methods for identifying important functional genes in bacterial genomes. Res Microbiol 2009; 161:1-8. [PMID: 19900539 DOI: 10.1016/j.resmic.2009.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2009] [Revised: 10/05/2009] [Accepted: 10/21/2009] [Indexed: 12/30/2022]
Abstract
Some functional genes, such as essential genes, highly expressed genes and horizontally transferred genes, play important roles in the survival and pathogenicity of bacteria. This review attempts to summarize current computational methods in identifying the above functional genes from bacterial genomes, which is of significant importance in exploring the bacterial genomes.
Collapse
Affiliation(s)
- Junxiang Gao
- School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, PR China
| | | |
Collapse
|
73
|
Das S, Roymondal U, Sahoo S. Analyzing gene expression from relative codon usage bias in Yeast genome: a statistical significance and biological relevance. Gene 2009; 443:121-31. [PMID: 19410638 DOI: 10.1016/j.gene.2009.04.022] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2008] [Revised: 03/08/2009] [Accepted: 04/20/2009] [Indexed: 11/17/2022]
Abstract
Based on the hypothesis that highly expressed genes are often characterized by strong compositional bias in terms of codon usage, there are a number of measures currently in use that quantify codon usage bias in genes, and hence provide numerical indices to predict the expression levels of genes. With the recent advent of expression measure from the score of the relative codon usage bias (RCBS), we have explicitly tested the performance of this numerical measure to predict the gene expression level and illustrate this with an analysis of Yeast genomes. In contradiction with previous other studies, we observe a weak correlations between GC content and RCBS, but a selective pressure on the codon preferences in highly expressed genes. The assertion that the expression of a given gene depends on the score of relative codon usage bias (RCBS) is supported by the data. We further observe a strong correlation between RCBS and protein length indicating natural selection in favour of shorter genes to be expressed at higher level. We also attempt a statistical analysis to assess the strength of relative codon bias in genes as a guide to their likely expression level, suggesting a decrease of the informational entropy in the highly expressed genes.
Collapse
Affiliation(s)
- Shibsankar Das
- Department of Mathematics, Uluberia College, Uluberia, Howrah, W.B., India
| | | | | |
Collapse
|
74
|
Roymondal U, Das S, Sahoo S. Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res 2009; 16:13-30. [PMID: 19131380 PMCID: PMC2646356 DOI: 10.1093/dnares/dsn029] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present an expression measure of a gene, devised to predict the level of gene expression from relative codon bias (RCB). There are a number of measures currently in use that quantify codon usage in genes. Based on the hypothesis that gene expressivity and codon composition is strongly correlated, RCB has been defined to provide an intuitively meaningful measure of an extent of the codon preference in a gene. We outline a simple approach to assess the strength of RCB (RCBS) in genes as a guide to their likely expression levels and illustrate this with an analysis of Escherichia coli (E. coli) genome. Our efforts to quantitatively predict gene expression levels in E. coli met with a high level of success. Surprisingly, we observe a strong correlation between RCBS and protein length indicating natural selection in favour of the shorter genes to be expressed at higher level. The agreement of our result with high protein abundances, microarray data and radioactive data demonstrates that the genomic expression profile available in our method can be applied in a meaningful way to the study of cell physiology and also for more detailed studies of particular genes of interest.
Collapse
Affiliation(s)
- Uttam Roymondal
- Department of Mathematics, Raidighi College, South 24 Parganas, Raidighi, West Bengal, India
| | | | | |
Collapse
|
75
|
Salvato P, Simonato M, Battisti A, Negrisolo E. The complete mitochondrial genome of the bag-shelter moth Ochrogaster lunifer (Lepidoptera, Notodontidae). BMC Genomics 2008; 9:331. [PMID: 18627592 PMCID: PMC2488359 DOI: 10.1186/1471-2164-9-331] [Citation(s) in RCA: 174] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2008] [Accepted: 07/15/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Knowledge of animal mitochondrial genomes is very important to understand their molecular evolution as well as for phylogenetic and population genetic studies. The Lepidoptera encompasses more than 160,000 described species and is one of the largest insect orders. To date only nine lepidopteran mitochondrial DNAs have been fully and two others partly sequenced. Furthermore the taxon sampling is very scant. Thus advance of lepidopteran mitogenomics deeply requires new genomes derived from a broad taxon sampling. In present work we describe the mitochondrial genome of the moth Ochrogaster lunifer. RESULTS The mitochondrial genome of O. lunifer is a circular molecule 15593 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. It contains also 7 intergenic spacers. The gene order of the newly sequenced genome is that typical for Lepidoptera and differs from the insect ancestral type for the placement of trnM. The 77.84% A+T content of its alpha strand is the lowest among known lepidopteran genomes. The mitochondrial genome of O. lunifer exhibits one of the most marked C-skew among available insect Pterygota genomes. The protein-coding genes have typical mitochondrial start codons except for cox1 that present an unusual CGA. The O. lunifer genome exhibits the less biased synonymous codon usage among lepidopterans. Comparative genomics analysis study identified atp6, cox1, cox2 as cox3, cob, nad1, nad2, nad4, and nad5 as potential markers for population genetics/phylogenetics studies. A peculiar feature of O. lunifer mitochondrial genome it that the intergenic spacers are mostly made by repetitive sequences. CONCLUSION The mitochondrial genome of O. lunifer is the first representative of superfamily Noctuoidea that account for about 40% of all described Lepidoptera. New genome shares many features with other known lepidopteran genomes. It differs however for its low A+T content and marked C-skew. Compared to other lepidopteran genomes it is less biased in synonymous codon usage. Comparative evolutionary analysis of lepidopteran mitochondrial genomes allowed the identification of previously neglected coding genes as potential phylogenetic markers. Presence of repetitive elements in intergenic spacers of O. lunifer genome supports the role of DNA slippage as possible mechanism to produce spacers during replication.
Collapse
Affiliation(s)
- Paola Salvato
- Department of Public Health, Comparative Pathology and Veterinary Hygiene, University of Padova, Agripolis, Viale dell'Università 16, 35020 Legnaro, Italy.
| | | | | | | |
Collapse
|
76
|
Puigbò P, Bravo IG, Garcia-Vallvé S. E-CAI: a novel server to estimate an expected value of Codon Adaptation Index (eCAI). BMC Bioinformatics 2008; 9:65. [PMID: 18230160 PMCID: PMC2246156 DOI: 10.1186/1471-2105-9-65] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 01/29/2008] [Indexed: 11/26/2022] Open
Abstract
Background The Codon Adaptation Index (CAI) is a measure of the synonymous codon usage bias for a DNA or RNA sequence. It quantifies the similarity between the synonymous codon usage of a gene and the synonymous codon frequency of a reference set. Extreme values in the nucleotide or in the amino acid composition have a large impact on differential preference for synonymous codons. It is thence essential to define the limits for the expected value of CAI on the basis of sequence composition in order to properly interpret the CAI and provide statistical support to CAI analyses. Though several freely available programs calculate the CAI for a given DNA sequence, none of them corrects for compositional biases or provides confidence intervals for CAI values. Results The E-CAI server, available at , is a web-application that calculates an expected value of CAI for a set of query sequences by generating random sequences with G+C and amino acid content similar to those of the input. An executable file, a tutorial, a Frequently Asked Questions (FAQ) section and several examples are also available. To exemplify the use of the E-CAI server, we have analysed the codon adaptation of human mitochondrial genes that codify a subunit of the mitochondrial respiratory chain (excluding those genes that lack a prokaryotic orthologue) and are encoded in the nuclear genome. It is assumed that these genes were transferred from the proto-mitochondrial to the nuclear genome and that its codon usage was then ameliorated. Conclusion The E-CAI server provides a direct threshold value for discerning whether the differences in CAI are statistically significant or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences.
Collapse
Affiliation(s)
- Pere Puigbò
- Evolutionary Genomics Group, Department of Biochemistry and Biotechnology, Rovira i Virgili University (URV), Campus Sescelades, c/Marcelli Domingo s/n, 43007 Tarragona, Spain.
| | | | | |
Collapse
|
77
|
Fuglsang A. Impact of bias discrepancy and amino acid usage on estimates of the effective number of codons used in a gene, and a test for selection on codon usage. Gene 2007; 410:82-8. [PMID: 18248919 DOI: 10.1016/j.gene.2007.12.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2007] [Revised: 10/22/2007] [Accepted: 12/03/2007] [Indexed: 11/26/2022]
Abstract
The effective number of codons (Nc) used in a gene is one of the most commonly used measures of synonymous codon usage bias, owing much of its popularity to the fact that it is species independent and that simulation studies have shown that it is less dependent of gene length than other measures. In this paper I provide a clear and practically meaningful definition of bias discrepancy (BD; when the degree of codon bias varies within a degeneracy class). Moreover I evaluate the impact of BD and amino acid usage on estimates of Nc. It is shown that both factors have a significant effect on accuracy and precision. Both amino acid usage and BD influence accuracy considerably, especially in short genes. Finally, I demonstrate how the definition of bias discrepancy can be applied to investigate if codon usage is influenced by selection and I discuss this test in relation to the incongruous literature that exists for Buchnera sp. APS and Borrelia burgdorferi.
Collapse
Affiliation(s)
- Anders Fuglsang
- University of Copenhagen, Faculty of Pharmaceutical Sciences, 2 Universitetsparken, Copenhagen O, Denmark.
| |
Collapse
|
78
|
Chen R, Yan H, Zhao KN, Martinac B, Liu GB. Comprehensive analysis of prokaryotic mechanosensation genes: their characteristics in codon usage. ACTA ACUST UNITED AC 2007; 18:269-78. [PMID: 17541832 DOI: 10.1080/10425170601136564] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In the present study, we examined GC nucleotide composition, relative synonymous codon usage (RSCU), effective number of codons (ENC), codon adaptation index (CAI) and gene length for 308 prokaryotic mechanosensitive ion channel (MSC) genes from six evolutionary groups: Euryarchaeota, Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Firmicutes, and Gammaproteobacteria. Results showed that: (1) a wide variation of overrepresentation of nucleotides exists in the MSC genes; (2) codon usage bias varies considerably among the MSC genes; (3) both nucleotide constraint and gene length play an important role in shaping codon usage of the bacterial MSC genes; and (4) synonymous codon usage of prokaryotic MSC genes is phylogenetically conserved. Knowledge of codon usage in prokaryotic MSC genes may benefit from the study of the MSC genes in eukaryotes in which few MSC genes have been identified and functionally analysed.
Collapse
Affiliation(s)
- Rong Chen
- School of Medicine, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | | | | | | | | |
Collapse
|
79
|
Ferro A, Giugno R, Pigola G, Pulvirenti A, Di Pietro C, Purrello M, Ragusa M. Sequence similarity is more relevant than species specificity in probabilistic backtranslation. BMC Bioinformatics 2007; 8:58. [PMID: 17313665 PMCID: PMC1810562 DOI: 10.1186/1471-2105-8-58] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Accepted: 02/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. RESULTS This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. CONCLUSION The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Collapse
Affiliation(s)
- Alfredo Ferro
- Dipartimento di Matematica e Informatica, Università di Catania, Viale A. Doria 6, I-95125 Catania, Italy
- Dipartimento di Scienze Biomediche, Università di Catania, Via S. Sofia 87, I-95125 Catania, Italy
| | - Rosalba Giugno
- Dipartimento di Matematica e Informatica, Università di Catania, Viale A. Doria 6, I-95125 Catania, Italy
| | - Giuseppe Pigola
- Dipartimento di Matematica e Informatica, Università di Catania, Viale A. Doria 6, I-95125 Catania, Italy
| | - Alfredo Pulvirenti
- Dipartimento di Matematica e Informatica, Università di Catania, Viale A. Doria 6, I-95125 Catania, Italy
| | - Cinzia Di Pietro
- Dipartimento di Scienze Biomediche, Università di Catania, Via S. Sofia 87, I-95125 Catania, Italy
| | - Michele Purrello
- Dipartimento di Scienze Biomediche, Università di Catania, Via S. Sofia 87, I-95125 Catania, Italy
| | - Marco Ragusa
- Dipartimento di Scienze Biomediche, Università di Catania, Via S. Sofia 87, I-95125 Catania, Italy
| |
Collapse
|
80
|
Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM. A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science 2006; 315:525-8. [PMID: 17185560 DOI: 10.1126/science.1135308] [Citation(s) in RCA: 1799] [Impact Index Per Article: 99.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Synonymous single-nucleotide polymorphisms (SNPs) do not produce altered coding sequences, and therefore they are not expected to change the function of the protein in which they occur. We report that a synonymous SNP in the Multidrug Resistance 1 (MDR1) gene, part of a haplotype previously linked to altered function of the MDR1 gene product P-glycoprotein (P-gp), nonetheless results in P-gp with altered drug and inhibitor interactions. Similar mRNA and protein levels, but altered conformations, were found for wild-type and polymorphic P-gp. We hypothesize that the presence of a rare codon, marked by the synonymous polymorphism, affects the timing of cotranslational folding and insertion of P-gp into the membrane, thereby altering the structure of substrate and inhibitor interaction sites.
Collapse
MESH Headings
- ATP Binding Cassette Transporter, Subfamily B, Member 1/antagonists & inhibitors
- ATP Binding Cassette Transporter, Subfamily B, Member 1/chemistry
- ATP Binding Cassette Transporter, Subfamily B, Member 1/genetics
- ATP Binding Cassette Transporter, Subfamily B, Member 1/metabolism
- Animals
- Cell Line
- Cell Membrane/metabolism
- Chlorocebus aethiops
- Codon
- Cyclosporine/pharmacology
- Genes, MDR
- Haplotypes
- HeLa Cells
- Humans
- Mutagenesis, Site-Directed
- Polymorphism, Single Nucleotide
- Protein Biosynthesis
- Protein Conformation
- Protein Folding
- Protein Structure, Tertiary
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- Reverse Transcriptase Polymerase Chain Reaction
- Rhodamine 123/metabolism
- Rhodamine 123/pharmacology
- Sirolimus/pharmacology
- Substrate Specificity
- Transfection
- Verapamil/metabolism
- Verapamil/pharmacology
Collapse
Affiliation(s)
- Chava Kimchi-Sarfaty
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | |
Collapse
|
81
|
Sánchez J, López-Villaseñor I. A simple model to explain three-base periodicity in coding DNA. FEBS Lett 2006; 580:6413-22. [PMID: 17097640 DOI: 10.1016/j.febslet.2006.10.056] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2006] [Revised: 10/04/2006] [Accepted: 10/19/2006] [Indexed: 11/26/2022]
Abstract
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.
Collapse
Affiliation(s)
- Joaquín Sánchez
- Facultad de Medicina, UAEM, Av. Universidad 1001, Cuernavaca, Morelos, CP 62210, México D.F., Mexico.
| | | |
Collapse
|
82
|
Webster BL, Mackenzie-Dodds JA, Telford MJ, Littlewood DTJ. The mitochondrial genome of Priapulus caudatus Lamarck (Priapulida: Priapulidae). Gene 2006; 389:96-105. [PMID: 17123748 DOI: 10.1016/j.gene.2006.10.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2006] [Revised: 09/19/2006] [Accepted: 10/09/2006] [Indexed: 11/16/2022]
Abstract
We sequenced and annotated the complete mitochondrial (mt) genome of the priapulid Priapulus caudatus in order to provide a source of phylogenetic characters including an assessment of gene order arrangement. The genome was 14,919 bp in its entirety with few, short non-coding regions. A number of protein-coding and tRNA genes overlapped, making the genome relatively compact. The gene order was: cox1, cox2, trnK, trnD, atp8, atp6, cox3, trnG, nad3, trnA, trnR, trnN, rrnS, trnV, rrnL, trnL(yaa), trnL(nag), nad1, -trnS(nga), -cob, -nad6, trnP, -trnT, nad4L, nad4, trnH, nad5, trnF, -trnE, -trnS(nct), trnI, -trnQ, trnM, nad2, trnW, -trnC, -trnY; where '-' indicates genes transcribed on the opposite strand. The gene order, although unique amongst Metazoa, shared the greatest number of gene boundaries and the longest contiguous fragments with the chelicerate Limulus polyphemus. The mt genomes of these taxa differed only by a single inversion of 18 contiguous genes bounded by rrnS and trnS(nct). Other arthropods and nematodes shared fewer gene boundaries but considerably more than the most similar non-ecdysozoan.
Collapse
Affiliation(s)
- Bonnie L Webster
- Department of Zoology, Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | | | | | | |
Collapse
|
83
|
Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 2006; 7:142. [PMID: 16542435 PMCID: PMC1489950 DOI: 10.1186/1471-2105-7-142] [Citation(s) in RCA: 265] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2005] [Accepted: 03/16/2006] [Indexed: 01/25/2023] Open
Abstract
Background Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes.
Collapse
Affiliation(s)
- Stephan Waack
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Oliver Keller
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Roman Asper
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Thomas Brodag
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Carsten Damm
- Institut für Numerische und Angewandte Mathematik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Wolfgang Florian Fricke
- Göttingen Genomics Laboratory, Universität Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany
| | - Katharina Surovcik
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Peter Meinicke
- Institut für Mikrobiologie und Genetik, Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, Universitätsstr. 31, 93053 Regensburg, Germany
| |
Collapse
|