1
|
Alonso AM, Diambra L. SARS-CoV-2 Codon Usage Bias Downregulates Host Expressed Genes With Similar Codon Usage. Front Cell Dev Biol 2020; 8:831. [PMID: 32974353 PMCID: PMC7468442 DOI: 10.3389/fcell.2020.00831] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 08/04/2020] [Indexed: 12/31/2022] Open
Abstract
Severe acute respiratory syndrome has spread quickly throughout the world and was declared a pandemic by the World Health Organization (WHO). The pathogenic agent is a new coronavirus (SARS-CoV-2) that infects pulmonary cells with great effectiveness. In this study we focus on the codon composition for the viral protein synthesis and its relationship with the protein synthesis of the host. Our analysis reveals that SARS-CoV-2 preferred codons have poor representation of G or C nucleotides in the third position, a characteristic which could result in an unbalance in the tRNAs pools of the infected cells with serious implications in host protein synthesis. By integrating this observation with proteomic data from infected cells, we observe a reduced translation rate of host proteins associated with highly expressed genes and that they share the codon usage bias of the virus. The functional analysis of these genes suggests that this mechanism of epistasis can contribute to understanding how this virus evades the immune response and the etiology of some deleterious collateral effect as a result of the viral replication. In this manner, our finding contributes to the understanding of the SARS-CoV-2 pathogeny and could be useful for the design of a vaccine based on the live attenuated strategy.
Collapse
Affiliation(s)
- Andres Mariano Alonso
- InTech, Universidad Nacional de San Martin, Chascomús, Argentina
- CONICET, Chascomús, Argentina
| | - Luis Diambra
- CONICET, Chascomús, Argentina
- CREG, Universidad Nacional de La Plata, La Plata, Argentina
| |
Collapse
|
2
|
Sawyer EB, Grabowska AD, Cortes T. Translational regulation in mycobacteria and its implications for pathogenicity. Nucleic Acids Res 2019; 46:6950-6961. [PMID: 29947784 PMCID: PMC6101614 DOI: 10.1093/nar/gky574] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 06/14/2018] [Indexed: 01/13/2023] Open
Abstract
Protein synthesis is a fundamental requirement of all cells for survival and replication. To date, vast numbers of genetic and biochemical studies have been performed to address the mechanisms of translation and its regulation in Escherichia coli, but only a limited number of studies have investigated these processes in other bacteria, particularly in slow growing bacteria like Mycobacterium tuberculosis, the causative agent of human tuberculosis. In this Review, we highlight important differences in the translational machinery of M. tuberculosis compared with E. coli, specifically the presence of two additional proteins and subunit stabilizing elements such as the B9 bridge. We also consider the role of leaderless translation in the ability of M. tuberculosis to establish latent infection and look at the experimental evidence that translational regulatory mechanisms operate in mycobacteria during stress adaptation, particularly focussing on differences in toxin-antitoxin systems between E. coli and M. tuberculosis and on the role of tuneable translational fidelity in conferring phenotypic antibiotic resistance. Finally, we consider the implications of these differences in the context of the biological adaptation of M. tuberculosis and discuss how these regulatory mechanisms could aid in the development of novel therapeutics for tuberculosis.
Collapse
Affiliation(s)
- Elizabeth B Sawyer
- Pathogen Molecular Biology Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK.,TB Centre, London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| | - Anna D Grabowska
- Pathogen Molecular Biology Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK.,TB Centre, London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| | - Teresa Cortes
- Pathogen Molecular Biology Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK.,TB Centre, London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| |
Collapse
|
3
|
Differential interaction strategies of hepatitis c virus genotypes during entry - An in silico investigation of envelope glycoprotein E2 - CD81 interaction. INFECTION GENETICS AND EVOLUTION 2019; 69:48-60. [PMID: 30639544 DOI: 10.1016/j.meegid.2019.01.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 12/12/2018] [Accepted: 01/08/2019] [Indexed: 12/12/2022]
Abstract
Hepatitis C Virus is a blood borne pathogen responsible for chronic hepatitis in more than 71 million people. Wide variations across strains and genotypes are one of the major hurdles in therapeutic development. While genotype 1 remains the most extensively studied and abundant strain, genotype 3 is more virulent and second most prevalent. This study aimed to compare differences in the glycoprotein E2 across HCV genotypes at nucleotide, protein and structural levels. Nucleotide sequences of E2 from 29 strains across genotypes 1a, 1b, 3a and 3b revealed a stark preference for C-richness which was attributed to a distinct bias for C-rich codons in genotype 1. Genotype 3 exhibited a similar preference to a lesser extent. Amino acid level comparison revealed majority of the changes at the C-terminal half of the proteins leaving the N-terminal region conspicuously conserved apart from the two hyper variable regions. Amino acid changes across genotypes were mostly polar-nonpolar alterations. In silico models of E2 glycoproteins and docking analysis with the energy minimized PDB-CD81 model revealed unique interacting residues in both E2 and CD81. While several CD81 binding residues were common for all four genotypes, number and composition of interacting residues varied. The interacting residues of E2 were however unique for each genotype. E2 of genotype 3a and CD81 had the strongest interaction. In conclusion this is the first comprehensive study comparing E2 sequences across genotypes 1a, 1b, 3a and 3b revealing stark genotype-specific differences which requires more extensive investigation.
Collapse
|
4
|
Comprehensive Analysis and Comparison on the Codon Usage Pattern of Whole Mycobacterium tuberculosis Coding Genome from Different Area. BIOMED RESEARCH INTERNATIONAL 2018; 2018:3574976. [PMID: 29854746 PMCID: PMC5964552 DOI: 10.1155/2018/3574976] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 02/25/2018] [Accepted: 03/28/2018] [Indexed: 11/18/2022]
Abstract
Phenomenon of unequal use of synonymous codons in Mycobacterium tuberculosis is common. Codon usage bias not only plays an important regulatory role at the level of gene expression, but also helps in improving the accuracy and efficiency of translation. Meanwhile, codon usage pattern of Mycobacterium tuberculosis genome is important for interpreting evolutionary characteristics in species. In order to investigate the codon usage pattern of the Mycobacterium tuberculosis genome, 12 Mycobacterium tuberculosis genomes from different area are downloaded from the GeneBank. The correlations between G3, GC12, whole GC content, codon adaptation index, codon bias index, and so on of Mycobacterium tuberculosis genomes are calculated. The ENC-plot, relationship between A3/(A3 + T3) and G3/(G3 + C3), GC12 versus GC3 plot, and the RSCU of overall/separated genomes all show that the codon usage bias exists in all 12 Mycobacterium tuberculosis genomes. Lastly, relationship between CBI and the equalization of ENC shows a strong negative correlation between them. The relationship between protein length and GC content (GC3 and GC12) shows that more obvious differences in the GC content may be in shorter protein. These results show that codon usage bias existing in the Mycobacterium tuberculosis genomes could be used for further study on their evolutionary phenomenon.
Collapse
|
5
|
McCarthy C, Carrea A, Diambra L. Bicodon bias can determine the role of synonymous SNPs in human diseases. BMC Genomics 2017; 18:227. [PMID: 28288557 PMCID: PMC5347174 DOI: 10.1186/s12864-017-3609-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Accepted: 03/04/2017] [Indexed: 01/09/2023] Open
Abstract
Background For a long time synonymous single nucleotide polymorphisms were considered as silent mutations. However, nowadays it is well known that they can affect protein conformation and function, leading to altered disease susceptibilities, differential prognosis and/or drug responses, among other clinically relevant genetic traits. This occurs through different mechanisms: by disrupting the splicing signals of precursor mRNAs, affecting regulatory binding-sites of transcription factors and miRNAs, or by modifying the secondary structure of mRNAs. Results In this paper we considered 22 human genetic diseases or traits, linked to 35 synonymous single nucleotide polymorphisms in 27 different genes. We performed a local sequence context analysis in terms of the ribosomal pause propensity affected by synonymous single nucleotide polymorphisms. We found that synonymous mutations related to the above mentioned mechanisms presented small pause propensity changes, whereas synonymous mutations that were not related to those mechanisms presented large pause propensity changes. On the other hand, we did not observe large variations in the codon usage of codons associated with these mutations. Furthermore, we showed that the changes in the pause propensity associated with benign sSNPs are significantly lower than the pause propensity changes related to sSNPs associated to diseases. Conclusions These results suggest that the genetic diseases or traits related to synonymous mutations with large pause propensity changes, could be the consequence of another mechanism underlying non-silent synonymous mutations. Namely, alternative protein configuration related, in turn, to alterations in the ribosome-mediated translational attenuation program encoded by pairs of consecutive codons, not codons. These findings shed light on the latter mechanism based on the perturbation of the co-translational folding process. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3609-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Christina McCarthy
- Centro Regional de Estudio Génomicos, Universidad Nacional de La Plata, Boulevard 120, La Plata, Argentina.,CONICET, Buenos Aires, Argentina.,Departamento de Informática y Tecnología, Escuela de Ciencias Agrarias, Naturales y Ambientales, Universidad Nacional del Noroeste de la Provincia de Buenos Aires, Pergamino, Argentina
| | - Alejandra Carrea
- Centro Regional de Estudio Génomicos, Universidad Nacional de La Plata, Boulevard 120, La Plata, Argentina.,CONICET, Buenos Aires, Argentina
| | - Luis Diambra
- Centro Regional de Estudio Génomicos, Universidad Nacional de La Plata, Boulevard 120, La Plata, Argentina. .,CONICET, Buenos Aires, Argentina.
| |
Collapse
|
6
|
Diambra LA. Differential bicodon usage in lowly and highly abundant proteins. PeerJ 2017; 5:e3081. [PMID: 28289571 PMCID: PMC5346287 DOI: 10.7717/peerj.3081] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 02/10/2017] [Indexed: 01/23/2023] Open
Abstract
Degeneracy in the genetic code implies that different codons can encode the same amino acid. Usage preference of synonymous codons has been observed in all domains of life. There is much evidence suggesting that this bias has a major role on protein elongation rate, contributing to differential expression and to co-translational folding. In addition to codon usage bias, other preference variations have been observed such as codon pairs. In this paper, I report that codon pairs have significant different frequency usage for coding either lowly or highly abundant proteins. These usage preferences cannot be explained by the frequency usage of the single codons. The statistical analysis of coding sequences of nine organisms reveals that in many cases bicodon preferences are shared between related organisms. Furthermore, it is observed that misfolding in the drug-transport protein, encoded by MDR1 gene, is better explained by a big change in the pause propensity due to the synonymous bicodon variant, rather than by a relatively small change in codon usage. These findings suggest that codon pair usage can be a more powerful framework to understand translation elongation rate, protein folding efficiency, and to improve protocols to optimize heterologous gene expression.
Collapse
Affiliation(s)
- Luis A. Diambra
- Centro Regional de Estudios Genómicos, Universidad Nacional de La Plata, CONICET, La Plata, Argentina
| |
Collapse
|
7
|
Meyer MM. The role of mRNA structure in bacterial translational regulation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27301829 DOI: 10.1002/wrna.1370] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 05/12/2016] [Accepted: 05/16/2016] [Indexed: 01/08/2023]
Abstract
The characteristics of bacterial messenger RNAs (mRNAs) that influence translation efficiency provide many convenient handles for regulation of gene expression, especially when coupled with the processes of transcription termination and mRNA degradation. An mRNA's structure, especially near the site of initiation, has profound consequences for how readily it is translated. This property allows bacterial gene expression to be altered by changes to mRNA structure induced by temperature, or interactions with a wide variety of cellular components including small molecules, other RNAs (such as sRNAs and tRNAs), and RNA-binding proteins. This review discusses the links between mRNA structure and translation efficiency, and how mRNA structure is manipulated by conditions and signals within the cell to regulate gene expression. The range of RNA regulators discussed follows a continuum from very complex tertiary structures such as riboswitch aptamers and ribosomal protein-binding sites to thermosensors and mRNA:sRNA interactions that involve only base-pairing interactions. Furthermore, the high degrees of diversity observed for both mRNA structures and the mechanisms by which inhibition of translation occur have significant consequences for understanding the evolution of bacterial translational regulation. WIREs RNA 2017, 8:e1370. doi: 10.1002/wrna.1370 For further resources related to this article, please visit the WIREs website.
Collapse
|
8
|
Chakraborti P, Banerjee R, Roy A, Mandal S, Mukhopadhyay S. Molecular characterization influencing metal resistance in the Cupriavidus/Ralstonia genomes. J Biomol Struct Dyn 2015; 33:2330-46. [PMID: 26156561 DOI: 10.1080/07391102.2015.1069214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Our environment is stressed with a load of heavy and toxic metals. Microbes, abundant in our environment, are found to adapt well to this metal-stressed condition. A comparative study among five Cupriavidus/Ralstonia genomes can offer a better perception of their evolutionary mechanisms to adapt to these conditions. We have studied codon usage among 1051 genes common to all these organisms and identified 15 optimal codons frequently used in highly expressed genes present within 1051 genes. We found the core genes of Cupriavidus metallidurans CH34 have a different optimal codon choice for arginine, glycine and alanine in comparison with the other four bacteria. We also found that the synonymous codon usage bias within these 1051 core genes is highly correlated with their gene expression. This supports that translational selection drives synonymous codon usage in the core genes of these genomes. Synonymous codon usage is highly conserved in the core genes of these five genomes. The only exception among them is C. metallidurans CH34. This genomewide shift in synonymous codon choice in C. metallidurans CH34 may have taken place due to the insertion of new genes in its genomes facilitating them to survive in heavy metal containing environment and the co-evolution of the other genes in its genome to achieve a balance in gene expression. Structural studies indicated the presence of a longer N-terminal region containing a copper-binding domain in the cupC proteins of C. metallidurans CH3 that helps it to attain higher binding efficacy with copper in comparison with its orthologs.
Collapse
Affiliation(s)
- Pratim Chakraborti
- a Apt Software Avenues Pvt. Ltd, Unit G 301, Block DC , City Centre , Sector I, Salt Lake, Kolkata 700064 , India
| | - Rachana Banerjee
- b Department of Biophysics, Molecular Biology and Bioinformatics , University of Calcutta , 92, A.P.C. Road, Kolkata 700009 , India
| | - Ayan Roy
- c NBU Bioinformatics Facility, Department of Botany , University of North Bengal , Raja Rammohanpur, Siliguri 734013 , India
| | - Sunanda Mandal
- b Department of Biophysics, Molecular Biology and Bioinformatics , University of Calcutta , 92, A.P.C. Road, Kolkata 700009 , India
| | - Subhasish Mukhopadhyay
- b Department of Biophysics, Molecular Biology and Bioinformatics , University of Calcutta , 92, A.P.C. Road, Kolkata 700009 , India
| |
Collapse
|
9
|
Ma YP, Liu ZX, Hao L, Ma JY, Liang ZL, Li YG, Ke H. Analysing codon usage bias of cyprinid herpesvirus 3 and adaptation of this virus to the hosts. JOURNAL OF FISH DISEASES 2015; 38:665-673. [PMID: 25491502 DOI: 10.1111/jfd.12316] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 08/31/2014] [Accepted: 09/04/2014] [Indexed: 06/04/2023]
Abstract
The codon usage patterns of open reading frames (ORFs) in cyprinid herpesvirus 3 (CyHV-3) have been investigated in this study. The high correlation between GC12 % and GC3 % suggests that mutational pressure rather than natural selection is the main factor that determines the codon usage and base component in the CyHV-3, while mutational pressure effect results from the high correlation between GC3 % and the first principal axis of principle component analysis (Axis 1) on the relative synonymous codon usage (RSCU) value of the viral functional genes. However, the interaction between the absolute codon usage bias and GC3 % suggests that other selections take part in the formation of codon usage, except for the mutational pressure. It is noted that the similarity degree of codon usage between the CyHV-3 and goldfish, Carassius auratus (L.), is higher than that between the virus and common carp, Cyprinus carpio L., suggesting that the goldfish plays a more important role than the common carp in codon usage pattern of the CyHV-3. The study of codon usage in CyHV-3 can provide some evidence about the molecular evolution of the virus. It can also enrich our understanding about the relationship between the CyHV-3 and its hosts by analysing their codon usage patterns.
Collapse
Affiliation(s)
- Y P Ma
- Guangdong Public Laboratory of Veterinary Public Health, Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Z X Liu
- Guangdong Public Laboratory of Veterinary Public Health, Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - L Hao
- Guangdong Public Laboratory of Veterinary Public Health, Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - J Y Ma
- Guangdong Public Laboratory of Veterinary Public Health, Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Z L Liang
- Guangdong Public Laboratory of Veterinary Public Health, Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Y G Li
- South China Agricultural University, Guangzhou, China
| | - H Ke
- Guangdong Public Laboratory of Veterinary Public Health, Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
10
|
Zhou HQ, Ning LW, Zhang HX, Guo FB. Analysis of the relationship between genomic GC Content and patterns of base usage, codon usage and amino acid usage in prokaryotes: similar GC content adopts similar compositional frequencies regardless of the phylogenetic lineages. PLoS One 2014; 9:e107319. [PMID: 25255224 PMCID: PMC4177787 DOI: 10.1371/journal.pone.0107319] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 08/08/2014] [Indexed: 11/19/2022] Open
Abstract
The GC contents of 2670 prokaryotic genomes that belong to diverse phylogenetic lineages were analyzed in this paper. These genomes had GC contents that ranged from 13.5% to 74.9%. We analyzed the distance of base frequencies at the three codon positions, codon frequencies, and amino acid compositions across genomes with respect to the differences in the GC content of these prokaryotic species. We found that although the phylogenetic lineages were remote among some species, a similar genomic GC content forced them to adopt similar base usage patterns at the three codon positions, codon usage patterns, and amino acid usage patterns. Our work demonstrates that in prokaryotic genomes: a) base usage, codon usage, and amino acid usage change with GC content with a linear correlation; b) the distance of each usage has a linear correlation with the GC content difference; and c) GC content is more essential than phylogenetic lineage in determining base usage, codon usage, and amino acid usage. This work is exceptional in that we adopted intuitively graphic methods for all analyses, and we used these analyses to examine as many as 2670 prokaryotes. We hope that this work is helpful for understanding common features in the organization of microbial genomes.
Collapse
Affiliation(s)
- Hui-Qi Zhou
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Lu-Wen Ning
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui-Xiong Zhang
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
- * E-mail:
| |
Collapse
|
11
|
Codon usage bias of the phosphoprotein gene of spring viraemia of carp virus and high codon adaptation to the host. Arch Virol 2014; 159:1841-7. [PMID: 24519460 DOI: 10.1007/s00705-014-2000-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Accepted: 10/05/2013] [Indexed: 10/25/2022]
Abstract
In this study, we calculated the relative synonymous codon usage (RSCU) value and the effective number of codons (ENC) value to carry out principal component analysis (PCA) and correlation analysis of the codon usage pattern of the phosphoprotein gene (P gene) of spring viraemia of carp virus (SVCV). The synonymous codon usage pattern in P genes is geography-specific, based on PCA analysis. The high correlation between (G + C)1,2 % and (G + C)3 % suggests that mutational pressure rather than natural selection is the main factor that determines the codon usage and base components in P genes. At least 40 out of 59 synonymous codons are similarly selected in all functional genes within five complete SVCV genomes, and the hosts based on the RSCU data. These results not only provide insight into variations in the codon usage pattern of SVCV but also may help in understanding the processes governing the evolution of SVCV.
Collapse
|
12
|
Belalov IS, Lukashev AN. Causes and implications of codon usage bias in RNA viruses. PLoS One 2013; 8:e56642. [PMID: 23451064 PMCID: PMC3581513 DOI: 10.1371/journal.pone.0056642] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 01/15/2013] [Indexed: 12/03/2022] Open
Abstract
Choice of synonymous codons depends on nucleotide/dinucleotide composition of the genome (termed mutational pressure) and relative abundance of tRNAs in a cell (translational pressure). Mutational pressure is commonly simplified to genomic GC content; however mononucleotide and dinucleotide frequencies in different genomes or mRNAs may vary significantly, especially in RNA viruses. A series of in silico shuffling algorithms were developed to account for these features and analyze the relative impact of mutational pressure components on codon usage bias in RNA viruses. Total GC content was a poor descriptor of viral genome composition and causes of codon usage bias. Genomic nucleotide content was the single most important factor of synonymous codon usage. Moreover, the choice between compatible amino acids (e.g., leucine and isoleucine) was strongly affected by genomic nucleotide composition. Dinucleotide composition at codon positions 2-3 had additional effect on codon usage. Together with mononucleotide composition bias, it could explain almost the entire codon usage bias in RNA viruses. On the other hand, strong dinucleotide content bias at codon position 3-1 found in some viruses had very little effect on codon usage. A hypothetical innate immunity sensor for CpG in RNA could partially explain the codon usage bias, but due to dependence of virus translation upon biased host translation machinery, experimental studies are required to further explore the source of dinucleotide bias in RNA viruses.
Collapse
Affiliation(s)
- Ilya S. Belalov
- Chumakov Institute of Poliomyelitis and Viral Encephalitides, Russian Academy of Medical Sciences, Moscow, Russia
| | - Alexander N. Lukashev
- Chumakov Institute of Poliomyelitis and Viral Encephalitides, Russian Academy of Medical Sciences, Moscow, Russia
- Institute for Virology, University of Bonn Medical Center, Bonn, Germany
| |
Collapse
|
13
|
A comparative analysis on the synonymous codon usage pattern in viral functional genes and their translational initiation region of ASFV. Virus Genes 2012; 46:271-9. [PMID: 23161403 DOI: 10.1007/s11262-012-0847-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 11/01/2012] [Indexed: 01/21/2023]
Abstract
The synonymous codon usage pattern of African swine fever virus (ASFV), the similarity degree of the synonymous codon usage between this virus and some organisms and the synonymous codon usage bias for the translation initiation region of viral functional genes in the whole genome of ASFV have been investigated by some simply statistical analyses. Although both GC12% (the GC content at the first and second codon positions) and GC3% (the GC content at the third codon position) of viral functional genes have a large fluctuation, the significant correlations between GC12 and GC3% and between GC3% and the first principal axis of principle component analysis on the relative synonymous codon usage of the viral functional genes imply that mutation pressure of ASFV plays an important role in the synonymous codon usage pattern. Turning to the synonymous codon usage of this virus, the codons with U/A end predominate in the synonymous codon family for the same amino acid and a weak codon usage bias in both leading and lagging strands suggests that strand compositional asymmetry does not take part in the formation of codon usage in ASFV. The interaction between the absolute codon usage bias and GC3% suggests that other selections take part in the formation of codon usage, except for the mutation pressure. It is noted that the similarity degree of codon usage between ASFV and soft tick is higher than that between the virus and the pig, suggesting that the soft tick plays a more important role than the pig in the codon usage pattern of ASFV. The translational initiation region of the viral functional genes generally have a strong tendency to select some synonymous codons with low GC content, suggesting that the synonymous codon usage bias caused by translation selection from the host takes part in modulating the translation initiation efficiency of ASFV functional genes.
Collapse
|
14
|
Sanjukta R, Farooqi MS, Sharma N, Rai A, Mishra DC, Singh DP. Trends in the codon usage patterns of Chromohalobacter salexigens genes. Bioinformation 2012; 8:1087-95. [PMID: 23251043 PMCID: PMC3523223 DOI: 10.6026/97320630081087] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2012] [Accepted: 10/06/2012] [Indexed: 11/23/2022] Open
Abstract
Chromohalobacter salexigens, a Gammaproteobacterium belonging to the family Halomonadaceae, shows a broad salinity range for growth. In order to reveal the factors influencing architecture of protein coding genes in C. salexigens, pattern of synonymous codon usage bias has been investigated. Overall codon usage analysis of the microorganism revealed that C and G ending codons are predominantly used in all the genes which are indicative of mutational bias. Multivariate statistical analysis showed that the genes are separated along the first major explanatory axis according to their expression levels and their genomic GC content at the synonymous third positions of the codons. Both NC plot and correspondence analysis on Relative Synonymous Codon Usage (RSCU) indicates that the variation in codon usage among the genes may be due to mutational bias at the DNA level and natural selection acting at the level of mRNA translation. Gene length and the hydrophobicity of the encoded protein also influence the codon usage variation of genes to some extent. A comparison of the relative synonymous codon usage between 10% each of highly and lowly expressed genes determines 23 optimal codons, which are statistically over represented in the former group of genes and may provide useful information for salt-stressed gene prediction and gene-transformation. Furthermore, genes for regulatory functions; mobile and extrachromosomal element functions; and cell envelope are observed to be highly expressed. The study could provide insight into the gene expression response of halophilic bacteria and facilitate establishment of effective strategies to develop salt-tolerant crops of agronomic value.
Collapse
Affiliation(s)
- Rajkumari Sanjukta
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Mohammad Samir Farooqi
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Naveen Sharma
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Anil Rai
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Dwijesh Chandra Mishra
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Dhananjaya P Singh
- National Bureau of Agriculturally Important Microorganisms, Mau Nath Bhanjan, UP – 275 101
| |
Collapse
|
15
|
Analysis of base and codon usage by rubella virus. Arch Virol 2012; 157:889-99. [DOI: 10.1007/s00705-012-1243-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Accepted: 12/24/2011] [Indexed: 11/25/2022]
|
16
|
Pan A, Chanda I, Chakrabarti J. Analysis of the genome and proteome composition of Bdellovibrio bacteriovorus: indication for recent prey-derived horizontal gene transfer. Genomics 2011; 98:213-22. [PMID: 21722725 DOI: 10.1016/j.ygeno.2011.06.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2010] [Revised: 05/18/2011] [Accepted: 06/14/2011] [Indexed: 10/18/2022]
Abstract
The genome/proteome composition of Bdellovibrio bacteriovorus, the predatory microorganism that preys on other Gram-negative bacteria, has been analyzed. The study elucidates that translational selection plays a major role in genome compositional variation with higher intensity compared to other deltaproteobacteria. Other sources of variations having relatively minor contributions are local GC-bias, horizontal gene transfer and strand-specific mutational bias. The study identifies a group of AT-rich genes with distinct codon composition that is presumably acquired by Bdellovibrio recently from Gram-negative prey-bacteria other than deltaproteobacteria. The proteome composition of this species is influenced by various physico-chemical factors, viz, alcoholicity, residue-charge, aromaticity and hydropathy. Cell-wall-surface-anchor-family (CSAPs) and transporter proteins with distinct amino acid composition and specific secondary-structure also contribute notably to proteome compositional variation. CSAPs, which are low molecular-weight, outer-membrane proteins with highly disordered secondary-structure, have preference toward polar-uncharged residues and cysteine that presumably help in prey-predator interaction by providing particular bonds of attachment.
Collapse
Affiliation(s)
- Archana Pan
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry-605014, India.
| | | | | |
Collapse
|
17
|
Tang SL, Chang BC, Halgamuge SK. Gene functionality's influence on the second codon: A large-scale survey of second codon composition in three domains. Genomics 2010; 96:92-101. [DOI: 10.1016/j.ygeno.2010.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Revised: 02/03/2010] [Accepted: 04/07/2010] [Indexed: 10/19/2022]
|
18
|
Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 2010; 17:185-96. [PMID: 20453079 PMCID: PMC2885275 DOI: 10.1093/dnares/dsq012] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
Collapse
Affiliation(s)
- Jesse M Fox
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Road, Baltimore, MD 21228, USA
| | | |
Collapse
|
19
|
RoyChoudhury S, Mukherjee D. A detailed comparative analysis on the overall codon usage pattern in herpesviruses. Virus Res 2010; 148:31-43. [DOI: 10.1016/j.virusres.2009.11.018] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Revised: 11/27/2009] [Accepted: 11/30/2009] [Indexed: 11/30/2022]
|
20
|
Synonymous codon usage analysis of thirty two mycobacteriophage genomes. Adv Bioinformatics 2010:316936. [PMID: 20150956 PMCID: PMC2817497 DOI: 10.1155/2009/316936] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2009] [Accepted: 10/27/2009] [Indexed: 11/17/2022] Open
Abstract
Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc) and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.
Collapse
|
21
|
Chen LL, Ma BG, Gao N. Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043. FEBS J 2007; 275:198-206. [PMID: 18067578 DOI: 10.1111/j.1742-4658.2007.06190.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Over-annotation of hypothetical ORFs is a common phenomenon in bacterial genomes, which necessitates confirming the coding reliability of hypothetical ORFs and then predicting their functions. The important plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043 (Eca1043) is a typical case because more than a quarter of its annotated ORFs are hypothetical. Our analysis focuses on annotation of Eca1043 hypothetical ORFs, and comprises two efforts: (a) based on the Z-curve method, 49 originally annotated hypothetical ORFs are recognized as noncoding, this is further supported by principal components analysis and other evidence; and (b) using sequence-alignment tools and some functional resources, more than a half of the hypothetical genes were assigned functions. The potential functions of 427 hypothetical genes are summarized according to the cluster of orthologous groups functional category. Moreover, 114 and 86 hypothetical genes are recognized as putative 'membrane proteins' and 'exported proteins', respectively. Reannotation of Eca1043 hypothetical ORFs will benefit research into the lifestyle, metabolism and pathogenicity of the important plant pathogen. Also, our study proffers a model for the reannotation of hypothetical ORFs in microbial genomes.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Shandong University of Technology, Zibo, China.
| | | | | |
Collapse
|
22
|
Moura GR, Lousado JP, Pinheiro M, Carreto L, Silva RM, Oliveira JL, Santos MAS. Codon-triplet context unveils unique features of the Candida albicans protein coding genome. BMC Genomics 2007; 8:444. [PMID: 18047667 PMCID: PMC2244636 DOI: 10.1186/1471-2164-8-444] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 11/29/2007] [Indexed: 11/29/2022] Open
Abstract
Background The evolutionary forces that determine the arrangement of synonymous codons within open reading frames and fine tune mRNA translation efficiency are not yet understood. In order to tackle this question we have carried out a large scale study of codon-triplet contexts in 11 fungal species to unravel associations or relationships between codons present at the ribosome A-, P- and E-sites during each decoding cycle. Results Our analysis unveiled high bias within the context of codon-triplets, in particular strong preference for triplets of identical codons. We have also identified a surprisingly large number of codon-triplet combinations that vanished from fungal ORFeomes. Candida albicans exacerbated these features, showed an unbalanced tRNA population for decoding its pool of codons and used near-cognate decoding for a large set of codons, suggesting that unique evolutionary forces shaped the evolution of its ORFeome. Conclusion We have developed bioinformatics tools for large-scale analysis of codon-triplet contexts. These algorithms identified codon-triplets context biases, allowed for large scale comparative codon-triplet analysis, and identified rules governing codon-triplet context. They could also detect alterations to the standard genetic code.
Collapse
Affiliation(s)
- Gabriela R Moura
- Department of Biology and CESAM, University of Aveiro, 3810-193 Aveiro, Portugal.
| | | | | | | | | | | | | |
Collapse
|
23
|
Das S, Paul S, Dutta C. Evolutionary constraints on codon and amino acid usage in two strains of human pathogenic actinobacteria Tropheryma whipplei. J Mol Evol 2006; 62:645-58. [PMID: 16557339 DOI: 10.1007/s00239-005-0164-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2005] [Accepted: 12/20/2005] [Indexed: 12/13/2022]
Abstract
The factors governing codon and amino acid usages in the predicted protein-coding sequences of Tropheryma whipplei TW08/27 and Twist genomes have been analyzed. Multivariate analysis identifies the replicational-transcriptional selection coupled with DNA strand-specific asymmetric mutational bias as a major driving force behind the significant interstrand variations in synonymous codon usage patterns in T. whipplei genes, while a residual intrastrand synonymous codon bias is imparted by a selection force operating at the level of translation. The strand-specific mutational pressure has little influence on the amino acid usage, for which the mean hydropathy level and aromaticity are the major sources of variation, both having nearly equal impact. In spite of the intracellular lifestyle, the amino acid usage in highly expressed gene products of T. whipplei follows the cost-minimization hypothesis. The products of the highly expressed genes of these relatively A + T-rich actinobacteria prefer to use the residues encoded by GC-rich codons, probably due to greater conservation of a GC-rich ancestral state in the highly expressed genes, as suggested by the lower values of the rate of nonsynonymous divergences between orthologous sequences of highly expressed genes from the two strains of T. whipplei. Both the genomes under study are characterized by the presence of two distinct groups of membrane-associated genes, products of which exhibit significant differences in primary and potential secondary structures as well as in the propensity of protein disorder.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Centre, Indian Institute of Chemical Biology, 4 Raja S. C. Mullick Road, Kolkata 700 032, India
| | | | | |
Collapse
|
24
|
Cayabyab MJ, Hovav AH, Hsu T, Krivulka GR, Lifton MA, Gorgone DA, Fennelly GJ, Haynes BF, Jacobs WR, Letvin NL. Generation of CD8+ T-cell responses by a recombinant nonpathogenic Mycobacterium smegmatis vaccine vector expressing human immunodeficiency virus type 1 Env. J Virol 2006; 80:1645-52. [PMID: 16439521 PMCID: PMC1367151 DOI: 10.1128/jvi.80.4.1645-1652.2006] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Because the vaccine vectors currently being evaluated in human populations all have significant limitations in their immunogenicity, novel vaccine strategies are needed for the elicitation of cell-mediated immunity. The nonpathogenic, rapidly growing mycobacterium Mycobacterium smegmatis was engineered as a vector expressing full-length human immunodeficiency virus type 1 (HIV-1) HXBc2 envelope protein. Immunization of mice with recombinant M. smegmatis led to the expansion of major histocompatibility complex class I-restricted HIV-1 epitope-specific CD8(+) T cells that were cytolytic and secreted gamma interferon. Effector and memory T lymphocytes were elicited, and repeated immunization generated a stable central memory pool of virus-specific cells. Importantly, preexisting immunity to Mycobacterium bovis BCG had only a marginal effect on the immunogenicity of recombinant M. smegmatis. This mycobacterium may therefore be a useful vaccine vector.
Collapse
Affiliation(s)
- Mark J Cayabyab
- Department of Medicine, Division of Viral Pathogenesis, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Ave., Boston, MA 02130, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Wu G, Nie L, Zhang W. Predicted highly expressed genes in Nocardia farcinica and the implication for its primary metabolism and nocardial virulence. Antonie van Leeuwenhoek 2006; 89:135-46. [PMID: 16496092 DOI: 10.1007/s10482-005-9016-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2005] [Accepted: 09/26/2005] [Indexed: 01/30/2023]
Abstract
Nocardia farcinica is a Gram positive, filamentous bacterium, and is considered an opportunistic pathogen. In this study, the highly expressed genes in N. farcinica were predicted using the codon adaptation index (CAI) as a numerical estimator of gene expressivity. Using ribosomal protein (RP) genes as references, the top approximately approximately 10% of the genes were predicted to be the predicted highly expressed (PHX) genes in N. farcinica using a CAI cutoff of greater than 0.73. Consistent with earlier analysis of Streptomyces genomes, most of the PHX genes in N. farcinica were involved in various 'house-keeping' functions important for cell growth. However, 15 genes putatively involved in nocardial virulence were predicted as PHX genes in N. farcinica, which included genes encoding four Mce proteins, cyclopropane fatty acid synthase which is involved in the modification of cell wall which may be important for nocardia virulence, polyketide synthase PKS13 for mycolic acid synthesis and a non-ribosomal peptide synthetase involved in biosynthesis of a mycobactin-related siderophore. In addition, multiple genes involved in defense against reactive oxygen species (ROS) produced by the phagocyte were predicted with high expressivity, which included alkylhydroperoxide reductase (ahpC), catalase (katG), superoxide dismutase (sodF), thioredoxin, thioredoxin reductase, glutathione peroxidase, and peptide methionine sulfoxide reductase, suggesting that combating against ROS is essential for survival of N. farcinica in host cells. The study also showed that the distribution of PHX genes in the N. farcinica circular chromosome was uneven, with more PHX genes located in the regions close to replication initiation site. The results provided the first estimates of global gene expression patterns in N. farcinica, which will be useful in guiding experimental design for further investigations.
Collapse
Affiliation(s)
- Gang Wu
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | | | | |
Collapse
|
26
|
Pascal G, Médigue C, Danchin A. Persistent biases in the amino acid composition of prokaryotic proteins. Bioessays 2006; 28:726-38. [PMID: 16850406 DOI: 10.1002/bies.20431] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.
Collapse
Affiliation(s)
- Géraldine Pascal
- Genoscope/CNRS UMR 8030, Atelier de Génomique Comparative, Evry, France
| | | | | |
Collapse
|
27
|
Das S, Paul S, Dutta C. Synonymous codon usage in adenoviruses: influence of mutation, selection and protein hydropathy. Virus Res 2005; 117:227-36. [PMID: 16307819 DOI: 10.1016/j.virusres.2005.10.007] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 10/19/2005] [Accepted: 10/19/2005] [Indexed: 11/23/2022]
Abstract
Trends in synonymous codon usage in adenoviruses have been examined through the multivariate statistical analysis on the annotated protein-coding regions of 22 adenoviral species, for which complete genome sequences are available. One of the major determinants of such trends is the G+C content at third codon positions of the genes, the average value of which varied from one viral genome to other depending on the overall mutational bias of the species. G3S and C3S interacted synergistically along the first principal axis of correspondence analysis on the Relative Synonymous Codon Usage of adenoviral genes, but antagonistically along the second principal axis. The intra-genomic variation in codon usage pattern in adenoviruses is generally influenced by asymmetrical mutational bias in two DNA strands. Other major determinants of the trends are the natural selection, putatively operative at the level of translation and quite interestingly, hydropathy of the encoded proteins. The trends in codon usage, though characterized by distinct virus-specific mutational bias, do not exhibit any sign of host-specificity. Significant variations are observed in synonymous codon choice in structural and nonstructural genes of adenoviruses.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Centre, Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Kolkata 700032, India
| | | | | |
Collapse
|
28
|
Wu G, Culley DE, Zhang W. Predicted highly expressed genes in the genomes of Streptomyces coelicolor and Streptomyces avermitilis and the implications for their metabolism. MICROBIOLOGY-SGM 2005; 151:2175-2187. [PMID: 16000708 DOI: 10.1099/mic.0.27833-0] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Highly expressed genes in bacteria often have a stronger codon bias than genes expressed at lower levels, due to translational selection. In this study, a comparative analysis of predicted highly expressed (PHX) genes in the Streptomyces coelicolor and Streptomyces avermitilis genomes was performed using the codon adaptation index (CAI) as a numerical estimator of gene expression level. Although it has been suggested that there is little heterogeneity in codon usage in G+C-rich bacteria, considerable heterogeneity was found among genes in these two G+C-rich Streptomyces genomes. Using ribosomal protein genes as references, approximately 10% of the genes were predicted to be PHX genes using a CAI cutoff value of greater than 0.78 and 0.75 in S. coelicolor and S. avermitilis, respectively. The PHX genes showed good agreement with the experimental data on expression levels obtained from proteomic analysis by previous workers. Among 724 and 730 PHX genes identified from S. coelicolor and S. avermitilis, 368 are orthologue genes present in both genomes, which were mostly 'housekeeping' genes involved in cell growth. In addition, 61 orthologous gene pairs with unknown functions were identified as PHX. Only one polyketide synthase gene from each Streptomyces genome was predicted as PHX. Nevertheless, several key genes responsible for producing precursors for secondary metabolites, such as crotonyl-CoA reductase and propionyl-CoA carboxylase, and genes necessary for initiation of secondary metabolism, such as adenosylmethionine synthetase, were among the PHX genes in the two Streptomyces species. The PHX genes exclusive to each genome, and what they imply regarding cellular metabolism, are also discussed.
Collapse
Affiliation(s)
- Gang Wu
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - David E Culley
- Microbiology Department, Pacific Northwest National Laboratory, 902 Battelle Boulevard, PO Box 999, Mail Stop P7-50, Richland, WA 99352, USA
| | - Weiwen Zhang
- Microbiology Department, Pacific Northwest National Laboratory, 902 Battelle Boulevard, PO Box 999, Mail Stop P7-50, Richland, WA 99352, USA
| |
Collapse
|
29
|
Das S, Pan A, Paul S, Dutta C. Comparative Analyses of Codon and Amino Acid Usage in Symbiotic Island and Core Genome in Nitrogen-Fixing Symbiotic BacteriumBradyrhizobium japonicum. J Biomol Struct Dyn 2005; 23:221-32. [PMID: 16060695 DOI: 10.1080/07391102.2005.10507061] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Genes involved in the symbiotic interactions between the nitrogen-fixing endosymbiont Bradyrhizobium japonicum, and its leguminous host are mostly clustered in a symbiotic island (SI), acquired by the bacterium through a process of horizontal transfer. A comparative analysis of the codon and amino acid usage in core and SI genes/proteins of B. japonicum has been carried out in the present study. The mutational bias, translational selection, and gene length are found to be the major sources of variation in synonymous codon usage in the core genome as well as in SI, the strength of translational selection being higher in core genes than in SI. In core proteins, hydrophobicity is the main source of variation in amino acid usage, expressivity and aromaticity being the second and third important sources. But in SI proteins, aromaticity is the chief source of variation, followed by expressivity and hydrophobicity. In SI proteins, both the mean molecular weight and mean aromaticity of individual proteins exhibit significant positive correlation with gene expressivity, which violate the cost-minimization hypothesis. Investigation of nucleotide substitution patterns in B. japonicum and Mesorhizobium loti orthologous genes reveals that both synonymous and non-synonymous sites of highly expressed genes are more conserved than their lowly expressed counterparts and this conservation is more pronounced in the genes present in core genome than in SI.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Centre, Indian Institute of Chemical Biology, 4 Raja SC Mullick Road, Kolkata 700 032, India
| | | | | | | |
Collapse
|
30
|
Gupta SK, Banerjee T, Basak S, Sahu K, Sau S, Ghosh TC. Studies on codon usage inThermoplasma acidophilum and its possible implications on the occurrences of lateral gene transfer. J Basic Microbiol 2005; 45:344-54. [PMID: 16187257 DOI: 10.1002/jobm.200510576] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Codon usage studies have been carried out on the coding sequences of Thermoplasma acidophilum, which is an archaeon and grows at very low pH and high temperature. Overall codon usage data analysis indicates that all the four bases are almost equifrequent at the third position of codons, which is expected (since genomic GC % of this genome is about 46%). However, multivariate statistical analysis indicates that there are two major trends in the codon usage variation among the genes in this organism. In the first major trend it is observed that genes having G and C ending codons are clustered at one end while, A and T ending ones are clustered at the other end. We have also found a significant positive correlation between the expressivities of genes and GC contents at the synonymous third codon positions. In the second major trend, it is seen that the genes are clustered into three distinct parts. A comparative analyses of codon usage data of T. acidophilum and Sulfolobus solfataricus reveals that one of the three clusters of genes of T. acidophilum is very similar to a considerable number of S. solfataricus genes, suggesting possible occurrences of lateral gene transfer between these two microorganisms as reported by earlier workers.
Collapse
Affiliation(s)
- S K Gupta
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M Calcutta 700 054. India
| | | | | | | | | | | |
Collapse
|
31
|
Das S, Ghosh S, Pan A, Dutta C. Compositional variation in bacterial genes and proteins with potential expression level. FEBS Lett 2005; 579:5205-10. [PMID: 16165133 DOI: 10.1016/j.febslet.2005.08.042] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2005] [Accepted: 08/22/2005] [Indexed: 11/22/2022]
Abstract
Usage of guanine and cytosine at three codon sites in eubacterial genes vary distinctly with potential expressivity, as predicted by Codon Adaptation Index (CAI). In bacteria with moderate/high GC-content, G(3) follows a biphasic relationship, while C(3) increases with CAI. In AT-rich bacteria, correlation of CAI is negative with G(3), but non-specific with C(3). Correlations of CAI with residues encoded by G-starting codons are positive, while with those by C-starting codons are usually negative/random. Average Size/Complexity Score and aromaticity of gene-products decrease with CAI, confirming general validity of cost-minimization principle in free-living eubacteria. Alcoholicity of bacterial gene-products usually decreases with expressivity.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Center, Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Kolkata 700 032, India
| | | | | | | |
Collapse
|
32
|
Wernegreen JJ, Funk DJ. Mutation exposed: a neutral explanation for extreme base composition of an endosymbiont genome. J Mol Evol 2005; 59:849-58. [PMID: 15599516 DOI: 10.1007/s00239-003-0192-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
The influence of neutral mutation pressure versus selection on base composition evolution is a subject of considerable controversy. Yet the present study represents the first explicit population genetic analysis of this issue in prokaryotes, the group in which base composition variation is most dramatic. Here, we explore the impact of mutation and selection on the dynamics of synonymous changes in Buchnera aphidicola, the AT-rich bacterial endosymbiont of aphids. Specifically, we evaluated three forms of evidence. (i) We compared the frequencies of directional base changes (AT-->GC vs. GC-->AT) at synonymous sites within and between Buchnera species, to test for selective preference versus effective neutrality of these mutational categories. Reconstructed mutational changes across a robust intraspecific phylogeny showed a nearly 1:1 AT-->GC:GC-->AT ratio. Likewise, stationarity of base composition among Buchnera species indicated equal rates of AT-->GC and GC-->AT substitutions. The similarity of these patterns within and between species supported the neutral model. (ii) We observed an equivalence of relative per-site AT mutation rate and current AT content at synonymous sites, indicating that base composition is at mutational equilibrium. (iii) We demonstrated statistically greater equality in the frequency of mutational categories in Buchnera than in parallel mammalian studies that documented selection on synonymous sites. Our results indicate that effectively neutral mutational pressure, rather than selection, represents the major force driving base composition evolution in Buchnera. Thus they further corroborate recent evidence for the critical role of reduced N(e) in the molecular evolution of bacterial endosymbionts.
Collapse
Affiliation(s)
- Jennifer J Wernegreen
- Josephine Bay Paul Center for Comparative Molecular Biology & Evolution, The Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA.
| | | |
Collapse
|
33
|
Merkl R. A survey of codon and amino acid frequency bias in microbial genomes focusing on translational efficiency. J Mol Evol 2004; 57:453-66. [PMID: 14708578 DOI: 10.1007/s00239-003-2499-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Unequal use of synonymous codons has been found in several prokaryotic and eukaryotic genomes. This bias has been associated with translational efficiency. The prevalence of this bias across lineages is currently unknown. Here, a new method (GCB) to measure codon usage bias is presented. It uses an iterative approach for the determination of codon scores and allows the computation of an index of codon bias suitable for interspecies comparison. A server to calculate GCB-values of individual genes as well as a list of compiled results are available at www.g21.bio.uni-goettingen.de. The method was applied to complete bacterial genomes. The relation of codon usage bias with amino acid composition and the choice of stop codons were determined and discussed.
Collapse
Affiliation(s)
- Rainer Merkl
- Abteilung Molekulare Genetik und Präparative Molekularbiologie, Institut für Mikrobiologie und Genetik, Göttingen Genomics Laboratory, Georg-August-Universität Göttingen, Grisebachstrasse 8, D - 37077 G6ttingen, Germany.
| |
Collapse
|
34
|
Gupta SK, Bhattacharyya TK, Ghosh TC. Synonymous Codon Usage inLactococcus lactis: Mutational Bias Versus Translational Selection. J Biomol Struct Dyn 2004; 21:527-36. [PMID: 14692797 DOI: 10.1080/07391102.2004.10506946] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed analysis of overall codon usage data indicates that A and/or T ending codons are predominant in this organism. However, multivariate statistical analyses based both on codon count and on relative synonymous codon usage (RSCU) detect a large number of genes, which are supposed to be highly expressed are clustered at one end of the first major axis, while majority of the putatively lowly expressed genes are clustered at the other end of the first major axis. It was observed that in the highly expressed genes C and T ending codons are significantly higher than the lowly expressed genes and also it was observed that C ending codons are predominant in the duets of highly expressed genes, whereas the T endings codons are abundant in the quartets. Abundance of C and T ending codons in the highly expressed genes suggest that, besides, compositional biases, translational selection are also operating in shaping the codon usage variation among the genes in this organism as observed in other compositionally skewed organisms. The second major axis generated by correspondence analysis on simple codon counts differentiates the genes into two distinct groups according to their hydrophobicity values, but the same analysis computed with relative synonymous codon usage values could not discriminate the genes according to the hydropathy values. This suggests that amino acid composition exerts constraints on codon usage in this organism. On the other hand the second major axis produced by correspondence analysis on RSCU values differentiates the genes into two groups according to the synonymous codon usage for cysteine residues (rarest amino acids in this organism), which is nothing but a artifactual effect induced by the RSCU values. Other factors such as length of the genes and the positions of the genes in the leading and lagging strand of replication have practically no influence in the codon usage variation among the genes in this organism.
Collapse
Affiliation(s)
- S K Gupta
- Bioinformatics Centre Bose Institute, P 1/12, CIT Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
35
|
Abstract
Guanine plus cytosine (GC) content ranges broadly among bacterial genomes. In this study, we explore the use of a Brownian-motion model for the evolution of GC content over time. This model assumes that GC content varies over time in a continuous and homogeneous manner. Using this model and a maximum-likelihood approach, we analyzed the evolution of GC content across several bacterial phylogenies. Using three independent tests, we found that the observed divergence in GC content was consistent with a homogeneous Brownian-motion model. For example, similar rates of GC content evolution were inferred in several different bacterial subclades, indicating that there is relatively little rate heterogeneity in GC content evolution over broad evolutionary time scales. We thus argue that the homogeneous Brownian-motion model provides a good working model for GC content evolution. We then use this model to determine the overall rate of GC content evolution among eubacteria. We also determine the time frame over which GC content remains similar in related taxa, using a flexible definition for "similarity" in GC content so that, depending on the context, more or less stringent criteria may be applied. Our results have implications for models of sequence evolution, including those used for phylogenetic reconstruction and for inferring unusual changes in GC content.
Collapse
Affiliation(s)
- Eric Haywood-Farmer
- Department of Zoology, University of British Columbia, Vancouver V6T 1Z4, Canada
| | | |
Collapse
|
36
|
Abstract
The ORFs of microbial genomes in annotation files are usually classified into two groups: the first corresponds to known genes; whereas the second includes 'putative', 'probable', 'conserved hypothetical', 'hypothetical', 'unknown' and 'predicted' ORFs etc. Since the annotation is not 100% accurate, it is essential to confirm which ORF of the latter group is coding and which is not. Starting from known genes in the former, this paper describes an improved Z curve method to recognize genes in the latter. Ten-fold cross-validation tests show that the average accuracy of the algorithm is greater than 99% for recognizing the known genes in 57 bacterial and archaeal genomes. The method is then applied to recognize genes of the latter group. The likely non-coding ORFs in each of the 57 bacterial or archaeal genomes studied here are recognized and listed at the website http://tubic.tju.edu.cn/ZCURVE_C_html/noncoding.html. The working mechanism of the algorithm has been discussed in details. A computer program, called ZCURVE_C, was written to calculate a coding score called Z-curve score for ORFs in the above 57 bacterial and archaeal genomes. Coding/non-coding is simply determined by the criterion of Z-curve score > 0/ Z-curve score < 0. A website has been set up to provide the service to calculate the Z-curve score. A user may submit the DNA sequence of an ORF to the server at http://tubic.tju.edu.cn/ZCURVE_C/Default.cgi, and the Z-curve score of the ORF is calculated and returned to the user immediately.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Department of Physics, Tianjin University, Tianjin, 300072, China
| | | |
Collapse
|
37
|
McHardy AC, Pühler A, Kalinowski J, Meyer F. Comparing expression level-dependent features in codon usage with protein abundance: An analysis of ‘predictive proteomics’. Proteomics 2003; 4:46-58. [PMID: 14730671 DOI: 10.1002/pmic.200300501] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Synonymous codon usage is a commonly used means for estimating gene expression levels of Escherichia coli genes and has also been used for predicting highly expressed genes for a number of prokaryotic genomes. By comparison of expression level-dependent features in codon usage with protein abundance data from two proteome studies of exponentially growing E. coli and Bacillus subtilis cells, we try to evaluate whether the implicit assumption of this approach can be confirmed with experimental data. Log-odds ratio scores are used to model differences in codon usage between highly expressed genes and genomic average. Using these, the strength and significance of expression level-dependent features in codon usage were determined for the genes of the Escherichia coli, Bacillus subtilis and Haemophilus influenzae genomes. The comparison of codon usage features with protein abundance data confirmed a relationship between these to be present, although exceptions to this, possibly related to functional context, were found. For species with expression level-dependent features in their codon usage, the applied methodology could be used to improve in silico simulations of the outcome of two-dimensional gel electrophoretic experiments.
Collapse
|
38
|
Chen LL, Zhang CT. Seven GC-rich microbial genomes adopt similar codon usage patterns regardless of their phylogenetic lineages. Biochem Biophys Res Commun 2003; 306:310-7. [PMID: 12788106 DOI: 10.1016/s0006-291x(03)00973-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seven GC-rich (group I) and three AT-rich (group II) microbial genomes are analyzed in this paper. The seven microbes in group I belong to different phylogenetic lineages, even different domains of life. The common feature is that they are highly GC-rich organisms, with more than 60% genomic GC content. Group II includes three bacteria, which belong to the same subdivision as Pseudomonas aeruginosa in group I. The genomic GC content of the three bacteria is in the range of 26-50%. It is shown that although the phylogenetic lineages of the organisms in group I are remote, the common feature of highly genomic GC content forces them to adopt similar codon usage patterns, which constitutes the basis of an algorithm using a set of universal parameters to recognize known genes in the seven genomes. The common codon usage pattern of function known genes in the seven genomes is GGS type, where G, G, and S are the bases of G, non-G, and G/C, respectively. On the contrary, although the phylogenetic lineages of the three bacteria in group II are quite close, the codon usage patterns of function known genes in these genomes are obviously distinct. There are no universal parameters to identify known genes in the three genomes in group II. It can be deduced that the genomic GC content is more important than phylogenetic lineage in gene recognition programs. We hope that the work might be useful for understanding the common characteristics in the organization of microbial genomes.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Department of Physics, Tianjin University, 300072, Tianjin, China
| | | |
Collapse
|
39
|
Wall DP, Herbeck JT. Evolutionary patterns of codon usage in the chloroplast gene rbcL. J Mol Evol 2003; 56:673-88; discussion 689-90. [PMID: 12911031 DOI: 10.1007/s00239-002-2436-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
In this study we reconstruct the evolution of codon usage bias in the chloroplast gene rbcL using a phylogeny of 92 green-plant taxa. We employ a measure of codon usage bias that accounts for chloroplast genomic nucleotide content, as an attempt to limit plausible explanations for patterns of codon bias evolution to selection- or drift-based processes. This measure uses maximum likelihood-ratio tests to compare the performance of two models, one in which a single codon is overrepresented and one in which two codons are overrepresented. The measure allowed us to analyze both the extent of bias in each lineage and the evolution of codon choice across the phylogeny. Despite predictions based primarily on the low G + C content of the chloroplast and the high functional importance of rbcL, we found large differences in the extent of bias, suggesting differential molecular selection that is clade specific. The seed plants and simple leafy liverworts each independently derived a low level of bias in rbcL, perhaps indicating relaxed selectional constraint on molecular changes in the gene. Overrepresentation of a single codon was typically plesiomorphic, and transitions to overrepresentation of two codons occurred commonly across the phylogeny, possibly indicating biochemical selection. The total codon bias in each taxon, when regressed against the total bias of each amino acid, suggested that twofold amino acids play a strong role in inflating the level of codon usage bias in rbcL, despite the fact that twofolds compose a minority of residues in this gene. Those amino acids that contributed most to the total codon usage bias of each taxon are known through amino acid knockout and replacement to be of high functional importance. This suggests that codon usage bias may be constrained by particular amino acids and, thus, may serve as a good predictor of what residues are most important for protein fitness.
Collapse
Affiliation(s)
- Dennis P Wall
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| | | |
Collapse
|
40
|
Abstract
The association of codon context and codon usage was studied in seven bacteria as well as Schizosaccharomyces pombe and Encephalitozoon cuniculi. The association is strongest in magnitude closest to the codons of interest but there is apparently no rule about which of the two contexts is generally strongest associated to codon usage. In all bacterial species and in the intron-rich Sch. pombe it was furthermore observed from plots of chi2 versus N that the wobble positions of codons in the proximity cause regular peaks both upstream and downstream. This observation is discussed in relation to a possible effect of mutational pressure on the association of codon usage and codon context. Absence of peaks corresponding to the wobble positions in the intron-poor En. cuniculi, and presence in Sch. pombe, may indicate that the role of introns in the context-dependent codon bias is negligible.
Collapse
|
41
|
|
42
|
Abstract
Bacterial genomes are extremely dynamic and mosaic in nature. A substantial amount of genetic information is inserted into or deleted from such genomes through the process of horizontal transfer. Through the introduction of novel physiological traits from distantly related organisms, horizontal gene transfer often causes drastic changes in the ecological and pathogenic character of bacterial species and thereby promotes microbial diversification and speciation. This review discusses how the recent influx of complete chromosomal sequences of various microorganisms has allowed for a quantitative assessment of the scope, rate and impact of horizontally transmitted information on microbial evolution.
Collapse
Affiliation(s)
- Chitra Dutta
- Human Genetics and Genomics Group, Indian Institute of Chemical Biology, 4, Raja SC Mullick Road, Kolkata 700 032, India.
| | | |
Collapse
|
43
|
Gupta SK, Ghosh TC. Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa. Gene 2001; 273:63-70. [PMID: 11483361 DOI: 10.1016/s0378-1119(01)00576-5] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Codon usage biases of all DNA sequences (length greater than or equal to 300 bp) from the complete genome of Pseudomonas aeruginosa have been analyzed. As P. aeruginosa is a GC-rich organism, G and/or C are expected to predominate in their codons. Overall codon usage data analysis indicates that indeed codons ending in G and/or C are predominant in this organism. But multivariate statistical analysis indicates that there is a single major trend in the codon usage variation among the genes in this organism, which has a strong negative correlation with the expressivities of the genes. The majority of the lowly expressed genes are scattered towards the positive end of the major axis whereas the highly expressed genes are clustered towards the negative end. This is the first report where the prokaryotic organism having highly skewed base composition is dictated mainly by translational selection, though some other factors such as the lengths of the genes as well as the hydrophobicity of genes also influence the codon usage variation among the genes in this organism in a minor way.
Collapse
Affiliation(s)
- S K Gupta
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, Calcutta 700 054, India
| | | |
Collapse
|
44
|
Abstract
Codon usage bias of Entamoeba histolytica, a protozoan parasite, was investigated using the available DNA sequence data. Entamoeba histolytica having AT rich genome, is expected to have A and/or T at the third position of codons. Overall codon usage data analysis indicates that A and/or T ending codons are strongly biased in the coding region of this organism. However, multivariate statistical analysis suggests that there is a single major trend in codon usage variation among the genes. The genes which are supposed to be highly expressed are clustered at one end, while the majority of the putatively lowly expressed genes are clustered at the other end. The codon usage pattern is distinctly different in these two sets of genes. C ending codons are significantly higher in the putatively highly expressed genes suggesting that C ending codons are translationally optimal in this organism. In the putatively lowly expressed genes A and/or T ending codons are predominant, which suggests that compositional constraints are playing the major role in shaping codon usage variation among the lowly expressed genes. These results suggest that both mutational bias and translational selection are operational in the codon usage variation in this organism.
Collapse
Affiliation(s)
- T C Ghosh
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, 700 054, Calcutta, India.
| | | | | |
Collapse
|