1
|
Cutler RW, Chantawannakul P. The effect of local nucleotides on synonymous codon usage in the honeybee (Apis mellifera L.) genome. J Mol Evol 2007; 64:637-45. [PMID: 17541680 DOI: 10.1007/s00239-006-0198-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Accepted: 02/12/2007] [Indexed: 10/23/2022]
Abstract
Using all currently predicted coding regions in the honeybee genome, a novel form of synonymous codon bias is presented that affects the usage of particular codons dependent on the surrounding nucleotides in the coding region. Nucleotides at the third codon site are correlated, dependent on their weak (adenine [A] or thyamine [T]) versus strong (guanine [G] or cytosine [C]) status, to nucleotides on the first codon site which are dependent on their purine (A/G) versus pyrimidine (C/T) status. In particular, for adjacent third and first site nucleotides, weak-pyrimidine and strong-purine nucleotide combinations occur much more frequently than the underabundant weak-purine and strong-pyrimidine nucleotide combinations. Since a similar effect is also found in the noncoding regions, but is present for all adjacent nucleotides, this coding effect is most likely due to a genome-wide context-dependent mutation error correcting mechanism in combination with selective constraints on adjacent first and second nucleotide pairs within codons. The position-dependent relationship of synonymous codon usage is evidence for a novel form of codon position bias which utilizes the redundancy in the genetic code to minimize the effect of nucleotide mutations within coding regions.
Collapse
Affiliation(s)
- Robert W Cutler
- Department of Biology, Bard College, Annandale-on-Hudson, NY 12504, USA
| | | |
Collapse
|
2
|
Desai D, Zhang K, Barik S, Srivastava A, Bolander MEME, Sarkar G. Intragenic codon bias in a set of mouse and human genes. J Theor Biol 2004; 230:215-25. [PMID: 15302553 DOI: 10.1016/j.jtbi.2004.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 05/06/2004] [Accepted: 05/06/2004] [Indexed: 11/20/2022]
Abstract
To better conceptualize the mechanism underlying the evolution of synonymous codons, we have analysed intragenic codon usage in chosen "regions" of some mouse and human genes. We divided a given gene into two regions: one consisting of a trinucleotide repeat (TNR) and the other consisting of the "rest of the coding region" (RCR). Usually, a TNR is composed of a repetitive single codon, which may reflect its frequency in a gene. In contrast, a non-random frequency of a codon in the RCR versus TNR (or vice versa) of a gene should indicate a bias for that codon within the TNR. We examined this scenario by comparing codon frequency between the RCR and the cognate TNR(s) for a set of human and mouse genes. A TNR length of six amino acids or more was used to identify genes from the Genbank database. Twenty nine human and twenty one mouse genes containing TNRs coding for nine different amino acid runs were identified. The ratio of codon frequency in a TNR versus the corresponding RCR was expressed as "fold change" which was also regarded as a measure of codon bias (defined as preferential use either in TNR or in RCR). Chi-square values were then determined from the distribution of codon frequency in a TNR vs. the cognate RCR. At p<0.001, 22% and 27%, respectively, of human and mouse TNRs showed codon bias. Greater than 40% of the TNRs (29 out of 69 in human, and 18 of 42 in mouse) showed codon bias at p<0.05. In addition, we identify eight single-codon TNRs in mouse and ten in human genes. Thus, our results show intragenic codon bias in both mouse and human genes expressed in diverse tissue types. Since our results are independent of the Codon Adaptation Index (CAI) and starvation CAI, and since the tRNA repertoire in a cell or in a tissue is constant, our data suggest that other constraints besides tRNA abundance played a role in creating intragenic codon bias in these genes.
Collapse
Affiliation(s)
- Dinakar Desai
- Department of Orthopedics, Mayo Clinic and Foundation, Medical Science Building 3-69, 200 1st Street, SW, Rochester, MN 55905, USA
| | | | | | | | | | | |
Collapse
|
3
|
Takeuchi F, Futamura Y, Yoshikura H, Yamamoto K. Statistics of trinucleotides in coding sequences and evolution. J Theor Biol 2003; 222:139-49. [PMID: 12727450 DOI: 10.1016/s0022-5193(03)00021-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The aim of this paper is to give measurements indicative of evolutional stages of the species. Two types of statistics of trinucleotides in coding regions are analysed for 27 species. The first one is the codon space, the nucleotide ratio for each of the three codon positions. We apply principal component analysis on this space and extract two principal components faithfully describing the original distribution of the codon space. The first principal component corresponds to the GC content. The second principal component classifies the species into three evolutional groups, Archaea, Bacteria and Eukaryota. The second statistics is the real and theoretical frequency of amino acids. The real frequency of an amino acid in a coding sequence is its frequency in the translated protein. The theoretical frequency is the expected frequency calculated from the ratio of nucleotides. We introduce the discrepancy between these two frequencies as an index of non-randomness of nucleotides in the sequence. This index of non-randomness divides the species into two groups: eukaryotes having smaller non-randomness (i.e. being more random) and prokaryotes having higher non-randomness.
Collapse
Affiliation(s)
- Fumihiko Takeuchi
- Research Institute, International Medical Center of Japan, 1-21-1 Toyama, Shinjuku-ku, Japan.
| | | | | | | |
Collapse
|
4
|
Majumdar S, Gupta SK, Sundararajan VS, Ghosh TC. Compositional correlation studies among the three different codon positions in 12 bacterial genomes. Biochem Biophys Res Commun 1999; 266:66-71. [PMID: 10581166 DOI: 10.1006/bbrc.1999.1774] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Compositional distributions in the three codon positions of the coding sequences of 12 fully sequenced prokaryotic genomes, which are publicly available, were investigated. A universal compositional correlation was observed in most of the genomes under investigation irrespective of their overall genomic GC contents. In all the genomes, the GC contents at the first codon positions are always greater than the overall GC contents of the genomes whereas the reverse is true in the case of second codon positions. GC contents at the third codon positions are higher than the overall genomic GC contents in high GC containing genomes, and the opposite situation was found in case of low GC genomes except for Helicobacter pylori. In high-GC rich genomes, the GC contents at the first + second codon positions are less than the GC contents at the third codon positions, and they are low in low-GC genomes except for Helicobacter pylori. The distributions of four bases at the three different positions were also investigated for all 12 organisms. It was observed that in high-GC genomes G is the most dominant base and in low-GC genomes A is the most dominant base in the first codon positions. But purine bases, i.e., (A + G), predominantly occur in the first codon position. In the second codon position, A is the most dominant base in most of the organisms and G is the least dominant base in all the organisms. There is no unique regular pattern of individual bases at the third codon positions; however, there are significant differences in the occurrences of (G + C) contents in the third codon positions among the different organisms. Calculations of dinucleotide frequencies in 12 different organisms indicate that in GC-rich genomes GG, GC, CC, and CG dinucleotides are the most dominant whereas the reverse is true in case of low-GC genomes. Biological implications of these results are discussed in this paper.
Collapse
Affiliation(s)
- S Majumdar
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M Calcutta, 700 054, India
| | | | | | | |
Collapse
|
5
|
Musto H, Cacciò S, Rodríguez-Maseda H, Bernardi G. Compositional constraints in the extremely GC-poor genome of Plasmodium falciparum. Mem Inst Oswaldo Cruz 1997; 92:835-41. [PMID: 9566216 DOI: 10.1590/s0074-02761997000600020] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
We have analyzed the compositional properties of coding (protein encoding) and non-coding sequences of Plasmodium falciparum, a unicellular parasite characterized by an extremely AT-rich genome. GC% levels, base and dinucleotide frequencies were studied. We found that among the various factors that contribute to the properties of the sequences analyzed, the most relevant are the compositional constraints which operate on the whole genome.
Collapse
Affiliation(s)
- H Musto
- Sección Bioquímica, Facultad de Ciencias, Montevideo, Uruguay.
| | | | | | | |
Collapse
|
6
|
Rodríguez-Maseda H, Musto H. The compositional compartments of the nuclear genomes of Trypanosoma brucei and T. cruzi. Gene 1994; 151:221-4. [PMID: 7828878 DOI: 10.1016/0378-1119(94)90660-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Fractionation of DNA from Trypanosoma brucei and T. equiperdum by centrifugation in a Cs2SO4/BAMD density gradient indicated that these genomes are compositionally compartmentalized, a conclusion confirmed by the analysis of the compositional distribution of third codon positions from T. brucei and T. cruzi. In order to investigate whether this compartmentalization is accompanied by the often different properties of coding sequences, we have analyzed and compared the compositional compartments with respect to dinucleotide frequency and amino-acid usage of the encoded proteins of all gene sequences available in the GenBank database from T. brucei and T. cruzi. In all cases, the compartments displayed remarkable differences. These results are similar to findings obtained in highly compartmentalized genomes, like those of warm-blooded vertebrates.
Collapse
Affiliation(s)
- H Rodríguez-Maseda
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, Paris, France
| | | |
Collapse
|
7
|
Musto H, Alvarez F, Tort J, Maseda HR. Dinucleotide biases in the platyhelminth Schistosoma mansoni. Int J Parasitol 1994; 24:277-83. [PMID: 8026908 DOI: 10.1016/0020-7519(94)90039-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The analysis of dinucleotide biases in coding and flanking regions, introns, rDNA and repetitive sequences, in the flatworm Schistosoma mansoni is reported. Except for rDNA, all regions display CpG avoidance and TpG plus CpA excess, which might be evidence of the presence of 5mC. The distribution and hierarchies of dinucleotides differ from the data published for invertebrate and vertebrate coding sequences.
Collapse
Affiliation(s)
- H Musto
- Sección Bioquímica, Facultad de Ciencias, Montevideo, Uruguay
| | | | | | | |
Collapse
|
8
|
Musto H, Rodríguez-Maseda H, Alvarez F, Tort J. Possible implications of CpG avoidance in the flatworm Schistosoma mansoni. J Mol Evol 1994; 38:36-40. [PMID: 8151713 DOI: 10.1007/bf00175493] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
We report the analysis of the biases of CpG, TpG, and CpA of all the DNA sequences data from the Trematode Schistosoma mansoni. Our results show CpG avoidance whereas TpG and CpA frequencies are over the expected values. These characteristics are similar to the biases displayed by methylated genomes, but in platyhelminths 5mC has not been detected by biochemical methods. The possible implications of this CpG shortage are discussed.
Collapse
Affiliation(s)
- H Musto
- Departamento de Bioquímica, Facultad de Ciencias, Montevideo, Uruguay
| | | | | | | |
Collapse
|
9
|
Gutiérrez G, Oliver JL, Marín A. Dinucleotides and G+C content in human genes: opposite behavior of GpG, GpC, and TpC at II-III codon positions and in introns. J Mol Evol 1993; 37:131-6. [PMID: 8411202 DOI: 10.1007/bf02407348] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
We have studied the behavior of the dinucleotide preferences under G+C content variation in human genes. The doublet preferences for each dinucleotide were compared between two functionally distinct zones in genes, the II-III codon positions, and the introns. The 16 dinucleotides have been tentatively classified in three groups: AA, AC, CC, CT, and GA, doublets showing no difference between introns and II-III codon positions in the full range of G+C variation TG and TA, which differ in the full range of G+C variation AT, AG, GT, TC, TT, GG, GC, CG, and CA, which show differences in regions over 50% G+C A remarkable pattern observed concerns the behavior of GG, GC, and TC, which showed opposite trends in II-III codon positions and in introns. If codon positions and introns are under the same structural requirements and the same mutational bias, our results indicate that the differences observed could be related to post-transcriptional constraints acting on mRNA.
Collapse
Affiliation(s)
- G Gutiérrez
- Departamento de Genética, Universidad de Sevilla, Spain
| | | | | |
Collapse
|
10
|
|
11
|
Shpaer EG, Mullins JI. Selection against CpG dinucleotides in lentiviral genes: a possible role of methylation in regulation of viral expression. Nucleic Acids Res 1990; 18:5793-7. [PMID: 2170945 PMCID: PMC332316 DOI: 10.1093/nar/18.19.5793] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Extremely low frequencies of CpG dinucleotides are found in the genomes of the lentivirus subfamily of retroviruses, including the human, simian and feline immunodeficiency viruses (HIV1, HIV2, SIV, and FIV, respectively), equine infectious anemia virus (EIAV), and the ovine lentivirus, Visna. The occurrence of CpG dinucleotides is greater in the 2-3 (NCG) than in the 1-2 (CGN) codon-defined frame, as well as in the gag and env genes, compared to the more conserved pol gene. These differences suggest that CpG depletion in lentiviruses occurs as a result of selection against CpG rather than due to mutational bias, the latter is responsible for low CpG frequencies in vertebrate genomes. CpG levels in the onco-retrovirus subfamily are reduced to a lesser extent, principally due to mutational bias. The difference between the retrovirus subfamilies appears to reflect their evolutionary origin, that is, lentiviruses have no known endogenous counterparts whereas most oncoviruses have endogenous cellular counterparts with which they can undergo recombination. Furthermore, we suggest that the number of CpG dinucleotides in a lentiviral genome determines the maximum potential DNA methylation level of the provirus, which in turn affects viral transcription in host cells.
Collapse
Affiliation(s)
- E G Shpaer
- Department of Microbiology and Immunology, Stanford University School of Medicine, CA 94305-5402
| | | |
Collapse
|