1
|
Štambuk N, Konjevoda P, Štambuk A. How ambiguity codes specify molecular descriptors and information flow in Code Biology. Biosystems 2023; 233:105034. [PMID: 37739308 DOI: 10.1016/j.biosystems.2023.105034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023]
Abstract
The article presents IUPAC ambiguity codes for incomplete nucleic acid specification, and their use in Code Biology. It is shown how to use this nomenclature in order to extract accurate information on different properties of the biological systems. We investigated the use of ambiguity codes, as mathematical and logical operators and truth table elements, for the encoding of amino acids by means of the Standard Genetic Code. It is explained how to use ambiguity codes and truth functions in order to obtain accurate information on different properties of the biological systems. Nucleotide ambiguity codes could be applied to: 1. encoding descriptive information of nucleotides, amino acids and proteins (e.g., of polarity, relative solvent accessibility, atom depth, etc.), and 2. system modelling ranging from standard bioinformatics tools to classic evolutionary models (i.e. from Miyazawa-Jernigan statistical potential to Kimura three-substitution-type model, respectively). It is shown that the algorithms based on IUPAC ambiguity codes, Boolean functions and truth table, Probabilistic Square of Opposition/Semiotic Square and Klein 4-groups-could be used for the bioinformatics analyses and Relational data modelling in natural science. Underlying mathematical, logical and semiotic concepts of interest are presented and addressed.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
2
|
Liu Y, Sharp JS, Do DHT, Kahn RA, Schwalbe H, Buhr F, Prestegard JH. Mistakes in translation: Reflections on mechanism. PLoS One 2017; 12:e0180566. [PMID: 28662217 PMCID: PMC5491249 DOI: 10.1371/journal.pone.0180566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 06/16/2017] [Indexed: 01/25/2023] Open
Abstract
Mistakes in translation of messenger RNA into protein are clearly a detriment to the recombinant production of pure proteins for biophysical study or the biopharmaceutical market. However, they may also provide insight into mechanistic details of the translation process. Mistakes often involve the substitution of an amino acid having an abundant codon for one having a rare codon, differing by substitution of a G base by an A base, as in the case of substitution of a lysine (AAA) for arginine (AGA). In these cases one expects the substitution frequency to depend on the relative abundances of the respective tRNAs, and thus, one might expect frequencies to be similar for all sites having the same rare codon. Here we demonstrate that, for the ADP-ribosylation factor from yeast expressed in E. coli, lysine for arginine substitutions frequencies are not the same at the 9 sites containing a rare arginine codon; mis-incorporation frequencies instead vary from less than 1 to 16%. We suggest that the context in which the codons occur (clustering of rare sites) may be responsible for the variation. The method employed to determine the frequency of mis-incorporation involves a novel mass spectrometric analysis of the products from the parallel expression of wild type and codon-optimized genes in 15N and 14N enriched media, respectively. The high sensitivity and low material requirements of the method make this a promising technology for the collection of data relevant to other mis-incorporations. The additional data could be of value in refining models for the ribosomal translation elongation process.
Collapse
Affiliation(s)
- Yizhou Liu
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States of America
| | - Joshua S. Sharp
- Department of BioMolecular Sciences, University of Mississippi, Oxford, Mississippi, United States of America
| | - Duc H-T. Do
- Department of Food Science and Technology, University of Georgia, Athens, Georgia, United States of America
| | - Richard A. Kahn
- Department of Biochemistry, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Harald Schwalbe
- Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe-University, Frankfurt, Germany
| | - Florian Buhr
- Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe-University, Frankfurt, Germany
| | - James H. Prestegard
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States of America
- * E-mail:
| |
Collapse
|
3
|
Frappat L, Sciarrino A, Sorba P. Prediction of physical-chemical properties of amino acids from genetic code. J Biol Phys 2013; 28:17-26. [PMID: 23345754 DOI: 10.1023/a:1016274329603] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Using the crystal basis model of the genetic code, a set of relations between the physical-chemical properties of the amino acids are derived and compared with the experimental data. A prediction for the not yet measured thermodynamical parameters of three amino acids is done.
Collapse
|
4
|
|
5
|
Castro-Chavez F. The rules of variation: amino acid exchange according to the rotating circular genetic code. J Theor Biol 2010; 264:711-21. [PMID: 20371250 PMCID: PMC3130497 DOI: 10.1016/j.jtbi.2010.03.046] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Revised: 03/06/2010] [Accepted: 03/30/2010] [Indexed: 12/11/2022]
Abstract
General guidelines for the molecular basis of functional variation are presented while focused on the rotating circular genetic code and allowable exchanges that make it resistant to genetic diseases under normal conditions. The rules of variation, bioinformatics aids for preventative medicine, are: (1) same position in the four quadrants for hydrophobic codons, (2) same or contiguous position in two quadrants for synonymous or related codons, and (3) same quadrant for equivalent codons. To preserve protein function, amino acid exchange according to the first rule takes into account the positional homology of essential hydrophobic amino acids with every codon with a central uracil in the four quadrants, the second rule includes codons for identical, acidic, or their amidic amino acids present in two quadrants, and the third rule, the smaller, aromatic, stop codons, and basic amino acids, each in proximity within a 90 degree angle. I also define codifying genes and palindromati, CTCGTGCCGAATTCGGCACGAG.
Collapse
|
6
|
Dynamic covariation between gene expression and genome characteristics. Gene 2008; 410:53-66. [PMID: 18191345 DOI: 10.1016/j.gene.2007.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2007] [Revised: 11/13/2007] [Accepted: 11/29/2007] [Indexed: 11/21/2022]
Abstract
Gene and protein expression is controlled so that cells can react to changing intra- and extracellular signals by modulating biochemical networks and pathways. We have previously shown that gene expression and the properties of expressed proteins are dynamically correlated. Here we investigated correlations between gene related parameters and gene expression patterns, and found statistically significant correlations in microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, cell cycle in HeLa cells, infection in intestinal epithelial cells, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. Our method was applied to time course datasets individually for each time point. We derived from sequence information numerous parameters for nucleotide composition, two-base composition, codon usage, skew parameters, and codon bias. In addition to coding regions, we also investigated correlations for complete genes and introns. Significant dynamic correlations were identified for each of the analyses. Our method also proved useful for detecting dynamic shifts in gene expression profiles, such as in the D. melanogaster dataset. Detection of changes in the properties of expressed genes and proteins might be useful for predicting or following biological processes, responses, growth, differentiation and possibly in related disorders.
Collapse
|
7
|
Mukhopadhyay P, Basak S, Ghosh TC. Synonymous codon usage in different protein secondary structural classes of human genes: implication for increased non-randomness of GC3 rich genes towards protein stability. J Biosci 2007; 32:947-63. [PMID: 17914237 DOI: 10.1007/s12038-007-0095-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The relationship between the synonymous codon usage and different protein secondary structural classes were investigated using 401 Homo sapiens proteins extracted from Protein Data Bank (PDB). A simple Chi-square test was used to assess the significance of deviation of the observed and expected frequencies of 59 codons at the level of individual synonymous families in the four different protein secondary structural classes. It was observed that synonymous codon families show non-randomness in codon usage in four different secondary structural classes. However,when the genes were classified according to their GC3 levels there was an increase in non-randomness in high GC3 group of genes. The non-randomness in codon usage was further tested among the same protein secondary structures belonging to four different protein folding classes of high GC3 group of genes. The results show that in each of the protein secondary structural unit there exist some synonymous family that shows class specific codon-usage pattern. Moreover, there is an increased non-random behaviour of synonymous codons in sheet structure of all secondary structural classes in high GC3 group of genes. Biological implications of these results have been discussed.
Collapse
Affiliation(s)
- Pamela Mukhopadhyay
- Bioinformatics Centre, Bose Institute, P 1/12, CIT Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
8
|
Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM. A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science 2006; 315:525-8. [PMID: 17185560 DOI: 10.1126/science.1135308] [Citation(s) in RCA: 1807] [Impact Index Per Article: 100.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Synonymous single-nucleotide polymorphisms (SNPs) do not produce altered coding sequences, and therefore they are not expected to change the function of the protein in which they occur. We report that a synonymous SNP in the Multidrug Resistance 1 (MDR1) gene, part of a haplotype previously linked to altered function of the MDR1 gene product P-glycoprotein (P-gp), nonetheless results in P-gp with altered drug and inhibitor interactions. Similar mRNA and protein levels, but altered conformations, were found for wild-type and polymorphic P-gp. We hypothesize that the presence of a rare codon, marked by the synonymous polymorphism, affects the timing of cotranslational folding and insertion of P-gp into the membrane, thereby altering the structure of substrate and inhibitor interaction sites.
Collapse
MESH Headings
- ATP Binding Cassette Transporter, Subfamily B, Member 1/antagonists & inhibitors
- ATP Binding Cassette Transporter, Subfamily B, Member 1/chemistry
- ATP Binding Cassette Transporter, Subfamily B, Member 1/genetics
- ATP Binding Cassette Transporter, Subfamily B, Member 1/metabolism
- Animals
- Cell Line
- Cell Membrane/metabolism
- Chlorocebus aethiops
- Codon
- Cyclosporine/pharmacology
- Genes, MDR
- Haplotypes
- HeLa Cells
- Humans
- Mutagenesis, Site-Directed
- Polymorphism, Single Nucleotide
- Protein Biosynthesis
- Protein Conformation
- Protein Folding
- Protein Structure, Tertiary
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- Reverse Transcriptase Polymerase Chain Reaction
- Rhodamine 123/metabolism
- Rhodamine 123/pharmacology
- Sirolimus/pharmacology
- Substrate Specificity
- Transfection
- Verapamil/metabolism
- Verapamil/pharmacology
Collapse
Affiliation(s)
- Chava Kimchi-Sarfaty
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | |
Collapse
|
9
|
Rakocević MM. A harmonic structure of the genetic code. J Theor Biol 2004; 229:221-34. [PMID: 15207477 DOI: 10.1016/j.jtbi.2004.03.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2003] [Revised: 03/19/2004] [Accepted: 03/26/2004] [Indexed: 10/26/2022]
Abstract
In this paper is presented a new, very harmonic structure of the genetic code (GC) within a system of "4 x 5" (and/or of "5 x 4") of amino acids (AAs) in two variants. In first variant, the five rows within the system start with one polar charged amino acid (AA) each, making first column, consisting from five polar charged AAs (D, R, K, H, E). Five polar non-charged AAs (N, P, Y, W, Q) follow, then five non-polar AAs as last column (A, L, F, V, I) and, finally, five polar or non-polar AAs, in a combination, as first to last column (A as non-polar; S, T as polar, and G, P as ambivalent AAs). A second variant is subsequent to this one-"4 x 5" system with five nitrogen AAs (K, R, P, H, W), five oxygen (D, E, Y, S, T), five solely carbon (A, L, F, V, I) and five "combined" AAs (G with hydrogen as side chain; C and M with carbon and sulfur; N and Q with carbon, oxygen and nitrogen). A strict balance of atom and nucleon number as well as molecule mass follows the classification in both system variants.
Collapse
Affiliation(s)
- Miloje M Rakocević
- Department of Chemistry, Faculty of Science, University of Nis, Cirila i Metodija 2, Serbia & Montenegro, Yugoslavia.
| |
Collapse
|
10
|
Chen LL, Zhang CT. Seven GC-rich microbial genomes adopt similar codon usage patterns regardless of their phylogenetic lineages. Biochem Biophys Res Commun 2003; 306:310-7. [PMID: 12788106 DOI: 10.1016/s0006-291x(03)00973-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seven GC-rich (group I) and three AT-rich (group II) microbial genomes are analyzed in this paper. The seven microbes in group I belong to different phylogenetic lineages, even different domains of life. The common feature is that they are highly GC-rich organisms, with more than 60% genomic GC content. Group II includes three bacteria, which belong to the same subdivision as Pseudomonas aeruginosa in group I. The genomic GC content of the three bacteria is in the range of 26-50%. It is shown that although the phylogenetic lineages of the organisms in group I are remote, the common feature of highly genomic GC content forces them to adopt similar codon usage patterns, which constitutes the basis of an algorithm using a set of universal parameters to recognize known genes in the seven genomes. The common codon usage pattern of function known genes in the seven genomes is GGS type, where G, G, and S are the bases of G, non-G, and G/C, respectively. On the contrary, although the phylogenetic lineages of the three bacteria in group II are quite close, the codon usage patterns of function known genes in these genomes are obviously distinct. There are no universal parameters to identify known genes in the three genomes in group II. It can be deduced that the genomic GC content is more important than phylogenetic lineage in gene recognition programs. We hope that the work might be useful for understanding the common characteristics in the organization of microbial genomes.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Department of Physics, Tianjin University, 300072, Tianjin, China
| | | |
Collapse
|
11
|
D'Onofrio G, Ghosh TC, Bernardi G. The base composition of the genes is correlated with the secondary structures of the encoded proteins. Gene 2002; 300:179-87. [PMID: 12468099 DOI: 10.1016/s0378-1119(02)01045-4] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The analysis of a non-redundant set of human proteins, for which both the crystallographic structures and the corresponding gene sequences are available, show that bases at third codon position are non-uniformly distributed along the coding sequences. Significant compositional differences are found by comparing the gene regions corresponding to the different secondary structures of the proteins. Inter-and intra-structure differences were most pronounced in the GC-richest genes. These results are not compatible with any proposed hypotheses based on a neutral process of formation/maintenance of the high GC(3) levels of the genes localized in the GC-richest isochores of the human genome.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica A. Dohrn, Naples, Italy.
| | | | | |
Collapse
|
12
|
Zhang CT, Wang J, Zhang R. Using a Euclid distance discriminant method to find protein coding genes in the yeast genome. COMPUTERS & CHEMISTRY 2002; 26:195-206. [PMID: 11868909 DOI: 10.1016/s0097-8485(01)00107-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8-7.0% less than 5800-6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RGW type, where R, G and W indicate the bases of purine, non-G and A/T, whereas the 'codons' in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.
Collapse
|
13
|
Wang J, Guo FB. Base frequencies at the second codon position of Vibrio cholerae genes connect with protein function. Biochem Biophys Res Commun 2002; 290:81-4. [PMID: 11779136 DOI: 10.1006/bbrc.2001.6174] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this paper, the base frequency at the second codon position of the 3839 open reading frames (ORFs) in the Vibrio cholerae genome is analyzed. It is shown that according to the base content at this codon site, the ORFs can be divided into two clusters, each containing 673 and 3166 ORFs, respectively. ORFs in the smaller cluster usually have significantly higher T frequency than that of A at the second codon position. For the two clusters of ORFs, there are significant differences in the frequencies for 18 of the 20 amino acids in the encoding proteins. The two clusters of ORFs are also significantly different in their functions. More than half of the known genes involved in transport and binding are included in the smaller cluster, while few genes involved in amino acid biosynthesis, protein synthesis, and so on are included in this cluster.
Collapse
Affiliation(s)
- Ju Wang
- Department of Physics, Tianjin University, Tianjin 300072, China.
| | | |
Collapse
|
14
|
Abstract
The systematics of indices of physico-chemical properties of codons and amino acids across the genetic code are examined. Using a simple numerical labelling scheme for nucleic acid bases, A=(-1,0), C=(0,-1), G=(0,1), U=(1,0), data can be fitted as low order polynomials of the six coordinates in the 64-dimensional codon weight space. The work confirms and extends the recent studies by Siemion et al. (1995. BioSystems 36, 231-238) of the conformational parameters. Fundamental patterns in the data such as codon periodicities, and related harmonics and reflection symmetries, are here associated with the structure of the set of basis monomials chosen for fitting. Results are plotted using the Siemion one-step mutation ring scheme, and variants thereof. The connections between the present work, and recent studies of the genetic code structure using dynamical symmetry algebras, are pointed out.
Collapse
Affiliation(s)
- J D Bashford
- Centre for the Structure of Subatomic Matter, University of Adelaide, Adelaide, SA 5005, Australia
| | | |
Collapse
|
15
|
Gupta SK, Majumdar S, Bhattacharya TK, Ghosh TC. Studies on the relationships between the synonymous codon usage and protein secondary structural units. Biochem Biophys Res Commun 2000; 269:692-6. [PMID: 10720478 DOI: 10.1006/bbrc.2000.2351] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The relationship between the synonymous codon usage and protein secondary structural elements (alpha helices and beta sheets) were reinvestigated by taking structural information of proteins from Protein Data Bank (PDB) and their corresponding mRNA sequences from GenBank for four different organisms E. coli, B. subtilis, S. cerevisiae, and Homo sapiens. It was observed that synonymous codon families have non-random codon usage, but there does not exist any species invariant universal correlation between the synonymous codon usage and protein secondary structural elements. The secondary structural units of proteins can be distinguished from the occurrences of bases at the second codon position.
Collapse
Affiliation(s)
- S K Gupta
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, Calcutta, 700 054, India
| | | | | | | |
Collapse
|
16
|
Abstract
The hypothesis that synonymous codon usage is related to protein three-dimensional structure is examined by investigating the correlation between synonymous codon usage and protein secondary structure. All except two codons in E. coli show the same secondary structural preference for alpha-helix, beta-strand or coil as that of amino acids to be encoded by the respective codons, while 17 codons show secondary structural bias in mammalian proteins. The results indicate that there is no significant correlation between synonymous codon usage and protein secondary structure in E. coli, but there is a correlation in mammals. It could be deduced that synonymous codons carry much less structural information in prokaryotes than in eukaryotes due to their divergent evolutionary mechanism.
Collapse
Affiliation(s)
- T Xie
- Shanghai Institute of Biochemistry, Academia Sinica, People's Republic of China
| | | | | | | |
Collapse
|
17
|
Leluk J. A new algorithm for analysis of the homology in protein primary structure. COMPUTERS & CHEMISTRY 1998; 22:123-31. [PMID: 9570113 DOI: 10.1016/s0097-8485(97)00035-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A new algorithm for analysis of the homology and genetic semihomology in protein sequence is described. It assumes the close relation between the compared amino acids and their codons in related proteins. The algorithm is based on the network of the genetic relationship between amino acids and, thus differs from the commonly used statistical matrices. The results obtained by using this method are more comprehensive than used at present, and reflect the actual mechanism of protein differentiation and evolution. They concern: (1) location of homologous and semihomologous sites in compared proteins; (2) precise estimation of insertion/deletion gaps in non-homologous fragments; (3) analysis of internal homology and semihomology; (4) precise location of domains in multidomain proteins; (5) estimation of genetic code of non-homologous fragments; (6) construction of genetic probes; (7) studies on differentiation processes among related proteins; (8) estimation of the degree of relationship among related proteins; (9) studies on the evolution mechanism within homologous protein families and (10) confirmation of actual relationship of sequences showing low degree of homology.
Collapse
Affiliation(s)
- J Leluk
- Institute of Biochemistry and Molecular Biology, University of Wrocław, Poland.
| |
Collapse
|
18
|
Bashford JD, Tsohantjis I, Jarvis PD. A supersymmetric model for the evolution of the genetic code. Proc Natl Acad Sci U S A 1998; 95:987-92. [PMID: 9448272 PMCID: PMC18647 DOI: 10.1073/pnas.95.3.987] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
A model is presented for the structure and evolution of the eukaryotic and vertebrate mitochondrial genetic codes, based on the representation theory of the Lie superalgebra A(5,0) approximately sl(6/1). A key role is played by pyrimidine and purine exchange symmetries in codon quartets.
Collapse
Affiliation(s)
- J D Bashford
- Department of Physics and Mathematical Physics, University of Adelaide Adelaide, S. A. 5005, Australia
| | | | | |
Collapse
|
19
|
Thanaraj TA, Argos P. Protein secondary structural types are differentially coded on messenger RNA. Protein Sci 1996; 5:1973-83. [PMID: 8897597 PMCID: PMC2143259 DOI: 10.1002/pro.5560051003] [Citation(s) in RCA: 126] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Tricodon regions on messenger RNAs corresponding to a set of proteins from Escherichia coli were scrutinized for their translation speed. The fractional frequency values of the individual codons as they occur in mRNAs of highly expressed genes from Escherichia coli were taken as an indicative measure of the translation speed. The tricodons were classified by the sum of the frequency values of the constituent codons. Examination of the conformation of the encoded amino acid residues in the corresponding protein tertiary structures revealed a correlation between codon usage in mRNA and topological features of the encoded proteins. Alpha helices on proteins tend to be preferentially coded by translationally fast mRNA regions while the slow segments often code for beta strands and coil regions. Fast regions correspondingly avoid coding for beta strands and coil regions while the slow regions similarly move away from encoding alpha helices. Structural and mechanistic aspects of the ribosome peptide channel support the relevance of sequence fragment translation and subsequent conformation. A discussion is presented relating the observation to the reported kinetic data on the formation and stabilization of protein secondary structural types during protein folding. The observed absence of such strong positive selection for codons in non-highly expressed genes is compatible with existing theories that mutation pressure may well dominate codon selection in non-highly expressed genes.
Collapse
Affiliation(s)
- T A Thanaraj
- European Molecular Biology Laboratory, Heidelberg, Germany.
| | | |
Collapse
|
20
|
Abstract
Because regions on the messenger ribonucleic acid differ in the rate at which they are translated by the ribosome and because proteins can fold cotranslationally on the ribosome, a question arises as to whether the kinetics of translation influence the folding events in the growing nascent polypeptide chain. Translationally slow regions were identified on mRNAs for a set of 37 multidomain proteins from Escherichia coli with known three-dimensional structures. The frequencies of individual codons in mRNAs of highly expressed genes from E. coli were taken as a measure of codon translation speed. Analysis of codon usage in slow regions showed a consistency with the experimentally determined translation rates of codons; abundant codons that are translated with faster speeds compared with their synonymous codons were found to be avoided; rare codons that are translated at an unexpectedly higher rate were also found to be avoided in slow regions. The statistical significance of the occurrence of such slow regions on mRNA spans corresponding to the oligopeptide domain termini and linking regions on the encoded proteins was assessed. The amino acid type and the solvent accessibility of the residues coded by such slow regions were also examined. The results indicated that protein domain boundaries that mark higher-order structural organization are largely coded by translationally slow regions on the RNA and are composed of such amino acids that are stickier to the ribosome channel through which the synthesized polypeptide chain emerges into the cytoplasm. The translationally slow nucleotide regions on mRNA possess the potential to form hairpin secondary structures and such structures could further slow the movement of ribosome. The results point to an intriguing correlation between protein synthesis machinery and in vivo protein folding. Examination of available mutagenic data indicated that the effects of some of the reported mutations were consistent with our hypothesis.
Collapse
Affiliation(s)
- T A Thanaraj
- European Molecular Biology Laboratory, Heidelberg, Germany.
| | | |
Collapse
|
21
|
Siemion IZ, Siemion PJ, Krajewski K. Chou-Fasman conformational amino acid parameters and the genetic code. Biosystems 1995; 36:231-8. [PMID: 8573701 DOI: 10.1016/0303-2647(95)01559-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
It was found that the distribution of Chou-Fasman P alpha conformational parameters within the genetic code (arranged into the one-step mutation ring) may be described by a quite simple trigonometric function of mutational angle. The mutational angle is defined as k pi/32, where k is a number of codons count from i under consideration to k. The principal eight-codon periodicity defines the P alpha-genetic code correspondence, but the other perioditicies seem also to modulate the principal function. The eight-codon periodicity finds the explanation in the regular changes of third bases of successive codons. These changes appear in the order; C,U,A,G,G,A,U,C, assigning eight maxima and eight minima of P alpha curve. The experimental P alpha values fit well the dependence found, except proline, the amino acid which breaks the regular eight-codon P alpha periodicity. The analysis of dependence obtained suggest that, in agreement with the hypothesis of Jukes (1973), arginine CGR and AGR codons could be in an earlier genetic code used for coding of ornithine.
Collapse
Affiliation(s)
- I Z Siemion
- Institute of Chemistry, Wroclaw University, Poland
| | | | | |
Collapse
|