1
|
Michel CJ. Circular code identified by the codon usage. Biosystems 2024; 244:105308. [PMID: 39159879 DOI: 10.1016/j.biosystems.2024.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/04/2024] [Accepted: 08/13/2024] [Indexed: 08/21/2024]
Abstract
Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists' access to circular code theory. By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal C3 self-complementary trinucleotide circular code X in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
2
|
Michel CJ. Circular code in introns. Biosystems 2024; 239:105215. [PMID: 38641199 DOI: 10.1016/j.biosystems.2024.105215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/21/2024]
Abstract
A massive statistical analysis based on the autocorrelation function of the circular code X observed in genes is performed on the (eukaryotic) introns. Surprisingly, a circular code periodicity 0 modulo 3 is identified in 5 groups of introns: birds, ascomycetes, basidiomycetes, green algae and land plants. This circular code periodicity, which is a property of retrieving the reading frame in (protein coding) genes, may suggest that these introns have a coding property. In a well-known way, a periodicity 1 modulo 2 is observed in 6 groups of introns: amphibians, fishes, mammals, other animals, reptiles and apicomplexans. A mixed periodicity modulo 2 and 3 is found in the introns of insects. Astonishing, a subperiodicity 3 modulo 6 is a common statistical property in these 3 classes of introns. When the particular trinucleotides N1N2N1 of the circular code X are not considered, the circular code periodicity 0 modulo 3, hidden by the periodicity 1 modulo 2, is now retrieved in 5 groups of introns: amphibians, fishes, other animals, reptiles and insects. Thus, 10 groups of introns, taxonomically different, out of 12 have a coding property related to the reading frame retrieval. The trinucleotides N1N2N1 are analysed in the 216 maximal C3 self-complementary trinucleotide circular codes. A hexanucleotide code (words of 6 letters) is proposed to explain the periodicity 3 modulo 6. It could be a trace of more general circular codes at the origin of the circular code X.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
3
|
Fimmel E, Strüngmann L. The spiderweb of error-detecting codes in the genetic information. Biosystems 2023; 233:105009. [PMID: 37640191 DOI: 10.1016/j.biosystems.2023.105009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/21/2023] [Accepted: 08/21/2023] [Indexed: 08/31/2023]
Abstract
Nature possesses inherent mechanisms for error detection and correction during the translation of genetic information, as demonstrated by the discovery of a self-complementary circular C3-code called X0 in various organisms such as bacteria, eukaryotes, plasmids, and viruses (Arquès and Michel, 1996; Michel, 2015, 2017). Since then, extensive research has focused on circular codes, which are believed to be remnants of ancient comma-free codes. These codes can be regarded as an additional genetic code specifically optimized for detecting and preserving the proper reading frame in protein-coding sequences. A study by Fimmel et al. in 2014 identified that a total of 216 maximal self-complementary C3-codes can be grouped into 27 equivalence classes with eight codes in each class. In this work, we study how the 27 equivalence classes are related to each other. While the codes in each equivalence class obtained by Fimmel et al. in 2014 are permutations of each other, i.e. one code can be obtained from the other by applying a permutation of the bases, it has not been clear how the equvalence classes are connected. We show that there is an ordering of the equivalence classes such that one gets from one class to the next one by substituting only one pair of codon/anticodon in the corresponding codes, i.e. the corresponding codes have a maximal intersection of 18 codons. To perform this analysis, we define two graphs, G216 and G27, whose vertices are, respectively, all 216 maximal self-complementary C3-codes and 27 equivalence classes. Several properties of the graphs are obtained. Most surprisingly, it turns out that G27 contains Hamiltonian paths of length 27. This fact ultimately leads to a representation of the set of all 216 maximal self-complementary C3-codes as a kind of spider web. Finally, we define dinucleotide cuts of such codes by projecting each codon to its first two bases and show that the paths of lengths 27 in G216 can even be chosen so that all the codes contain a special subset of dinucleotides defined by Rumer's roots. These observations raise a lot of new questions about the biological function of such structures.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
4
|
Rosandić M, Paar V. The Evolution of Life Is a Road Paved with the DNA Quadruplet Symmetry and the Supersymmetry Genetic Code. Int J Mol Sci 2023; 24:12029. [PMID: 37569405 PMCID: PMC10418607 DOI: 10.3390/ijms241512029] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/19/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Symmetries have not been completely determined and explained from the discovery of the DNA structure in 1953 and the genetic code in 1961. We show, during 10 years of investigation and research, our discovery of the Supersymmetry Genetic Code table in the form of 2 × 8 codon boxes, quadruplet DNA symmetries, and the classification of trinucleotides/codons, all built with the same physiochemical double mirror symmetry and Watson-Crick pairing. We also show that single-stranded RNA had the complete code of life in the form of the Supersymmetry Genetic Code table simultaneously with instructions of codons' relationship as to how to develop the DNA molecule on the principle of Watson-Crick pairing. We show that the same symmetries between the genetic code and DNA quadruplet are highly conserved during the whole evolution even between phylogenetically distant organisms. In this way, decreasing disorder and entropy enabled the evolution of living beings up to sophisticated species with cognitive features. Our hypothesis that all twenty amino acids are necessary for the origin of life on the Earth, which entirely changes our view on evolution, confirms the evidence of organic natural amino acids from the extra-terrestrial asteroid Ryugu, which is nearly as old as our solar system.
Collapse
Affiliation(s)
- Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia;
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia;
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
5
|
Fimmel E, Michel CJ, Strüngmann L. Circular mixed sets. Biosystems 2023; 229:104906. [PMID: 37196893 DOI: 10.1016/j.biosystems.2023.104906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 04/29/2023] [Indexed: 05/19/2023]
Abstract
In this article, we introduce the new mathematical concept of circular mixed sets of words over an arbitrary finite alphabet. These circular mixed sets may not be codes in the classical sense and hence allow a higher amount of information to be encoded. After describing their basic properties, we generalize a recent graph theoretical approach for circularity and apply it to distinguish codes from sets (i.e. non-codes). Moreover, several methods are given to construct circular mixed sets. Finally, this approach allows us to propose a new evolution model of the present genetic code that could have evolved from a dinucleotide world to a trinucleotide world via circular mixed sets of dinucleotides and trinucleotides.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
6
|
Michel CJ, Sereni JS. Reading Frame Retrieval of Genes: A New Parameter of Codon Usage Based on the Circular Code Theory. Bull Math Biol 2023; 85:24. [PMID: 36826719 PMCID: PMC9950712 DOI: 10.1007/s11538-023-01129-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 01/26/2023] [Indexed: 02/25/2023]
Abstract
Based on the circular code theory, we define a new function f that quantifies the property of reading frame retrieval (RFR) of genes from their codon usage. This RFR function f is computed on a massive scale in genes of genomes of bacteria, eukaryotes and archaea. By expressing f as a function of the mean number [Formula: see text] of codons per gene, a "universal" property is identified, whatever the kingdom: the reading frame retrieval is enhanced in large genes. By investigating this property according to the theory developed, a Spearman's rank correlation with a strong negative coefficient is observed between the codon usage dispersion d (from the uniform codon distribution [Formula: see text]) and the RFR function f, whatever the kingdom (p-values [Formula: see text] in bacteria, [Formula: see text] in eukaryotes and [Formula: see text] in archaea). Thus, the reading frame retrieval is enhanced with the codon usage dispersion. Furthermore, this approach identifies a "genome centre" from which emerge two distinct "genome arms": an upper arm and a lower arm, respectively, above and below the linear regression. The RFR function by itself or combined with classical methods (alignment, phylogeny) could also be a new approach to classify the genomes in the future.
Collapse
Affiliation(s)
- Christian J. Michel
- grid.11843.3f0000 0001 2157 9291Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Jean-Sébastien Sereni
- grid.11843.3f0000 0001 2157 9291Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| |
Collapse
|
7
|
Borah C, Ali T. Genetic code noise immunity features: Degeneracy and frameshift correction. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
8
|
Property based analysis: Optimality of RNY comma-free code versus circular code (X) after frameshift errors. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
9
|
Trinucleotide k-circular codes II: Biology. Biosystems 2022; 217:104668. [DOI: 10.1016/j.biosystems.2022.104668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 02/27/2022] [Accepted: 03/16/2022] [Indexed: 11/22/2022]
|
10
|
Michel CJ. Genes on the circular code alphabet. Biosystems 2021; 206:104431. [PMID: 33894288 DOI: 10.1016/j.biosystems.2021.104431] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 04/15/2021] [Accepted: 04/15/2021] [Indexed: 02/07/2023]
Abstract
The X motifs, motifs from the circular code X, are enriched in the (protein coding) genes of bacteria, archaea, eukaryotes, plasmids and viruses, moreover, in the minimal gene set belonging to the three domains of life, as well as in tRNA and rRNA sequences. They allow to retrieve, maintain and synchronize the reading frame in genes, and contribute to the regulation of gene expression. These results lead here to a theoretical study of genes based on the circular code alphabet. A new occurrence relation of the circular code X under the hypothesis of an equiprobable (balanced) strand pairing is given. Surprisingly, a statistical analysis of a large set of bacterial genes retrieves this relation on the circular code alphabet, but not on the DNA alphabet. Furthermore, the circular code X has the strongest balanced circular code pairing among 216 maximal C3 self-complementary trinucleotide circular codes, a new property of this circular code X. As an application of this theory, different tRNAs studied on the circular code alphabet reveal an unexpected stem structure. Thus, the circular code X would have constructed a coding stem in tRNAs as an outline of the future gene structure and the future DNA double helix.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, CNRS, University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
11
|
Gumbel M, Wiedemann P. Motif lengths of circular codes in coding sequences. J Theor Biol 2021; 523:110708. [PMID: 33862087 DOI: 10.1016/j.jtbi.2021.110708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 03/22/2021] [Accepted: 03/30/2021] [Indexed: 10/21/2022]
Abstract
Protein synthesis is a crucial process in any cell. Translation, in which mRNA is translated into proteins, can lead to several errors, notably frame shifts where the ribosome accidentally skips or re-reads one or more nucleotides. So-called circular codes are capable of discovering frame shifts and their codons can be found disproportionately often in coding sequences. Here, we analyzed motifs of circular codes, i.e. sequences only containing codons of circular codes, in biological and artificial sequences. The lengths of these motifs were compared to a statistical model in order to elucidate if coding sequences contain significantly longer motifs than non-coding sequences. Our findings show that coding sequences indeed show on average greater motif lengths than expected by chance. On the other hand, the motifs are too short for a possible frame shift recognition to take place within an entire coding sequence. This suggests that as much as circular codes might have been used in ancient life forms in order to prevent frame shift errors, it remains to be seen whether they are still functional in current organisms.
Collapse
Affiliation(s)
- M Gumbel
- Competence Center for Mathematical and Algorithmical Methods in Biology, Biotechnology and Medicine, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - P Wiedemann
- Competence Center for Mathematical and Algorithmical Methods in Biology, Biotechnology and Medicine, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
12
|
Shelah S, Strüngmann L. Infinite combinatorics in mathematical biology. Biosystems 2021; 204:104392. [PMID: 33731280 DOI: 10.1016/j.biosystems.2021.104392] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 02/11/2021] [Accepted: 02/18/2021] [Indexed: 12/12/2022]
Abstract
Is it possible to apply infinite combinatorics and (infinite) set theory in theoretical biology? We do not know the answer yet but in this article we try to present some techniques from infinite combinatorics and set theory that have been used over the last decades in order to prove existence results and independence theorems in algebra and that might have the flexibility and generality to be also used in theoretical biology. In particular, we will introduce the theory of forcing and an algebraic construction technique based on trees and forests using infinite binary sequences. We will also present an overview of the theory of circular codes. Such codes had been found in the genetic information and are assumed to play an important role in error detecting and error correcting mechanisms during the process of translation. Finally, examples and constructions of infinite mixed circular codes using binary sequences hopefully show some similarity between these theories - a starting point for future applications.
Collapse
Affiliation(s)
- Saharon Shelah
- Einstein Institute of Mathematics, The Hebrew University of Jerusalem(1), 9190401, Jerusalem, Israel; Department of Mathematics, Rutgers University, Piscataway, NJ, 08854-8019, USA.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163, Mannheim, Germany.
| |
Collapse
|
13
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
14
|
Fayazi F, Fimmel E, Strüngmann L. Equivalence classes of circular codes induced by permutation groups. Theory Biosci 2021; 140:107-121. [PMID: 33523355 PMCID: PMC7897628 DOI: 10.1007/s12064-020-00337-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/27/2020] [Indexed: 11/27/2022]
Abstract
In the 1950s, Crick proposed the concept of so-called comma-free codes as an answer to the frame-shift problem that biologists have encountered when studying the process of translating a sequence of nucleotide bases into a protein. A little later it turned out that this proposal unfortunately does not correspond to biological reality. However, in the mid-90s, a weaker version of comma-free codes, so-called circular codes, was discovered in nature in J Theor Biol 182:45–58, 1996. Circular codes allow to retrieve the reading frame during the translational process in the ribosome and surprisingly the circular code discovered in nature is even circular in all three possible reading-frames (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$C^3$$\end{document}C3-property). Moreover, it is maximal in the sense that it contains 20 codons and is self-complementary which means that it consists of pairs of codons and corresponding anticodons. In further investigations, it was found that there are exactly 216 codes that have the same strong properties as the originally found code from J Theor Biol 182:45–58. Using an algebraic approach, it was shown in J Math Biol, 2004 that the class of 216 maximal self-complementary \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$C^3$$\end{document}C3-codes can be partitioned into 27 equally sized equivalence classes by the action of a transformation group \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$L \subseteq S_4$$\end{document}L⊆S4 which is isomorphic to the dihedral group. Here, we extend the above findings to circular codes over a finite alphabet of even cardinality \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$|\Sigma |=2n$$\end{document}|Σ|=2n for \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n \in {\mathbb {N}}$$\end{document}n∈N. We describe the corresponding group \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$L_n$$\end{document}Ln using matrices and we investigate what classes of circular codes are split into equally sized equivalence classes under the natural equivalence relation induced by \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$L_n$$\end{document}Ln. Surprisingly, this is not always the case. All results and constructions are illustrated by examples.
Collapse
Affiliation(s)
- Fariba Fayazi
- Department of Mathematics, Faculty of Science, University of Qom, Qom, IR Iran
| | - Elena Fimmel
- Faculty of Computer Science, Insitute for Mathematical Biology, Mannheim University of Applied Science, Mannheim, Germany
| | - Lutz Strüngmann
- Faculty of Computer Science, Insitute for Mathematical Biology, Mannheim University of Applied Science, Mannheim, Germany
| |
Collapse
|
15
|
The maximality of circular codes in genes statistically verified. Biosystems 2020; 197:104201. [DOI: 10.1016/j.biosystems.2020.104201] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 06/22/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022]
|
16
|
Fimmel E, Michel CJ, Pirot F, Sereni JS, Starman M, Strüngmann L. The Relation Between k-Circularity and Circularity of Codes. Bull Math Biol 2020; 82:105. [PMID: 32754878 PMCID: PMC7402406 DOI: 10.1007/s11538-020-00770-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 06/24/2020] [Indexed: 12/15/2022]
Abstract
A code X is k-circular if any concatenation of at most k words from X, when read on a circle, admits exactly one partition into words from X. It is circular if it is k-circular for every integer k. While it is not a priori clear from the definition, there exists, for every pair \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$(n,\ell )$$\end{document}(n,ℓ), an integer k such that every k-circular \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ell $$\end{document}ℓ-letter code over an alphabet of cardinality n is circular, and we determine the least such integer k for all values of n and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ell $$\end{document}ℓ. The k-circular codes may represent an important evolutionary step between the circular codes, such as the comma-free codes, and the genetic code.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
| | - Christian J. Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - François Pirot
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
- LORIA (Orpailleur), C.N.R.S., University of Lorraine, INRIA, Campus scientifique, 54506 Vandœuvre-lès-Nancy Cedex, France
| | - Jean-Sébastien Sereni
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Martin Starman
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
| |
Collapse
|
17
|
Dila G, Michel CJ, Thompson JD. Optimality of circular codes versus the genetic code after frameshift errors. Biosystems 2020; 195:104134. [DOI: 10.1016/j.biosystems.2020.104134] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/24/2022]
|
18
|
Abstract
The origin of the modern genetic code and the mechanisms that have contributed to its present form raise many questions. The main goal of this work is to test two hypotheses concerning the development of the genetic code for their compatibility and complementarity and see if they could benefit from each other. On the one hand, Gonzalez, Giannerini and Rosa developed a theory, based on four-based codons, which they called tesserae. This theory can explain the degeneracy of the modern vertebrate mitochondrial code. On the other hand, in the 1990s, so-called circular codes were discovered in nature, which seem to ensure the maintenance of a correct reading-frame during the translation process. It turns out that the two concepts not only do not contradict each other, but on the contrary complement and enrichen each other.
Collapse
|
19
|
Michel CJ, Thompson JD. Identification of a circular code periodicity in the bacterial ribosome: origin of codon periodicity in genes? RNA Biol 2020; 17:571-583. [PMID: 31960748 PMCID: PMC8647727 DOI: 10.1080/15476286.2020.1719311] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 01/10/2020] [Accepted: 01/14/2020] [Indexed: 02/09/2023] Open
Abstract
Three-base periodicity (TBP), where nucleotides and higher order n-tuples are preferentially spaced by 3, 6, 9, etc. bases, is a well-known intrinsic property of protein-coding DNA sequences. However, its origins are still not fully understood. One hypothesis is that the periodicity reflects a primordial coding system that was used before the emergence of the modern standard genetic code (SGC). Recent evidence suggests that the X circular code, a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, represents a possible ancestor of the SGC. Motifs from the X circular code have been found in the reading frame of protein-coding regions in extant organisms from bacteria to eukaryotes, in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase centre and the decoding centre. Here, we have used a powerful correlation function to search for periodicity patterns involving the 20 trinucleotides of the X circular code in a large set of bacterial protein-coding genes, as well as in the translation machinery, including rRNA and tRNA sequences. As might be expected, we found a strong circular code periodicity 0 modulo 3 in the protein-coding genes. More surprisingly, we also identified a similar circular code periodicity in a large region of the 16S rRNA. This region includes the 3' major domain corresponding to the primordial proto-ribosome decoding centre and containing numerous sites that interact with the tRNA and messenger RNA (mRNA) during translation. Furthermore, 3D structural analysis shows that the periodicity region surrounds the mRNA channel that lies between the head and the body of the SSU. Our results support the hypothesis that the X circular code may constitute an ancestral translation code involved in reading frame retrieval and maintenance, traces of which persist in modern mRNA, tRNA and rRNA despite their long evolution and adaptation to the SGC.
Collapse
Affiliation(s)
- Christian J. Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D. Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| |
Collapse
|
20
|
Demongeot J, Seligmann H. Why Is AUG the Start Codon?: Theoretical Minimal RNA Rings: Maximizing Coded Information Biases 1st Codon for the Universal Initiation Codon AUG. Bioessays 2020; 42:e1900201. [PMID: 32227358 DOI: 10.1002/bies.201900201] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 02/09/2020] [Indexed: 01/04/2023]
Abstract
The rational design of theoretical minimal RNA rings predetermines AUG as the universal start codon. This design maximizes coded amino acid diversity over minimal sequence length, defining in silico theoretical minimal RNA rings, candidate ancestral genes. RNA rings code for 21 amino acids and a stop codon after three consecutive translation rounds, and form a degradation-delaying stem-loop hairpin. Twenty-five RNA rings match these constraints, ten start with the universal initiation codon AUG. No first codon bias exists among remaining RNA rings. RNA ring design predetermines AUG as initiation codon. This is the only explanation yet for AUG as start codon. RNA ring design determines additional RNA ring gene- and tRNA-like properties described previously, because it presumably mimics constraints on life's primordial RNAs.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, F-38700, France
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, F-38700, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, 91404, Israel
| |
Collapse
|
21
|
Pentamers with Non-redundant Frames: Bias for Natural Circular Code Codons. J Mol Evol 2020; 88:194-201. [DOI: 10.1007/s00239-019-09925-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 12/17/2019] [Indexed: 02/06/2023]
|
22
|
Dila G, Ripp R, Mayer C, Poch O, Michel CJ, Thompson JD. Circular code motifs in the ribosome: a missing link in the evolution of translation? RNA (NEW YORK, N.Y.) 2019; 25:1714-1730. [PMID: 31506380 PMCID: PMC6859856 DOI: 10.1261/rna.072074.119] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 09/06/2019] [Indexed: 05/29/2023]
Abstract
The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.
Collapse
Affiliation(s)
- Gopal Dila
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
- Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724 Paris Cedex 15, France
- Université Paris Diderot, Sorbonne Paris Cité, 75724 Paris Cedex 15, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| |
Collapse
|
23
|
Dragovich B, Mišić NŽ. p-Adic hierarchical properties of the genetic code. Biosystems 2019; 185:104017. [PMID: 31433999 DOI: 10.1016/j.biosystems.2019.104017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/10/2019] [Accepted: 08/13/2019] [Indexed: 12/11/2022]
Abstract
In this article, we consider p-adic modeling of the standard genetic code and the vertebrate mitochondrial one. To this end, we use 5-adic and 2-adic distance as a mathematical tool to describe closeness (nearness, similarity) between codons as elements of a bioinformation space. Codons which are simultaneously at the smallest 5-adic and 2-adic distance code the same (or similar) amino acid or stop signal. The set of codons is presented as an ultrametric tree as well as a fractal and p-adic network. It is shown that genetic code can be treated as sequential translation between genetic languages. This p-adic approach gives possibility to be applied to sequences of nucleotides of an arbitrary finite length. We present an overview of published and some new results on various p-adic properties of the genetic code.
Collapse
Affiliation(s)
- Branko Dragovich
- Institute of Physics, University of Belgrade, Belgrade, Serbia; Mathematical Institute, Serbian Academy of Sciences and Arts, Belgrade, Serbia.
| | - Nataša Ž Mišić
- Research and Development Institute Lola Ltd, Kneza Višeslava 70a, Belgrade, Serbia.
| |
Collapse
|
24
|
Fimmel E, Michel CJ, Pirot F, Sereni JS, Strüngmann L. Mixed circular codes. Math Biosci 2019; 317:108231. [PMID: 31325443 DOI: 10.1016/j.mbs.2019.108231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 07/16/2019] [Accepted: 07/17/2019] [Indexed: 12/11/2022]
Abstract
By an extensive statistical analysis in genes of bacteria, archaea, eukaryotes, plasmids and viruses, a maximal C3-self-complementary trinucleotide circular code has been found to have the highest average occurrence in the reading frame of the ribosome during translation. Circular codes may play an important role in maintaining the correct reading frame. On the other hand, as several evolutionary theories propose primeval codes based on dinucleotides, trinucleotides and tetranucleotides, mixed circular codes were investigated. By using a graph-theoretical approach of circular codes recently developed, we study mixed circular codes, which are the union of a dinucleotide circular code, a trinucleotide circular code and a tetranucleotide circular code. Maximal mixed circular codes of (di,tri)-nucleotides, (tri,tetra)-nucleotides and (di,tri,tetra)-nucleotides are constructed, respectively. In particular, we show that any maximal dinucleotide circular code of size 6 can be embedded into a maximal mixed (di,tri)-nucleotide circular code such that its trinucleotide component is a maximal C3-comma-free code. The growth function of self-complementary mixed circular codes of dinucleotides and trinucleotides is given. Self-complementary mixed circular codes could have been involved in primitive genetic processes.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, Mannheim 68163, Germany.
| | - Christian J Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, Illkirch 67400, France.
| | - François Pirot
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, Illkirch 67400, France; LORIA (Orpailleur) and Dept. of Mathematics, University of Lorraine and Radboud University, Vandœuvre-lès-Nancy, France and Nijmegen, Netherlands.
| | - Jean-Sébastien Sereni
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, Illkirch 67400, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, Mannheim 68163, Germany.
| |
Collapse
|
25
|
Demongeot J, Seligmann H. Spontaneous evolution of circular codes in theoretical minimal RNA rings. Gene 2019; 705:95-102. [DOI: 10.1016/j.gene.2019.03.069] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/08/2019] [Accepted: 03/29/2019] [Indexed: 02/06/2023]
|
26
|
BłaŻej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. The influence of different types of translational inaccuracies on the genetic code structure. BMC Bioinformatics 2019; 20:114. [PMID: 30841864 PMCID: PMC6404327 DOI: 10.1186/s12859-019-2661-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The standard genetic code is a recipe for assigning unambiguously 21 labels, i.e. amino acids and stop translation signal, to 64 codons. However, at early stages of the translational machinery development, the codons did not have to be read unambiguously and the early genetic codes could have contained some ambiguous assignments of codons to amino acids. Therefore, the goal of this work was to obtain the genetic code structures which could have evolved assuming different types of inaccuracy of the translational machinery starting from unambiguous assignments of codons to amino acids. RESULTS We developed a theoretical model assuming that the level of uncertainty of codon assignments can gradually decrease during the simulations. Since it is postulated that the standard code has evolved to be robust against point mutations and mistranslations, we developed three simulation scenarios assuming that such errors can influence one, two or three codon positions. The simulated codes were selected using the evolutionary algorithm methodology to decrease coding ambiguity and increase their robustness against mistranslation. CONCLUSIONS The results indicate that the typical codon block structure of the genetic code could have evolved to decrease the ambiguity of amino acid to codon assignments and to increase the fidelity of reading the genetic information. However, the robustness to errors was not the decisive factor that influenced the genetic code evolution because it is possible to find theoretical codes that minimize the reading errors better than the standard genetic code.
Collapse
Affiliation(s)
- Paweł BłaŻej
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Małgorzata Wnetrzak
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Dorota Mackiewicz
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Paweł Mackiewicz
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| |
Collapse
|
27
|
Single-Frame, Multiple-Frame and Framing Motifs in Genes. Life (Basel) 2019; 9:life9010018. [PMID: 30744207 PMCID: PMC6463195 DOI: 10.3390/life9010018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 01/24/2019] [Accepted: 01/31/2019] [Indexed: 12/12/2022] Open
Abstract
We study the distribution of new classes of motifs in genes, a research field that has not been investigated to date. A single-frame motif SF has no trinucleotide in reading frame (frame 0) that occurs in a shifted frame (frame 1 or 2), e.g., the dicodon AAACAA is SF as the trinucleotides AAA and CAA do not occur in a shifted frame. A motif which is not single-frame SF is multiple-frame MF. Several classes of MF motifs are defined and analysed. The distributions of single-frame SF motifs (associated with an unambiguous trinucleotide decoding in the two 5′–3′ and 3′–5′ directions) and 5′ unambiguous motifs 5′U (associated with an unambiguous trinucleotide decoding in the 5′–3′ direction only) are analysed without and with constraints. The constraints studied are: initiation and stop codons, periodic codons {AAA,CCC,GGG,TTT}, antiparallel complementarity and parallel complementarity. Taken together, these results suggest that the complementarity property involved in the antiparallel (DNA double helix, RNA stem) and parallel sequences could also be fundamental for coding genes with an unambiguous trinucleotide decoding in the two 5′–3′ and 3′–5′ directions or the 5′–3′ direction only. Furthermore, the single-frame motifs SF with a property of trinucleotide decoding and the framing motifs F (also called circular code motifs; first introduced by Michel (2012)) with a property of reading frame decoding may have been involved in the early life genes to build the modern genetic code and the extant genes. They could have been involved in the stage without anticodon-amino acid interactions or in the Implicated Site Nucleotides (ISN) of RNA interacting with the amino acids. Finally, the SF and MF dipeptides associated with the SF and MF dicodons, respectively, are studied and their importance for biology and the origin of life discussed.
Collapse
|
28
|
Dila G, Michel CJ, Poch O, Ripp R, Thompson JD. Evolutionary conservation and functional implications of circular code motifs in eukaryotic genomes. Biosystems 2019; 175:57-74. [DOI: 10.1016/j.biosystems.2018.10.014] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Revised: 10/08/2018] [Accepted: 10/17/2018] [Indexed: 12/13/2022]
|
29
|
Seligmann H. Protein Sequences Recapitulate Genetic Code Evolution. Comput Struct Biotechnol J 2018; 16:177-189. [PMID: 30002789 PMCID: PMC6040577 DOI: 10.1016/j.csbj.2018.05.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 05/14/2018] [Accepted: 05/17/2018] [Indexed: 12/16/2022] Open
Abstract
Several hypotheses predict ranks of amino acid assignments to genetic code's codons. Analyses here show that average positions of amino acid species in proteins correspond to assignment ranks, in particular as predicted by Juke's neutral mutation hypothesis for codon assignments. In all tested protein groups, including co- and post-translationally folding proteins, 'recent' amino acids are on average closer to gene 5' extremities than 'ancient' ones. Analyses of pairwise residue contact energies matrices suggest that early amino acids stereochemically selected late ones that stablilize residue interactions within protein cores, presumably producing 5'-late-to-3'-early amino acid protein sequence gradients. The gradient might reduce protein misfolding, also after mutations, extending principles of neutral mutations to protein folding. Presumably, in self-perpetuating and self-correcting systems like the genetic code, initial conditions produce similarities between evolution of the process (the genetic code) and 'ontogeny' of resulting structures (here proteins), producing apparent teleonomy between process and product.
Collapse
Affiliation(s)
- Hervé Seligmann
- Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UMR MEPHI, Aix-Marseille Université, IRD, Assistance Publique-Hôpitaux de Marseille, Institut Hospitalo-Universitaire Méditerranée-Infection, 19-21 boulevard Jean Moulin, 13005 Marseille, France.
| |
Collapse
|
30
|
Affiliation(s)
- Jan-Hendrik S Hofmeyr
- Centre for Complex Systems in Transition and Dept. of Biochemistry, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa.
| |
Collapse
|