1
|
Michel CJ. Genes on the circular code alphabet. Biosystems 2021; 206:104431. [PMID: 33894288 DOI: 10.1016/j.biosystems.2021.104431] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 04/15/2021] [Accepted: 04/15/2021] [Indexed: 02/07/2023]
Abstract
The X motifs, motifs from the circular code X, are enriched in the (protein coding) genes of bacteria, archaea, eukaryotes, plasmids and viruses, moreover, in the minimal gene set belonging to the three domains of life, as well as in tRNA and rRNA sequences. They allow to retrieve, maintain and synchronize the reading frame in genes, and contribute to the regulation of gene expression. These results lead here to a theoretical study of genes based on the circular code alphabet. A new occurrence relation of the circular code X under the hypothesis of an equiprobable (balanced) strand pairing is given. Surprisingly, a statistical analysis of a large set of bacterial genes retrieves this relation on the circular code alphabet, but not on the DNA alphabet. Furthermore, the circular code X has the strongest balanced circular code pairing among 216 maximal C3 self-complementary trinucleotide circular codes, a new property of this circular code X. As an application of this theory, different tRNAs studied on the circular code alphabet reveal an unexpected stem structure. Thus, the circular code X would have constructed a coding stem in tRNAs as an outline of the future gene structure and the future DNA double helix.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, CNRS, University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
2
|
Michel CJ, Ngoune VN, Poch O, Ripp R, Thompson JD. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae. Life (Basel) 2017; 7:life7040052. [PMID: 29207500 PMCID: PMC5745565 DOI: 10.3390/life7040052] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 11/27/2017] [Accepted: 11/27/2017] [Indexed: 12/17/2022] Open
Abstract
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Collapse
Affiliation(s)
- Christian J Michel
- Complex Systems and Translational Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Viviane Nguefack Ngoune
- Complex Systems and Translational Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Raymond Ripp
- Complex Systems and Translational Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
3
|
Diletter circular codes over finite alphabets. Math Biosci 2017; 294:120-129. [PMID: 29024747 DOI: 10.1016/j.mbs.2017.10.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 08/26/2017] [Accepted: 10/08/2017] [Indexed: 11/22/2022]
Abstract
The graph approach of circular codes recently developed (Fimmel et al., 2016) allows here a detailed study of diletter circular codes over finite alphabets. A new class of circular codes is identified, strong comma-free codes. New theorems are proved with the diletter circular codes of maximal length in relation to (i) a characterisation of their graphs as acyclic tournaments; (ii) their explicit description; and (iii) the non-existence of other maximal diletter circular codes. The maximal lengths of paths in the graphs of the comma-free and strong comma-free codes are determined. Furthermore, for the first time, diletter circular codes are enumerated over finite alphabets. Biological consequences of dinucleotide circular codes are analysed with respect to their embedding in the trinucleotide circular code X identified in genes and to the periodicity modulo 2 observed in introns. An evolutionary hypothesis of circular codes is also proposed according to their combinatorial properties.
Collapse
|
4
|
Abstract
In this paper, we explain why the chaotic mutation (CM) model of J. M. Bahi and C. Michel (2008) simulates the genes mutations over time with good accuracy. It is firstly shown that the CM model is a truly chaotic one, as it is defined by Devaney. Then, it is established that mutations occurring in genes mutations have indeed a same chaotic dynamic, thus making relevant the use of chaotic models for genomes evolution. Transposition and inversion dynamics are finally investigated.
Collapse
Affiliation(s)
- Jacques M. Bahi
- Institut FEMTO-ST, Université Bourgogne Franche-Comté, 25000 Besançon, France
| | - Christophe Guyeux
- Institut FEMTO-ST, Université Bourgogne Franche-Comté, 25000 Besançon, France
| | - Antoine Perasso
- UMR Chrono-Environnement, Université Bourgogne Franche-Comté, 25000 Besançon, France
| |
Collapse
|
5
|
Benard E, Lèbre S, Michel CJ. Genome evolution by transformation, expansion and contraction (GETEC). Biosystems 2015; 135:15-34. [PMID: 26135206 DOI: 10.1016/j.biosystems.2015.05.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Revised: 05/04/2015] [Accepted: 05/21/2015] [Indexed: 10/23/2022]
Abstract
We propose here the GETEC (Genome Evolution by Transformation, Expansion and Contraction) model of gene evolution based on substitution, insertion and deletion of genetic motifs. The GETEC model unifies two classes of evolution models: models of substitution, insertion and deletion of nucleotides as function of time (Lèbre and Michel, 2010) and sequence length (Lèbre and Michel, 2012), and models of symmetric substitution of genetic motifs as function of time (Benard and Michel, 2011). Evolution of genetic motifs based on substitution, insertion and deletion is modeled by a differential equation whose analytical solutions give an expression of the genetic motif occurrence probabilities as a function of time or sequence length, as well as in direct time direction (past-present) or inverse time direction (present-past). Evolution models with "substitution only", i.e. without insertion and deletion, and with "insertion and deletion only", i.e. without substitution, are particular cases of the GETEC model. We have also developed a research software for computing the analytical solutions of the GETEC model. It is freely accessible at http://icube-bioinfo.u-strasbg.fr/webMathematica/GETEC/ or via the web site http://dpt-info.u-strasbg.fr/∼michel/.
Collapse
Affiliation(s)
- Emmanuel Benard
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Sophie Lèbre
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Christian J Michel
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
6
|
Michel CJ. The maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses. J Theor Biol 2015; 380:156-77. [PMID: 25934352 DOI: 10.1016/j.jtbi.2015.04.009] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 02/28/2015] [Accepted: 04/09/2015] [Indexed: 11/28/2022]
Abstract
In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
7
|
Michel CJ. An extended genetic scale of reading frame coding. J Theor Biol 2014; 365:164-74. [PMID: 25311909 DOI: 10.1016/j.jtbi.2014.09.040] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 08/29/2014] [Accepted: 09/30/2014] [Indexed: 11/29/2022]
Abstract
The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. An extended definition of the statistical parameter PrRFC (Michel, 2014) is proposed here for analysing the probability (efficiency) of reading frame coding of usage of any trinucleotide code. It is applied to the analysis of the RFC efficiency of usage of the C(3) self-complementary trinucleotide circular code X identified in prokaryotic and eukaryotic genes (Arquès and Michel, 1996). The usage of X is called usage XU. The highest RFC probabilities of usage XU are identified in bacterial plasmids and bacteria (about 49.0%). Then, by decreasing values, the RFC probabilities of usage XU are observed in archaea (47.5%), viruses (45.4%) and nuclear eukaryotes (42.8%). The lowest RFC probabilities of usage XU are found in mitochondria and chloroplasts (about 36.5%). Thus, genes contain information for reading frame coding. Such a genetic property which to our knowledge has never been identified, may bring new insights in the origin and evolution of the genetic code.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
8
|
Michel CJ. Circular code motifs in transfer RNAs. Comput Biol Chem 2013; 45:17-29. [DOI: 10.1016/j.compbiolchem.2013.02.004] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 02/28/2013] [Indexed: 10/27/2022]
|