1
|
Michel CJ. Circular code identified by the codon usage. Biosystems 2024; 244:105308. [PMID: 39159879 DOI: 10.1016/j.biosystems.2024.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/04/2024] [Accepted: 08/13/2024] [Indexed: 08/21/2024]
Abstract
Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists' access to circular code theory. By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal C3 self-complementary trinucleotide circular code X in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
2
|
The maximality of circular codes in genes statistically verified. Biosystems 2020; 197:104201. [DOI: 10.1016/j.biosystems.2020.104201] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 06/22/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022]
|
3
|
Warthi G, Seligmann H. Transcripts with systematic nucleotide deletion of 1-12 nucleotide in human mitochondrion suggest potential non-canonical transcription. PLoS One 2019; 14:e0217356. [PMID: 31120958 PMCID: PMC6532905 DOI: 10.1371/journal.pone.0217356] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 05/09/2019] [Indexed: 11/22/2022] Open
Abstract
Raw transcriptomic data contain numerous RNA reads whose homology with template DNA doesn't match canonical transcription. Transcriptome analyses usually ignore such noncanonical RNA reads. Here, analyses search for noncanonical mitochondrial RNAs systematically deleting 1 to 12 nucleotides after each transcribed nucleotide triplet, producing deletion-RNAs (delRNAs). We detected delRNAs in the human whole cell and purified mitochondrial transcriptomes, and in Genbank's human EST database corresponding to systematic deletions of 1 to 12 nucleotides after each transcribed trinucleotide. DelRNAs detected in both transcriptomes mapped along with 55.63% of the EST delRNAs. A bias exists for delRNAs covering identical mitogenomic regions in both transcriptomic and EST datasets. Among 227 delRNAs detected in these 3 datasets, 81.1% and 8.4% of delRNAs were mapped on mitochondrial coding and hypervariable region 2 of dloop. Del-transcription analyses of GenBank's EST database confirm observations from whole cell and purified mitochondrial transcriptomes, eliminating the possibility that detected delRNAs are false positives matches, cytosolic DNA/RNA nuclear contamination or sequencing artefacts. These detected delRNAs are enriched in frameshift-inducing homopolymers and are poor in frameshift-preventing circular code codons (a set of 20 codons which regulate reading frame detection, over- and underrepresented in coding and other frames of genes, respectively) suggesting a motif-based regulation of non-canonical transcription. These findings show that rare non-canonical transcripts exist. Such non canonical del-transcription does increases mitochondrial coding potential and non-coding regulation of intracellular mechanisms, and could explain the dark DNA conundrum.
Collapse
Affiliation(s)
- Ganesh Warthi
- Aix-Marseille Université, IRD, VITROME, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France
| | - Hervé Seligmann
- Aix-Marseille Université, IRD, MEPHI, Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, Marseille, France
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
4
|
Mathematical fundamentals for the noise immunity of the genetic code. Biosystems 2018; 164:186-198. [DOI: 10.1016/j.biosystems.2017.09.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 09/07/2017] [Accepted: 09/08/2017] [Indexed: 01/05/2023]
|
5
|
El Houmami N, Seligmann H. Evolution of Nucleotide Punctuation Marks: From Structural to Linear Signals. Front Genet 2017; 8:36. [PMID: 28396681 PMCID: PMC5366352 DOI: 10.3389/fgene.2017.00036] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/13/2017] [Indexed: 01/13/2023] Open
Abstract
We present an evolutionary hypothesis assuming that signals marking nucleotide synthesis (DNA replication and RNA transcription) evolved from multi- to unidimensional structures, and were carried over from transcription to translation. This evolutionary scenario presumes that signals combining secondary and primary nucleotide structures are evolutionary transitions. Mitochondrial replication initiation fits this scenario. Some observations reported in the literature corroborate that several signals for nucleotide synthesis function in translation, and vice versa. (a) Polymerase-induced frameshift mutations occur preferentially at translational termination signals (nucleotide deletion is interpreted as termination of nucleotide polymerization, paralleling the role of stop codons in translation). (b) Stem-loop hairpin presence/absence modulates codon-amino acid assignments, showing that translational signals sometimes combine primary and secondary nucleotide structures (here codon and stem-loop). (c) Homopolymer nucleotide triplets (AAA, CCC, GGG, TTT) cause transcriptional and ribosomal frameshifts. Here we find in recently described human mitochondrial RNAs that systematically lack mono-, dinucleotides after each trinucleotide (delRNAs) that delRNA triplets include 2x more homopolymers than mitogenome regions not covered by delRNA. Further analyses of delRNAs show that the natural circular code X (a little-known group of 20 translational signals enabling ribosomal frame retrieval consisting of 20 codons {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} universally overrepresented in coding versus other frames of gene sequences), regulates frameshift in transcription and translation. This dual transcription and translation role confirms for X the hypothesis that translational signals were carried over from transcriptional signals.
Collapse
Affiliation(s)
- Nawal El Houmami
- URMITE, Aix Marseille Université UM63, CNRS 7278, IRD 198, INSERM 1095, IHU - Méditerranée Infection Marseille, France
| | - Hervé Seligmann
- URMITE, Aix Marseille Université UM63, CNRS 7278, IRD 198, INSERM 1095, IHU - Méditerranée Infection Marseille, France
| |
Collapse
|
6
|
Codon Distribution in Error-Detecting Circular Codes. Life (Basel) 2016; 6:life6010014. [PMID: 26999215 PMCID: PMC4810245 DOI: 10.3390/life6010014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 02/24/2016] [Accepted: 03/10/2016] [Indexed: 11/17/2022] Open
Abstract
In 1957, Francis Crick et al. suggested an ingenious explanation for the process of frame maintenance. The idea was based on the notion of comma-free codes. Although Crick’s hypothesis proved to be wrong, in 1996, Arquès and Michel discovered the existence of a weaker version of such codes in eukaryote and prokaryote genomes, namely the so-called circular codes. Since then, circular code theory has invariably evoked great interest and made significant progress. In this article, the codon distributions in maximal comma-free, maximal self-complementary C3 and maximal self-complementary circular codes are discussed, i.e., we investigate in how many of such codes a given codon participates. As the main (and surprising) result, it is shown that the codons can be separated into very few classes (three, or five, or six) with respect to their frequency. Moreover, the distribution classes can be hierarchically ordered as refinements from maximal comma-free codes via maximal self-complementary C3 codes to maximal self-complementary circular codes.
Collapse
|
7
|
Fimmel E, Giannerini S, Gonzalez DL, Strüngmann L. Dinucleotide circular codes and bijective transformations. J Theor Biol 2015; 386:159-65. [PMID: 26423358 DOI: 10.1016/j.jtbi.2015.08.034] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Revised: 07/30/2015] [Accepted: 08/29/2015] [Indexed: 11/20/2022]
Abstract
The presence of circular codes in mRNA coding sequences is postulated to be involved in informational mechanisms aimed at detecting and maintaining the normal reading frame during protein synthesis. Most of the recent research is focused on trinucleotide circular codes. However, also dinucleotide circular codes are important since dinucleotides are ubiquitous in genomes and associated to important biological functions. In this work we adopt the group theoretic approach used for trinucleotide codes in Fimmel et al. (2015) to study dinucleotide circular codes and highlight their symmetry properties. Moreover, we characterize such codes in terms of n-circularity and provide a graph representation that allows to visualize them geometrically. The results establish a theoretical framework for the study of the biological implications of dinucleotide circular codes in genomic sequences.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute for Mathematical Biology, Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Simone Giannerini
- Department of Statistical Sciences, University of Bologna, 40126, Bologna, Italy.
| | - Diego Luis Gonzalez
- CNR-IMM, Sezione di Bologna, Via Gobetti 101, I-40129, Bologna, Italia; Department of Statistical Sciences, University of Bologna, 40126, Bologna, Italy.
| | - Lutz Strüngmann
- Institute for Mathematical Biology, Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
8
|
Michel CJ. The maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses. J Theor Biol 2015; 380:156-77. [PMID: 25934352 DOI: 10.1016/j.jtbi.2015.04.009] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 02/28/2015] [Accepted: 04/09/2015] [Indexed: 11/28/2022]
Abstract
In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
9
|
Michel CJ. An extended genetic scale of reading frame coding. J Theor Biol 2014; 365:164-74. [PMID: 25311909 DOI: 10.1016/j.jtbi.2014.09.040] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 08/29/2014] [Accepted: 09/30/2014] [Indexed: 11/29/2022]
Abstract
The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. An extended definition of the statistical parameter PrRFC (Michel, 2014) is proposed here for analysing the probability (efficiency) of reading frame coding of usage of any trinucleotide code. It is applied to the analysis of the RFC efficiency of usage of the C(3) self-complementary trinucleotide circular code X identified in prokaryotic and eukaryotic genes (Arquès and Michel, 1996). The usage of X is called usage XU. The highest RFC probabilities of usage XU are identified in bacterial plasmids and bacteria (about 49.0%). Then, by decreasing values, the RFC probabilities of usage XU are observed in archaea (47.5%), viruses (45.4%) and nuclear eukaryotes (42.8%). The lowest RFC probabilities of usage XU are found in mitochondria and chloroplasts (about 36.5%). Thus, genes contain information for reading frame coding. Such a genetic property which to our knowledge has never been identified, may bring new insights in the origin and evolution of the genetic code.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
10
|
Michel CJ. A genetic scale of reading frame coding. J Theor Biol 2014; 355:83-94. [DOI: 10.1016/j.jtbi.2014.03.029] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Revised: 03/18/2014] [Accepted: 03/18/2014] [Indexed: 11/27/2022]
|
11
|
Fimmel E, Giannerini S, Gonzalez DL, Strüngmann L. Circular codes, symmetries and transformations. J Math Biol 2014; 70:1623-44. [PMID: 25008961 DOI: 10.1007/s00285-014-0806-7] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Revised: 02/19/2014] [Indexed: 11/29/2022]
Abstract
Circular codes, putative remnants of primeval comma-free codes, have gained considerable attention in the last years. In fact they represent a second kind of genetic code potentially involved in detecting and maintaining the normal reading frame in protein coding sequences. The discovering of an universal code across species suggested many theoretical and experimental questions. However, there is a key aspect that relates circular codes to symmetries and transformations that remains to a large extent unexplored. In this article we aim at addressing the issue by studying the symmetries and transformations that connect different circular codes. The main result is that the class of 216 C3 maximal self-complementary codes can be partitioned into 27 equivalence classes defined by a particular set of transformations. We show that such transformations can be put in a group theoretic framework with an intuitive geometric interpretation. More general mathematical results about symmetry transformations which are valid for any kind of circular codes are also presented. Our results pave the way to the study of the biological consequences of the mathematical structure behind circular codes and contribute to shed light on the evolutionary steps that led to the observed symmetries of present codes.
Collapse
Affiliation(s)
- Elena Fimmel
- Faculty of Computer Sciences, Institute of Applied Mathematics, Mannheim University of Applied Sciences, 68163 , Mannheim, Germany,
| | | | | | | |
Collapse
|
12
|
Gonzalez D, Giannerini S, Rosa R. Circular codes revisited: A statistical approach. J Theor Biol 2011; 275:21-8. [DOI: 10.1016/j.jtbi.2011.01.028] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 01/18/2011] [Accepted: 01/19/2011] [Indexed: 11/29/2022]
|
13
|
Michel CJ. An analytical model of gene evolution with 9 mutation parameters: an application to the amino acids coded by the common circular code. Bull Math Biol 2006; 69:677-98. [PMID: 16952018 DOI: 10.1007/s11538-006-9147-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2006] [Accepted: 05/31/2006] [Indexed: 10/24/2022]
Abstract
We develop here an analytical evolutionary model based on a trinucleotide mutation matrix 64 x 64 with nine substitution parameters associated with the three types of substitutions in the three trinucleotide sites. It generalizes the previous models based on the nucleotide mutation matrices 4 x 4 and the trinucleotide mutation matrix 64 x 64 with three and six parameters. It determines at some time t the exact occurrence probabilities of trinucleotides mutating randomly according to these nine substitution parameters. An application of this model allows an evolutionary study of the common circular code [Formula: see text] of eukaryotes and prokaryotes and its 12 coded amino acids. The main property of this code [Formula: see text] is the retrieval of the reading frames in genes, both locally, i.e. anywhere in genes and in particular without a start codon, and automatically with a window of a few nucleotides. However, since its identification in 1996, amino acid information coded by [Formula: see text] has never been studied. Very unexpectedly, this evolutionary model demonstrates that random substitutions in this code [Formula: see text] and with particular values for the nine substitutions parameters retrieve after a certain time of evolution a frequency distribution of these 12 amino acids very close to the one coded by the actual genes.
Collapse
Affiliation(s)
- Christian J Michel
- Equipe de Bioinformatique Théorique, LSIIT (UMR CNRS-ULP 7005), Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
14
|
Frey G, Michel CJ. An analytical model of gene evolution with six mutation parameters: an application to archaeal circular codes. Comput Biol Chem 2006; 30:1-11. [PMID: 16324886 DOI: 10.1016/j.compbiolchem.2005.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Revised: 09/04/2005] [Accepted: 09/05/2005] [Indexed: 11/17/2022]
Abstract
We develop here an analytical evolutionary model based on a trinucleotide mutation matrix 64 x 64 with six substitution parameters associated with the transitions and transversions in the three trinucleotide sites. It generalizes the previous models based on the nucleotide mutation matrices 4 x 4 and the trinucleotide mutation matrix 64 x 64 with three parameters. It determines at some time t the exact occurrence probabilities of trinucleotides mutating randomly according to six substitution parameters. An application of this model allows an evolutionary study of the common circular code COM and the 15 archaeal circular codes X which have been recently identified in several archaeal genomes. The main property of a circular code is the retrieval of the reading frames in genes, both locally, i.e. anywhere in genes and in particular without a start codon, and automatically with a window of a few nucleotides. In genes, the circular code is superimposed on the traditional genetic one. Very unexpectedly, the evolutionary model demonstrates that the archaeal circular codes can derive from the common circular code subjected to random substitutions with particular values for six substitutions parameters. It has a strong correlation with the statistical observations of three archaeal codes in actual genes. Furthermore, the properties of these substitution rates allow proposal of an evolutionary classification of the 15 archaeal codes into three main classes according to this model. In almost all the cases, they agree with the actual degeneracy of the genetic code with substitutions more frequent in the third trinucleotide site and with transitions more frequent that transversions in any trinucleotide site.
Collapse
Affiliation(s)
- Gabriel Frey
- Equipe de Bioinformatique Théorique, LSIIT (UMR CNRS-ULP 7005), Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| | | |
Collapse
|
15
|
Frey G, Michel CJ. Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes. Comput Biol Chem 2006; 30:87-101. [PMID: 16439185 DOI: 10.1016/j.compbiolchem.2005.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2005] [Revised: 11/07/2005] [Accepted: 11/07/2005] [Indexed: 10/25/2022]
Abstract
We developed a statistical method that allows each trinucleotide to be associated with a unique frame among the three possible ones in a (protein coding) gene. An extensive gene study in 175 complete bacterial genomes based on this statistical approach resulted in identification of 72 new circular codes. Finding a circular code enables an immediate retrieval of the reading frame locally anywhere in a gene. No knowledge of location of the start codon is required and a short window of only a few nucleotides is sufficient for automatic retrieval. We have therefore developed a factorization method (that explores previously found circular codes) for retrieving the reading frames of bacterial genes. Its principle is new and easy to understand. Neither complex treatment nor specific information on the nucleotide sequences is necessary. Moreover, the method can be used for short regions in nucleotide sequences (less than 25 nucleotides in protein coding genes). Selected additional properties of circular codes and their possible biological consequences are also discussed.
Collapse
Affiliation(s)
- Gabriel Frey
- Equipe de Bioinformatique Théorique, LSIIT (UMR CNRS-ULP 7005), Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| | | |
Collapse
|
16
|
Abstract
A new statistical method associating each trinucleotide with a frame is developed for identifying circular codes. Its sensibility allows the detection of several circular codes in the (protein coding) genes of archaeal genomes. Several properties of these circular codes are described, in particular the lengths of the minimal windows to retrieve the construction frames, a new definition of a parameter for measuring some probabilities of words generated by the circular codes, and the types of nucleotides in the trinucleotide sites. Some biological consequences are presented in Discussion.
Collapse
Affiliation(s)
- Gabriel Frey
- Equipe de Bioinformatique Théorique, LSIIT, UMR CNRS-ULP 7005, Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| | | |
Collapse
|