1
|
Michel CJ. Circular code in introns. Biosystems 2024; 239:105215. [PMID: 38641199 DOI: 10.1016/j.biosystems.2024.105215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/21/2024]
Abstract
A massive statistical analysis based on the autocorrelation function of the circular code X observed in genes is performed on the (eukaryotic) introns. Surprisingly, a circular code periodicity 0 modulo 3 is identified in 5 groups of introns: birds, ascomycetes, basidiomycetes, green algae and land plants. This circular code periodicity, which is a property of retrieving the reading frame in (protein coding) genes, may suggest that these introns have a coding property. In a well-known way, a periodicity 1 modulo 2 is observed in 6 groups of introns: amphibians, fishes, mammals, other animals, reptiles and apicomplexans. A mixed periodicity modulo 2 and 3 is found in the introns of insects. Astonishing, a subperiodicity 3 modulo 6 is a common statistical property in these 3 classes of introns. When the particular trinucleotides N1N2N1 of the circular code X are not considered, the circular code periodicity 0 modulo 3, hidden by the periodicity 1 modulo 2, is now retrieved in 5 groups of introns: amphibians, fishes, other animals, reptiles and insects. Thus, 10 groups of introns, taxonomically different, out of 12 have a coding property related to the reading frame retrieval. The trinucleotides N1N2N1 are analysed in the 216 maximal C3 self-complementary trinucleotide circular codes. A hexanucleotide code (words of 6 letters) is proposed to explain the periodicity 3 modulo 6. It could be a trace of more general circular codes at the origin of the circular code X.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
2
|
Michel CJ, Sereni JS. Reading Frame Retrieval of Genes: A New Parameter of Codon Usage Based on the Circular Code Theory. Bull Math Biol 2023; 85:24. [PMID: 36826719 PMCID: PMC9950712 DOI: 10.1007/s11538-023-01129-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 01/26/2023] [Indexed: 02/25/2023]
Abstract
Based on the circular code theory, we define a new function f that quantifies the property of reading frame retrieval (RFR) of genes from their codon usage. This RFR function f is computed on a massive scale in genes of genomes of bacteria, eukaryotes and archaea. By expressing f as a function of the mean number [Formula: see text] of codons per gene, a "universal" property is identified, whatever the kingdom: the reading frame retrieval is enhanced in large genes. By investigating this property according to the theory developed, a Spearman's rank correlation with a strong negative coefficient is observed between the codon usage dispersion d (from the uniform codon distribution [Formula: see text]) and the RFR function f, whatever the kingdom (p-values [Formula: see text] in bacteria, [Formula: see text] in eukaryotes and [Formula: see text] in archaea). Thus, the reading frame retrieval is enhanced with the codon usage dispersion. Furthermore, this approach identifies a "genome centre" from which emerge two distinct "genome arms": an upper arm and a lower arm, respectively, above and below the linear regression. The RFR function by itself or combined with classical methods (alignment, phylogeny) could also be a new approach to classify the genomes in the future.
Collapse
Affiliation(s)
- Christian J. Michel
- grid.11843.3f0000 0001 2157 9291Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Jean-Sébastien Sereni
- grid.11843.3f0000 0001 2157 9291Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| |
Collapse
|
3
|
Michel CJ, Thompson JD. Identification of a circular code periodicity in the bacterial ribosome: origin of codon periodicity in genes? RNA Biol 2020; 17:571-583. [PMID: 31960748 DOI: 10.1080/15476286.2020.1719311] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
Three-base periodicity (TBP), where nucleotides and higher order n-tuples are preferentially spaced by 3, 6, 9, etc. bases, is a well-known intrinsic property of protein-coding DNA sequences. However, its origins are still not fully understood. One hypothesis is that the periodicity reflects a primordial coding system that was used before the emergence of the modern standard genetic code (SGC). Recent evidence suggests that the X circular code, a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, represents a possible ancestor of the SGC. Motifs from the X circular code have been found in the reading frame of protein-coding regions in extant organisms from bacteria to eukaryotes, in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase centre and the decoding centre. Here, we have used a powerful correlation function to search for periodicity patterns involving the 20 trinucleotides of the X circular code in a large set of bacterial protein-coding genes, as well as in the translation machinery, including rRNA and tRNA sequences. As might be expected, we found a strong circular code periodicity 0 modulo 3 in the protein-coding genes. More surprisingly, we also identified a similar circular code periodicity in a large region of the 16S rRNA. This region includes the 3' major domain corresponding to the primordial proto-ribosome decoding centre and containing numerous sites that interact with the tRNA and messenger RNA (mRNA) during translation. Furthermore, 3D structural analysis shows that the periodicity region surrounds the mRNA channel that lies between the head and the body of the SSU. Our results support the hypothesis that the X circular code may constitute an ancestral translation code involved in reading frame retrieval and maintenance, traces of which persist in modern mRNA, tRNA and rRNA despite their long evolution and adaptation to the SGC.
Collapse
Affiliation(s)
- Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| |
Collapse
|
4
|
Fimmel E, Michel CJ, Pirot F, Sereni JS, Strüngmann L. Mixed circular codes. Math Biosci 2019; 317:108231. [PMID: 31325443 DOI: 10.1016/j.mbs.2019.108231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 07/16/2019] [Accepted: 07/17/2019] [Indexed: 12/11/2022]
Abstract
By an extensive statistical analysis in genes of bacteria, archaea, eukaryotes, plasmids and viruses, a maximal C3-self-complementary trinucleotide circular code has been found to have the highest average occurrence in the reading frame of the ribosome during translation. Circular codes may play an important role in maintaining the correct reading frame. On the other hand, as several evolutionary theories propose primeval codes based on dinucleotides, trinucleotides and tetranucleotides, mixed circular codes were investigated. By using a graph-theoretical approach of circular codes recently developed, we study mixed circular codes, which are the union of a dinucleotide circular code, a trinucleotide circular code and a tetranucleotide circular code. Maximal mixed circular codes of (di,tri)-nucleotides, (tri,tetra)-nucleotides and (di,tri,tetra)-nucleotides are constructed, respectively. In particular, we show that any maximal dinucleotide circular code of size 6 can be embedded into a maximal mixed (di,tri)-nucleotide circular code such that its trinucleotide component is a maximal C3-comma-free code. The growth function of self-complementary mixed circular codes of dinucleotides and trinucleotides is given. Self-complementary mixed circular codes could have been involved in primitive genetic processes.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, Mannheim 68163, Germany.
| | - Christian J Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, Illkirch 67400, France.
| | - François Pirot
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, Illkirch 67400, France; LORIA (Orpailleur) and Dept. of Mathematics, University of Lorraine and Radboud University, Vandœuvre-lès-Nancy, France and Nijmegen, Netherlands.
| | - Jean-Sébastien Sereni
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, Illkirch 67400, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, Mannheim 68163, Germany.
| |
Collapse
|
5
|
Fimmel E, Gumbel M, Karpuzoglu A, Petoukhov S. On comparing composition principles of long DNA sequences with those of random ones. Biosystems 2019; 180:101-108. [DOI: 10.1016/j.biosystems.2019.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 04/05/2019] [Accepted: 04/06/2019] [Indexed: 11/25/2022]
|
6
|
Diletter circular codes over finite alphabets. Math Biosci 2017; 294:120-129. [PMID: 29024747 DOI: 10.1016/j.mbs.2017.10.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 08/26/2017] [Accepted: 10/08/2017] [Indexed: 11/22/2022]
Abstract
The graph approach of circular codes recently developed (Fimmel et al., 2016) allows here a detailed study of diletter circular codes over finite alphabets. A new class of circular codes is identified, strong comma-free codes. New theorems are proved with the diletter circular codes of maximal length in relation to (i) a characterisation of their graphs as acyclic tournaments; (ii) their explicit description; and (iii) the non-existence of other maximal diletter circular codes. The maximal lengths of paths in the graphs of the comma-free and strong comma-free codes are determined. Furthermore, for the first time, diletter circular codes are enumerated over finite alphabets. Biological consequences of dinucleotide circular codes are analysed with respect to their embedding in the trinucleotide circular code X identified in genes and to the periodicity modulo 2 observed in introns. An evolutionary hypothesis of circular codes is also proposed according to their combinatorial properties.
Collapse
|
7
|
Fimmel E, Michel CJ, Strüngmann L. n-Nucleotide circular codes in graph theory. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0058. [PMID: 26857680 DOI: 10.1098/rsta.2015.0058] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/05/2015] [Indexed: 06/05/2023]
Abstract
The circular code theory proposes that genes are constituted of two trinucleotide codes: the classical genetic code with 61 trinucleotides for coding the 20 amino acids (except the three stop codons {TAA,TAG,TGA}) and a circular code based on 20 trinucleotides for retrieving, maintaining and synchronizing the reading frame. It relies on two main results: the identification of a maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses (Michel 2015 J. Theor. Biol. 380, 156-177. (doi:10.1016/j.jtbi.2015.04.009); Arquès & Michel 1996 J. Theor. Biol. 182, 45-58. (doi:10.1006/jtbi.1996.0142)) and the finding of X circular code motifs in tRNAs and rRNAs, in particular in the ribosome decoding centre (Michel 2012 Comput. Biol. Chem. 37, 24-37. (doi:10.1016/j.compbiolchem.2011.10.002); El Soufi & Michel 2014 Comput. Biol. Chem. 52, 9-17. (doi:10.1016/j.compbiolchem.2014.08.001)). The univerally conserved nucleotides A1492 and A1493 and the conserved nucleotide G530 are included in X circular code motifs. Recently, dinucleotide circular codes were also investigated (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631); Fimmel et al. 2015 J. Theor. Biol. 386, 159-165. (doi:10.1016/j.jtbi.2015.08.034)). As the genetic motifs of different lengths are ubiquitous in genes and genomes, we introduce a new approach based on graph theory to study in full generality n-nucleotide circular codes X, i.e. of length 2 (dinucleotide), 3 (trinucleotide), 4 (tetranucleotide), etc. Indeed, we prove that an n-nucleotide code X is circular if and only if the corresponding graph [Formula: see text] is acyclic. Moreover, the maximal length of a path in [Formula: see text] corresponds to the window of nucleotides in a sequence for detecting the correct reading frame. Finally, the graph theory of tournaments is applied to the study of dinucleotide circular codes. It has full equivalence between the combinatorics theory (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631)) and the group theory (Fimmel et al. 2015 J. Theor. Biol. 386, 159-165. (doi:10.1016/j.jtbi.2015.08.034)) of dinucleotide circular codes while its mathematical approach is simpler.
Collapse
Affiliation(s)
- Elena Fimmel
- Faculty for Computer Sciences, Institute of Mathematical Biology, Mannheim University of Applied Sciences, Mannheim 68163, Germany
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, Illkirch 67400, France
| | - Lutz Strüngmann
- Faculty for Computer Sciences, Institute of Mathematical Biology, Mannheim University of Applied Sciences, Mannheim 68163, Germany
| |
Collapse
|
8
|
Gonzalez D, Giannerini S, Rosa R. Circular codes revisited: A statistical approach. J Theor Biol 2011; 275:21-8. [DOI: 10.1016/j.jtbi.2011.01.028] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 01/18/2011] [Accepted: 01/19/2011] [Indexed: 11/29/2022]
|
9
|
Michel CJ. Evolution probabilities and phylogenetic distance of dinucleotides. J Theor Biol 2007; 249:271-7. [PMID: 17884102 DOI: 10.1016/j.jtbi.2007.07.032] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2007] [Revised: 07/18/2007] [Accepted: 07/20/2007] [Indexed: 11/15/2022]
Abstract
We develop here an analytical evolution model based on a dinucleotide mutation matrix 16 x 16 with six substitution parameters associated with the three types of substitutions in the two dinucleotide sites. It generalizes the previous models based on the nucleotide mutation matrices 4 x 4. It determines at some time t the exact occurrence probabilities of dinucleotides mutating randomly according to these six substitution parameters. Furthermore, several properties and two applications of this model allow to derive 16 evolutionary analytical solutions of dinucleotides and also a dinucleotide phylogenetic distance. Finally, based on this mathematical model, the SED (Stochastic Evolution of Dinucleotides) web server has been developed for deriving evolutionary analytical solutions of dinucleotides.
Collapse
Affiliation(s)
- Christian J Michel
- Equipe de Bioinformatique Théorique, LSIIT (UMR CNRS-ULP 7005), Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
10
|
Laskin AA, Kudryashov NA, Skryabin KG, Korotkov EV. Latent periodicity of serine-threonine and tyrosine protein kinases and other protein families. Comput Biol Chem 2005; 29:229-43. [PMID: 15979043 DOI: 10.1016/j.compbiolchem.2005.04.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2004] [Revised: 04/18/2005] [Accepted: 04/18/2005] [Indexed: 11/22/2022]
Abstract
We identified latent periodicity in catalytic domains of approximately 85% of annotated serine-threonine and tyrosine protein kinases. Similar results were obtained for other 22 protein families and domains. We also designed the method of noise decomposition, which is aimed to distinguish between different periodicity types of the same period length. The method is to be used in conjunction with the method of cyclic profile alignment, and this combination is able to reveal structure-related or function-related patterns of latent periodicity. Possible origins of the periodic structure of protein kinase active sites are discussed. Summarizing, we presume that latent periodicity is the common property of many catalytic protein domains.
Collapse
Affiliation(s)
- Andrew A Laskin
- Bioengineering Center of Russian Academy of Sciences, Prospect 60-tya Oktyabrya, 7/1, 117312 Moscow, Russia.
| | | | | | | |
Collapse
|
11
|
Xu R, Xiao Y. A common sequence-associated physicochemical feature for proteins of beta-trefoil family. Comput Biol Chem 2005; 29:79-82. [PMID: 15680588 DOI: 10.1016/j.compbiolchem.2004.12.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2004] [Revised: 12/10/2004] [Accepted: 12/10/2004] [Indexed: 11/26/2022]
Abstract
Different amino acid sequences can fold into similar tertiary structures but the reasons for it are not very clear. It has been suggested in the literature that these sequences may have some common features associated with them but the exact nature of such shared properties remains largely unknown. We studied a representative sample of proteins from the beta-trefoil family and observed that their amino acid sequences, despite being considerably divergent from each other, can be accounted for by matching to a repetition of three physicochemically similar segments. This observation in turn is consistent with the three-fold pseudo-symmetry in tertiary structures of these proteins.
Collapse
Affiliation(s)
- Ruizhen Xu
- Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | |
Collapse
|
12
|
Vendramini D. Noncoding DNA and the teem theory of inheritance, emotions and innate behaviour. Med Hypotheses 2005; 64:512-9. [PMID: 15617858 DOI: 10.1016/j.mehy.2004.08.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2004] [Accepted: 08/25/2004] [Indexed: 10/26/2022]
Abstract
The evolutionary function of noncoding 'junk' DNA remains one of the most challenging mysteries of genetics. Here a new model of DNA is proposed to explain this function. The hypothesis asserts the DNA molecule contains not one, but two separate modes of inheritance. In addition to exons that code for proteins and physical traits, it is argued noncoding repetitive elements code for the inheritance of emotions and innate behaviour in metazoans. That is to say, noncoding DNA functions as the medium of a second, hitherto unknown evolutionary process that genetically archives adaptive information, configured as emotions and acquired during the life of an organism, into an inheritable form. This second evolutionary process, here called 'Teemosis', is a selectionist process, but paradoxically, because it does not affect physical traits, it has no maladaptive Lamarckian consequences. The medical implications of the hypothesis are discussed.
Collapse
|
13
|
Holste D, Grosse I, Buldyrev SV, Stanley HE, Herzel H. Optimization of coding potentials using positional dependence of nucleotide frequencies. J Theor Biol 2000; 206:525-37. [PMID: 11013113 DOI: 10.1006/jtbi.2000.2144] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We study the coding potential of human DNA sequences, using the positional asymmetry function (D(p)) and the positional information function (I(q)). Both D(p)and I(q)are based on the positional dependence of single nucleotide frequencies. We investigate the accuracy of D(p)and I(q)in distinguishing coding and non-coding DNA as a function of the parameters p and q, respectively, and explore at which parameters p(opt)and q(opt)both D(p)and I(q)distinguish coding and non-coding DNA most accurately. We compare our findings with classically used parameter values and find that optimized coding potentials yield comparable accuracies as classical frame-independent coding potentials trained on prior data. We find that p(opt)and q(opt)vary only slightly with the sequence length.
Collapse
Affiliation(s)
- D Holste
- Department of Theoretical Biophysics, Humboldt University Berlin, Invalidenstr. 42, D-10115, Berlin, Germany
| | | | | | | | | |
Collapse
|
14
|
Abstract
Recognition of function of newly sequenced DNA fragments is an important area of computational molecular biology. Here we present an extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area.
Collapse
Affiliation(s)
- M S Gelfand
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow region, Russia
| |
Collapse
|
15
|
Abstract
A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures--functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of 'typical' exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures. Different measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.
Collapse
Affiliation(s)
- J W Fickett
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, NM 87545
| | | |
Collapse
|
16
|
Michel CJ. A study of the purine/pyrimidine codon occurrence with a reduced centered variable and an evaluation compared to the frequency statistic. Math Biosci 1989; 97:161-77. [PMID: 2520209 DOI: 10.1016/0025-5564(89)90003-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
With the three-letter alphabet [R,Y,N] (R = purine, Y = pyrimidine, N = R or Y), there are 26 codons (NNN being excluded): RNN,...,NNY (six codons at two unspecified bases N), RRN,...,NYY (12 codons at one unspecified base N), RRR,...,YYY (eight specified codons). A statistical methodology that uses the codon frequency and a reduced centered variable leads to similar results for a codon occurrence study, regardless of gene function and regardless of a particular protein coding gene taxonomic population. Therefore, this variable can be considered a new codon usage index, whose use removes certain nonsignificant results found with the frequency statistic. This methodology identifies the common and rare codons (i.e., the codons having the highest and lowest occurrence) and leads to a model of codon evolution at three successive states: RNN, then RNY, and finally RYY. Some biological relations between this model and the YRY(N)6YRY preferential occurrence are also presented.
Collapse
|
17
|
Sibbald PR. Patterns of base usage, nearest neighbour analysis and identification of genes in two completely sequenced chloroplast genomes. Curr Genet 1988. [DOI: 10.1007/bf02427759] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Louis BG, Ganoza MC. Signals determining translational start-site recognition in eukaryotes and their role in prediction of genetic reading frames. Mol Biol Rep 1988; 13:103-15. [PMID: 3221841 DOI: 10.1007/bf00539058] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A special methionyl-tRNA (RNAi) is universally required to initiate translation. The conversation of this reactant throughout evolution, as well as its unusual decoding properties, suggested an alternate mechanism for tRNA-mRNA interactions at initiation. We have reported that the sequence of bases neighboring the start codons of many eubacterial genes are complementary not only to the 16S rRNA 3' end and to the anticodon of tRNAi, but, also, have the potential to base-pair the D, T or extended anticodon loops of this tRNAi. The coding properties of tRNAi and mutations that affect translation suggest that these signals may function. This hypothesis explains the observation that unusual triplets can start prokaryotic and mitochondrial genes and predicts the occurrence of other reading frames. Furthermore, it suggests a unifying model of chain initiation based on RNA-RNA contacts and displacements. Here we examine the start domain of 290 eukaryotic genes for their ability to base-pair the tRNAi loops and the 18S rRNA. We observe that both methionine start, and methionine coding regions have the potential to pair with the 18S rRNA, but that the nucleotide distribution about start codons strongly favoured such pairings over that near internal AUGs. The 5' extended anticodon of tRNAi is methylated, and was not represented in the mRNA with high frequency. However, the tetramer AUGg did occur with high frequency in the start domain. A modification of the tRNAi T loop also decreases its base-pairing potential. Interestingly, complementarity to the T loop did not occur with high frequency in the start sites. The early coding region, 10 to 34 nucleotides 3' to the initiator AUG, is complementary to the tRNAi D loop in many cases, while no such affinity is found near internal AUGs. The nucleotides around initiator AUGs were heavily biassed toward the sequence gccaccAUGgcg. No such tendency was noted around internal AUGs. Although the role of this sequence bias is unclear, the sequence gccaccAUGg has been shown by Kozak to promote initiation. Another distinguishing feature was a C-rich tract 7 to 34 nucleotides 5' to the initiator AUGs. Ability to pair with more than eight bases of the start consensus sequence, matching of 6 or 7 nucleotides to the D loop on the 3' side, an C-richness on the 5' side were used as criteria for distinguishing start AUGs.(ABSTRACT TRUNCATED AT 400 WORDS)
Collapse
Affiliation(s)
- B G Louis
- Banting and Best Department of Medical Research, University of Toronto, Ontario, Canada
| | | |
Collapse
|
19
|
Abstract
The sequence information for the splicing process of introns is found in the consensus sequences at the two splice sites. For long introns, of 300 or more nucleotides, the middle regions may provide additional specificity for splicing which can be investigated by defining an adequate quantitative parameter. This methodology permits to retrieve the coding periodicity in the viral and mitochondrial introns and to identify with a statistical significance, a surprising alternating purine-pyrimidine base sequence -i.e. a modulo 2 periodicity- in the eukaryotic introns, and particularly in the vertebrate introns. This alternating structure suggests that the vertebrate introns do not have the genetic information to code for proteins, they carry structural and regulatory functions.
Collapse
Affiliation(s)
- D G Arquès
- Friedrich Miescher Institut, Bioinformatic group, Basel, Switzerland
| | | |
Collapse
|
20
|
|