1
|
Fimmel E, Strüngmann L. The spiderweb of error-detecting codes in the genetic information. Biosystems 2023; 233:105009. [PMID: 37640191 DOI: 10.1016/j.biosystems.2023.105009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/21/2023] [Accepted: 08/21/2023] [Indexed: 08/31/2023]
Abstract
Nature possesses inherent mechanisms for error detection and correction during the translation of genetic information, as demonstrated by the discovery of a self-complementary circular C3-code called X0 in various organisms such as bacteria, eukaryotes, plasmids, and viruses (Arquès and Michel, 1996; Michel, 2015, 2017). Since then, extensive research has focused on circular codes, which are believed to be remnants of ancient comma-free codes. These codes can be regarded as an additional genetic code specifically optimized for detecting and preserving the proper reading frame in protein-coding sequences. A study by Fimmel et al. in 2014 identified that a total of 216 maximal self-complementary C3-codes can be grouped into 27 equivalence classes with eight codes in each class. In this work, we study how the 27 equivalence classes are related to each other. While the codes in each equivalence class obtained by Fimmel et al. in 2014 are permutations of each other, i.e. one code can be obtained from the other by applying a permutation of the bases, it has not been clear how the equvalence classes are connected. We show that there is an ordering of the equivalence classes such that one gets from one class to the next one by substituting only one pair of codon/anticodon in the corresponding codes, i.e. the corresponding codes have a maximal intersection of 18 codons. To perform this analysis, we define two graphs, G216 and G27, whose vertices are, respectively, all 216 maximal self-complementary C3-codes and 27 equivalence classes. Several properties of the graphs are obtained. Most surprisingly, it turns out that G27 contains Hamiltonian paths of length 27. This fact ultimately leads to a representation of the set of all 216 maximal self-complementary C3-codes as a kind of spider web. Finally, we define dinucleotide cuts of such codes by projecting each codon to its first two bases and show that the paths of lengths 27 in G216 can even be chosen so that all the codes contain a special subset of dinucleotides defined by Rumer's roots. These observations raise a lot of new questions about the biological function of such structures.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
2
|
Borah C, Ali T. Genetic code noise immunity features: Degeneracy and frameshift correction. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
3
|
Property based analysis: Optimality of RNY comma-free code versus circular code (X) after frameshift errors. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
4
|
Wang X, Dong Q, Chen G, Zhang J, Liu Y, Cai Y. Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance. BMC Genomics 2022; 23:416. [PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.
Collapse
|
5
|
Štambuk N, Konjevoda P, Pavan J. Antisense Peptide Technology for Diagnostic Tests and Bioengineering Research. Int J Mol Sci 2021; 22:9106. [PMID: 34502016 PMCID: PMC8431130 DOI: 10.3390/ijms22179106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 08/10/2021] [Accepted: 08/13/2021] [Indexed: 01/01/2023] Open
Abstract
Antisense peptide technology (APT) is based on a useful heuristic algorithm for rational peptide design. It was deduced from empirical observations that peptides consisting of complementary (sense and antisense) amino acids interact with higher probability and affinity than the randomly selected ones. This phenomenon is closely related to the structure of the standard genetic code table, and at the same time, is unrelated to the direction of its codon sequence translation. The concept of complementary peptide interaction is discussed, and its possible applications to diagnostic tests and bioengineering research are summarized. Problems and difficulties that may arise using APT are discussed, and possible solutions are proposed. The methodology was tested on the example of SARS-CoV-2. It is shown that the CABS-dock server accurately predicts the binding of antisense peptides to the SARS-CoV-2 receptor binding domain without requiring predefinition of the binding site. It is concluded that the benefits of APT outweigh the costs of random peptide screening and could lead to considerable savings in time and resources, especially if combined with other computational and immunochemical methods.
Collapse
Affiliation(s)
- Nikola Štambuk
- Center for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia
| | - Josip Pavan
- Department of Ophthalmology, University Hospital Dubrava, Avenija Gojka Šuška 6, HR-10000 Zagreb, Croatia
| |
Collapse
|
6
|
Dujon B. On the origin of the genetic code: a 27-codon hypothetical precursor of an intricate 64-codon intermediate shaped the modern code. C R Biol 2021; 343:15-52. [PMID: 33988323 DOI: 10.5802/crbiol.47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 03/03/2021] [Indexed: 11/24/2022]
Abstract
The modern genetic code reveals numerous traces of specific relationships between the early codons which, together with its internal asymmetries, suggest a sequential appearance of the nucleobases in primitive RNA molecules. Keeping the hypothesis of triplet pairings between primitive RNA molecules at the origin of the code, this work systematically examines complete codon-anticodon interaction matrices assuming distinct pairing options at each position of the triplet duplexes. Application of these principles suggests that a 27-codon precursor having a reasonable coding capacity for short peptide synthesis could have started with primitive RNA molecules able to form two distinct pairs with different free energies between a single purine and two pyrimidines (such as G with C and U). Conservation of the same pairing options at positions 1 and 2 of codons at the arrival of a second purine with distinct pairing preferences (such as A) generated a 64-codon intermediate code made of interrelated pairs or groups of codons (designated here as intricacy). The numerous traces of this hypothetical scheme that are visible in the standard and variant forms of the modern code demonstrate without ambiguity that the ancestral codon-anticodon duplexes required high energetic pairings at their central position (Watson-Crick) but tolerated less energetic pairings at the first codon position (G • U type). Combined with the sequential appearance of the nucleobases, the predicted codon intricacy allows a stepwise reconstruction of the evolution of the coding repertoire, by simple a posteriori comparison to the modern code. This reconstruction reveals a remarkable internal coherence in terms of amino acids and tRNA synthetases recruitment. The code started with a group of amino acids (Ala, Gly, Pro, Ser and Thr) that are now all activated by class II tRNA synthetases before reaching an intermediate period during which up to 14 distinct amino acids could be encoded by a full set of intricated codons. The perfect coincidence between the last 6 amino acids predicted in this reconstruction and the speculated action of the arrival of free atmospheric oxygen on proteins is spectacular, and suggests that the code has only reached its present form after the great oxidation event.
Collapse
Affiliation(s)
- Bernard Dujon
- Institut Pasteur, Dept. Genomes and Genetics, CNRS (UMR3525) and Sorbonne Université (UFR927), Paris, France
| |
Collapse
|
7
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
8
|
Nesterov-Mueller A, Popov R, Seligmann H. Combinatorial Fusion Rules to Describe Codon Assignment in the Standard Genetic Code. Life (Basel) 2020; 11:life11010004. [PMID: 33374866 PMCID: PMC7824455 DOI: 10.3390/life11010004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/15/2020] [Accepted: 12/21/2020] [Indexed: 11/16/2022] Open
Abstract
We propose combinatorial fusion rules that describe the codon assignment in the standard genetic code simply and uniformly for all canonical amino acids. These rules become obvious if the origin of the standard genetic code is considered as a result of a fusion of four protocodes: Two dominant AU and GC protocodes and two recessive AU and GC protocodes. The biochemical meaning of the fusion rules consists of retaining the complementarity between cognate codons of the small hydrophobic amino acids and large charged or polar amino acids within the protocodes. The proto tRNAs were assembled in form of two kissing hairpins with 9-base and 10-base loops in the case of dominant protocodes and two 9-base loops in the case of recessive protocodes. The fusion rules reveal the connection between the stop codons, the non-canonical amino acids, pyrrolysine and selenocysteine, and deviations in the translation of mitochondria. Using fusion rules, we predicted the existence of additional amino acids that are essential for the development of the standard genetic code. The validity of the proposed partition of the genetic code into dominant and recessive protocodes is considered referring to state-of-the-art hypotheses. The formation of two aminoacyl-tRNA synthetase classes is compatible with four-protocode partition.
Collapse
Affiliation(s)
- Alexander Nesterov-Mueller
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
- Correspondence:
| | - Roman Popov
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
| | - Hervé Seligmann
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
- Laboratory AGEIS EA 7407, Team Tools for e-GnosisMedical & LabcomCNRS/UGA/OrangeLabs Telecoms4Health, Faculty of Medicine, Université Grenoble Alpes, F-38700 La Tronche, France
| |
Collapse
|
9
|
Demongeot J, Moreira A, Seligmann H. Negative CG dinucleotide bias: An explanation based on feedback loops between Arginine codon assignments and theoretical minimal RNA rings. Bioessays 2020; 43:e2000071. [PMID: 33319381 DOI: 10.1002/bies.202000071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 01/05/2023]
Abstract
Theoretical minimal RNA rings are candidate primordial genes evolved for non-redundant coding of the genetic code's 22 coding signals (one codon per biogenic amino acid, a start and a stop codon) over the shortest possible length: 29520 22-nucleotide-long RNA rings solve this min-max constraint. Numerous RNA ring properties are reminiscent of natural genes. Here we present analyses showing that all RNA rings lack dinucleotide CG (a mutable, chemically instable dinucleotide coding for Arginine), bearing a resemblance to known CG-depleted genomes. CG in "incomplete" RNA rings (not coding for all coding signals, with only 3-12 nucleotides) gradually decreases towards CG absence in complete, 22-nucleotide-long RNA rings. Presumably, feedback loops during RNA ring growth during evolution (when amino acid assignment fixed the genetic code) assigned Arg to codons lacking CG (AGR) to avoid CG. Hence, as a chemical property of base pairs, CG mutability restructured the genetic code, thereby establishing itself as genetically encoded biological information.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France
| | - Andrés Moreira
- Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| |
Collapse
|