1
|
Feistel R. Self-Organisation of Prediction Models. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1596. [PMID: 38136476 PMCID: PMC10743227 DOI: 10.3390/e25121596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/21/2023] [Accepted: 11/24/2023] [Indexed: 12/24/2023]
Abstract
Living organisms are active open systems far from thermodynamic equilibrium. The ability to behave actively corresponds to dynamical metastability: minor but supercritical internal or external effects may trigger major substantial actions such as gross mechanical motion, dissipating internally accumulated energy reserves. Gaining a selective advantage from the beneficial use of activity requires a consistent combination of sensual perception, memorised experience, statistical or causal prediction models, and the resulting favourable decisions on actions. This information processing chain originated from mere physical interaction processes prior to life, here denoted as structural information exchange. From there, the self-organised transition to symbolic information processing marks the beginning of life, evolving through the novel purposivity of trial-and-error feedback and the accumulation of symbolic information. The emergence of symbols and prediction models can be described as a ritualisation transition, a symmetry-breaking kinetic phase transition of the second kind previously known from behavioural biology. The related new symmetry is the neutrally stable arbitrariness, conventionality, or code invariance of symbols with respect to their meaning. The meaning of such symbols is given by the structural effect they ultimately unleash, directly or indirectly, by deciding on which actions to take. The early genetic code represents the first symbols. The genetically inherited symbolic information is the first prediction model for activities sufficient for survival under the condition of environmental continuity, sometimes understood as the "final causality" property of the model.
Collapse
Affiliation(s)
- Rainer Feistel
- Leibniz Institute for Baltic Sea Research (IOW), 18119 Rostock, Germany
| |
Collapse
|
2
|
Štambuk N, Konjevoda P, Štambuk A. How ambiguity codes specify molecular descriptors and information flow in Code Biology. Biosystems 2023; 233:105034. [PMID: 37739308 DOI: 10.1016/j.biosystems.2023.105034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023]
Abstract
The article presents IUPAC ambiguity codes for incomplete nucleic acid specification, and their use in Code Biology. It is shown how to use this nomenclature in order to extract accurate information on different properties of the biological systems. We investigated the use of ambiguity codes, as mathematical and logical operators and truth table elements, for the encoding of amino acids by means of the Standard Genetic Code. It is explained how to use ambiguity codes and truth functions in order to obtain accurate information on different properties of the biological systems. Nucleotide ambiguity codes could be applied to: 1. encoding descriptive information of nucleotides, amino acids and proteins (e.g., of polarity, relative solvent accessibility, atom depth, etc.), and 2. system modelling ranging from standard bioinformatics tools to classic evolutionary models (i.e. from Miyazawa-Jernigan statistical potential to Kimura three-substitution-type model, respectively). It is shown that the algorithms based on IUPAC ambiguity codes, Boolean functions and truth table, Probabilistic Square of Opposition/Semiotic Square and Klein 4-groups-could be used for the bioinformatics analyses and Relational data modelling in natural science. Underlying mathematical, logical and semiotic concepts of interest are presented and addressed.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
3
|
Konjevoda P, Štambuk N. Relational model of the standard genetic code. Biosystems 2021; 210:104529. [PMID: 34464669 DOI: 10.1016/j.biosystems.2021.104529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 11/28/2022]
Abstract
The genetic code is a set of rules that establishes mapping between triplets in messenger RNA and amino acids in proteins. The most common way to display these rules is the Standard Genetic Code (SGC) table. This paper takes an alternative approach, based on the relational data model by Edgar F. Codd (Commun. ACM, 13:377-387, 1970). The relational model (RM) proposes a distributed storage of data into a collection of tables (called relations), that can be connected by shared communality. Basic elements of the table are rows (called records or tuples), and columns (called fields or attributes). The SGC table, according to the relational data model, represents the so called unnormalized form of a table. Using normalization rules it is possible to subdivide the SGC table into four tables. The rows and columns of single tables are defined by the first and second base and individual tables by the third codon base. The result of this model is an approach to managing genetic code data, represented in terms of tuples and grouped into relations, with table structure and language consistent with first-order (predicate) logic. The RM explains that the final step in the development of the SGC was the adoption of coding function by the third base, which makes an informational/functional unit with the first base, despite the different physical location in a triplet. This enabled the synthesis of specific proteins without ambiguity, in accordance with the concept of ambiguity reduction and five phases of the general model on the origin of biological codes by Marcello Barbieri (BioSystems 181:11-19, 2019).
Collapse
Affiliation(s)
- Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia.
| | - Nikola Štambuk
- Center for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia.
| |
Collapse
|
4
|
Muthugobal BKN, Ramesh G, Parthasarathy S, Suvaithenamudhan S, Muthuvel Prasath K. Gray code representation of the universal genetic code: Generation of never born protein sequences using Toeplitz matrix approach. Biosystems 2020; 198:104280. [PMID: 33161051 DOI: 10.1016/j.biosystems.2020.104280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 10/23/2020] [Accepted: 10/23/2020] [Indexed: 01/21/2023]
Abstract
In this paper, we identify all possible Gray Code and Partitioned Gray Code representations of the Universal Genetic Code for n = 2-bit and 3-bit binary numbers. We analyse the Hamming Distance matrices of all these Gray code and Partitioned Gray Code possibilities for which we obtain the Toeplitz and Partitioned Toeplitz Matrices, respectively. We use this Gray Code and Partitioned Gray Code representations of the Universal Genetic Code combined with the novel Toeplitz matrix approach to generate many Never Born Protein (NBP) Sequences, which exhibit intrinsic structural stability. In general, Never Born Protein sequences may have many potential applications in synthetic biology and opens a new vista in understanding this new subset of proteins for better applications in drug discovery, synthesis of fine chemicals, etc.
Collapse
Affiliation(s)
| | - Ganapathy Ramesh
- Ramanujan Research Centre, Department of Mathematics, Government Arts College (Autonomous), Kumbakonam, 612 001, Tamil Nadu, India
| | - Subbiah Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620 024, Tamil Nadu, India.
| | - Suvaiyarasan Suvaithenamudhan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620 024, Tamil Nadu, India
| | - Karuppasamy Muthuvel Prasath
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620 024, Tamil Nadu, India
| |
Collapse
|
5
|
Determining amino acid scores of the genetic code table: Complementarity, structure, function and evolution. Biosystems 2020; 187:104026. [DOI: 10.1016/j.biosystems.2019.104026] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 08/28/2019] [Indexed: 11/22/2022]
|
6
|
Grosjean H, Westhof E. An integrated, structure- and energy-based view of the genetic code. Nucleic Acids Res 2016; 44:8020-40. [PMID: 27448410 PMCID: PMC5041475 DOI: 10.1093/nar/gkw608] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Revised: 06/11/2016] [Accepted: 06/17/2016] [Indexed: 12/25/2022] Open
Abstract
The principles of mRNA decoding are conserved among all extant life forms. We present an integrative view of all the interaction networks between mRNA, tRNA and rRNA: the intrinsic stability of codon-anticodon duplex, the conformation of the anticodon hairpin, the presence of modified nucleotides, the occurrence of non-Watson-Crick pairs in the codon-anticodon helix and the interactions with bases of rRNA at the A-site decoding site. We derive a more information-rich, alternative representation of the genetic code, that is circular with an unsymmetrical distribution of codons leading to a clear segregation between GC-rich 4-codon boxes and AU-rich 2:2-codon and 3:1-codon boxes. All tRNA sequence variations can be visualized, within an internal structural and energy framework, for each organism, and each anticodon of the sense codons. The multiplicity and complexity of nucleotide modifications at positions 34 and 37 of the anticodon loop segregate meaningfully, and correlate well with the necessity to stabilize AU-rich codon-anticodon pairs and to avoid miscoding in split codon boxes. The evolution and expansion of the genetic code is viewed as being originally based on GC content with progressive introduction of A/U together with tRNA modifications. The representation we present should help the engineering of the genetic code to include non-natural amino acids.
Collapse
Affiliation(s)
- Henri Grosjean
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, 91198 Gif-sur-Yvette, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 rue René Descartes, 67084 Strasbourg, France
| |
Collapse
|
7
|
The Graph, Geometry and Symmetries of the Genetic Code with Hamming Metric. Symmetry (Basel) 2015. [DOI: 10.3390/sym7031211] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
8
|
Rosandić M, Paar V. Codon sextets with leading role of serine create "ideal" symmetry classification scheme of the genetic code. Gene 2014; 543:45-52. [PMID: 24709107 DOI: 10.1016/j.gene.2014.04.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 04/03/2014] [Indexed: 11/17/2022]
Abstract
The standard classification scheme of the genetic code is organized for alphabetic ordering of nucleotides. Here we introduce the new, "ideal" classification scheme in compact form, for the first time generated by codon sextets encoding Ser, Arg and Leu amino acids. The new scheme creates the known purine/pyrimidine, codon-anticodon, and amino/keto type symmetries and a novel A+U rich/C+G rich symmetry. This scheme is built from "leading" and "nonleading" groups of 32 codons each. In the ensuing 4 × 16 scheme, based on trinucleotide quadruplets, Ser has a central role as initial generator. Six codons encoding Ser and six encoding Arg extend continuously along a linear array in the "leading" group, and together with four of six Leu codons uniquely define construction of the "leading" group. The remaining two Leu codons enable construction of the "nonleading" group. The "ideal" genetic code suggests the evolution of genetic code with serine as an initiator.
Collapse
Affiliation(s)
- Marija Rosandić
- Croatian Academy of Sciences and Arts, Zrinski trg 11, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, Zrinski trg 11, 10000 Zagreb, Croatia; Faculty of Science, University of Zagreb, Bijenička 32, 10000 Zagreb, Croatia.
| |
Collapse
|
9
|
Lenstra R. Evolution of the genetic code through progressive symmetry breaking. J Theor Biol 2014; 347:95-108. [DOI: 10.1016/j.jtbi.2014.01.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Revised: 12/18/2013] [Accepted: 01/01/2014] [Indexed: 01/18/2023]
|
10
|
Abstract
In this article, the pattern learned from the classic or conventional rotating circular genetic code is transferred to a 64-grid model. In this non-static representation, the codons for the same amino acid within each quadrant could be exchanged, wobbling or rotating in a quantic way similar to the electrons within an atomic orbit. Represented in this 64-grid format are the three rules of variation encompassing 4, 2, or 1 quadrant, respectively: 1) same position in four quadrants for the essential hydrophobic amino acids that have U at the center, 2) same or contiguous position for the same or related amino acids in two quadrants, and 3) equivalent amino acids within one quadrant. Also represented is the mathematical balance of the odd and even codons, and the most used codons per amino acid in humans compared to one diametrically opposed organism: the plant Arabidopsis thaliana, a comparison that depicts the difference in third nucleotide preferences: a C/U exchange for 11 amino acids, a G/A exchange for 2 amino acids, and G/U or C/A exchanges for one amino acid, respectively; by studying these codon usage preferences per amino acid we present our two hypotheses: 1) A slower translation in vertebrates and 2) a faster translation in invertebrates, possibly due to the aqueous environments where they live. These codon usage preferences may also be able to determine genomic compatibility by comparing individual mRNAs and their functional third dimensional structure, transport and translation within cells and organisms. These observations are aimed to the design of bioinformatics computational tools to compare human genomes and to determine the exchange between compatible codons and amino acids, to preserve and/or to bring back extinct biodiversity, and for the early detection of incompatible changes that lead to genetic diseases.
Collapse
Affiliation(s)
- Fernando Castro-Chavez
- Department of Medicine, Atherosclerosis and Vascular Medicine Section, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
11
|
Zhang Z, Yu J. On the organizational dynamics of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:21-9. [PMID: 21641559 PMCID: PMC5054158 DOI: 10.1016/s1672-0229(11)60004-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Accepted: 10/26/2010] [Indexed: 11/23/2022]
Abstract
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides—adenine, thymine, guanine and cytosine—according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | |
Collapse
|
12
|
Seligmann H. Undetected antisense tRNAs in mitochondrial genomes? Biol Direct 2010; 5:39. [PMID: 20553583 PMCID: PMC2907346 DOI: 10.1186/1745-6150-5-39] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2010] [Accepted: 06/16/2010] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The hypothesis that both mitochondrial (mt) complementary DNA strands of tRNA genes code for tRNAs (sense-antisense coding) is explored. This could explain why mt tRNA mutations are 6.5 times more frequently pathogenic than in other mt sequences. Antisense tRNA expression is plausible because tRNA punctuation signals mt sense RNA maturation: both sense and antisense tRNAs form secondary structures potentially signalling processing. Sense RNA maturation processes by default 11 antisense tRNAs neighbouring sense genes. If antisense tRNAs are expressed, processed antisense tRNAs should have adapted more for translational activity than unprocessed ones. Four tRNA properties are examined: antisense tRNA 5' and 3' end processing by sense RNA maturation and its accuracy, cloverleaf stability and misacylation potential. RESULTS Processed antisense tRNAs align better with standard tRNA sequences with the same cognate than unprocessed antisense tRNAs, suggesting less misacylations. Misacylation increases with cloverleaf fragility and processing inaccuracy. Cloverleaf fragility, misacylation and processing accuracy of antisense tRNAs decrease with genome-wide usage of their predicted cognate amino acid. CONCLUSIONS These properties correlate as if they adaptively coevolved for translational activity by some antisense tRNAs, and to avoid such activity by other antisense tRNAs. Analyses also suggest previously unsuspected particularities of aminoacylation specificity in mt tRNAs: combinations of competition between tRNAs on tRNA synthetases with competition between tRNA synthetases on tRNAs determine specificities of tRNA amino acylations. The latter analyses show that alignment methods used to detect tRNA cognates yield relatively robust results, even when they apparently fail to detect the tRNA's cognate amino acid and indicate high misacylation potential.
Collapse
Affiliation(s)
- Hervé Seligmann
- Department of Biology, University of Oslo, Center for Ecological and Evolutionary Synthesis, Blindern, 3016 Oslo, Norway.
| |
Collapse
|
13
|
Castro-Chavez F. The rules of variation: amino acid exchange according to the rotating circular genetic code. J Theor Biol 2010; 264:711-21. [PMID: 20371250 PMCID: PMC3130497 DOI: 10.1016/j.jtbi.2010.03.046] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Revised: 03/06/2010] [Accepted: 03/30/2010] [Indexed: 12/11/2022]
Abstract
General guidelines for the molecular basis of functional variation are presented while focused on the rotating circular genetic code and allowable exchanges that make it resistant to genetic diseases under normal conditions. The rules of variation, bioinformatics aids for preventative medicine, are: (1) same position in the four quadrants for hydrophobic codons, (2) same or contiguous position in two quadrants for synonymous or related codons, and (3) same quadrant for equivalent codons. To preserve protein function, amino acid exchange according to the first rule takes into account the positional homology of essential hydrophobic amino acids with every codon with a central uracil in the four quadrants, the second rule includes codons for identical, acidic, or their amidic amino acids present in two quadrants, and the third rule, the smaller, aromatic, stop codons, and basic amino acids, each in proximity within a 90 degree angle. I also define codifying genes and palindromati, CTCGTGCCGAATTCGGCACGAG.
Collapse
|