1
|
Di Giulio M. Theories of the origin of the genetic code: Strong corroboration for the coevolution theory. Biosystems 2024; 239:105217. [PMID: 38663520 DOI: 10.1016/j.biosystems.2024.105217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 04/29/2024]
Abstract
I analyzed all the theories and models of the origin of the genetic code, and over the years, I have considered the main suggestions that could explain this origin. The conclusion of this analysis is that the coevolution theory of the origin of the genetic code is the theory that best captures the majority of observations concerning the organization of the genetic code. In other words, the biosynthetic relationships between amino acids would have heavily influenced the origin of the organization of the genetic code, as supported by the coevolution theory. Instead, the presence in the genetic code of physicochemical properties of amino acids, which have also been linked to the physicochemical properties of anticodons or codons or bases by stereochemical and physicochemical theories, would simply be the result of natural selection. More explicitly, I maintain that these correlations between codons, anticodons or bases and amino acids are in fact the result not of a real correlation between amino acids and codons, for example, but are only the effect of the intervention of natural selection. Specifically, in the genetic code table we expect, for example, that the most similar codons - that is, those that differ by only one base - will have more similar physicochemical properties. Therefore, the 64 codons of the genetic code table ordered in a certain way would also represent an ordering of some of their physicochemical properties. Now, a study aimed at clarifying which physicochemical property of amino acids has influenced the allocation of amino acids in the genetic code has established that the partition energy of amino acids has played a role decisive in this. Indeed, under some conditions, the genetic code was found to be approximately 98% optimized on its columns. In this same work, it was shown that this was most likely the result of the action of natural selection. If natural selection had truly allocated the amino acids in the genetic code in such a way that similar amino acids also have similar codons - this, not through a mechanism of physicochemical interaction between, for example, codons and amino acids - then it might turn out that even different physicochemical properties of codons (or anticodons or bases) show some correlation with the physicochemical properties of amino acids, simply because the partition energy of amino acids is correlated with other physicochemical properties of amino acids. It is very likely that this would inevitably lead to a correlation between codons (or anticodons or bases) and amino acids. In other words, since the codons (anticodons or bases) are ordered in the genetic code, that is to say, some of their physicochemical properties should also be ordered by a similar order, and given that the amino acids would also appear to have been ordered in the genetic code by selection natural, then it should inevitably turn out that there is a correlation between, for example, the hydrophobicity of anticodons and that of amino acids. Instead, the intervention of natural selection in organizing the genetic code would appear to be highly compatible with the main mechanism of structuring the genetic code as supported by the coevolution theory. This would make the coevolution theory the only plausible explanation for the origin of the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
2
|
Yarus M. Ordering events in a developing genetic code. RNA Biol 2024; 21:1-8. [PMID: 38169326 PMCID: PMC10766418 DOI: 10.1080/15476286.2023.2299615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
Preexisting partial genetic codes can fuse to evolve towards the complete Standard Genetic Code (SGC). Such code fusion provides a path of 'least selection', readily generating precursor codes that resemble the SGC. Consequently, such least selections produce the SGC via minimal, thus rapid, change. Optimal code evolution therefore requires delayed wobble. Early wobble encoding slows code evolution, very specifically diminishing the most likely SGC precursors: near-complete, accurate codes which are the products of code fusions. In contrast: given delayed wobble, the SGC can emerge from a truncation selection/evolutionary radiation based on proficient fused coding.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA
| |
Collapse
|
3
|
Yarus M. A crescendo of competent coding (c3) contains the Standard Genetic Code. RNA (NEW YORK, N.Y.) 2022; 28:1337-1347. [PMID: 35868841 PMCID: PMC9479743 DOI: 10.1261/rna.079275.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Abstract
The Standard Genetic Code (SGC) can arise by fusion of partial codes evolved in different individuals, perhaps for differing prior tasks. Such code fragments can be unified into an SGC after later evolution of accurate third-position Crick wobble. Late wobble advent fills in the coding table, leaving only later development of translational initiation and termination to reach the SGC in separated domains of life. This code fusion mechanism is computationally implemented here. Late Crick wobble after C3 fusion (c3-lCw) is tested for its ability to evolve the SGC. Compared with previously studied isolated coding tables, or with increasing numbers of parallel, but nonfusing codes, c3-lCw reaches the SGC sooner, is successful in a smaller population, and presents accurate and complete codes more frequently. Notably, a long crescendo of SGC-like codes is exposed for selection of superior translation. c3-lCw also effectively suppresses varied disordered assignments, thus converging on a unified code. Such merged codes closely approach the SGC, making its selection plausible. For example: Under routine conditions, ≈1 of 22 c3-lCw environments evolves codes with ≥20 assignments and ≤3 differences from the SGC, notably including codes identical to the Standard Genetic Code.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado 80309-0347, USA
| |
Collapse
|
4
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
5
|
The phylogenetic distribution of the glutaminyl-tRNA synthetase and Glu-tRNA Gln amidotransferase in the fundamental lineages would imply that the ancestor of archaea, that of eukaryotes and LUCA were progenotes. Biosystems 2020; 196:104174. [PMID: 32535177 DOI: 10.1016/j.biosystems.2020.104174] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/25/2020] [Accepted: 05/25/2020] [Indexed: 12/21/2022]
Abstract
The function of the glutaminyl-tRNA synthetase and Glu-tRNAGln amidotransferase might be related to the origin of the genetic code because, for example, glutaminyl-tRNA synthetase catalyses the fundamental reaction that makes the genetic code. If the evolutionary stage of the origin of these two enzymes could be unambiguously identified, then the genetic code should still have been originating at that particular evolutionary stage because the fundamental reaction that makes the code itself was still evidently evolving. This would result in that particular evolutionary moment being attributed to the evolutionary stage of the progenote because it would have a relationship between the genotype and the phenotype not yet fully realized because the genetic code was precisely still originating. I then analyzed the distribution of the glutaminyl-tRNA synthetase and Glu-tRNAGln aminodotrasferase in the main phyletic lineages. Since in some cases the origin of these two enzymes can be related to the evolutionary stages of ancestors of archaea and eukaryotes, this would indicate these ancestors as progenotes because at that evolutionary moment the genetic code was evidently still evolving, thus realizing the definition of progenote. The conclusion that the ancestor of archaea and that of eukaryotes were progenotes would imply that even the last universal common ancestor (LUCA) was a progenote because it appeared, on the tree of life, temporally before these ancestors.
Collapse
|
6
|
Barbieri M. Evolution of the genetic code: The ambiguity-reduction theory. Biosystems 2019; 185:104024. [DOI: 10.1016/j.biosystems.2019.104024] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/26/2019] [Accepted: 08/26/2019] [Indexed: 10/26/2022]
|
7
|
Collins-Hed AI, Ardell DH. Match fitness landscapes for macromolecular interaction networks: Selection for translational accuracy and rate can displace tRNA-binding interfaces of non-cognate aminoacyl-tRNA synthetases. Theor Popul Biol 2019; 129:68-80. [PMID: 31042487 DOI: 10.1016/j.tpb.2019.03.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 01/26/2019] [Accepted: 03/13/2019] [Indexed: 12/21/2022]
Abstract
Advances in structural biology of aminoacyl-tRNA synthetases (aaRSs) have revealed incredible diversity in how aaRSs bind their tRNA substrates. The causes of this diversity remain mysterious. We developed a new class of highly rugged fitness landscape models called match landscapes, through which genes encode the assortative interactions of their gene products through the complementarity and identifiability of their structural features. We used results from coding theory to prove bounds and equalities on fitness in match landscapes assuming additive interaction energies, macroscopic aminoacylation kinetics including proofreading, site-specific modifiers of interaction, and selection for translational accuracy in multiple, perfectly encoded site-types. Using genotypes based on extended Hamming codes we show that over a wide array of interface sizes and numbers of encoded cognate pairs, selection for translational accuracy alone is insufficient to displace the tRNA-binding interfaces of aaRSs. Yet, under combined selection for translational accuracy and rate, site-specific modifiers are selected to adaptively displace the tRNA-binding interfaces of non-cognate aaRS-tRNA pairs. We describe a remarkable correspondence between the lengths of perfect RNA (quaternary) codes and the modal sizes of small non-coding RNA families.
Collapse
Affiliation(s)
- Andrea I Collins-Hed
- Quantitative and Systems Biology Program, University of California, Merced, CA, 95306, United States
| | - David H Ardell
- Quantitative and Systems Biology Program, University of California, Merced, CA, 95306, United States; Molecular and Cell Biology Department, School of Natural Sciences, University of California, Merced, CA, 95306, United States.
| |
Collapse
|
8
|
BłaŻej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. The influence of different types of translational inaccuracies on the genetic code structure. BMC Bioinformatics 2019; 20:114. [PMID: 30841864 PMCID: PMC6404327 DOI: 10.1186/s12859-019-2661-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The standard genetic code is a recipe for assigning unambiguously 21 labels, i.e. amino acids and stop translation signal, to 64 codons. However, at early stages of the translational machinery development, the codons did not have to be read unambiguously and the early genetic codes could have contained some ambiguous assignments of codons to amino acids. Therefore, the goal of this work was to obtain the genetic code structures which could have evolved assuming different types of inaccuracy of the translational machinery starting from unambiguous assignments of codons to amino acids. RESULTS We developed a theoretical model assuming that the level of uncertainty of codon assignments can gradually decrease during the simulations. Since it is postulated that the standard code has evolved to be robust against point mutations and mistranslations, we developed three simulation scenarios assuming that such errors can influence one, two or three codon positions. The simulated codes were selected using the evolutionary algorithm methodology to decrease coding ambiguity and increase their robustness against mistranslation. CONCLUSIONS The results indicate that the typical codon block structure of the genetic code could have evolved to decrease the ambiguity of amino acid to codon assignments and to increase the fidelity of reading the genetic information. However, the robustness to errors was not the decisive factor that influenced the genetic code evolution because it is possible to find theoretical codes that minimize the reading errors better than the standard genetic code.
Collapse
Affiliation(s)
- Paweł BłaŻej
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Małgorzata Wnetrzak
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Dorota Mackiewicz
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Paweł Mackiewicz
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| |
Collapse
|
9
|
Alignment-based and alignment-free methods converge with experimental data on amino acids coded by stop codons at split between nuclear and mitochondrial genetic codes. Biosystems 2018; 167:33-46. [DOI: 10.1016/j.biosystems.2018.03.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/18/2018] [Accepted: 03/19/2018] [Indexed: 12/11/2022]
|
10
|
Zamudio GS, José MV. Phenotypic Graphs and Evolution Unfold the Standard Genetic Code as the Optimal. ORIGINS LIFE EVOL B 2017; 48:83-91. [PMID: 29082465 DOI: 10.1007/s11084-017-9552-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 10/16/2017] [Indexed: 10/18/2022]
Abstract
In this work, we explicitly consider the evolution of the Standard Genetic Code (SGC) by assuming two evolutionary stages, to wit, the primeval RNY code and two intermediate codes in between. We used network theory and graph theory to measure the connectivity of each phenotypic graph. The connectivity values are compared to the values of the codes under different randomization scenarios. An error-correcting optimal code is one in which the algebraic connectivity is minimized. We show that the SGC is optimal in regard to its robustness and error-tolerance when compared to all random codes under different assumptions.
Collapse
Affiliation(s)
- Gabriel S Zamudio
- Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, C.P. 04510, Ciudad de México CDMX, Mexico
| | - Marco V José
- Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, C.P. 04510, Ciudad de México CDMX, Mexico.
| |
Collapse
|
11
|
Kuruoglu EE, Arndt PF. The information capacity of the genetic code: Is the natural code optimal? J Theor Biol 2017; 419:227-237. [PMID: 28163008 DOI: 10.1016/j.jtbi.2017.01.046] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 01/25/2017] [Accepted: 01/31/2017] [Indexed: 10/20/2022]
Abstract
We envision the molecular evolution process as an information transfer process and provide a quantitative measure for information preservation in terms of the channel capacity according to the channel coding theorem of Shannon. We calculate Information capacities of DNA on the nucleotide (for non-coding DNA) and the amino acid (for coding DNA) level using various substitution models. We extend our results on coding DNA to a discussion about the optimality of the natural codon-amino acid code. We provide the results of an adaptive search algorithm in the code domain and demonstrate the existence of a large number of genetic codes with higher information capacity. Our results support the hypothesis of an ancient extension from a 2-nucleotide codon to the current 3-nucleotide codon code to encode the various amino acids.
Collapse
Affiliation(s)
- Ercan E Kuruoglu
- Institute of Information Science and Technologies, "A. Faedo", CNR, via G Moruzzi 1, 56124 Pisa, Italy.
| | - Peter F Arndt
- Max Planck Institute for Molecular Genetics, Department of Computational Molecular Biology, Ihnestr. 63/73, 14195 Berlin, Germany
| |
Collapse
|
12
|
Aggarwal N, Bandhu AV, Sengupta S. Finite population analysis of the effect of horizontal gene transfer on the origin of an universal and optimal genetic code. Phys Biol 2016; 13:036007. [DOI: 10.1088/1478-3975/13/3/036007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Tlusty T. Self-referring DNA and protein: a remark on physical and geometrical aspects. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0070. [PMID: 26857671 DOI: 10.1098/rsta.2015.0070] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/19/2015] [Indexed: 06/05/2023]
Abstract
All known life forms are based upon a hierarchy of interwoven feedback loops, operating over a cascade of space, time and energy scales. Among the most basic loops are those connecting DNA and proteins. For example, in genetic networks, DNA genes are expressed as proteins, which may bind near the same genes and thereby control their own expression. In this molecular type of self-reference, information is mapped from the DNA sequence to the protein and back to DNA. There is a variety of dynamic DNA-protein self-reference loops, and the purpose of this remark is to discuss certain geometrical and physical aspects related to the back and forth mapping between DNA and proteins. The mappings are examined as dimensional reductions and expansions between high- and low-dimensional manifolds in molecular spaces. The discussion raises basic questions regarding the nature of DNA and proteins as self-referring matter, which are examined in a simple toy model.
Collapse
Affiliation(s)
- Tsvi Tlusty
- Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540, USACenter for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 689-798, Republic of KoreaDepartment of Physics, Ulsan National Institute of Science and Technology, Ulsan 689-798, Republic of Korea
| |
Collapse
|
14
|
Abstract
Carl Woese is known to the scientific community primarily through his landmark contributions to microbiology, in particular, his discovery of the third Domain of Life, which came to be known as the Archaea. While it is well known how he made this discovery, through the techniques he developed based on his studies of rRNA, the reasons why he was driven in this scientific direction, and what he saw as the principle outcome of his discovery--it was not the Archaea!--are not so widely appreciated. In this essay, I discuss his vision of evolution, one which transcends population genetics, and which has ramifications not only for our understanding of the origin of life on Earth and elsewhere, but also for our understanding of biology as a novel class of complex dynamical systems.
Collapse
Affiliation(s)
- Nigel Goldenfeld
- Institute for Universal Biology; Institute for Genomic Biology, and Department of Physics; University of Illinois at Urbana-Champaign; Urbana, IL USA
| |
Collapse
|
15
|
Becich PJ, Stark BP, Bhat HS, Ardell DH. CMCpy: Genetic Code-Message Coevolution Models in Python. Evol Bioinform Online 2013; 9:111-25. [PMID: 23532367 PMCID: PMC3596977 DOI: 10.4137/ebo.s11169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Code-message coevolution (CMC) models represent coevolution of a genetic code and a population of protein-coding genes (“messages”). Formally, CMC models are sets of quasispecies coupled together for fitness through a shared genetic code. Although CMC models display plausible explanations for the origin of multiple genetic code traits by natural selection, useful modern implementations of CMC models are not currently available. To meet this need we present CMCpy, an object-oriented Python API and command-line executable front-end that can reproduce all published results of CMC models. CMCpy implements multiple solvers for leading eigenpairs of quasispecies models. We also present novel analytical results that extend and generalize applications of perturbation theory to quasispecies models and pioneer the application of a homotopy method for quasispecies with non-unique maximally fit genotypes. Our results therefore facilitate the computational and analytical study of a variety of evolutionary systems. CMCpy is free open-source software available from http://pypi.python.org/pypi/CMCpy/.
Collapse
Affiliation(s)
- Peter J Becich
- Center for Computational Biology, University of California, Merced, CA
| | | | | | | |
Collapse
|
16
|
Morgens DW, Cavalcanti ARO. An alternative look at code evolution: using non-canonical codes to evaluate adaptive and historic models for the origin of the genetic code. J Mol Evol 2013; 76:71-80. [PMID: 23344715 DOI: 10.1007/s00239-013-9542-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 01/15/2013] [Indexed: 10/27/2022]
Abstract
The canonical code has been shown many times to be highly robust against point mutations; that is, mutations that change a single nucleotide tend to result in similar amino acids more often than expected by chance. There are two major types of models for the origin of the code, which explain how this sophisticated structure evolved. Adaptive models state that the primitive code was specifically selected for error minimization, while historic models hypothesize that the robustness of the code is an artifact or by-product of the mechanism of code evolution. In this paper, we evaluated the levels of robustness in existing non-canonical codes as well as codes that differ in only one codon assignment from the standard code. We found that the level of robustness of many of these codes is comparable or better than that of the standard code. Although these results do not preclude an adaptive origin of the genetic code, they suggest that the code was not selected for minimizing the effects of point mutations.
Collapse
Affiliation(s)
- David W Morgens
- Department of Biology, Pomona College, 175 W 6th Street, Claremont, CA, USA
| | | |
Collapse
|
17
|
Nikolajewa S, Friedel M, Beyer A, Wilhelm T. THE NEW CLASSIFICATION SCHEME OF THE GENETIC CODE, ITS EARLY EVOLUTION, AND tRNA USAGE. J Bioinform Comput Biol 2011; 4:609-20. [PMID: 16819806 DOI: 10.1142/s0219720006001825] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2005] [Revised: 12/09/2005] [Accepted: 12/23/2005] [Indexed: 11/18/2022]
Abstract
We present a new classification scheme of the genetic code. In contrast to the standard form it clearly shows five codon symmetries: codon-anticodon, codon-reverse codon, and sense-antisense symmetry, as well as symmetries with respect to purine-pyrimidine (A versus G, U versus C) and keto-aminobase (G versus U, A versus C) exchanges. We study the number of tRNA genes of 16 archaea, 81 bacteria and 7 eucaryotes to analyze whether these symmetries are reflected in the corresponding tRNA usage patterns. Two features are especially striking: reverse stop codons do not have their own tRNAs (just one exception in human), and A** anticodons are significantly suppressed. Our classification scheme of the genetic code and the identified tRNA usage patterns support recent speculations about the early evolution of the genetic code. In particular, pre-tRNAs might have had the ability to bind their codons in two directions to the corresponding codons.
Collapse
Affiliation(s)
- Swetlana Nikolajewa
- Theoretical Systems Biology, Institute of Molecular Biotechnology Beutenbergstr, 11, Jena, D-07745, Germany
| | | | | | | |
Collapse
|
18
|
Caporaso JG, Knight R. New insight into the diversity of life's building blocks: evenness, not variance. ASTROBIOLOGY 2011; 11:197-198. [PMID: 21417743 DOI: 10.1089/ast.2011.2280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
|
19
|
Santos J, Monteagudo A. Simulated evolution applied to study the genetic code optimality using a model of codon reassignments. BMC Bioinformatics 2011; 12:56. [PMID: 21338505 PMCID: PMC3053255 DOI: 10.1186/1471-2105-12-56] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Accepted: 02/21/2011] [Indexed: 11/29/2022] Open
Abstract
Background As the canonical code is not universal, different theories about its origin and organization have appeared. The optimization or level of adaptation of the canonical genetic code was measured taking into account the harmful consequences resulting from point mutations leading to the replacement of one amino acid for another. There are two basic theories to measure the level of optimization: the statistical approach, which compares the canonical genetic code with many randomly generated alternative ones, and the engineering approach, which compares the canonical code with the best possible alternative. Results Here we used a genetic algorithm to search for better adapted hypothetical codes and as a method to guess the difficulty in finding such alternative codes, allowing to clearly situate the canonical code in the fitness landscape. This novel proposal of the use of evolutionary computing provides a new perspective in the open debate between the use of the statistical approach, which postulates that the genetic code conserves amino acid properties far better than expected from a random code, and the engineering approach, which tends to indicate that the canonical genetic code is still far from optimal. We used two models of hypothetical codes: one that reflects the known examples of codon reassignment and the model most used in the two approaches which reflects the current genetic code translation table. Although the standard code is far from a possible optimum considering both models, when the more realistic model of the codon reassignments was used, the evolutionary algorithm had more difficulty to overcome the efficiency of the canonical genetic code. Conclusions Simulated evolution clearly reveals that the canonical genetic code is far from optimal regarding its optimization. Nevertheless, the efficiency of the canonical code increases when mistranslations are taken into account with the two models, as indicated by the fact that the best possible codes show the patterns of the standard genetic code. Our results are in accordance with the postulates of the engineering approach and indicate that the main arguments of the statistical approach are not enough to its assertion of the extreme efficiency of the canonical genetic code.
Collapse
Affiliation(s)
- José Santos
- Department of Computer Science, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain.
| | | |
Collapse
|
20
|
A new model of amino acids evolution, evolution index of amino acids and its application in graphical representation of protein sequences. Chem Phys Lett 2010. [DOI: 10.1016/j.cplett.2010.08.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
21
|
Tlusty T. A colorful origin for the genetic code: Information theory, statistical mechanics and the emergence of molecular codes. Phys Life Rev 2010; 7:362-76. [DOI: 10.1016/j.plrev.2010.06.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 01/25/2010] [Accepted: 02/06/2010] [Indexed: 10/19/2022]
|
22
|
José MV, Morgado ER, Govezensky T. Genetic hotels for the standard genetic code: evolutionary analysis based upon novel three-dimensional algebraic models. Bull Math Biol 2010; 73:1443-76. [PMID: 20725796 DOI: 10.1007/s11538-010-9571-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Accepted: 07/02/2010] [Indexed: 11/30/2022]
Abstract
Herein, we rigorously develop novel 3-dimensional algebraic models called Genetic Hotels of the Standard Genetic Code (SGC). We start by considering the primeval RNA genetic code which consists of the 16 codons of type RNY (purine-any base-pyrimidine). Using simple algebraic operations, we show how the RNA code could have evolved toward the current SGC via two different intermediate evolutionary stages called Extended RNA code type I and II. By rotations or translations of the subset RNY, we arrive at the SGC via the former (type I) or via the latter (type II), respectively. Biologically, the Extended RNA code type I, consists of all codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type) and third (YRN type) reading frames. The Extended RNA code type II, comprises all codons of the type RNY plus codons that arise from transversions of the RNA code in the first (YNY type) and third (RNR) nucleotide bases. Since the dimensions of remarkable subsets of the Genetic Hotels are not necessarily integer numbers, we also introduce the concept of algebraic fractal dimension. A general decoding function which maps each codon to its corresponding amino acid or the stop signals is also derived. The Phenotypic Hotel of amino acids is also illustrated. The proposed evolutionary paths are discussed in terms of the existing theories of the evolution of the SGC. The adoption of 3-dimensional models of the Genetic and Phenotypic Hotels will facilitate the understanding of the biological properties of the SGC.
Collapse
Affiliation(s)
- Marco V José
- Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Mexico.
| | | | | |
Collapse
|
23
|
Santos J, Monteagudo Á. Study of the genetic code adaptability by means of a genetic algorithm. J Theor Biol 2010; 264:854-65. [DOI: 10.1016/j.jtbi.2010.02.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Revised: 01/05/2010] [Accepted: 02/23/2010] [Indexed: 11/30/2022]
|
24
|
Jestin JL, Kempf A. Optimization models and the structure of the genetic code. J Mol Evol 2009; 69:452-7. [PMID: 19841850 DOI: 10.1007/s00239-009-9287-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Accepted: 09/18/2009] [Indexed: 11/29/2022]
Abstract
The codon assignment of the quasi-universal genetic code can be assumed to have resulted from the evolutionary pressures that prevailed when the code was still evolving. Here, we review studies of the structure of the genetic code based on optimization models. We also review studies that, from the structure of the code, attempt to derive aspects of the primordial circumstances in which the genetic code froze. Different rationales are summarized, compared with experimental data, discussed in the context of the transition from a RNA world to a DNA-protein world, and linked to the emergence of the last universal common ancestor.
Collapse
Affiliation(s)
- J L Jestin
- Département de Biologie Structurale et Chimie, Institut Pasteur, CNRS, 25 rue du Dr. Roux, 75724, Paris 15, France.
| | | |
Collapse
|
25
|
Butler T, Goldenfeld N, Mathew D, Luthey-Schulten Z. Extreme genetic code optimality from a molecular dynamics calculation of amino acid polar requirement. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:060901. [PMID: 19658466 DOI: 10.1103/physreve.79.060901] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Revised: 04/21/2009] [Indexed: 05/28/2023]
Abstract
A molecular dynamics calculation of the amino acid polar requirement is used to score the canonical genetic code. Monte Carlo simulation shows that this computational polar requirement has been optimized by the canonical genetic code, an order of magnitude more than any previously known measure, effectively ruling out a vertical evolution dynamics. The sensitivity of the optimization to the precise metric used in code scoring is consistent with code evolution having proceeded through the communal dynamics of statistical proteins using horizontal gene transfer, as recently proposed. The extreme optimization of the genetic code therefore strongly supports the idea that the genetic code evolved from a communal state of life prior to the last universal common ancestor.
Collapse
Affiliation(s)
- Thomas Butler
- Department of Physics and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, USA
| | | | | | | |
Collapse
|
26
|
Higgs PG. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol Direct 2009; 4:16. [PMID: 19393096 PMCID: PMC2689856 DOI: 10.1186/1745-6150-4-16] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Accepted: 04/24/2009] [Indexed: 11/18/2022] Open
Abstract
Background The arrangement of the amino acids in the genetic code is such that neighbouring codons are assigned to amino acids with similar physical properties. Hence, the effects of translational error are minimized with respect to randomly reshuffled codes. Further inspection reveals that it is amino acids in the same column of the code (i.e. same second base) that are similar, whereas those in the same row show no particular similarity. We propose a 'four-column' theory for the origin of the code that explains how the action of selection during the build-up of the code leads to a final code that has the observed properties. Results The theory makes the following propositions. (i) The earliest amino acids in the code were those that are easiest to synthesize non-biologically, namely Gly, Ala, Asp, Glu and Val. (ii) These amino acids are assigned to codons with G at first position. Therefore the first code may have used only these codons. (iii) The code rapidly developed into a four-column code where all codons in the same column coded for the same amino acid: NUN = Val, NCN = Ala, NAN = Asp and/or Glu, and NGN = Gly. (iv) Later amino acids were added sequentially to the code by a process of subdivision of codon blocks in which a subset of the codons assigned to an early amino acid were reassigned to a later amino acid. (v) Later amino acids were added into positions formerly occupied by amino acids with similar properties because this can occur with minimal disruption to the proteins already encoded by the earlier code. As a result, the properties of the amino acids in the final code retain a four-column pattern that is a relic of the earliest stages of code evolution. Conclusion The driving force during this process is not the minimization of translational error, but positive selection for the increased diversity and functionality of the proteins that can be made with a larger amino acid alphabet. Nevertheless, the code that results is one in which translational error is minimized. We define a cost function with which we can compare the fitness of codes with varying numbers of amino acids, and a barrier function, which measures the change in cost immediately after addition of a new amino acid. We show that the barrier is positive if an amino acid is added into a column with dissimilar properties, but negative if an amino acid is added into a column with similar physical properties. Thus, natural selection favours the assignment of amino acids to the positions that they occupy in the final code. Reviewers This article was reviewed by David Ardell, Eugene Koonin and Stephen Freeland (nominated by Laurence Hurst)
Collapse
Affiliation(s)
- Paul G Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada.
| |
Collapse
|
27
|
Koonin EV, Novozhilov AS. Origin and evolution of the genetic code: the universal enigma. IUBMB Life 2009; 61:99-111. [PMID: 19117371 DOI: 10.1002/iub.146] [Citation(s) in RCA: 199] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly nonrandom. The three main concepts on the origin and evolution of the code are the stereochemical theory, according to which codon assignments are dictated by physicochemical affinity between amino acids and the cognate codons (anticodons); the coevolution theory, which posits that the code structure coevolved with amino acid biosynthesis pathways; and the error minimization theory under which selection to minimize the adverse effect of point mutations and translation errors was the principal factor of the code's evolution. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis, that is, the notion that the standard code might have no special properties but was fixed simply because all extant life forms share a common ancestor, with subsequent changes to the code, mostly, precluded by the deleterious effect of codon reassignment. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational misreading but there are numerous more robust codes, so the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code could be a combination of frozen accident with selection for error minimization although contributions from coevolution of the code with metabolic pathways and weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain. A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
28
|
Schmitt A, Schuchhardt J, Brockmann GA. The action of key factors in protein evolution at high temporal resolution. PLoS One 2009; 4:e4821. [PMID: 19279682 PMCID: PMC2652826 DOI: 10.1371/journal.pone.0004821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 02/05/2009] [Indexed: 11/18/2022] Open
Abstract
Background Protein evolution is particularly shaped by the conservation of the amino acids' physico-chemical properties and the structure of the genetic code. While conservation is the result of negative selection against proteins with reduced functionality, the codon sequences determine the stochastic aspect of amino acid exchanges. Thus far, it is known that the genetic code is the dominant factor if little time has elapsed since the divergence of one gene into two, but physico-chemical forces gain importance at greater evolutionary distances. Further details, however, on how the influence of these factors varies with time are unknown to date. Methodology/Principal Findings Here, we derive each 10,000 divergence specific substitution matrices for orthologues and paralogues from the Pfam collection of multiple protein alignments and quantify the action of three physico-chemical forces and of the structure of the genetic code at high resolution using correlation analysis. For closely related proteins, the codon sequence similarity is the most influential factor controlling protein evolution, but its influence decreases rapidly as divergence grows. From a protein sequence divergence of about 20 percent on the maintenance of the hydrophobic character of an amino acid is the most influential factor. All factors lose importance from about 40 percent divergence on. This suggests that the original protein structure often does no longer represent a constraint to the protein sequence. The proteins then become free to adopt new functions. We furthermore show that the constraints exerted by both physico-chemical forces and by the genetic code are quite comparable for orthologues and paralogues, however somewhat weaker for paralogues than for orthologues in weakly or moderately diverged proteins. Conclusion/Significance Our analysis substantiates earlier findings that protein evolution is mainly governed by the structure of the genetic code in the early phase after divergence and by the conservation of physico-chemical properties at the later phase. We determine the level of sequence divergence from which on the conservation of the hydrophobic character is gaining importance over the genetic code to be 20 percent. The evolution of orthologues and paralogues is shaped by evolutionary forces in quite comparable ways.
Collapse
Affiliation(s)
- Armin Schmitt
- Institute for Animal Sciences, Humboldt-Universität zu Berlin, Berlin, Germany.
| | | | | |
Collapse
|
29
|
Tlusty T. A simple model for the evolution of molecular codes driven by the interplay of accuracy, diversity and cost. Phys Biol 2008; 5:016001. [DOI: 10.1088/1478-3975/5/1/016001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
30
|
Schmitt AO, Schuchhardt J, Ludwig A, Brockmann GA. Protein evolution within and between species. J Theor Biol 2007; 249:376-83. [PMID: 17881006 DOI: 10.1016/j.jtbi.2007.08.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Revised: 07/16/2007] [Accepted: 08/01/2007] [Indexed: 11/17/2022]
Abstract
Protein evolution can be seen as the successive replacement of amino acids by other amino acids. In general, it is a very slow process which is triggered by point mutations in the nucleotide sequence. These mutations can transform into single nucleotide polymorphisms (SNPs) within populations and diverging proteins between species. It is well known that in many cases amino acids can be replaced by others without impeding the functioning of the protein, even if these are of quite different physico-chemical character. In some cases, however, almost any replacement would result in a functionally deficient protein. Based upon comprehensive published SNP data and applying correlation analysis we quantified the two antagonist factors controlling the process of amino acid replacement and thus protein evolution: First, the degenerate structure of the genetic code which facilitates the exchange of certain amino acids and, second, the physico-chemical forces which limit the range of possible exchanges to maintain a functional protein. We found that the observed frequencies of amino acid exchanges within species are best explained by the genetic code and that the conservation of physico-chemical properties plays a subordinate role, but has nevertheless to be considered as a key factor. Between moderately diverged species genetic code and physico-chemical properties exert comparable influence on amino acid exchanges. We furthermore studied amino acid exchanges in more detail for six species (four mammals, one bird, and one insect) and found that the profiles are highly correlated across all examined species despite their large evolutionary divergence of up to 800 million years. The species specific exchange profiles are also correlated to the exchange profile observed between different species. The currently available huge body of SNP data allows to characterize the role of two major shaping forces of protein evolution more quantitatively than before.
Collapse
Affiliation(s)
- Armin O Schmitt
- Institute for Animal Sciences, Humboldt-Universität zu Berlin, Invalidenstrasse 42, 10115 Berlin, Germany.
| | | | | | | |
Collapse
|
31
|
Novozhilov AS, Wolf YI, Koonin EV. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol Direct 2007; 2:24. [PMID: 17956616 PMCID: PMC2211284 DOI: 10.1186/1745-6150-2-24] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2007] [Accepted: 10/23/2007] [Indexed: 11/30/2022] Open
Abstract
Background The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point. Results We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak. Conclusion The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident. Reviewers This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight. Open Peer Review This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight.
Collapse
Affiliation(s)
- Artem S Novozhilov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
32
|
Sengupta S, Yang X, Higgs PG. The mechanisms of codon reassignments in mitochondrial genetic codes. J Mol Evol 2007; 64:662-88. [PMID: 17541678 PMCID: PMC1894752 DOI: 10.1007/s00239-006-0284-7] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2006] [Accepted: 03/07/2007] [Indexed: 11/26/2022]
Abstract
Many cases of nonstandard genetic codes are known in mitochondrial genomes. We carry out analysis of phylogeny and codon usage of organisms for which the complete mitochondrial genome is available, and we determine the most likely mechanism for codon reassignment in each case. Reassignment events can be classified according to the gain-loss framework. The “gain” represents the appearance of a new tRNA for the reassigned codon or the change of an existing tRNA such that it gains the ability to pair with the codon. The “loss” represents the deletion of a tRNA or the change in a tRNA so that it no longer translates the codon. One possible mechanism is codon disappearance (CD), where the codon disappears from the genome prior to the gain and loss events. In the alternative mechanisms the codon does not disappear. In the unassigned codon mechanism, the loss occurs first, whereas in the ambiguous intermediate mechanism, the gain occurs first. Codon usage analysis gives clear evidence of cases where the codon disappeared at the point of the reassignment and also cases where it did not disappear. CD is the probable explanation for stop to sense reassignments and a small number of reassignments of sense codons. However, the majority of sense-to-sense reassignments cannot be explained by CD. In the latter cases, by analysis of the presence or absence of tRNAs in the genome and of the changes in tRNA sequences, it is sometimes possible to distinguish between the unassigned codon and the ambiguous intermediate mechanisms. We emphasize that not all reassignments follow the same scenario and that it is necessary to consider the details of each case carefully.
Collapse
Affiliation(s)
- Supratim Sengupta
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1 Canada
- Department of Physics and Atmospheric Science, Dalhousie University, Halifax, Nova Scotia B3H 3J5 Canada
| | - Xiaoguang Yang
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1 Canada
| | - Paul G. Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1 Canada
| |
Collapse
|
33
|
Sella G, Ardell DH. The Coevolution of Genes and Genetic Codes: Crick’s Frozen Accident Revisited. J Mol Evol 2006; 63:297-313. [PMID: 16838217 DOI: 10.1007/s00239-004-0176-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2004] [Accepted: 10/21/2005] [Indexed: 10/24/2022]
Abstract
The standard genetic code is the nearly universal system for the translation of genes into proteins. The code exhibits two salient structural characteristics: it possesses a distinct organization that makes it extremely robust to errors in replication and translation, and it is highly redundant. The origin of these properties has intrigued researchers since the code was first discovered. One suggestion, which is the subject of this review, is that the code's organization is the outcome of the coevolution of genes and genetic codes. In 1968, Francis Crick explored the possible implications of coevolution at different stages of code evolution. Although he argues that coevolution was likely to influence the evolution of the code, he concludes that it falls short of explaining the organization of the code we see today. The recent application of mathematical modeling to study the effects of errors on the course of coevolution, suggests a different conclusion. It shows that coevolution readily generates genetic codes that are highly redundant and similar in their error-correcting organization to the standard code. We review this recent work and suggest that further affirmation of the role of coevolution can be attained by investigating the extent to which the outcome of coevolution is robust to other influences that were present during the evolution of the code.
Collapse
Affiliation(s)
- Guy Sella
- Center for the Study of Rationality, The Hebrew University, Givat Ram, 91904, Jerusalem, Israel.
| | | |
Collapse
|
34
|
Abstract
A dynamical theory for the evolution of the genetic code is presented, which accounts for its universality and optimality. The central concept is that a variety of collective, but non-Darwinian, mechanisms likely to be present in early communal life generically lead to refinement and selection of innovation-sharing protocols, such as the genetic code. Our proposal is illustrated by using a simplified computer model and placed within the context of a sequence of transitions that early life may have made, before the emergence of vertical descent.
Collapse
Affiliation(s)
| | - Carl Woese
- Microbiology and
- Institute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
- To whom correspondence may be addressed. E-mail:
| | - Nigel Goldenfeld
- Departments of *Physics and
- Institute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
- To whom correspondence may be addressed at:
Department of Physics and Institute for Genomic Biology, University of Illinois at Urbana–Champaign, 1110 West Green Street, Urbana, IL 61801. E-mail:
| |
Collapse
|
35
|
Zhu W, Freeland S. The standard genetic code enhances adaptive evolution of proteins. J Theor Biol 2005; 239:63-70. [PMID: 16325205 DOI: 10.1016/j.jtbi.2005.07.012] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Revised: 06/29/2005] [Accepted: 07/19/2005] [Indexed: 11/15/2022]
Abstract
The standard genetic code, by which most organisms translate genetic material into protein metabolism, is non-randomly organized. The Error Minimization hypothesis interprets this non-randomness as an adaptation, proposing that natural selection produced a pattern of codon assignments that buffers genomes against the impact of mutations. Indeed, on the average any given point mutation has a lesser effect on the chemical properties of the utilized amino acid than expected by chance. Might it also, however, be the case that the non-random nature of the code effects the rate of adaptive evolution? To investigate this, here we develop population genetic simulations to test the rate of adaptive gene evolution under different genetic codes. We identify two independent properties of a genetic code that profoundly influence the speed of adaptive evolution. Noting that the standard genetic code exhibits both, we offer a new insight into the effects of the "error minimizing" code: such a code enhances the efficacy of adaptive sequence evolution.
Collapse
Affiliation(s)
- Wen Zhu
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
| | | |
Collapse
|
36
|
Marquez R, Smit S, Knight R. Do universal codon-usage patterns minimize the effects of mutation and translation error? Genome Biol 2005; 6:R91. [PMID: 16277746 PMCID: PMC1297647 DOI: 10.1186/gb-2005-6-11-r91] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2005] [Revised: 08/24/2005] [Accepted: 09/21/2005] [Indexed: 12/03/2022] Open
Abstract
The analysis of codon usage in nearly 900 species of the three domains of life suggests that codon usage patterns in mRNA messages do not minimize the effects of translation error. Background Do species use codons that reduce the impact of errors in translation or replication? The genetic code is arranged in a way that minimizes errors, defined as the sum of the differences in amino-acid properties caused by single-base changes from each codon to each other codon. However, the extent to which organisms optimize the genetic messages written in this code has been far less studied. We tested whether codon and amino-acid usages from 457 bacteria, 264 eukaryotes, and 33 archaea minimize errors compared to random usages, and whether changes in genome G+C content influence these error values. Results We tested the hypotheses that organisms choose their codon usage to minimize errors, and that the large observed variation in G+C content in coding sequences, but the low variation in G+U or G+A content, is due to differences in the effects of variation along these axes on the error value. Surprisingly, the biological distribution of error values has far lower variance than randomized error values, but error values of actual codon and amino-acid usages are actually greater than would be expected by chance. Conclusion These unexpected findings suggest that selection against translation error has not produced codon or amino-acid usages that minimize the effects of errors, and that even messages with very different nucleotide compositions somehow maintain a relatively constant error value. They raise the question: why do all known organisms use highly error-minimizing genetic codes, but fail to minimize the errors in the mRNA messages they encode?
Collapse
Affiliation(s)
- Roberto Marquez
- Department of Computer Science, New Mexico State University, MSC CS, Las Cruces, NM 88003, USA
| | - Sandra Smit
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| | - Rob Knight
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
37
|
Abstract
There is very significant evidence that cognate codons and/or anticodons are unexpectedly frequent in RNA-binding sites for seven of eight biological amino acids that have been tested. This suggests that a substantial fraction of the genetic code has a stereochemical basis, the triplets having escaped from their original function in amino acid-binding sites to become modern codons and anticodons. We explicitly show that this stereochemical basis is consistent with subsequent optimization of the code to minimize the effect of coding mistakes on protein structure. These data also strengthen the argument for invention of the genetic code in an RNA world and for the RNA world itself.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular Cellular and Developmental Biology, University of Colorado, Boulder, Colorado 80309-0347, USA.
| | | | | |
Collapse
|
38
|
Wu HL, Bagby S, van den Elsen JMH. Evolution of the Genetic Triplet Code via Two Types of Doublet Codons. J Mol Evol 2005; 61:54-64. [PMID: 16059752 DOI: 10.1007/s00239-004-0224-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2004] [Accepted: 02/21/2005] [Indexed: 10/25/2022]
Abstract
Explaining the apparent non-random codon distribution and the nature and number of amino acids in the 'standard' genetic code remains a challenge, despite the various hypotheses so far proposed. In this paper we propose a simple new hypothesis for code evolution involving a progression from singlet to doublet to triplet codons with a reading mechanism that moves three bases each step. We suggest that triplet codons gradually evolved from two types of ambiguous doublet codons, those in which the first two bases of each three-base window were read ('prefix' codons) and those in which the last two bases of each window were read ('suffix' codons). This hypothesis explains multiple features of the genetic code such as the origin of the pattern of four-fold degenerate and two-fold degenerate triplet codons, the origin of its error minimising properties, and why there are only 20 amino acids.
Collapse
Affiliation(s)
- Huan-Lin Wu
- Department of Biology and Biochemistry, University of Bath, 4 South, Claverton Down, Bath BA2 7AY, UK
| | | | | |
Collapse
|
39
|
Abstract
Since the early days of the discovery of the genetic code nonrandom patterns have been searched for in the code in the hope of providing information about its origin and early evolution. Here we present a new classification scheme of the genetic code that is based on a binary representation of the purines and pyrimidines. This scheme reveals known patterns more clearly than the common one, for instance, the classification of strong, mixed, and weak codons as well as the ordering of codon families. Furthermore, new patterns have been found that have not been described before: Nearly all quantitative amino acid properties, such as Woese's polarity and the specific volume, show a perfect correlation to Lagerkvist's codon-anticodon binding strength. Our new scheme leads to new ideas about the evolution of the genetic code. It is hypothesized that it started with a binary doublet code and developed via a quaternary doublet code into the contemporary triplet code. Furthermore, arguments are presented against suggestions that a "simpler" code, where only the midbase was informational, was at the origin of the genetic code.
Collapse
Affiliation(s)
- Thomas Wilhelm
- Institute of Molecular Biotechnology, Beutenbergstr. 11, 07745 Jena, Germany.
| | | |
Collapse
|
40
|
Abstract
The coevolution theory of the genetic code, which postulates that prebiotic synthesis was an inadequate source of all twenty protein amino acids, and therefore some of them had to be derived from the coevolving pathways of amino acid biosynthesis, has been assessed in the light of the discoveries of the past three decades. Its four fundamental tenets regarding the essentiality of amino acid biosynthesis, role of pretran synthesis, biosynthetic imprint on codon allocations and mutability of the encoded amino acids are proven by the new knowledge. Of the factors that guided the evolutionary selection of the universal code, the relative contributions of Amino Acid Biosynthesis: Error Minimization: Stereochemical Interaction are estimated to first approximation as 40,000,000:400:1, which suggests that amino acid biosynthesis represents the dominant factor shaping the code. The utility of the coevolution theory is demonstrated by its opening up experimental expansions of the code and providing a basis for locating the root of life.
Collapse
Affiliation(s)
- J Tze-Fei Wong
- Applied Genomics Laboratory and Department of Biochemistry, Hong Kong University of Science & Technology, Hong Kong, China.
| |
Collapse
|
41
|
Di Giulio M. The origin of the genetic code: theories and their relationships, a review. Biosystems 2004; 80:175-84. [PMID: 15823416 DOI: 10.1016/j.biosystems.2004.11.005] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2004] [Revised: 11/12/2004] [Accepted: 11/18/2004] [Indexed: 10/26/2022]
Abstract
A review of the main theories proposed to explain the origin of the genetic code is presented. I analyze arguments and data in favour of different theories proposed to explain the origin of the organization of the genetic code. It is possible to suggest a mechanism that makes compatible the different theories of the origin of the code, even if these are based on a historical or physicochemical determinism and thus appear incompatible by definition. Finally, I discuss the question of why a given number of synonymous codons was attributed to the amino acids in the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Institute of Genetics and Biophysics Adriano Buzzati-Traverso, CNR, Naples, Italy
| |
Collapse
|
42
|
|
43
|
Affiliation(s)
- Rufus A. Johnstone
- Department of Zoology,University of Cambridge, Downing Street,Cambridge, CB2 3EJ, UK
| | - Sasha R. X. Dall
- Department of Zoology,University of Cambridge, Downing Street,Cambridge, CB2 3EJ, UK
| |
Collapse
|