1
|
Di Giulio M. Theories of the origin of the genetic code: Strong corroboration for the coevolution theory. Biosystems 2024; 239:105217. [PMID: 38663520 DOI: 10.1016/j.biosystems.2024.105217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 04/29/2024]
Abstract
I analyzed all the theories and models of the origin of the genetic code, and over the years, I have considered the main suggestions that could explain this origin. The conclusion of this analysis is that the coevolution theory of the origin of the genetic code is the theory that best captures the majority of observations concerning the organization of the genetic code. In other words, the biosynthetic relationships between amino acids would have heavily influenced the origin of the organization of the genetic code, as supported by the coevolution theory. Instead, the presence in the genetic code of physicochemical properties of amino acids, which have also been linked to the physicochemical properties of anticodons or codons or bases by stereochemical and physicochemical theories, would simply be the result of natural selection. More explicitly, I maintain that these correlations between codons, anticodons or bases and amino acids are in fact the result not of a real correlation between amino acids and codons, for example, but are only the effect of the intervention of natural selection. Specifically, in the genetic code table we expect, for example, that the most similar codons - that is, those that differ by only one base - will have more similar physicochemical properties. Therefore, the 64 codons of the genetic code table ordered in a certain way would also represent an ordering of some of their physicochemical properties. Now, a study aimed at clarifying which physicochemical property of amino acids has influenced the allocation of amino acids in the genetic code has established that the partition energy of amino acids has played a role decisive in this. Indeed, under some conditions, the genetic code was found to be approximately 98% optimized on its columns. In this same work, it was shown that this was most likely the result of the action of natural selection. If natural selection had truly allocated the amino acids in the genetic code in such a way that similar amino acids also have similar codons - this, not through a mechanism of physicochemical interaction between, for example, codons and amino acids - then it might turn out that even different physicochemical properties of codons (or anticodons or bases) show some correlation with the physicochemical properties of amino acids, simply because the partition energy of amino acids is correlated with other physicochemical properties of amino acids. It is very likely that this would inevitably lead to a correlation between codons (or anticodons or bases) and amino acids. In other words, since the codons (anticodons or bases) are ordered in the genetic code, that is to say, some of their physicochemical properties should also be ordered by a similar order, and given that the amino acids would also appear to have been ordered in the genetic code by selection natural, then it should inevitably turn out that there is a correlation between, for example, the hydrophobicity of anticodons and that of amino acids. Instead, the intervention of natural selection in organizing the genetic code would appear to be highly compatible with the main mechanism of structuring the genetic code as supported by the coevolution theory. This would make the coevolution theory the only plausible explanation for the origin of the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
2
|
Di Giulio M. The time of appearance of the genetic code. Biosystems 2024; 237:105159. [PMID: 38373543 DOI: 10.1016/j.biosystems.2024.105159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/13/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024]
Abstract
I support the hypothesis that the origin of the genetic code occurred simultaneously with the evolution of cellularity. That is to say, I favour the hypothesis that the origin of the genetic code is a very, very late event in the history of life on Earth. I corroborate this hypothesis with observations favouring the progenote's stage for the Last Universal Common Ancestor (LUCA), for the ancestor of bacteria and that of archaea. Indeed, these progenotic stages would imply that - at that time - the origin of the genetic code was still ongoing simply because this origin would fall within the very definition of progenote. Therefore, if the evolution of cellularity had truly been coeval with the origin of the genetic code - at least in its terminal part - then this would favour theories such as the coevolution theory of the origin of the genetic code because this theory would postulate that this origin must have occurred in extremely complex protocellular conditions and not concerning stereochemical or physicochemical interactions having to do with other stages of the origin of life. In this sense, the coevolution theory would be corroborated while the stereochemical and physicochemical theories would be damaged. Therefore, the origin of the genetic code would be linked to the origin of the cell and not to the origin of life as sometimes asserted. Therefore, I will discuss the late hypothesis of the origin of the genetic code in the context of the theories proposed to explain this origin and more generally of its implications for the early evolution of life.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
3
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
4
|
Use of the Codon Table to Quantify the Evolutionary Role of Random Mutations. ALGORITHMS 2021. [DOI: 10.3390/a14090270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The various biases affecting RNA mutations during evolution is the subject of intense research, leaving the extent of the role of random mutations undefined. To remedy this lacuna, using the codon table, the number of codons representing each amino acid was correlated with the amino acid frequencies in different branches of the evolutionary tree. The correlations were seen to increase as evolution progressed. Furthermore, the number of RNA mutations that resulted in a given amino acid mutation were found to be correlated with several widely used amino acid similarity tables (used in sequence alignments). These correlations were seen to increase when the observed codon usage was factored in.
Collapse
|
5
|
The Mutational Robustness of the Genetic Code and Codon Usage in Environmental Context: A Non-Extremophilic Preference? Life (Basel) 2021; 11:life11080773. [PMID: 34440517 PMCID: PMC8398314 DOI: 10.3390/life11080773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 12/12/2022] Open
Abstract
The genetic code was evolved, to some extent, to minimize the effects of mutations. The effects of mutations depend on the amino acid repertoire, the structure of the genetic code and frequencies of amino acids in proteomes. The amino acid compositions of proteins and corresponding codon usages are still under selection, which allows us to ask what kind of environment the standard genetic code is adapted to. Using simple computational models and comprehensive datasets comprising genomic and environmental data from all three domains of Life, we estimate the expected severity of non-synonymous genomic mutations in proteins, measured by the change in amino acid physicochemical properties. We show that the fidelity in these physicochemical properties is expected to deteriorate with extremophilic codon usages, especially in thermophiles. These findings suggest that the genetic code performs better under non-extremophilic conditions, which not only explains the low substitution rates encountered in halophiles and thermophiles but the revealed relationship between the genetic code and habitat allows us to ponder on earlier phases in the history of Life.
Collapse
|
6
|
Phylogenetic analysis of mutational robustness based on codon usage supports that the standard genetic code does not prefer extreme environments. Sci Rep 2021; 11:10963. [PMID: 34040064 PMCID: PMC8154912 DOI: 10.1038/s41598-021-90440-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/10/2021] [Indexed: 02/04/2023] Open
Abstract
The mutational robustness of the genetic code is rarely discussed in the context of biological diversity, such as codon usage and related factors, often considered as independent of the actual organism's proteome. Here we put the living beings back to picture and use distortion as a metric of mutational robustness. Distortion estimates the expected severities of non-synonymous mutations measuring it by amino acid physicochemical properties and weighting for codon usage. Using the biological variance of codon frequencies, we interpret the mutational robustness of the standard genetic code with regards to their corresponding environments and genomic compositions (GC-content). Employing phylogenetic analyses, we show that coding fidelity in physicochemical properties can deteriorate with codon usages adapted to extreme environments and these putative effects are not the artefacts of phylogenetic bias. High temperature environments select for codon usages with decreased mutational robustness of hydrophobic, volumetric, and isoelectric properties. Selection at high saline concentrations also leads to reduced fidelity in polar and isoelectric patterns. These show that the genetic code performs best with mesophilic codon usages, strengthening the view that LUCA or its ancestors preferred lower temperature environments. Taxonomic implications, such as rooting the tree of life, are also discussed.
Collapse
|
7
|
Di Giulio M. The key role of the elongation factors in the origin of the organization of the genetic code. Biosystems 2019; 181:20-26. [DOI: 10.1016/j.biosystems.2019.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/13/2019] [Accepted: 04/13/2019] [Indexed: 11/29/2022]
|
8
|
Facchiano A, Di Giulio M. The genetic code is not an optimal code in a model taking into account both the biosynthetic relationships between amino acids and their physicochemical properties. J Theor Biol 2018; 459:45-51. [DOI: 10.1016/j.jtbi.2018.09.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 09/04/2018] [Accepted: 09/19/2018] [Indexed: 01/22/2023]
|
9
|
Di Giulio M. A Non-neutral Origin for Error Minimization in the Origin of the Genetic Code. J Mol Evol 2018; 86:593-597. [PMID: 30361751 DOI: 10.1007/s00239-018-9871-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 10/17/2018] [Indexed: 11/29/2022]
Abstract
Massey (J Mol Evol 67:510-516, 2008; J Theor Biol 408:237-242, 2016; Nat Comput. https://doi.org/10.1007/s11047-017-9669-3, 2018) claims that the error minimization of the genetic code is derived by means of a neutral process and was not due to the action of natural selection. Here, I argue that this neutralist hypothesis of the origin of error minimization is not based directly on any neutral process but it could be only indirectly. On the contrary, it has been natural selection that has acted during the origin of the genetic code determining the property that similar amino acids are coded by similar codons within the genetic code table.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino, 111, 80131, Naples, Italy.
| |
Collapse
|
10
|
Di Giulio M. A discriminative test among the different theories proposed to explain the origin of the genetic code: The coevolution theory finds additional support. Biosystems 2018; 169-170:1-4. [DOI: 10.1016/j.biosystems.2018.05.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/26/2018] [Accepted: 05/07/2018] [Indexed: 11/29/2022]
|
11
|
Bijective codon transformations show genetic code symmetries centered on cytosine's coding properties. Theory Biosci 2017; 137:17-31. [PMID: 29147851 DOI: 10.1007/s12064-017-0258-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 11/13/2017] [Indexed: 12/11/2022]
Abstract
Homology of some RNAs with template DNA requires systematic exchanges between nucleotides. Such exchanges produce 'swinger' RNA along 23 bijective transformations (nine symmetric, X ↔ Y; and 14 asymmetric, X → Y → Z → X, for example A ↔ C and A → C → G → A, respectively). Here, analyses compare amino acids coded by swinger-transformed codons to those coded by untransformed codons, defining coding invariance after transformations. Swinger transformations cluster according to coding invariance in four groups characterized by transformations into cytosine (C = C, T → C, A → C, and G → C). C's central mutational coding role shows that swinger transformations constrained genetic code genesis. Coding invariance post-transformations correlate positively/negatively with mitochondrial swinger transcription/lepidosaurian body temperature. Presumably, low/high temperatures stabilize/revert rare swinger polymerization modes, producing long swinger sequences/point mutations, respectively. Coding invariance after swinger transformations might compensate effects of swinger polymerizations in species with low body temperatures. Hypothetically, swinger transcription increased coding potential of RNA self-replicating protolife systems under heating/cooling cycles.
Collapse
|
12
|
Zamudio GS, José MV. Phenotypic Graphs and Evolution Unfold the Standard Genetic Code as the Optimal. ORIGINS LIFE EVOL B 2017; 48:83-91. [PMID: 29082465 DOI: 10.1007/s11084-017-9552-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 10/16/2017] [Indexed: 10/18/2022]
Abstract
In this work, we explicitly consider the evolution of the Standard Genetic Code (SGC) by assuming two evolutionary stages, to wit, the primeval RNY code and two intermediate codes in between. We used network theory and graph theory to measure the connectivity of each phenotypic graph. The connectivity values are compared to the values of the codes under different randomization scenarios. An error-correcting optimal code is one in which the algebraic connectivity is minimized. We show that the SGC is optimal in regard to its robustness and error-tolerance when compared to all random codes under different assumptions.
Collapse
Affiliation(s)
- Gabriel S Zamudio
- Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, C.P. 04510, Ciudad de México CDMX, Mexico
| | - Marco V José
- Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, C.P. 04510, Ciudad de México CDMX, Mexico.
| |
Collapse
|
13
|
Some pungent arguments against the physico-chemical theories of the origin of the genetic code and corroborating the coevolution theory. J Theor Biol 2017; 414:1-4. [DOI: 10.1016/j.jtbi.2016.11.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/26/2016] [Accepted: 11/16/2016] [Indexed: 10/20/2022]
|
14
|
Klitting R, Gould EA, de Lamballerie X. G+C content differs in conserved and variable amino acid residues of flaviviruses and other evolutionary groups. INFECTION GENETICS AND EVOLUTION 2016; 45:332-340. [PMID: 27663721 DOI: 10.1016/j.meegid.2016.09.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Revised: 09/01/2016] [Accepted: 09/19/2016] [Indexed: 11/25/2022]
Abstract
Flaviviruses are small RNA viruses that exhibit genetic and ecological diversity and a wide range of G+C content (GC%). We discovered that, amongst flaviviruses, the GC% of nucleotides encoding conserved amino acid (AA) residues was consistently higher than that of nucleotides encoding variable AAs. This intriguing phenomenon was also identified for a wide range of other viruses, and some non-viral evolutionary groups. Here, we analyse the possible mechanisms underlying this imbalanced nucleotide content (in particular the role of the specific G content and the AA composition in flaviviral genomes) and discuss its evolutionary implications. Our findings suggest that one of the most simple characteristics of the genetic code (i.e., the G or G+C content of codons) is linked with the evolutionary behavior of the corresponding encoded AAs.
Collapse
Affiliation(s)
- Raphaëlle Klitting
- UMR "Emergence des Pathologies Virales" (EPV: Aix-Marseille University - IRD 190 - Inserm 1207 - EHESP), 27 bd Jean Moulin, 13385 Marseille, France.
| | - Ernest Andrew Gould
- UMR "Emergence des Pathologies Virales" (EPV: Aix-Marseille University - IRD 190 - Inserm 1207 - EHESP), 27 bd Jean Moulin, 13385 Marseille, France.
| | - Xavier de Lamballerie
- UMR "Emergence des Pathologies Virales" (EPV: Aix-Marseille University - IRD 190 - Inserm 1207 - EHESP), 27 bd Jean Moulin, 13385 Marseille, France; Institut Hospitalo-Universitaire Méditerranée-Infection, 27 bd Jean Moulin, 13385 Marseille, France.
| |
Collapse
|
15
|
Aggarwal N, Bandhu AV, Sengupta S. Finite population analysis of the effect of horizontal gene transfer on the origin of an universal and optimal genetic code. Phys Biol 2016; 13:036007. [DOI: 10.1088/1478-3975/13/3/036007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
16
|
Di Giulio M. The lack of foundation in the mechanism on which are based the physico-chemical theories for the origin of the genetic code is counterposed to the credible and natural mechanism suggested by the coevolution theory. J Theor Biol 2016; 399:134-40. [PMID: 27067244 DOI: 10.1016/j.jtbi.2016.04.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 03/29/2016] [Accepted: 04/01/2016] [Indexed: 11/25/2022]
Abstract
I analyze the mechanism on which are based the majority of theories that put to the center of the origin of the genetic code the physico-chemical properties of amino acids. As this mechanism is based on excessive mutational steps, I conclude that it could not have been operative or if operative it would not have allowed a full realization of predictions of these theories, because this mechanism contained, evidently, a high indeterminacy. I make that disapproving the four-column theory of the origin of the genetic code (Higgs, 2009) and reply to the criticism that was directed towards the coevolution theory of the origin of the genetic code. In this context, I suggest a new hypothesis that clarifies the mechanism by which the domains of codons of the precursor amino acids would have evolved, as predicted by the coevolution theory. This mechanism would have used particular elongation factors that would have constrained the evolution of all amino acids belonging to a given biosynthetic family to the progenitor pre-tRNA, that for first recognized, the first codons that evolved in a certain codon domain of a determined precursor amino acid. This happened because the elongation factors recognized two characteristics of the progenitor pre-tRNAs of precursor amino acids, which prevented the elongation factors from recognizing the pre-tRNAs belonging to biosynthetic families of different precursor amino acids. Finally, I analyze by means of Fisher's exact test, the distribution, within the genetic code, of the biosynthetic classes of amino acids and the ones of polarity values of amino acids. This analysis would seem to support the biosynthetic classes of amino acids over the ones of polarity values, as the main factor that led to the structuring of the genetic code, with the physico-chemical properties of amino acids playing only a subsidiary role in this evolution. As a whole, the full analysis brings to the conclusion that the coevolution theory of the origin of the genetic code would be a theory highly corroborated.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino, 111, 80131 Naples, Italy.
| |
Collapse
|
17
|
Coevolution Theory of the Genetic Code at Age Forty: Pathway to Translation and Synthetic Life. Life (Basel) 2016; 6:life6010012. [PMID: 26999216 PMCID: PMC4810243 DOI: 10.3390/life6010012] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Revised: 02/26/2016] [Accepted: 03/04/2016] [Indexed: 11/17/2022] Open
Abstract
The origins of the components of genetic coding are examined in the present study. Genetic information arose from replicator induction by metabolite in accordance with the metabolic expansion law. Messenger RNA and transfer RNA stemmed from a template for binding the aminoacyl-RNA synthetase ribozymes employed to synthesize peptide prosthetic groups on RNAs in the Peptidated RNA World. Coevolution of the genetic code with amino acid biosynthesis generated tRNA paralogs that identify a last universal common ancestor (LUCA) of extant life close to Methanopyrus, which in turn points to archaeal tRNA introns as the most primitive introns and the anticodon usage of Methanopyrus as an ancient mode of wobble. The prediction of the coevolution theory of the genetic code that the code should be a mutable code has led to the isolation of optional and mandatory synthetic life forms with altered protein alphabets.
Collapse
|
18
|
Massey SE. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel) 2015; 5:1301-32. [PMID: 25919033 PMCID: PMC4500140 DOI: 10.3390/life5021301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/02/2015] [Accepted: 04/03/2015] [Indexed: 01/09/2023] Open
Abstract
The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a selective pressure in the evolution of sexual reproduction, and differences in translational fidelity. Lastly, the utility of the concept of an informational constraint to other diverse fields of research is explored.
Collapse
Affiliation(s)
- Steven E Massey
- Biology Department, PO Box 23360, University of Puerto Rico-Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
19
|
On How Many Fundamental Kinds of Cells are Present on Earth: Looking for Phylogenetic Traits that Would Allow the Identification of the Primary Lines of Descent. J Mol Evol 2014; 78:313-20. [DOI: 10.1007/s00239-014-9626-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 05/21/2014] [Indexed: 11/26/2022]
|
20
|
Copp JN, Hanson-Manful P, Ackerley DF, Patrick WM. Error-prone PCR and effective generation of gene variant libraries for directed evolution. Methods Mol Biol 2014; 1179:3-22. [PMID: 25055767 DOI: 10.1007/978-1-4939-1053-3_1] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Any single-enzyme directed evolution strategy has two fundamental requirements: the need to efficiently introduce variation into a gene of interest and the need to create an effective library from those variants. Generation of a maximally diverse gene library is particularly important when employing nontargeted mutagenesis strategies such as error-prone PCR (epPCR), which seek to explore very large areas of sequence space. Here we present comprehensive protocols and tips for using epPCR to generate gene variants that exhibit a relatively balanced spectrum of mutations and for capturing as much diversity as possible through effective cloning of those variants. The detailed library preparation methods that we describe are generally applicable to any directed evolution strategy that uses restriction enzymes to clone gene variants into an expression plasmid.
Collapse
Affiliation(s)
- Janine N Copp
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | | | | |
Collapse
|
21
|
Bandhu AV, Aggarwal N, Sengupta S. Revisiting the physico-chemical hypothesis of code origin: an analysis based on code-sequence coevolution in a finite population. ORIGINS LIFE EVOL B 2013; 43:465-89. [PMID: 24500541 DOI: 10.1007/s11084-014-9353-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Accepted: 01/13/2014] [Indexed: 01/23/2023]
Abstract
The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examine the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explore two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.
Collapse
Affiliation(s)
- Ashutosh Vishwa Bandhu
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
| | | | | |
Collapse
|
22
|
Di Giulio M. The Origin of the Genetic Code: Matter of Metabolism or Physicochemical Determinism? J Mol Evol 2013; 77:131-3. [DOI: 10.1007/s00239-013-9593-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 10/18/2013] [Indexed: 12/27/2022]
|
23
|
A realistic model under which the genetic code is optimal. J Mol Evol 2013; 77:170-84. [PMID: 23877342 DOI: 10.1007/s00239-013-9571-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 06/27/2013] [Indexed: 01/23/2023]
Abstract
The genetic code has a high level of error robustness. Using values of hydrophobicity scales as a proxy for amino acid character, and the mean square measure as a function quantifying error robustness, a value can be obtained for a genetic code which reflects the error robustness of that code. By comparing this value with a distribution of values belonging to codes generated by random permutations of amino acid assignments, the level of error robustness of a genetic code can be quantified. We present a calculation in which the standard genetic code is shown to be optimal. We obtain this result by (1) using recently updated values of polar requirement as input; (2) fixing seven assignments (Ile, Trp, His, Phe, Tyr, Arg, and Leu) based on aptamer considerations; and (3) using known biosynthetic relations of the 20 amino acids. This last point is reflected in an approach of subdivision (restricting the random reallocation of assignments to amino acid subgroups, the set of 20 being divided in four such subgroups). The three approaches to explain robustness of the code (specific selection for robustness, amino acid-RNA interactions leading to assignments, or a slow growth process of assignment patterns) are reexamined in light of our findings. We offer a comprehensive hypothesis, stressing the importance of biosynthetic relations, with the code evolving from an early stage with just glycine and alanine, via intermediate stages, towards 64 codons carrying todays meaning.
Collapse
|
24
|
Morgens DW, Cavalcanti ARO. An alternative look at code evolution: using non-canonical codes to evaluate adaptive and historic models for the origin of the genetic code. J Mol Evol 2013; 76:71-80. [PMID: 23344715 DOI: 10.1007/s00239-013-9542-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 01/15/2013] [Indexed: 10/27/2022]
Abstract
The canonical code has been shown many times to be highly robust against point mutations; that is, mutations that change a single nucleotide tend to result in similar amino acids more often than expected by chance. There are two major types of models for the origin of the code, which explain how this sophisticated structure evolved. Adaptive models state that the primitive code was specifically selected for error minimization, while historic models hypothesize that the robustness of the code is an artifact or by-product of the mechanism of code evolution. In this paper, we evaluated the levels of robustness in existing non-canonical codes as well as codes that differ in only one codon assignment from the standard code. We found that the level of robustness of many of these codes is comparable or better than that of the standard code. Although these results do not preclude an adaptive origin of the genetic code, they suggest that the code was not selected for minimizing the effects of point mutations.
Collapse
Affiliation(s)
- David W Morgens
- Department of Biology, Pomona College, 175 W 6th Street, Claremont, CA, USA
| | | |
Collapse
|
25
|
Zhang Z, Yu J. Does the genetic code have a eukaryotic origin? GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:41-55. [PMID: 23402863 PMCID: PMC4357656 DOI: 10.1016/j.gpb.2013.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/09/2013] [Accepted: 01/11/2013] [Indexed: 11/29/2022]
Abstract
In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core “house-keeping” functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables—GC and purine contents—of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern—the symmetric pattern—where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
26
|
The genetic code and its optimization for kinetic energy conservation in polypeptide chains. Biosystems 2012; 109:141-4. [DOI: 10.1016/j.biosystems.2012.03.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2011] [Revised: 01/27/2012] [Accepted: 03/06/2012] [Indexed: 10/28/2022]
|
27
|
Stability of the genetic code and optimal parameters of amino acids. J Theor Biol 2010; 269:57-63. [PMID: 20955716 DOI: 10.1016/j.jtbi.2010.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Revised: 09/20/2010] [Accepted: 10/12/2010] [Indexed: 11/24/2022]
Abstract
The standard genetic code is known to be much more efficient in minimizing adverse effects of misreading errors and one-point mutations in comparison with a random code having the same structure, i.e. the same number of codons coding for each particular amino acid. We study the inverse problem, how the code structure affects the optimal physico-chemical parameters of amino acids ensuring the highest stability of the genetic code. It is shown that the choice of two or more amino acids with given properties determines unambiguously all the others. In this sense the code structure determines strictly the optimal parameters of amino acids or the corresponding scales may be derived directly from the genetic code. In the code with the structure of the standard genetic code the resulting values for hydrophobicity obtained in the scheme "leave one out" and in the scheme with fixed maximum and minimum parameters correlate significantly with the natural scale. The comparison of the optimal and natural parameters allows assessing relative impact of physico-chemical and error-minimization factors during evolution of the genetic code. As the resulting optimal scale depends on the choice of amino acids with given parameters, the technique can also be applied to testing various scenarios of the code evolution with increasing number of codified amino acids. Our results indicate the co-evolution of the genetic code and physico-chemical properties of recruited amino acids.
Collapse
|
28
|
Seaborg DM. Was Wright right? The canonical genetic code is an empirical example of an adaptive peak in nature; deviant genetic codes evolved using adaptive bridges. J Mol Evol 2010; 71:87-99. [PMID: 20711776 PMCID: PMC2924497 DOI: 10.1007/s00239-010-9373-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Accepted: 07/02/2010] [Indexed: 11/30/2022]
Abstract
The canonical genetic code is on a sub-optimal adaptive peak with respect to its ability to minimize errors, and is close to, but not quite, optimal. This is demonstrated by the near-total adjacency of synonymous codons, the similarity of adjacent codons, and comparisons of frequency of amino acid usage with number of codons in the code for each amino acid. As a rare empirical example of an adaptive peak in nature, it shows adaptive peaks are real, not merely theoretical. The evolution of deviant genetic codes illustrates how populations move from a lower to a higher adaptive peak. This is done by the use of "adaptive bridges," neutral pathways that cross over maladaptive valleys by virtue of masking of the phenotypic expression of some maladaptive aspects in the genotype. This appears to be the general mechanism by which populations travel from one adaptive peak to another. There are multiple routes a population can follow to cross from one adaptive peak to another. These routes vary in the probability that they will be used, and this probability is determined by the number and nature of the mutations that happen along each of the routes. A modification of the depiction of adaptive landscapes showing genetic distances and probabilities of travel along their multiple possible routes would throw light on this important concept.
Collapse
Affiliation(s)
- David M Seaborg
- Foundation for Biological Conservation and Research, 1888 Pomar Way, Walnut Creek, CA 94598-1424, USA.
| |
Collapse
|
29
|
Görnerup O, Jacobi MN. A model-independent approach to infer hierarchical codon substitution dynamics. BMC Bioinformatics 2010; 11:201. [PMID: 20412602 PMCID: PMC2868013 DOI: 10.1186/1471-2105-11-201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2009] [Accepted: 04/23/2010] [Indexed: 12/03/2022] Open
Abstract
Background Codon substitution constitutes a fundamental process in molecular biology that has been studied extensively. However, prior studies rely on various assumptions, e.g. regarding the relevance of specific biochemical properties, or on conservation criteria for defining substitution groups. Ideally, one would instead like to analyze the substitution process in terms of raw dynamics, independently of underlying system specifics. In this paper we propose a method for doing this by identifying groups of codons and amino acids such that these groups imply closed dynamics. The approach relies on recently developed spectral and agglomerative techniques for identifying hierarchical organization in dynamical systems. Results We have applied the techniques on an empirically derived Markov model of the codon substitution process that is provided in the literature. Without system specific knowledge of the substitution process, the techniques manage to "blindly" identify multiple levels of dynamics; from amino acid substitutions (via the standard genetic code) to higher order dynamics on the level of amino acid groups. We hypothesize that the acquired groups reflect earlier versions of the genetic code. Conclusions The results demonstrate the applicability of the techniques. Due to their generality, we believe that they can be used to coarse grain and identify hierarchical organization in a broad range of other biological systems and processes, such as protein interaction networks, genetic regulatory networks and food webs.
Collapse
Affiliation(s)
- Olof Görnerup
- Complex Systems Group, Department of Energy and Environment, Chalmers University of Technology, 412 96 Göteborg, Sweden.
| | | |
Collapse
|
30
|
Sánchez R, Grau R. An algebraic hypothesis about the primeval genetic code architecture. Math Biosci 2009; 221:60-76. [PMID: 19607845 DOI: 10.1016/j.mbs.2009.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Revised: 06/23/2009] [Accepted: 07/09/2009] [Indexed: 11/26/2022]
Abstract
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Collapse
Affiliation(s)
- Robersy Sánchez
- Research Institute of Tropical Roots, Tuber Crops and Plantains (INIVIT), Biotechnology Group, Villa Clara, Cuba
| | | |
Collapse
|
31
|
Wong TS, Wong TS, Roccatano* D, Schwaneberg U. Challenges of the genetic code for exploring sequence space in directed protein evolution. BIOCATAL BIOTRANSFOR 2009. [DOI: 10.1080/10242420701444280] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
32
|
Koonin EV, Novozhilov AS. Origin and evolution of the genetic code: the universal enigma. IUBMB Life 2009; 61:99-111. [PMID: 19117371 DOI: 10.1002/iub.146] [Citation(s) in RCA: 199] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly nonrandom. The three main concepts on the origin and evolution of the code are the stereochemical theory, according to which codon assignments are dictated by physicochemical affinity between amino acids and the cognate codons (anticodons); the coevolution theory, which posits that the code structure coevolved with amino acid biosynthesis pathways; and the error minimization theory under which selection to minimize the adverse effect of point mutations and translation errors was the principal factor of the code's evolution. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis, that is, the notion that the standard code might have no special properties but was fixed simply because all extant life forms share a common ancestor, with subsequent changes to the code, mostly, precluded by the deleterious effect of codon reassignment. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational misreading but there are numerous more robust codes, so the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code could be a combination of frozen accident with selection for error minimization although contributions from coevolution of the code with metabolic pathways and weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain. A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
33
|
Novozhilov AS, Wolf YI, Koonin EV. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol Direct 2007; 2:24. [PMID: 17956616 PMCID: PMC2211284 DOI: 10.1186/1745-6150-2-24] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2007] [Accepted: 10/23/2007] [Indexed: 11/30/2022] Open
Abstract
Background The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point. Results We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak. Conclusion The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident. Reviewers This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight. Open Peer Review This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight.
Collapse
Affiliation(s)
- Artem S Novozhilov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
34
|
Sella G, Ardell DH. The Coevolution of Genes and Genetic Codes: Crick’s Frozen Accident Revisited. J Mol Evol 2006; 63:297-313. [PMID: 16838217 DOI: 10.1007/s00239-004-0176-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2004] [Accepted: 10/21/2005] [Indexed: 10/24/2022]
Abstract
The standard genetic code is the nearly universal system for the translation of genes into proteins. The code exhibits two salient structural characteristics: it possesses a distinct organization that makes it extremely robust to errors in replication and translation, and it is highly redundant. The origin of these properties has intrigued researchers since the code was first discovered. One suggestion, which is the subject of this review, is that the code's organization is the outcome of the coevolution of genes and genetic codes. In 1968, Francis Crick explored the possible implications of coevolution at different stages of code evolution. Although he argues that coevolution was likely to influence the evolution of the code, he concludes that it falls short of explaining the organization of the code we see today. The recent application of mathematical modeling to study the effects of errors on the course of coevolution, suggests a different conclusion. It shows that coevolution readily generates genetic codes that are highly redundant and similar in their error-correcting organization to the standard code. We review this recent work and suggest that further affirmation of the role of coevolution can be attained by investigating the extent to which the outcome of coevolution is robust to other influences that were present during the evolution of the code.
Collapse
Affiliation(s)
- Guy Sella
- Center for the Study of Rationality, The Hebrew University, Givat Ram, 91904, Jerusalem, Israel.
| | | |
Collapse
|
35
|
Abstract
A dynamical theory for the evolution of the genetic code is presented, which accounts for its universality and optimality. The central concept is that a variety of collective, but non-Darwinian, mechanisms likely to be present in early communal life generically lead to refinement and selection of innovation-sharing protocols, such as the genetic code. Our proposal is illustrated by using a simplified computer model and placed within the context of a sequence of transitions that early life may have made, before the emergence of vertical descent.
Collapse
Affiliation(s)
| | - Carl Woese
- Microbiology and
- Institute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
- To whom correspondence may be addressed. E-mail:
| | - Nigel Goldenfeld
- Departments of *Physics and
- Institute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
- To whom correspondence may be addressed at:
Department of Physics and Institute for Genomic Biology, University of Illinois at Urbana–Champaign, 1110 West Green Street, Urbana, IL 61801. E-mail:
| |
Collapse
|
36
|
Patrick WM, Firth AE. Strategies and computational tools for improving randomized protein libraries. ACTA ACUST UNITED AC 2005; 22:105-12. [PMID: 16095966 DOI: 10.1016/j.bioeng.2005.06.001] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2005] [Revised: 06/20/2005] [Accepted: 06/21/2005] [Indexed: 11/15/2022]
Abstract
In the last decade, directed evolution has become a routine approach for engineering proteins with novel or altered properties. Concurrently, a trend away from purely 'blind' randomization strategies and towards more 'semi-rational' approaches has also become apparent. In this review, we discuss ways in which structural information and predictive computational tools are playing an increasingly important role in guiding the design of randomized libraries: web servers such as ConSurf-HSSP and SCHEMA allow the prediction of sites to target for producing functional variants, while algorithms such as GLUE, PEDEL and DRIVeR are useful for estimating library completeness and diversity. In addition, we review recent methodological developments that facilitate the construction of unbiased libraries, which are inherently more diverse than biased libraries and therefore more likely to yield improved variants.
Collapse
Affiliation(s)
- Wayne M Patrick
- Center for Fundamental and Applied Molecular Evolution, Emory University, 1510 Clifton Road, Atlanta GA 30322, USA.
| | | |
Collapse
|
37
|
Goodarzi H, Najafabadi HS, Torabi N. Designing a neural network for the constraint optimization of the fitness functions devised based on the load minimization of the genetic code. Biosystems 2005; 81:91-100. [PMID: 15936137 DOI: 10.1016/j.biosystems.2005.02.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2004] [Revised: 01/16/2005] [Accepted: 02/02/2005] [Indexed: 11/20/2022]
Abstract
Nonrandom patterns in codon assignments are supported by many statistical and biochemical studies in the last two decades. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslational errors and point mutations, an ability, which in term is designated "load minimization". Prior studies have included many attempts at quantitative estimation of the fraction of randomly generated codes, which in terms of load minimization, score higher than the canonical genetic code. In this study, a neural network, which estimates a highly optimized genetic code in a relatively short period of time has been devised. Several fitness functions were used throughout this text. Meanwhile, we have made use of two cost measure matrices, PAM74-100 and mutation matrix.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab St., Tehran, Iran.
| | | | | |
Collapse
|
38
|
Abstract
The coevolution theory of the genetic code, which postulates that prebiotic synthesis was an inadequate source of all twenty protein amino acids, and therefore some of them had to be derived from the coevolving pathways of amino acid biosynthesis, has been assessed in the light of the discoveries of the past three decades. Its four fundamental tenets regarding the essentiality of amino acid biosynthesis, role of pretran synthesis, biosynthetic imprint on codon allocations and mutability of the encoded amino acids are proven by the new knowledge. Of the factors that guided the evolutionary selection of the universal code, the relative contributions of Amino Acid Biosynthesis: Error Minimization: Stereochemical Interaction are estimated to first approximation as 40,000,000:400:1, which suggests that amino acid biosynthesis represents the dominant factor shaping the code. The utility of the coevolution theory is demonstrated by its opening up experimental expansions of the code and providing a basis for locating the root of life.
Collapse
Affiliation(s)
- J Tze-Fei Wong
- Applied Genomics Laboratory and Department of Biochemistry, Hong Kong University of Science & Technology, Hong Kong, China.
| |
Collapse
|
39
|
Di Giulio M. The origin of the genetic code: theories and their relationships, a review. Biosystems 2004; 80:175-84. [PMID: 15823416 DOI: 10.1016/j.biosystems.2004.11.005] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2004] [Revised: 11/12/2004] [Accepted: 11/18/2004] [Indexed: 10/26/2022]
Abstract
A review of the main theories proposed to explain the origin of the genetic code is presented. I analyze arguments and data in favour of different theories proposed to explain the origin of the organization of the genetic code. It is possible to suggest a mechanism that makes compatible the different theories of the origin of the code, even if these are based on a historical or physicochemical determinism and thus appear incompatible by definition. Finally, I discuss the question of why a given number of synonymous codons was attributed to the amino acids in the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Institute of Genetics and Biophysics Adriano Buzzati-Traverso, CNR, Naples, Italy
| |
Collapse
|
40
|
|
41
|
|
42
|
Abstract
Since discovering the pattern by which amino acids are assigned to codons within the standard genetic code, investigators have explored the idea that natural selection placed biochemically similar amino acids near to one another in coding space so as to minimize the impact of mutations and/or mistranslations. The analytical evidence to support this theory has grown in sophistication and strength over the years, and counterclaims questioning its plausibility and quantitative support have yet to transcend some significant weaknesses in their approach. These weaknesses are illustrated here by means of a simple simulation model for adaptive genetic code evolution. There remain ill explored facets of the 'error minimizing' code hypothesis, however, including the mechanism and pathway by which an adaptive pattern of codon assignments emerged, the extent to which natural selection created synonym redundancy, its role in shaping the amino acid and nucleotide languages, and even the correct interpretation of the adaptive codon assignment pattern: these represent fertile areas for future research.
Collapse
Affiliation(s)
- Stephen J Freeland
- Department of Biology, University of Maryland, Baltimore County, Catonsville, MD, USA.
| | | | | |
Collapse
|
43
|
Abstract
The primordial genetic code probably has been a drastically simplified ancestor of the canonical code that is used by contemporary cells. In order to understand how the present-day code came about we first need to explain how the language of the building plan can change without destroying the encoded information. In this work we introduce a minimal organism model that is based on biophysically reasonable descriptions of RNA and protein, namely secondary structure folding and knowledge based potentials. The evolution of a population of such organism under competition for a common resource is simulated explicitly at the level of individual replication events. Starting with very simple codes, and hence greatly reduced amino acid alphabets, we observe a diversification of the codes in most simulation runs. The driving force behind this effect is the possibility to produce fitter proteins when the repertoire of amino acids is enlarged.
Collapse
Affiliation(s)
- Günter Weberndorfer
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Wien, Austria
| | | | | |
Collapse
|
44
|
Ardell DH, Sella G. No accident: genetic codes freeze in error-correcting patterns of the standard genetic code. Philos Trans R Soc Lond B Biol Sci 2002; 357:1625-42. [PMID: 12495519 PMCID: PMC1693064 DOI: 10.1098/rstb.2002.1071] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The standard genetic code poses a challenge in understanding the evolution of information processing at a fundamental level of biological organization. Genetic codes are generally coadapted with, or 'frozen' by, the protein-coding genes that they translate, and so cannot easily change by natural selection. Yet the standard code has a significantly non-random pattern that corrects common errors in the transmission of information in protein-coding genes. Because of the freezing effect and for other reasons, this pattern has been proposed not to be due to selection but rather to be incidental to other evolutionary forces or even entirely accidental. We present results from a deterministic population genetic model of code-message coevolution. We explicitly represent the freezing effect of genes on genetic codes and the perturbative effect of changes in genetic codes on genes. We incorporate characteristic patterns of mutation and translational error, namely, transition bias and positional asymmetry, respectively. Repeated selection over small successive changes produces genetic codes that are substantially, but not optimally, error correcting. In particular, our model reproduces the error-correcting patterns of the standard genetic code. Aspects of our model and results may be applicable to the general problem of adaptation to error in other natural information-processing systems.
Collapse
Affiliation(s)
- David H Ardell
- Department of Biological Sciences, Stanford University, Stanford, CA 94305-5020, USA.
| | | |
Collapse
|
45
|
Gilis D, Massar S, Cerf NJ, Rooman M. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol 2001; 2:RESEARCH0049. [PMID: 11737948 PMCID: PMC60310 DOI: 10.1186/gb-2001-2-11-research0049] [Citation(s) in RCA: 134] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2001] [Revised: 07/06/2001] [Accepted: 09/28/2001] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND The genetic code is known to be efficient in limiting the effect of mistranslation errors. A misread codon often codes for the same amino acid or one with similar biochemical properties, so the structure and function of the coded protein remain relatively unaltered. Previous studies have attempted to address this question quantitatively, by estimating the fraction of randomly generated codes that do better than the genetic code in respect of overall robustness. We extended these results by investigating the role of amino-acid frequencies in the optimality of the genetic code. RESULTS We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code's structure, we estimated that around two random codes in a billion (109) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal. CONCLUSIONS These results lead us to discuss the role of amino-acid frequencies and other parameters in the genetic code's evolution, in an attempt to propose a tentative picture of primitive life.
Collapse
Affiliation(s)
- D Gilis
- Biomolecular Engineering, Université Libre de Bruxelles, ave F D Roosevelt, 1050 Bruxelles, Belgium.
| | | | | | | |
Collapse
|
46
|
Lahav N, Nir S, Elitzur AC. The emergence of life on Earth. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2001; 75:75-120. [PMID: 11311715 DOI: 10.1016/s0079-6107(01)00003-7] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Combined top-down and bottom-up research strategies and the principle of biological continuity were employed in an attempt to reconstruct a comprehensive origin of life theory, which is an extension of the coevolution theory (Lahav and Nir, Origins of Life Evol. Biosphere (1997) 27, 377-395). The resulting theory of emergence of templated-information and functionality (ETIF) addresses the emergence of living entities from inanimate matter, and that of the central mechanisms of their further evolution. It proposes the emergence of short organic catalysts (peptides and proto-ribozymes) and feedback-loop systems, plus their template-and-sequence-directed (TSD) reactions, encompassing catalyzed replication and translation of populations of molecules organized as chemical-informational feedback loop entities, in a fluctuating (wetting-drying) environment, functioning as simplified extant molecular-biological systems. The feedback loops with their TSD systems are chemically and functionally continuous with extant living organisms and their emergence in an inanimate environment may be defined as the beginning of life. The ETIF theory considers the emergence of bio-homochirality, a primordial genetic code, information and the incorporation of primordial metabolic cycles and compartmentation into the emerging living entities. This theory helps to establish a novel measure of biological information, which focuses on its physical effects rather than on the structure of the message, and makes it possible to estimate the time needed for the transition from the inanimate state to the closure of the first feedback-loop systems. Moreover, it forms the basis for novel laboratory experiments and computer modeling, encompassing catalytic activity of short peptides and proto-RNAs and the emergence of bio-homochirality and feedback-loop systems.
Collapse
Affiliation(s)
- N Lahav
- Department of Soil and Water Sciences, The Faculty of Agriculture, The Hebrew University of Jerusalem, Rehovot 76100, Israel.
| | | | | |
Collapse
|
47
|
Abstract
The coevolution theory of genetic code origin (Wong, J.T. 1975, Proc. Natl Acad. Sci. U.S.A.72, 1909-1912) is assumed here to be substantially correct. This theory is based on the strict parallelism of the biosynthetic relationships between amino acids and the organization of the genetic code and postulates that these relationships were mediated by tRNA-like molecules on which the biosynthetic transformations between precursor and product amino acids took place. These transformations underlay the mechanism that gave rise to genetic code organization. One of the pathways which represents these transformations found in current organisms, and which are thus probably molecular fossils, is the Met-tRNA(fMet)-->fMet-tRNA(fMet)pathway. This pathway is present only in the Bacteria domain. This along with other observations and arguments leads us to believe that this pathway is a clear violation of the universality of the genetic code. Furthermore, the presence of this pathway only in the Bacteria domain seems to imply that the translation apparatus was still rapidly evolving when this pathway was fixed. This, in turn, appears to imply that the last universal common ancestor was a progenote. Finally, the implications that the finding of this pathway has for the stereochemical theory of genetic code origin are discussed.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, 80125 Naples, Italy.
| |
Collapse
|
48
|
Di Giulio M. The origin of the genetic code cannot be studied using measurements based on the PAM matrix because this matrix reflects the code itself, making any such analyses tautologous. J Theor Biol 2001; 208:141-4. [PMID: 11162059 DOI: 10.1006/jtbi.2000.2206] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Freeland et al. (Mol. Biol. Evol. 2000 a, 17, 511--518) have recently used a transformation of the PAM 74-100 matrix to study the level of optimization reached during genetic code origin. Since the PAM matrix counts the amino acid substitutions that occurred in families of homologous proteins during molecular evolution and as this process is mediated by the genetic code structure itself, it could be that the influence of the code on this matrix is such as to make any conclusion insignificant. As will be shown in the present paper, the transformation of the PAM matrix is affected in a non-marginal way by the organization of the genetic code and, thus, renders the analysis of Freeland et al. tautologous. Although, under the hypothesis of a highly optimized genetic code, some correlations may be expected between a measurement of similarity between amino acids and the genetic code structure, no certain conclusions can be drawn for the measurement used by Freeland et al.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, 80125 Naples, Napoli, Italy.
| |
Collapse
|
49
|
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, Naples, Napoli, 80125, Italy.
| |
Collapse
|
50
|
Cavalcanti AR, Neto BD, Ferreira R. On the classes of aminoacyl-tRNA synthetases and the error minimization in the genetic code. J Theor Biol 2000; 204:15-20. [PMID: 10772845 DOI: 10.1006/jtbi.2000.1082] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
As a consequence of the existence of two classes of aminoacyl-tRNA synthetases (aaRSs), we defined two types of mutations: g (mutations that do not change the class of the involved amino acids) and u (those which change the class). We have found that the mean chemical distance resulting from g mutations is smaller than that corresponding to u mutations, indicating that g mutations are responsible for most of the known minimization of the genetic code. This supports models for the origin and evolution of the code, in which new amino acids were added after duplications or modification of existing aaRSs.
Collapse
Affiliation(s)
- A R Cavalcanti
- Departamento de Química Fundamental, Universidade Federal de Pernambuco, Recife, PE, 50670-901, Brazil.
| | | | | |
Collapse
|