1
|
Wesp V, Theißen G, Schuster S. Statistical analysis of synonymous and stop codons in pseudo-random and real sequences as a function of GC content. Sci Rep 2023; 13:22996. [PMID: 38151539 PMCID: PMC10752896 DOI: 10.1038/s41598-023-49626-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 12/10/2023] [Indexed: 12/29/2023] Open
Abstract
Knowledge of the frequencies of synonymous triplets in protein-coding and non-coding DNA stretches can be used in gene finding. These frequencies depend on the GC content of the genome or parts of it. An example of interest is provided by stop codons. This is relevant for the definition of Open Reading Frames. A generic case is provided by pseudo-random sequences, especially when they code for complex proteins or when they are non-coding and not subject to selection pressure. Here, we calculate, for such sequences and for all 25 known genetic codes, the frequency of each amino acid and stop codon based on their set of codons and as a function of GC content. The amino acids can be classified into five groups according to the GC content where their expected frequency reaches its maximum. We determine the overall Shannon information based on groups of synonymous codons and show that it becomes maximum at a percent GC of 43.3% (for the standard code). This is in line with the observation that in most fungi, plants, and animals, this genomic parameter is in the range from 35 to 50%. By analysing natural sequences, we show that there is a clear bias for triplets corresponding to stop codons near the 5'- and 3'-splice sites in the introns of various clades.
Collapse
Affiliation(s)
- Valentin Wesp
- Department of Bioinformatics, Matthias Schleiden Institute, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
| | - Günter Theißen
- Department of Genetics, Matthias Schleiden Institute, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany
| | - Stefan Schuster
- Department of Bioinformatics, Matthias Schleiden Institute, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany.
| |
Collapse
|
2
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
3
|
Abstract
A near-universal Standard Genetic Code (SGC) implies a single origin for present Earth life. To study this unique event, I compute paths to the SGC, comparing different plausible histories. Notably, SGC-like coding emerges from traditional evolutionary mechanisms, and a superior route can be identified. To objectively measure evolution, progress values from 0 (random coding) to 1 (SGC-like) are defined: these measure fractions of random-code-to-SGC distance. Progress types are spacing/distance/delta Polar Requirement, detecting space between identical assignments/mutational distance to the SGC/chemical order, respectively. The coding system is based on selected RNAs performing aminoacyl-RNA synthetase reactions. Acceptor RNAs exhibit SGC-like Crick wobble; alternatively, non-wobbling triplets uniquely encode 20 amino acids/start/stop. Triplets acquire 22 functions by stereochemistry, selection, coevolution, or at random. Assignments also propagate to an assigned triplet’s neighborhood via single mutations, but can also decay. A vast code universe makes futile evolutionary paths plentiful. Thus, SGC evolution is critically sensitive to disorder from random assignments. Evolution also inevitably slows near coding completion. The SGC likely avoided these difficulties, and two suitable paths are compared. In late wobble, a majority of non-wobble assignments are made before wobble is adopted. In continuous wobble, a uniquely advantageous early intermediate yields an ordered SGC. Revised coding evolution (limited randomness, late wobble, concentration on amino acid encoding, chemically conservative coevolution with a chemically ordered elite) produces varied full codes with excellent joint progress values. A population of only 600 independent coding tables includes SGC-like members; a Bayesian path toward more accurate SGC evolution is available.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309-0347, USA.
| |
Collapse
|
4
|
Facchiano A, Di Giulio M. The genetic code is not an optimal code in a model taking into account both the biosynthetic relationships between amino acids and their physicochemical properties. J Theor Biol 2018; 459:45-51. [DOI: 10.1016/j.jtbi.2018.09.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 09/04/2018] [Accepted: 09/19/2018] [Indexed: 01/22/2023]
|
5
|
Nemzer LR. Shannon information entropy in the canonical genetic code. J Theor Biol 2017; 415:158-170. [DOI: 10.1016/j.jtbi.2016.12.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 11/30/2016] [Accepted: 12/12/2016] [Indexed: 11/15/2022]
|
6
|
Grosjean H, Westhof E. An integrated, structure- and energy-based view of the genetic code. Nucleic Acids Res 2016; 44:8020-40. [PMID: 27448410 PMCID: PMC5041475 DOI: 10.1093/nar/gkw608] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Revised: 06/11/2016] [Accepted: 06/17/2016] [Indexed: 12/25/2022] Open
Abstract
The principles of mRNA decoding are conserved among all extant life forms. We present an integrative view of all the interaction networks between mRNA, tRNA and rRNA: the intrinsic stability of codon-anticodon duplex, the conformation of the anticodon hairpin, the presence of modified nucleotides, the occurrence of non-Watson-Crick pairs in the codon-anticodon helix and the interactions with bases of rRNA at the A-site decoding site. We derive a more information-rich, alternative representation of the genetic code, that is circular with an unsymmetrical distribution of codons leading to a clear segregation between GC-rich 4-codon boxes and AU-rich 2:2-codon and 3:1-codon boxes. All tRNA sequence variations can be visualized, within an internal structural and energy framework, for each organism, and each anticodon of the sense codons. The multiplicity and complexity of nucleotide modifications at positions 34 and 37 of the anticodon loop segregate meaningfully, and correlate well with the necessity to stabilize AU-rich codon-anticodon pairs and to avoid miscoding in split codon boxes. The evolution and expansion of the genetic code is viewed as being originally based on GC content with progressive introduction of A/U together with tRNA modifications. The representation we present should help the engineering of the genetic code to include non-natural amino acids.
Collapse
Affiliation(s)
- Henri Grosjean
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, 91198 Gif-sur-Yvette, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 rue René Descartes, 67084 Strasbourg, France
| |
Collapse
|
7
|
Wills PR. The generation of meaningful information in molecular systems. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0066. [PMID: 26857673 DOI: 10.1098/rsta.2015.0066] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 07/17/2015] [Indexed: 06/05/2023]
Abstract
The physico-chemical processes occurring inside cells are under the computational control of genetic (DNA) and epigenetic (internal structural) programming. The origin and evolution of genetic information (nucleic acid sequences) is reasonably well understood, but scant attention has been paid to the origin and evolution of the molecular biological interpreters that give phenotypic meaning to the sequence information that is quite faithfully replicated during cellular reproduction. The near universality and age of the mapping from nucleotide triplets to amino acids embedded in the functionality of the protein synthetic machinery speaks to the early development of a system of coding which is still extant in every living organism. We take the origin of genetic coding as a paradigm of the emergence of computation in natural systems, focusing on the requirement that the molecular components of an interpreter be synthesized autocatalytically. Within this context, it is seen that interpreters of increasing complexity are generated by series of transitions through stepped dynamic instabilities (non-equilibrium phase transitions). The early phylogeny of the amino acyl-tRNA synthetase enzymes is discussed in such terms, leading to the conclusion that the observed optimality of the genetic code is a natural outcome of the processes of self-organization that produced it.
Collapse
Affiliation(s)
- Peter R Wills
- Department of Physics, University of Auckland, PB 92019, Auckland 1142, Aotearoa, New Zealand
| |
Collapse
|
8
|
Lajoie MJ, Söll D, Church GM. Overcoming Challenges in Engineering the Genetic Code. J Mol Biol 2016; 428:1004-21. [PMID: 26348789 PMCID: PMC4779434 DOI: 10.1016/j.jmb.2015.09.003] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 08/19/2015] [Accepted: 09/01/2015] [Indexed: 11/24/2022]
Abstract
Withstanding 3.5 billion years of genetic drift, the canonical genetic code remains such a fundamental foundation for the complexity of life that it is highly conserved across all three phylogenetic domains. Genome engineering technologies are now making it possible to rationally change the genetic code, offering resistance to viruses, genetic isolation from horizontal gene transfer, and prevention of environmental escape by genetically modified organisms. We discuss the biochemical, genetic, and technological challenges that must be overcome in order to engineer the genetic code.
Collapse
Affiliation(s)
- M J Lajoie
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Program in Chemical Biology, Harvard University, Cambridge, MA 02138, USA.
| | - D Söll
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA
| | - G M Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
9
|
Gardini S, Cheli S, Baroni S, Di Lascio G, Mangiavacchi G, Micheletti N, Monaco CL, Savini L, Alocci D, Mangani S, Niccolai N. On Nature's Strategy for Assigning Genetic Code Multiplicity. PLoS One 2016; 11:e0148174. [PMID: 26849571 PMCID: PMC4746209 DOI: 10.1371/journal.pone.0148174] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 01/13/2016] [Indexed: 11/26/2022] Open
Abstract
Genetic code redundancy would yield, on the average, the assignment of three codons for each of the natural amino acids. The fact that this number is observed only for incorporating Ile and to stop RNA translation still waits for an overall explanation. Through a Structural Bioinformatics approach, the wealth of information stored in the Protein Data Bank has been used here to look for unambiguous clues to decipher the rationale of standard genetic code (SGC) in assigning from one to six different codons for amino acid translation. Leu and Arg, both protected from translational errors by six codons, offer the clearest clue by appearing as the most abundant amino acids in protein-protein and protein-nucleic acid interfaces. Other SGC hidden messages have been sought by analyzing, in a protein structure framework, the roles of over- and under-protected amino acids.
Collapse
Affiliation(s)
- Simone Gardini
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Sara Cheli
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Silvia Baroni
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Gabriele Di Lascio
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Guido Mangiavacchi
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Nicholas Micheletti
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Carmen Luigia Monaco
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Lorenzo Savini
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Davide Alocci
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Stefano Mangani
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Neri Niccolai
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| |
Collapse
|
10
|
Massey SE. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel) 2015; 5:1301-32. [PMID: 25919033 PMCID: PMC4500140 DOI: 10.3390/life5021301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/02/2015] [Accepted: 04/03/2015] [Indexed: 01/09/2023] Open
Abstract
The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a selective pressure in the evolution of sexual reproduction, and differences in translational fidelity. Lastly, the utility of the concept of an informational constraint to other diverse fields of research is explored.
Collapse
Affiliation(s)
- Steven E Massey
- Biology Department, PO Box 23360, University of Puerto Rico-Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
11
|
Di Giulio M. The Origin of the Genetic Code: Matter of Metabolism or Physicochemical Determinism? J Mol Evol 2013; 77:131-3. [DOI: 10.1007/s00239-013-9593-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 10/18/2013] [Indexed: 12/27/2022]
|