1
|
Yu X, Reva ON. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees. Evol Bioinform Online 2018; 14:1176934318759299. [PMID: 29511354 PMCID: PMC5826093 DOI: 10.1177/1176934318759299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/24/2018] [Indexed: 11/17/2022] Open
Abstract
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
Collapse
Affiliation(s)
- Xiaoyu Yu
- Department of Biochemistry, Centre for Bioinformatics and Computational Biology, University of Pretoria, Pretoria, South Africa
| | - Oleg N Reva
- Department of Biochemistry, Centre for Bioinformatics and Computational Biology, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
2
|
Higgs PG, Hao W, Golding GB. Identification of Conflicting Selective Effects on Highly Expressed Genes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Many different selective effects on DNA and proteins influence the frequency of codons and amino acids in coding sequences. Selection is often stronger on highly expressed genes. Hence, by comparing high- and low-expression genes it is possible to distinguish the factors that are selected by evolution. It has been proposed that highly expressed genes should (i) preferentially use codons matching abundant tRNAs (translational efficiency), (ii) preferentially use amino acids with low cost of synthesis, (iii) be under stronger selection to maintain the required amino acid content, and (iv) be selected for translational robustness. These effects act simultaneously and can be contradictory. We develop a model that combines these factors, and use Akaike's Information Criterion for model selection. We consider pairs of paralogues that arose by whole-genome duplication in Saccharmyces cerevisiae. A codon-based model is used that includes asymmetric effects due to selection on highly expressed genes. The largest effect is translational efficiency, which is found to strongly influence synonymous, but not non-synonymous rates. Minimization of the cost of amino acid synthesis is implicated. However, when a more general measure of selection for amino acid usage is used, the cost minimization effect becomes redundant. Small effects that we attribute to selection for translational robustness can be identified as an improvement in the model fit on top of the effects of translational efficiency and amino acid usage.
Collapse
Affiliation(s)
- Paul G. Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1
| | - Weilong Hao
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1
| |
Collapse
|
3
|
Seligmann H, Warthi G. Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes. Comput Struct Biotechnol J 2017; 15:412-424. [PMID: 28924459 PMCID: PMC5591391 DOI: 10.1016/j.csbj.2017.08.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/20/2017] [Accepted: 08/05/2017] [Indexed: 12/14/2022] Open
Abstract
A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').
Collapse
Affiliation(s)
- Hervé Seligmann
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
- Dept. Ecol Evol Behav, Alexander Silberman Inst Life Sci, The Hebrew University of Jerusalem, IL-91904 Jerusalem, Israel
| | - Ganesh Warthi
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
| |
Collapse
|
4
|
Santos J, Monteagudo Á. Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability. BMC Bioinformatics 2017; 18:195. [PMID: 28347270 PMCID: PMC5369190 DOI: 10.1186/s12859-017-1608-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Accepted: 03/16/2017] [Indexed: 11/26/2022] Open
Abstract
Background The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutation adverse effect as the main selection factor in the evolution of the code. We have used simulated evolution in a computer to search for optimized codes, which helps to obtain information about the optimization level of the canonical code in its evolution. A genetic algorithm searches for efficient codes in a fitness landscape that corresponds with the adaptability of possible hypothetical genetic codes. The lower the effects of errors or mutations in the codon bases of a hypothetical code, the more efficient or optimal is that code. The inclusion of the fitness sharing technique in the evolutionary algorithm allows the extent to which the canonical genetic code is in an area corresponding to a deep local minimum to be easily determined, even in the high dimensional spaces considered. Results The analyses show that the canonical code is not in a deep local minimum and that the fitness landscape is not a multimodal fitness landscape with deep and separated peaks. Moreover, the canonical code is clearly far away from the areas of higher fitness in the landscape. Conclusions Given the non-presence of deep local minima in the landscape, although the code could evolve and different forces could shape its structure, the fitness landscape nature considered in the error minimization theory does not explain why the canonical code ended its evolution in a location which is not an area of a localized deep minimum of the huge fitness landscape.
Collapse
Affiliation(s)
- José Santos
- Department of Computer Science, University of A Coruña, Campus de Elviña s/n, A Coruña, 15071, Spain.
| | - Ángel Monteagudo
- Department of Computer Science, University of A Coruña, Campus de Elviña s/n, A Coruña, 15071, Spain
| |
Collapse
|
5
|
Gardini S, Cheli S, Baroni S, Di Lascio G, Mangiavacchi G, Micheletti N, Monaco CL, Savini L, Alocci D, Mangani S, Niccolai N. On Nature's Strategy for Assigning Genetic Code Multiplicity. PLoS One 2016; 11:e0148174. [PMID: 26849571 PMCID: PMC4746209 DOI: 10.1371/journal.pone.0148174] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 01/13/2016] [Indexed: 11/26/2022] Open
Abstract
Genetic code redundancy would yield, on the average, the assignment of three codons for each of the natural amino acids. The fact that this number is observed only for incorporating Ile and to stop RNA translation still waits for an overall explanation. Through a Structural Bioinformatics approach, the wealth of information stored in the Protein Data Bank has been used here to look for unambiguous clues to decipher the rationale of standard genetic code (SGC) in assigning from one to six different codons for amino acid translation. Leu and Arg, both protected from translational errors by six codons, offer the clearest clue by appearing as the most abundant amino acids in protein-protein and protein-nucleic acid interfaces. Other SGC hidden messages have been sought by analyzing, in a protein structure framework, the roles of over- and under-protected amino acids.
Collapse
Affiliation(s)
- Simone Gardini
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Sara Cheli
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Silvia Baroni
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Gabriele Di Lascio
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Guido Mangiavacchi
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Nicholas Micheletti
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Carmen Luigia Monaco
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Lorenzo Savini
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Davide Alocci
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Stefano Mangani
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Neri Niccolai
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| |
Collapse
|
6
|
Błażej P, Miasojedow B, Grabińska M, Mackiewicz P. Optimization of Mutation Pressure in Relation to Properties of Protein-Coding Sequences in Bacterial Genomes. PLoS One 2015; 10:e0130411. [PMID: 26121655 PMCID: PMC4488281 DOI: 10.1371/journal.pone.0130411] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 05/19/2015] [Indexed: 12/22/2022] Open
Abstract
Most mutations are deleterious and require energetically costly repairs. Therefore, it seems that any minimization of mutation rate is beneficial. On the other hand, mutations generate genetic diversity indispensable for evolution and adaptation of organisms to changing environmental conditions. Thus, it is expected that a spontaneous mutational pressure should be an optimal compromise between these two extremes. In order to study the optimization of the pressure, we compared mutational transition probability matrices from bacterial genomes with artificial matrices fulfilling the same general features as the real ones, e.g., the stationary distribution and the speed of convergence to the stationarity. The artificial matrices were optimized on real protein-coding sequences based on Evolutionary Strategies approach to minimize or maximize the probability of non-synonymous substitutions and costs of amino acid replacements depending on their physicochemical properties. The results show that the empirical matrices have a tendency to minimize the effects of mutations rather than maximize their costs on the amino acid level. They were also similar to the optimized artificial matrices in the nucleotide substitution pattern, especially the high transitions/transversions ratio. We observed no substantial differences between the effects of mutational matrices on protein-coding sequences in genomes under study in respect of differently replicated DNA strands, mutational cost types and properties of the referenced artificial matrices. The findings indicate that the empirical mutational matrices are rather adapted to minimize mutational costs in the studied organisms in comparison to other matrices with similar mathematical constraints.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Błażej Miasojedow
- Section of Mathematical Statistics, The Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Małgorzata Grabińska
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
- * E-mail:
| |
Collapse
|
7
|
Massey SE. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel) 2015; 5:1301-32. [PMID: 25919033 PMCID: PMC4500140 DOI: 10.3390/life5021301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/02/2015] [Accepted: 04/03/2015] [Indexed: 01/09/2023] Open
Abstract
The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a selective pressure in the evolution of sexual reproduction, and differences in translational fidelity. Lastly, the utility of the concept of an informational constraint to other diverse fields of research is explored.
Collapse
Affiliation(s)
- Steven E Massey
- Biology Department, PO Box 23360, University of Puerto Rico-Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
8
|
Babbitt GA, Alawad MA, Schulze KV, Hudson AO. Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid. Nucleic Acids Res 2014; 42:10915-26. [PMID: 25200075 PMCID: PMC4176184 DOI: 10.1093/nar/gku811] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an 'accessory' during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.
Collapse
Affiliation(s)
- Gregory A Babbitt
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester NY, USA 14623
| | - Mohammed A Alawad
- B. Thomas Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester NY, USA 14623
| | - Katharina V Schulze
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston TX, USA 77030
| | - André O Hudson
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester NY, USA 14623
| |
Collapse
|
9
|
Santos J, Monteagudo A. Simulated evolution applied to study the genetic code optimality using a model of codon reassignments. BMC Bioinformatics 2011; 12:56. [PMID: 21338505 PMCID: PMC3053255 DOI: 10.1186/1471-2105-12-56] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Accepted: 02/21/2011] [Indexed: 11/29/2022] Open
Abstract
Background As the canonical code is not universal, different theories about its origin and organization have appeared. The optimization or level of adaptation of the canonical genetic code was measured taking into account the harmful consequences resulting from point mutations leading to the replacement of one amino acid for another. There are two basic theories to measure the level of optimization: the statistical approach, which compares the canonical genetic code with many randomly generated alternative ones, and the engineering approach, which compares the canonical code with the best possible alternative. Results Here we used a genetic algorithm to search for better adapted hypothetical codes and as a method to guess the difficulty in finding such alternative codes, allowing to clearly situate the canonical code in the fitness landscape. This novel proposal of the use of evolutionary computing provides a new perspective in the open debate between the use of the statistical approach, which postulates that the genetic code conserves amino acid properties far better than expected from a random code, and the engineering approach, which tends to indicate that the canonical genetic code is still far from optimal. We used two models of hypothetical codes: one that reflects the known examples of codon reassignment and the model most used in the two approaches which reflects the current genetic code translation table. Although the standard code is far from a possible optimum considering both models, when the more realistic model of the codon reassignments was used, the evolutionary algorithm had more difficulty to overcome the efficiency of the canonical genetic code. Conclusions Simulated evolution clearly reveals that the canonical genetic code is far from optimal regarding its optimization. Nevertheless, the efficiency of the canonical code increases when mistranslations are taken into account with the two models, as indicated by the fact that the best possible codes show the patterns of the standard genetic code. Our results are in accordance with the postulates of the engineering approach and indicate that the main arguments of the statistical approach are not enough to its assertion of the extreme efficiency of the canonical genetic code.
Collapse
Affiliation(s)
- José Santos
- Department of Computer Science, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain.
| | | |
Collapse
|
10
|
Welch M, Villalobos A, Gustafsson C, Minshull J. Designing genes for successful protein expression. Methods Enzymol 2011; 498:43-66. [PMID: 21601673 DOI: 10.1016/b978-0-12-385120-8.00003-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
DNA sequences are now far more readily available in silico than as physical DNA. De novo gene synthesis is an increasingly cost-effective method for building genetic constructs, and effectively removes the constraint of basing constructs on extant sequences. This allows scientists and engineers to experimentally test their hypotheses relating sequence to function. Molecular biologists, and now synthetic biologists, are characterizing and cataloging genetic elements with specific functions, aiming to combine them to perform complex functions. However, the most common purpose of synthetic genes is for the expression of an encoded protein. The huge number of different proteins makes it impossible to characterize and catalog each functional gene. Instead, it is necessary to abstract design principles from experimental data: data that can be generated by making predictions followed by synthesizing sequences to test those predictions. Because of the degeneracy of the genetic code, design of gene sequences to encode proteins is a high-dimensional problem, so there is no single simple formula to guarantee success. Nevertheless, there are several straightforward steps that can be taken to greatly increase the probability that a designed sequence will result in expression of the encoded protein. In this chapter, we discuss gene sequence parameters that are important for protein expression. We also describe algorithms for optimizing these parameters, and troubleshooting procedures that can be helpful when initial attempts fail. Finally, we show how many of these methods can be accomplished using the synthetic biology software tool Gene Designer.
Collapse
Affiliation(s)
- Mark Welch
- DNA2.0, Inc., Suite A, Menlo Park, California, USA
| | | | | | | |
Collapse
|
11
|
Görnerup O, Jacobi MN. A model-independent approach to infer hierarchical codon substitution dynamics. BMC Bioinformatics 2010; 11:201. [PMID: 20412602 PMCID: PMC2868013 DOI: 10.1186/1471-2105-11-201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2009] [Accepted: 04/23/2010] [Indexed: 12/03/2022] Open
Abstract
Background Codon substitution constitutes a fundamental process in molecular biology that has been studied extensively. However, prior studies rely on various assumptions, e.g. regarding the relevance of specific biochemical properties, or on conservation criteria for defining substitution groups. Ideally, one would instead like to analyze the substitution process in terms of raw dynamics, independently of underlying system specifics. In this paper we propose a method for doing this by identifying groups of codons and amino acids such that these groups imply closed dynamics. The approach relies on recently developed spectral and agglomerative techniques for identifying hierarchical organization in dynamical systems. Results We have applied the techniques on an empirically derived Markov model of the codon substitution process that is provided in the literature. Without system specific knowledge of the substitution process, the techniques manage to "blindly" identify multiple levels of dynamics; from amino acid substitutions (via the standard genetic code) to higher order dynamics on the level of amino acid groups. We hypothesize that the acquired groups reflect earlier versions of the genetic code. Conclusions The results demonstrate the applicability of the techniques. Due to their generality, we believe that they can be used to coarse grain and identify hierarchical organization in a broad range of other biological systems and processes, such as protein interaction networks, genetic regulatory networks and food webs.
Collapse
Affiliation(s)
- Olof Görnerup
- Complex Systems Group, Department of Energy and Environment, Chalmers University of Technology, 412 96 Göteborg, Sweden.
| | | |
Collapse
|
12
|
Certain non-standard coding tables appear to be more robust to error than the standard genetic code. J Mol Evol 2009; 70:13-28. [PMID: 20012032 DOI: 10.1007/s00239-009-9303-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2009] [Accepted: 11/10/2009] [Indexed: 10/20/2022]
Abstract
Since the identification of the Standard Coding Table as a "universal" method to translate genetic information into amino acids, exceptions to this rule have been reported, and to date there are nearly 20 alternative genetic coding tables deployed by either nuclear genomes or organelles of organisms. Why are these codes still in use and why are new codon reassignments occurring? This present study aims to provide a new method to address these questions and to analyze whether these alternative codes present any advantages or disadvantages to the organisms or organelles in terms of robustness to error. We show that two of the alternative coding tables, The Ciliate, Dasycladacean and Hexamita Nuclear Code (CDH) and The Flatworm Mitochondrial Code (FMC), exhibit an advantage, while others such as The Yeast Mitochondrial Code (YMC) are at a significant disadvantage. We propose that the Standard Code is likely to have emerged as a "local minimum" and that the "coding landscape" is still being searched for a "global" minimum.
Collapse
|
13
|
Huang Y, Koonin EV, Lipman DJ, Przytycka TM. Selection for minimization of translational frameshifting errors as a factor in the evolution of codon usage. Nucleic Acids Res 2009; 37:6799-810. [PMID: 19745054 PMCID: PMC2777431 DOI: 10.1093/nar/gkp712] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In a wide range of genomes, it was observed that the usage of synonymous codons is biased toward specific codons and codon patterns. Factors that are implicated in the selection for codon usage include facilitation of fast and accurate translation. There are two types of translational errors: missense errors and processivity errors. There is considerable evidence in support of the hypothesis that codon usage is optimized to minimize missense errors. In contrast, little is known about the relationship between codon usage and frameshifting errors, an important form of processivity errors, which appear to occur at frequencies comparable to the frequencies of missense errors. Based on the recently proposed pause-and-slip model of frameshifting, we developed Frameshifting Robustness Score (FRS). We used this measure to test if the pattern of codon usage indicates optimization against frameshifting errors. We found that the FRS values of protein-coding sequences from four analyzed genomes (the bacteria Bacillus subtilis and Escherichia coli, and the yeasts Saccharomyces cerevisiae and Schizosaccharomyce pombe) were typically higher than expected by chance. Other properties of FRS patterns observed in B. subtilis, S. cerevisiae and S. pombe, such as the tendency of FRS to increase from the 5′- to 3′-end of protein-coding sequences, were also consistent with the hypothesis of optimization against frameshifting errors in translation. For E. coli, the results of different tests were less consistent, suggestive of a much weaker optimization, if any. Collectively, the results fit the concept of selection against mistranslation-induced protein misfolding being one of the factors shaping the evolution of both coding and non-coding sequences.
Collapse
Affiliation(s)
- Yang Huang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
14
|
Jermiin LS, Ho JWK, Lau KW, Jayaswal V. SeqVis: a tool for detecting compositional heterogeneity among aligned nucleotide sequences. Methods Mol Biol 2009; 537:65-91. [PMID: 19378140 DOI: 10.1007/978-1-59745-251-9_4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Compositional heterogeneity is a poorly appreciated attribute of aligned nucleotide and amino acid sequences. It is a common property of molecular phylogenetic data, and it has been found to occur across sequences and/or across sites. Most molecular phylogenetic methods assume that the sequences have evolved under globally stationary, reversible, and homogeneous conditions, implying that the sequences should be compositionally homogeneous. The presence of the above-mentioned compositional heterogeneity implies that the sequences must have evolved under more general conditions than is commonly assumed. Consequently, there is a need for reliable methods to detect under what conditions alignments of nucleotides or amino acids may have evolved. In this chapter, we describe one such program. SeqVis is designed to survey aligned nucleotide sequences. We discuss pros-et-cons of this program in the context of other methods to detect compositional heterogeneity and violated phylogenetic assumptions. The benefits provided by SeqVis are demonstrated in two studies of alignments of nucleotides, one of which contained 7542 nucleotides from 53 species.
Collapse
Affiliation(s)
- Lars Sommer Jermiin
- School of Biological Sciences, Centre for Mathematical Biology and Sydney Bioinformatics, University of Sydney, Sydney, Australia
| | | | | | | |
Collapse
|
15
|
Archetti M. Genetic robustness at the codon level as a measure of selection. Gene 2009; 443:64-9. [PMID: 19477246 DOI: 10.1016/j.gene.2009.05.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Revised: 05/15/2009] [Accepted: 05/19/2009] [Indexed: 10/20/2022]
Abstract
Selection at the DNA level is usually detected by analysing substitution rates from multiple-species comparisons. It has been suggested that measures of genetic robustness at the codon level, which can be measured by analysing a single coding sequence, can be used to estimate selection, but the validity of these measures has been questioned. Here I test the efficiency of different measures of genetic robustness at the codon level to estimate the level of selection acting on a gene. I find that volatility and other measures of robustness are correlated with dN/dS, and that this is not simply the effect of a preference for translationally optimal codons. I discuss the possible implications and the possible problems of these methods based on single-sequence codon usage analysis.
Collapse
|
16
|
Welch M, Villalobos A, Gustafsson C, Minshull J. You're one in a googol: optimizing genes for protein expression. J R Soc Interface 2009; 6 Suppl 4:S467-76. [PMID: 19324676 DOI: 10.1098/rsif.2008.0520.focus] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A vast number of different nucleic acid sequences can all be translated by the genetic code into the same amino acid sequence. These sequences are not all equally useful however; the exact sequence chosen can have profound effects on the expression of the encoded protein. Despite the importance of protein-coding sequences, there has been little systematic study to identify parameters that affect expression. This is probably because protein expression has largely been tackled on an ad hoc basis in many independent projects: once a sequence has been obtained that yields adequate expression for that project, there is little incentive to continue work on the problem. Synthetic biology may now provide the impetus to transform protein expression folklore into design principles, so that DNA sequences may easily be designed to express any protein in any system. In this review, we offer a brief survey of the literature, outline the major challenges in interpreting existing data and constructing robust design algorithms, and propose a way to proceed towards the goal of rational sequence engineering.
Collapse
Affiliation(s)
- Mark Welch
- DNA 2.0, Inc., 1430 O'Brien Drive, Menlo Park, CA 94025, USA
| | | | | | | |
Collapse
|
17
|
Mahdi RN, Rouchka EC. Evidence of bias towards buffered codons in human transcripts. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2008; 2008:29-34. [PMID: 20622995 PMCID: PMC2901532 DOI: 10.1109/isspit.2008.4775640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Codon usage bias is well established in different species from bacteria to mammals. A number of models have been proposed to show this bias as a balance between mutation and selection. Most of these models emphasize controlling the speed of protein translation from the mRNA and increasing the accuracy where this bias is dependent on the abundance and properties of the available tRNA. In this work, codon usage bias in general is considered from a different angle based on a new hypothesis where selection is expected to act in a direction to favor codons that are more buffered, or protected, from mutation than those more sensitive to mutation. It is anticipated that the more buffered the original coding sequence, the higher the survival chance for the whole organism since the resulting protein sequence remains unchanged. Two different complementary measures are developed to compute the average buffering capacity in a given sequence. We show that the buffering capacity of coding sequences in humans is in general higher than that of randomly generated sequences and that of shifted reading frames. Highly expressed genes are shown to have an even higher buffering capacity than non-housekeeping genes. This result is to be expected due to the necessity of housekeeping genes.
Collapse
Affiliation(s)
- Rami N. Mahdi
- University of Louisville, Department of Computer Engineering and Computer Science,
| | - Eric C. Rouchka
- University of Louisville, Department of Computer Engineering and Computer Science,
| |
Collapse
|
18
|
Pienaar E, Viljoen HJ. The tri-frame model. J Theor Biol 2007; 251:616-27. [PMID: 18237749 DOI: 10.1016/j.jtbi.2007.12.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2007] [Revised: 12/07/2007] [Accepted: 12/07/2007] [Indexed: 11/18/2022]
Abstract
The tri-frame model gives mathematical expression to the transcription and translation processes, and considers all three reading frames (RFs). RNA polymerases transcribe DNA in single nucleotide increments, but ribosomes translate mRNA in pairings of three (triplets or codons). The set of triplets in the mRNA, starting with the initiation codon (usually AUG) defines the open reading frame (ORF). Since ribosomes do not always translocate three nucleotide positions, two additional RFs are accessible. The -1 RF and the +1 RF are triplet pairings of the mRNA, which are accessed by shifting one nucleotide position in the 5' and 3' directions, respectively. Transcription is modeled as a linear operator that maps the initial codons in all three frames into other codon sets to account for possible transcriptional errors. Translational errors (missense errors) originate from misacylation of tRNAs and misreading of aa-tRNAs by the ribosome. Translation is modeled as a linear mapping from codons into aa-tRNA species, which includes misreading errors. A final transformation from aa-tRNA species into amino acids provides the probability distributions of possible amino acids into which the codons in all three frames could be translated. An important element of the tri-frame model is the ribosomal occupancy probability. It is a vector in R(3) that gives the probability to find the ribosome in the ORF, -1 or +1 RF at each codon position. The sequence of vectors, from the first to the final codon position, gives a history of ribosome frameshifting. The model is powerful: it provides explicit expressions for (1) yield of error-free protein, (2) fraction of prematurely terminated polypeptides, (3) number of transcription errors, (4) number of translation errors and (5) mutations due to frameshifting. The theory is demonstrated for the three genes rpsU, dnaG and rpoD of Escherichia coli, which lie on the same operon, as well as for the prfB gene.
Collapse
Affiliation(s)
- Elsje Pienaar
- Department of Chemical and Biomolecular Engineering, University of Nebraska, 211 Othmer Hall, Lincoln, NE 68588-0643, USA
| | | |
Collapse
|
19
|
Novozhilov AS, Wolf YI, Koonin EV. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol Direct 2007; 2:24. [PMID: 17956616 PMCID: PMC2211284 DOI: 10.1186/1745-6150-2-24] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2007] [Accepted: 10/23/2007] [Indexed: 11/30/2022] Open
Abstract
Background The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point. Results We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak. Conclusion The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident. Reviewers This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight. Open Peer Review This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight.
Collapse
Affiliation(s)
- Artem S Novozhilov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
20
|
Higgs PG, Hao W, Golding GB. Identification of conflicting selective effects on highly expressed genes. Evol Bioinform Online 2007; 3:1-13. [PMID: 19430600 PMCID: PMC2674637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2022] Open
Abstract
Many different selective effects on DNA and proteins influence the frequency of codons and amino acids in coding sequences. Selection is often stronger on highly expressed genes. Hence, by comparing high- and low-expression genes it is possible to distinguish the factors that are selected by evolution. It has been proposed that highly expressed genes should (i) preferentially use codons matching abundant tRNAs (translational efficiency), (ii) preferentially use amino acids with low cost of synthesis, (iii) be under stronger selection to maintain the required amino acid content, and (iv) be selected for translational robustness. These effects act simultaneously and can be contradictory. We develop a model that combines these factors, and use Akaike's Information Criterion for model selection. We consider pairs of paralogues that arose by whole-genome duplication in Saccharmyces cerevisiae. A codon-based model is used that includes asymmetric effects due to selection on highly expressed genes. The largest effect is translational efficiency, which is found to strongly influence synonymous, but not non-synonymous rates. Minimization of the cost of amino acid synthesis is implicated. However, when a more general measure of selection for amino acid usage is used, the cost minimization effect becomes redundant. Small effects that we attribute to selection for translational robustness can be identified as an improvement in the model fit on top of the effects of translational efficiency and amino acid usage.
Collapse
Affiliation(s)
- Paul G. Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1.,Correspondence: Paul G. Higgs. Tel: 905 525 9140, ext 25870; Fax: 905 546 1252;
| | - Weilong Hao
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1
| |
Collapse
|
21
|
Using the nucleotide substitution rate matrix to detect horizontal gene transfer. BMC Bioinformatics 2006; 7:476. [PMID: 17067382 PMCID: PMC1657035 DOI: 10.1186/1471-2105-7-476] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2006] [Accepted: 10/26/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conferring antibiotic resistance, improved detection of horizontally transferred genes from sequence data would be an important advance. Existing sequence-based methods for detecting HGT focus on changes in nucleotide composition or on differences between gene and genome phylogenies; these methods have high error rates. RESULTS First, we introduce a new class of methods for detecting HGT based on the changes in nucleotide substitution rates that occur when a gene is transferred to a new organism. Our new methods discriminate simulated HGT events with an error rate up to 10 times lower than does GC content. Use of models that are not time-reversible is crucial for detecting HGT. Second, we show that using combinations of multiple predictors of HGT offers substantial improvements over using any single predictor, yielding as much as a factor of 18 improvement in performance (a maximum reduction in error rate from 38% to about 3%). Multiple predictors were combined by using the random forests machine learning algorithm to identify optimal classifiers that separate HGT from non-HGT trees. CONCLUSION The new class of HGT-detection methods introduced here combines advantages of phylogenetic and compositional HGT-detection techniques. These new techniques offer order-of-magnitude improvements over compositional methods because they are better able to discriminate HGT from non-HGT trees under a wide range of simulated conditions. We also found that combining multiple measures of HGT is essential for detecting a wide range of HGT events. These novel indicators of horizontal transfer will be widely useful in detecting HGT events linked to the evolution of important bacterial traits, such as antibiotic resistance and pathogenicity.
Collapse
|
22
|
Mitreva M, Wendl MC, Martin J, Wylie T, Yin Y, Larson A, Parkinson J, Waterston RH, McCarter JP. Codon usage patterns in Nematoda: analysis based on over 25 million codons in thirty-two species. Genome Biol 2006; 7:R75. [PMID: 26271136 PMCID: PMC1779591 DOI: 10.1186/gb-2006-7-8-r75] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Revised: 06/30/2006] [Accepted: 08/14/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage has direct utility in molecular characterization of species and is also a arker for molecular evolution. To understand codon usage within the diverse phylum Nematoda,we analyzed a total of 265,494 expressed sequence tags (ESTs) from 30 nematode species. The full genomes of Caenorhabditis elegans and C. briggsae were also examined. A total of 25,871,325 codons ere analyzed and a comprehensive codon usage table for all species was generated. This is the first codon usage table available for 24 of these organisms. RESULTS Codon usage similarity in Nematoda usually persists over the breadth of a genus but thenrapidly diminishes even within each clade. Globodera, Meloidogyne, Pristionchus, and Strongyloides have the most highly derived patterns of codon usage. The major factor affecting differences in codon usage between species is the coding sequence GC content, which varies in nematodes from 32%to 51%. Coding GC content (measured as GC3) also explains much of the observed variation in the effective number of codons (R = 0.70), which is a measure of codon bias, and it even accounts for differences in amino acid frequency. Codon usage is also affected by neighboring nucleotides(N1 context). Coding GC content correlates strongly with estimated noncoding genomic GC content (R = 0.92). On examining abundant clusters in five species, candidate optimal codons were identified that may be preferred in highly expressed transcripts. CONCLUSION Evolutionary models indicate that total genomic GC content, probably the product of directional mutation pressure, drives codon usage rather than the converse, a conclusion that is supported by examination of nematode genomes.
Collapse
Affiliation(s)
- Makedonka Mitreva
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Michael C Wendl
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - John Martin
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Todd Wylie
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Yong Yin
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Allan Larson
- Department of Biology, Washington University, St. Louis, Missouri 63130, USA
| | - John Parkinson
- Hospital for Sick Children, Toronto, and Departments of Biochemistry/Medical Genetics and Microbiology, University of Toronto, M5G 1X8, Canada
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - James P McCarter
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
- Divergence Inc., St Louis, Missouri 63141, USA
| |
Collapse
|