101
|
Ma W. The scenario on the origin of translation in the RNA world: in principle of replication parsimony. Biol Direct 2010; 5:65. [PMID: 21110883 PMCID: PMC3002371 DOI: 10.1186/1745-6150-5-65] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Accepted: 11/27/2010] [Indexed: 01/06/2023] Open
Abstract
Background It is now believed that in the origin of life, proteins should have been "invented" in an RNA world. However, due to the complexity of a possible RNA-based proto-translation system, this evolving process seems quite complicated and the associated scenario remains very blurry. Considering that RNA can bind amino acids with specificity, it has been reasonably supposed that initial peptides might have been synthesized on "RNA templates" containing multiple amino acid binding sites. This "Direct RNA Template (DRT)" mechanism is attractive because it should be the simplest mechanism for RNA to synthesize peptides, thus very likely to have been adopted initially in the RNA world. Then, how this mechanism could develop into a proto-translation system mechanism is an interesting problem. Presentation of the hypothesis Here an explanation to this problem is shown considering the principle of "replication parsimony" --- genetic information tends to be utilized in a parsimonious way under selection pressure, due to its replication cost (e.g., in the RNA world, nucleotides and ribozymes for RNA replication). Because a DRT would be quite long even for a short peptide, its replication cost would be great. Thus the diversity and the length of functional peptides synthesized by the DRT mechanism would be seriously limited. Adaptors (proto-tRNAs) would arise to allow a DRT's complementary strand (called "C-DRT" here) to direct the synthesis of the same peptide synthesized by the DRT itself. Because the C-DRT is a necessary part in the DRT's replication, fewer turns of the DRT's replication would be needed to synthesize definite copies of the functional peptide, thus saving the replication cost. Acting through adaptors, C-DRTs could transform into much shorter templates (called "proto-mRNAs" here) and substitute the role of DRTs, thus significantly saving the replication cost. A proto-rRNA corresponding to the small subunit rRNA would then emerge to aid the binding of proto-tRNAs and proto-mRNAs, allowing the reduction of base pairs between them (ultimately resulting in the triplet anticodon/codon pair), thus further saving the replication cost. In this context, the replication cost saved would allow the appearance of more and longer functional peptides and, finally, proteins. The hypothesis could be called "DRT-RP" ("RP" for "replication parsimony"). Testing the hypothesis The scenario described here is open for experimental work at some key scenes, including the compact DRT mechanism, the development of adaptors from aa-aptamers, the synthesis of peptides by proto-tRNAs and proto-mRNAs without the participation of proto-rRNAs, etc. Interestingly, a recent computer simulation study has demonstrated the plausibility of one of the evolving processes driven by replication parsimony in the scenario. Implication of the hypothesis An RNA-based proto-translation system could arise gradually from the DRT mechanism according to the principle of "replication parsimony" --- to save the replication cost of RNA templates for functional peptides. A surprising side deduction along the logic of the hypothesis is that complex, biosynthetic amino acids might have entered the genetic code earlier than simple, prebiotic amino acids, which is opposite to the common sense. Overall, the present discussion clarifies the blurry scenario concerning the origin of translation with a major clue, which shows vividly how life could "manage" to exploit potential chemical resources in nature, eventually in an efficient way over evolution. Reviewers This article was reviewed by Eugene V. Koonin, Juergen Brosius, and Arcady Mushegian.
Collapse
Affiliation(s)
- Wentao Ma
- College of Life Sciences, Wuhan University, Wuhan 430072, PR China.
| |
Collapse
|
102
|
Stability of the genetic code and optimal parameters of amino acids. J Theor Biol 2010; 269:57-63. [PMID: 20955716 DOI: 10.1016/j.jtbi.2010.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Revised: 09/20/2010] [Accepted: 10/12/2010] [Indexed: 11/24/2022]
Abstract
The standard genetic code is known to be much more efficient in minimizing adverse effects of misreading errors and one-point mutations in comparison with a random code having the same structure, i.e. the same number of codons coding for each particular amino acid. We study the inverse problem, how the code structure affects the optimal physico-chemical parameters of amino acids ensuring the highest stability of the genetic code. It is shown that the choice of two or more amino acids with given properties determines unambiguously all the others. In this sense the code structure determines strictly the optimal parameters of amino acids or the corresponding scales may be derived directly from the genetic code. In the code with the structure of the standard genetic code the resulting values for hydrophobicity obtained in the scheme "leave one out" and in the scheme with fixed maximum and minimum parameters correlate significantly with the natural scale. The comparison of the optimal and natural parameters allows assessing relative impact of physico-chemical and error-minimization factors during evolution of the genetic code. As the resulting optimal scale depends on the choice of amino acids with given parameters, the technique can also be applied to testing various scenarios of the code evolution with increasing number of codified amino acids. Our results indicate the co-evolution of the genetic code and physico-chemical properties of recruited amino acids.
Collapse
|
103
|
Tlusty T. A colorful origin for the genetic code: Information theory, statistical mechanics and the emergence of molecular codes. Phys Life Rev 2010; 7:362-76. [DOI: 10.1016/j.plrev.2010.06.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 01/25/2010] [Accepted: 02/06/2010] [Indexed: 10/19/2022]
|
104
|
Sammet SG, Bastolla U, Porto M. Comparison of translation loads for standard and alternative genetic codes. BMC Evol Biol 2010; 10:178. [PMID: 20546599 PMCID: PMC2909233 DOI: 10.1186/1471-2148-10-178] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 06/14/2010] [Indexed: 11/25/2022] Open
Abstract
Background The (almost) universality of the genetic code is one of the most intriguing properties of cellular life. Nevertheless, several variants of the standard genetic code have been observed, which differ in one or several of 64 codon assignments and occur mainly in mitochondrial genomes and in nuclear genomes of some bacterial and eukaryotic parasites. These variants are usually considered to be the result of non-adaptive evolution. It has been shown that the standard genetic code is preferential to randomly assembled codes for its ability to reduce the effects of errors in protein translation. Results Using a genotype-to-phenotype mapping based on a quantitative model of protein folding, we compare the standard genetic code to seven of its naturally occurring variants with respect to the fitness loss associated to mistranslation and mutation. These fitness losses are computed through computer simulations of protein evolution with mutations that are either neutral or lethal, and different mutation biases, which influence the balance between unfolding and misfolding stability. We show that the alternative codes may produce significantly different mutation and translation loads, particularly for genomes evolving with a rather large mutation bias. Most of the alternative genetic codes are found to be disadvantageous to the standard code, in agreement with the view that the change of genetic code is a mutationally driven event. Nevertheless, one of the studied alternative genetic codes is predicted to be preferable to the standard code for a broad range of mutation biases. Conclusions Our results show that, with one exception, the standard genetic code is generally better able to reduce the translation load than the naturally occurring variants studied here. Besides this exception, some of the other alternative genetic codes are predicted to be better adapted for extreme mutation biases. Hence, the fixation of alternative genetic codes might be a neutral or nearly-neutral event in the majority of the cases, but adaptation cannot be excluded for some of the studied cases.
Collapse
Affiliation(s)
- Stefanie Gabriele Sammet
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr, 8, 64289 Darmstadt, Germany
| | | | | |
Collapse
|
105
|
Santos J, Monteagudo Á. Study of the genetic code adaptability by means of a genetic algorithm. J Theor Biol 2010; 264:854-65. [DOI: 10.1016/j.jtbi.2010.02.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Revised: 01/05/2010] [Accepted: 02/23/2010] [Indexed: 11/30/2022]
|
106
|
Searching of Code Space for an Error-Minimized Genetic Code Via Codon Capture Leads to Failure, or Requires At Least 20 Improving Codon Reassignments Via the Ambiguous Intermediate Mechanism. J Mol Evol 2010; 70:106-15. [DOI: 10.1007/s00239-009-9313-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 12/07/2009] [Indexed: 10/19/2022]
|
107
|
Nassif H, Al-Ali H, Khuri S, Keirouz W, Page D. An Inductive Logic Programming Approach to Validate Hexose Binding Biochemical Knowledge. INDUCTIVE LOGIC PROGRAMMING. ILP 2010; 5989:149-165. [PMID: 25309972 PMCID: PMC4190110 DOI: 10.1007/978-3-642-13840-9_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Hexoses are simple sugars that play a key role in many cellular pathways, and in the regulation of development and disease mechanisms. Current protein-sugar computational models are based, at least partially, on prior biochemical findings and knowledge. They incorporate different parts of these findings in predictive black-box models. We investigate the empirical support for biochemical findings by comparing Inductive Logic Programming (ILP) induced rules to actual biochemical results. We mine the Protein Data Bank for a representative data set of hexose binding sites, non-hexose binding sites and surface grooves. We build an ILP model of hexose-binding sites and evaluate our results against several baseline machine learning classifiers. Our method achieves an accuracy similar to that of other black-box classifiers while providing insight into the discriminating process. In addition, it confirms wet-lab findings and reveals a previously unreported Trp-Glu amino acids dependency.
Collapse
Affiliation(s)
- Houssam Nassif
- Department of Computer Sciences, University of Wisconsin-Madison, USA
| | - Hassan Al-Ali
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA
| | - Sawsan Khuri
- Department of Biochemistry and Molecular Biology, University of Miami, Florida, USA
| | - Walid Keirouz
- Center for Computational Science, University of Miami, Florida, USA
| | - David Page
- The Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Florida, USA
| |
Collapse
|
108
|
Chechetkin V, Lobzin V. Local stability and evolution of the genetic code. J Theor Biol 2009; 261:643-53. [DOI: 10.1016/j.jtbi.2009.08.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2009] [Revised: 08/31/2009] [Accepted: 08/31/2009] [Indexed: 11/25/2022]
|
109
|
Novozhilov AS, Koonin EV. Exceptional error minimization in putative primordial genetic codes. Biol Direct 2009; 4:44. [PMID: 19925661 PMCID: PMC2785773 DOI: 10.1186/1745-6150-4-44] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 11/19/2009] [Indexed: 11/11/2022] Open
Abstract
Background The standard genetic code is redundant and has a highly non-random structure. Codons for the same amino acids typically differ only by the nucleotide in the third position, whereas similar amino acids are encoded, mostly, by codon series that differ by a single base substitution in the third or the first position. As a result, the code is highly albeit not optimally robust to errors of translation, a property that has been interpreted either as a product of selection directed at the minimization of errors or as a non-adaptive by-product of evolution of the code driven by other forces. Results We investigated the error-minimization properties of putative primordial codes that consisted of 16 supercodons, with the third base being completely redundant, using a previously derived cost function and the error minimization percentage as the measure of a code's robustness to mistranslation. It is shown that, when the 16-supercodon table is populated with 10 putative primordial amino acids, inferred from the results of abiotic synthesis experiments and other evidence independent of the code's evolution, and with minimal assumptions used to assign the remaining supercodons, the resulting 2-letter codes are nearly optimal in terms of the error minimization level. Conclusion The results of the computational experiments with putative primordial genetic codes that contained only two meaningful letters in all codons and encoded 10 to 16 amino acids indicate that such codes are likely to have been nearly optimal with respect to the minimization of translation errors. This near-optimality could be the outcome of extensive early selection during the co-evolution of the code with the primordial, error-prone translation system, or a result of a unique, accidental event. Under this hypothesis, the subsequent expansion of the code resulted in a decrease of the error minimization level that became sustainable owing to the evolution of a high-fidelity translation system. Reviewers This article was reviewed by Paul Higgs (nominated by Arcady Mushegian), Rob Knight, and Sandor Pongor. For the complete reports, go to the Reviewers' Reports section.
Collapse
Affiliation(s)
- Artem S Novozhilov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
110
|
Baranov PV, Venin M, Provan G. Codon size reduction as the origin of the triplet genetic code. PLoS One 2009; 4:e5708. [PMID: 19479032 PMCID: PMC2682656 DOI: 10.1371/journal.pone.0005708] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Accepted: 04/22/2009] [Indexed: 11/26/2022] Open
Abstract
The genetic code appears to be optimized in its robustness to missense errors and frameshift errors. In addition, the genetic code is near-optimal in terms of its ability to carry information in addition to the sequences of encoded proteins. As evolution has no foresight, optimality of the modern genetic code suggests that it evolved from less optimal code variants. The length of codons in the genetic code is also optimal, as three is the minimal nucleotide combination that can encode the twenty standard amino acids. The apparent impossibility of transitions between codon sizes in a discontinuous manner during evolution has resulted in an unbending view that the genetic code was always triplet. Yet, recent experimental evidence on quadruplet decoding, as well as the discovery of organisms with ambiguous and dual decoding, suggest that the possibility of the evolution of triplet decoding from living systems with non-triplet decoding merits reconsideration and further exploration. To explore this possibility we designed a mathematical model of the evolution of primitive digital coding systems which can decode nucleotide sequences into protein sequences. These coding systems can evolve their nucleotide sequences via genetic events of Darwinian evolution, such as point-mutations. The replication rates of such coding systems depend on the accuracy of the generated protein sequences. Computer simulations based on our model show that decoding systems with codons of length greater than three spontaneously evolve into predominantly triplet decoding systems. Our findings suggest a plausible scenario for the evolution of the triplet genetic code in a continuous manner. This scenario suggests an explanation of how protein synthesis could be accomplished by means of long RNA-RNA interactions prior to the emergence of the complex decoding machinery, such as the ribosome, that is required for stabilization and discrimination of otherwise weak triplet codon-anticodon interactions.
Collapse
Affiliation(s)
- Pavel V Baranov
- Biochemistry Department, University College Cork, Cork, Ireland.
| | | | | |
Collapse
|
111
|
Singh TR, Pardasani KR. Ambush hypothesis revisited: Evidences for phylogenetic trends. Comput Biol Chem 2009; 33:239-44. [PMID: 19473880 DOI: 10.1016/j.compbiolchem.2009.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Revised: 04/15/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]
Abstract
Recoding events occur in competition with standard readout of the transcript, and are site-specific. Recoding is the reprogramming of mRNA translation by localized alterations in the standard translational rules. Frame-shifting is one class of recoding and defined as protein translations that start not at the first, but either at the second (+1 frame-shift) or the third (-1 frame-shift) nucleotide of the codon. Coding sequences lack stop codons, but frame-shifted sequences contain many stop codons, termed off-frame stops or hidden stops. These hidden stops terminate frame-shifted translation, potentially decreasing energy, and resource waste on non-functional proteins. Our results support this putative ancient adaptive event for the selection of codons that can be part of hidden stop codons. All taxonomic groups represent positive correlation between codon usage frequencies and contribution of codons to hidden stops in off-frame context. Our analysis on nuclear and mitochondrial genomic data revealed phylogenomic selection of ambush mechanism. Strongest impact of this event was found in viruses and bacteria. It has been suggested that this mechanism has occurred and been utilized in the early stages of evolution.
Collapse
Affiliation(s)
- Tiratha Raj Singh
- Department of Zoology, Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | |
Collapse
|
112
|
Higgs PG. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol Direct 2009; 4:16. [PMID: 19393096 PMCID: PMC2689856 DOI: 10.1186/1745-6150-4-16] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Accepted: 04/24/2009] [Indexed: 11/18/2022] Open
Abstract
Background The arrangement of the amino acids in the genetic code is such that neighbouring codons are assigned to amino acids with similar physical properties. Hence, the effects of translational error are minimized with respect to randomly reshuffled codes. Further inspection reveals that it is amino acids in the same column of the code (i.e. same second base) that are similar, whereas those in the same row show no particular similarity. We propose a 'four-column' theory for the origin of the code that explains how the action of selection during the build-up of the code leads to a final code that has the observed properties. Results The theory makes the following propositions. (i) The earliest amino acids in the code were those that are easiest to synthesize non-biologically, namely Gly, Ala, Asp, Glu and Val. (ii) These amino acids are assigned to codons with G at first position. Therefore the first code may have used only these codons. (iii) The code rapidly developed into a four-column code where all codons in the same column coded for the same amino acid: NUN = Val, NCN = Ala, NAN = Asp and/or Glu, and NGN = Gly. (iv) Later amino acids were added sequentially to the code by a process of subdivision of codon blocks in which a subset of the codons assigned to an early amino acid were reassigned to a later amino acid. (v) Later amino acids were added into positions formerly occupied by amino acids with similar properties because this can occur with minimal disruption to the proteins already encoded by the earlier code. As a result, the properties of the amino acids in the final code retain a four-column pattern that is a relic of the earliest stages of code evolution. Conclusion The driving force during this process is not the minimization of translational error, but positive selection for the increased diversity and functionality of the proteins that can be made with a larger amino acid alphabet. Nevertheless, the code that results is one in which translational error is minimized. We define a cost function with which we can compare the fitness of codes with varying numbers of amino acids, and a barrier function, which measures the change in cost immediately after addition of a new amino acid. We show that the barrier is positive if an amino acid is added into a column with dissimilar properties, but negative if an amino acid is added into a column with similar physical properties. Thus, natural selection favours the assignment of amino acids to the positions that they occupy in the final code. Reviewers This article was reviewed by David Ardell, Eugene Koonin and Stephen Freeland (nominated by Laurence Hurst)
Collapse
Affiliation(s)
- Paul G Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada.
| |
Collapse
|
113
|
Abstract
The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly nonrandom. The three main concepts on the origin and evolution of the code are the stereochemical theory, according to which codon assignments are dictated by physicochemical affinity between amino acids and the cognate codons (anticodons); the coevolution theory, which posits that the code structure coevolved with amino acid biosynthesis pathways; and the error minimization theory under which selection to minimize the adverse effect of point mutations and translation errors was the principal factor of the code's evolution. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis, that is, the notion that the standard code might have no special properties but was fixed simply because all extant life forms share a common ancestor, with subsequent changes to the code, mostly, precluded by the deleterious effect of codon reassignment. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational misreading but there are numerous more robust codes, so the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code could be a combination of frozen accident with selection for error minimization although contributions from coevolution of the code with metabolic pathways and weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain. A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
114
|
Mahdi RN, Rouchka EC. Evidence of bias towards buffered codons in human transcripts. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2008; 2008:29-34. [PMID: 20622995 PMCID: PMC2901532 DOI: 10.1109/isspit.2008.4775640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Codon usage bias is well established in different species from bacteria to mammals. A number of models have been proposed to show this bias as a balance between mutation and selection. Most of these models emphasize controlling the speed of protein translation from the mRNA and increasing the accuracy where this bias is dependent on the abundance and properties of the available tRNA. In this work, codon usage bias in general is considered from a different angle based on a new hypothesis where selection is expected to act in a direction to favor codons that are more buffered, or protected, from mutation than those more sensitive to mutation. It is anticipated that the more buffered the original coding sequence, the higher the survival chance for the whole organism since the resulting protein sequence remains unchanged. Two different complementary measures are developed to compute the average buffering capacity in a given sequence. We show that the buffering capacity of coding sequences in humans is in general higher than that of randomly generated sequences and that of shifted reading frames. Highly expressed genes are shown to have an even higher buffering capacity than non-housekeeping genes. This result is to be expected due to the necessity of housekeeping genes.
Collapse
Affiliation(s)
- Rami N. Mahdi
- University of Louisville, Department of Computer Engineering and Computer Science,
| | - Eric C. Rouchka
- University of Louisville, Department of Computer Engineering and Computer Science,
| |
Collapse
|
115
|
Massey SE. A Neutral Origin for Error Minimization in the Genetic Code. J Mol Evol 2008; 67:510-6. [DOI: 10.1007/s00239-008-9167-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2008] [Revised: 09/03/2008] [Accepted: 09/03/2008] [Indexed: 10/21/2022]
|
116
|
Demongeot J, Glade N, Moreira A. Evolution and RNA relics. a systems biology view. Acta Biotheor 2008; 56:5-25. [PMID: 18273683 DOI: 10.1007/s10441-008-9028-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 12/19/2007] [Indexed: 02/05/2023]
Abstract
The genetic code has evolved from its initial non-degenerate wobble version until reaching its present state of degeneracy. By using the stereochemical hypothesis, we revisit the problem of codon assignations to the synonymy classes of amino-acids. We obtain these classes with a simple classifier based on physico-chemical properties of nucleic bases, like hydrophobicity and molecular weight. Then we propose simple RNA (or more generally XNA, with X for D, P or R) ring structures that present, overlap included, one and only one codon by synonymy class as solutions of a combinatory variational problem. We compare these solutions to sequences of present RNAs considered as relics, with a high interspecific invariance, like invariant parts of (t)RNAs and micro-RNAs. We conclude by emphasizing some optimal properties of the genetic code.
Collapse
Affiliation(s)
- Jacques Demongeot
- TIMC-IMAG, UMR CNRS 5525, Faculty of Medicine of Grenoble, University J. Fourier, 38 700 La Tronche, France.
| | | | | |
Collapse
|
117
|
Gutfraind A, Kempf A. Error-reducing structure of the genetic code indicates code origin in non-thermophile organisms. ORIGINS LIFE EVOL B 2008; 38:75-85. [PMID: 17554636 DOI: 10.1007/s11084-007-9071-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Revised: 03/28/2007] [Accepted: 04/03/2007] [Indexed: 10/23/2022]
Abstract
During the RNA World, organisms experienced high rates of genetic errors, which implies that there was strong evolutionary pressure to reduce the errors' phenotypical impact by suitably structuring the still-evolving genetic code. Therefore, the relative rates of the various types of genetic errors should have left characteristic imprints in the structure of the genetic code. Here, we show that, therefore, it is possible to some extent to reconstruct those error rates, as well as the nucleotide frequencies, for the time when the code was fixed. We find evidence indicating that the frequencies of G and C in the genome were not elevated. Since, for thermodynamic reasons, RNA in thermophiles tends to possess elevated G+C content, this result indicates that the fixation of the genetic code occurred in organisms which were either not thermophiles or that the code's fixation occurred after the rise of DNA.
Collapse
Affiliation(s)
- Alexander Gutfraind
- Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, USA.
| | | |
Collapse
|
118
|
Novozhilov AS, Wolf YI, Koonin EV. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol Direct 2007; 2:24. [PMID: 17956616 PMCID: PMC2211284 DOI: 10.1186/1745-6150-2-24] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2007] [Accepted: 10/23/2007] [Indexed: 11/30/2022] Open
Abstract
Background The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point. Results We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak. Conclusion The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident. Reviewers This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight. Open Peer Review This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight.
Collapse
Affiliation(s)
- Artem S Novozhilov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
119
|
Foettinger A, Leitner A, Lindner W. Selective Enrichment of Tryptophan-Containing Peptides from Protein Digests Employing a Reversible Derivatization with Malondialdehyde and Solid-Phase Capture on Hydrazide Beads. J Proteome Res 2007; 6:3827-34. [PMID: 17655347 DOI: 10.1021/pr0702767] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A method for the selective enrichment of tryptophan-containing peptides from complex peptide mixtures such as protein digests is presented. It is based on the reversible reaction of tryptophan with malondialdehyde and trapping of the derivatized Trp-peptides on hydrazide beads via the free aldehyde group of the modified peptides. The peptides are subsequently recovered in their native form by specific cleavage reactions for further (mass spectrometric) analysis. The method was optimized and evaluated using a tryptic digest of a mixture of 10 model proteins, demonstrating a significant reduction in sample complexity while still allowing the identification of all proteins. The applicability of the tryptophan-specific enrichment procedure to complex biological samples is demonstrated for a total yeast cell lysate. Analysis of the processed fraction by 1D-LC-MS/MS confirms the specificity of the enrichment procedure, as more than 85% of the peptides recovered from the enrichment step contained tryptophan. The reduction in sample complexity also resulted in the identification of additional proteins in comparison to the untreated lysate.
Collapse
Affiliation(s)
- Alexandra Foettinger
- Department of Analytical Chemistry and Food Chemistry, University of Vienna, Waehringer Strasse 38, 1090 Vienna, Austria
| | | | | |
Collapse
|
120
|
Zou G, de Leeuw E, Li C, Pazgier M, Li C, Zeng P, Lu WY, Lubkowski J, Lu W. Toward Understanding the Cationicity of Defensins. J Biol Chem 2007; 282:19653-65. [PMID: 17452329 DOI: 10.1074/jbc.m611003200] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Human defensins are a family of small antimicrobial proteins found predominantly in leukocytes and epithelial cells that play important roles in the innate and adaptive immune defense against microbial infection. The most distinct molecular feature of defensins is cationicity, manifested by abundant Arg and/or Lys residues in their sequences. Sequence analysis indicates that Arg is strongly selected over Lys in alpha-defensins but not in beta-defensins. To understand this Arg/Lys disparity in defensins, we chemically synthesized human alpha-defensin 1 (HNP1) and several HNP1 analogs where three Arg residues were replaced by each of the following six alpha-amino acids: Lys, ornithine (Orn), diaminobutyric acid (Dab), diaminopropionic acid (Dap), N,N-dimethyl-Lys ((diMe)Lys), and homo-Arg ((homo)Arg). In addition, we prepared human beta-defensin 1 (hBD1) and (Lys-->Arg)hBD1 in which all four Lys residues were substituted for Arg. Bactericidal activity assays revealed the following. 1) Arg-containing HNP1 and (Lys-->Arg)hBD1 are functionally better than Lys-HNP1 and hBD1, respectively; the difference between Arg and Lys is more evident in the alpha-defensin than in the beta-defensin and is more evident at low salt concentrations than at high salt concentrations. 2) For HNP1, the Arg/Lys disparity is much more pronounced with Staphylococcus aureus than with Escherichia coli, and the Arg-rich HNP1 kills bacteria faster than its Lys-rich analog. 3) Arg and Lys appear to have optimal chain lengths for bacterial killing as shortening Lys or lengthening Arg in HNP1 invariably becomes functionally deleterious. Our findings provide insights into the Arg/Lys disparity in defensins, and shed light on the cationicity of defensins with respect to their antimicrobial activity and specificity.
Collapse
Affiliation(s)
- Guozhang Zou
- Institute of Human Virology, University of Maryland Biotechnology Institute, Baltimore, Maryland 21201, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
121
|
Sánchez R, Grau R. A novel algebraic structure of the genetic code over the galois field of four DNA bases. Acta Biotheor 2007; 54:27-42. [PMID: 16823609 DOI: 10.1007/s10441-006-6192-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2005] [Accepted: 12/22/2005] [Indexed: 11/29/2022]
Abstract
A novel algebraic structure of the genetic code is proposed. Here, the principal partitions of the genetic code table were obtained as equivalent classes of quotient spaces of the genetic code vector space over the Galois field of the four DNA bases. The new algebraic structure shows strong connections among algebraic relationships, codon assignment and physicochemical properties of amino acids. Moreover, a distance function defined between the codon binary representations in the vector space was demonstrated to have a linear behavior respect to physical variables such as the mean of amino acids interaction energies in proteins. It was also noticed that the distance between wild type and mutant codons approach to smaller values in mutational variants of four genes, i.e., human phenylalanine hydroxylase, human beta-globin, HIV-1 protease and HIV-1 reverse transcriptase. These results strongly suggest that deterministic rules must be involved in the genetic code origin.
Collapse
Affiliation(s)
- Robersy Sánchez
- Research Institute of Tropical Roots, Tuber Crops and Banana (INIVIT), Biotechnology Group, Santo Domingo, Villa Clara, Cuba.
| | | |
Collapse
|
122
|
Goodarzi H, Katanforoush A, Torabi N, Najafabadi HS. Solvent accessibility, residue charge and residue volume, the three ingredients of a robust amino acid substitution matrix. J Theor Biol 2007; 245:715-25. [PMID: 17240399 DOI: 10.1016/j.jtbi.2006.12.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2006] [Revised: 10/31/2006] [Accepted: 12/08/2006] [Indexed: 11/25/2022]
Abstract
Cost measure matrices or different amino acid indices have been widely used for studies in many fields of biology. One major criticism of these studies might be based on the unavailability of an unbiased and yet effective amino acid substitution matrix. Throughout this study we have devised a cost measure matrix based on the solvent accessibility, residue charge, and residue volume indices. Performed analyses on this novel substitution matrix (i.e. solvent accessibility charge volume (SCV) matrix) support the uncontaminated nature of this matrix regarding the genetic code. Although highly similar to a number of previously available cost measure matrices, the SCV matrix results in a more significant optimality in the error-buffering capacity of the genetic code when compared to many other amino acid substitution matrices. Besides, a method to compare an SCV-based scoring matrix with a number of widely used matrices has been devised, the results of which highlights the robustness of this matrix in protein family discrimination.
Collapse
Affiliation(s)
- Hani Goodarzi
- Molecular Biology Department, Princeton University, Princeton, NJ, USA.
| | | | | | | |
Collapse
|
123
|
Itzkovitz S, Alon U. The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res 2007; 17:405-12. [PMID: 17293451 PMCID: PMC1832087 DOI: 10.1101/gr.5987307] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
DNA sequences that code for proteins need to convey, in addition to the protein-coding information, several different signals at the same time. These "parallel codes" include binding sequences for regulatory and structural proteins, signals for splicing, and RNA secondary structure. Here, we show that the universal genetic code can efficiently carry arbitrary parallel codes much better than the vast majority of other possible genetic codes. This property is related to the identity of the stop codons. We find that the ability to support parallel codes is strongly tied to another useful property of the genetic code--minimization of the effects of frame-shift translation errors. Whereas many of the known regulatory codes reside in nontranslated regions of the genome, the present findings suggest that protein-coding regions can readily carry abundant additional information.
Collapse
Affiliation(s)
- Shalev Itzkovitz
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Uri Alon
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
- Corresponding author.E-mail ; fax 972-8-934125
| |
Collapse
|
124
|
David MPC, Asprer JJT, Ibana JSA, Concepcion GP, Padlan EA. A study of the structural correlates of affinity maturation: Antibody affinity as a function of chemical interactions, structural plasticity and stability. Mol Immunol 2007; 44:1342-51. [PMID: 16854467 DOI: 10.1016/j.molimm.2006.05.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2006] [Revised: 05/15/2006] [Accepted: 05/17/2006] [Indexed: 11/23/2022]
Abstract
Mutations introduced in an antibody germline sequence as a result of somatic hypermutation could cause its derivatives to have an altered affinity for its target. Affinity maturation favors the selection of the antibodies which exhibit increased affinity. The mutations in 80 high affinity anti-thyroid peroxidase sequences derived from six germlines were analysed in terms of the physicochemical properties of the replacement residues, namely hydrophilicity, size and polarizability, and charge and polarity, in the context of its position and probable solvent accessibility. The effects of these substitutions were evaluated in terms of the resultant increased chemical interactivity potential of the affinity-matured antibodies relative to the germline. The results of the analysis would be useful in the rational design of antibodies and of other proteins for improved binding properties.
Collapse
Affiliation(s)
- Maria Pamela C David
- Virtual Laboratory of Biomolecular Structures, Marine Science Institute, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines.
| | | | | | | | | |
Collapse
|
125
|
Torabi N, Goodarzi H, Shateri Najafabadi H. The case for an error minimizing set of coding amino acids. J Theor Biol 2007; 244:737-44. [PMID: 17069856 DOI: 10.1016/j.jtbi.2006.09.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2006] [Revised: 09/17/2006] [Accepted: 09/19/2006] [Indexed: 10/24/2022]
Abstract
The fidelity of the translation machinery largely depends on the accuracy by which the tRNAs within the living cells are charged. Aminoacyl-tRNA synthetases (aaRSs) attach amino acids to their cognate tRNAs ensuring the fidelity of translation in coding sequences. Based on the sequence analysis and catalytic domain structure, these enzymes are classified into two major groups of 10 enzymes each. In this study, we have generally tackled the role of aaRSs in decreasing the effects of mistranslations and consequently the evolution of the translation machinery. To this end, a fitness function was introduced in order to measure the accuracy by which each tRNA is charged with its cognate amino acid. Our results suggest that the aaRSs are very well optimized in "load minimization" based on their classes and their mechanisms in distinguishing the correct amino acids. Besides, our results support the idea that from an evolutionary point, a selectional pressure on the translational fidelity seems to be responsible in the occurrence of the 20 coding amino acids.
Collapse
Affiliation(s)
- Noorossadat Torabi
- Department of Biotechnology, Faculty of Sciences, University of Tehran, Tehran, Iran
| | | | | |
Collapse
|
126
|
Najafabadi HS, Lehmann J, Omidi M. Error minimization explains the codon usage of highly expressed genes in Escherichia coli. Gene 2007; 387:150-5. [PMID: 17097242 DOI: 10.1016/j.gene.2006.09.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2006] [Revised: 08/01/2006] [Accepted: 09/04/2006] [Indexed: 10/24/2022]
Abstract
Different organisms use synonymous codons with different preferences. Several measures have been introduced to compute the extent of codon usage bias within a gene or genome, among which the codon adaptation index (CAI) has been shown to be well correlated with mRNA levels of Escherichia coli. In this work an error adaptation index (eAI) is introduced, which estimates the level at which a gene can tolerate the effects of mistranslations. It is shown that the eAI has a strong correlation with CAI, as well as with mRNA levels, which suggests that the codons of highly expressed genes are selected so that mistranslation would have the minimum possible effect on the structure and function of the related proteins.
Collapse
|
127
|
Use of gene dependent mutation probability in evolutionary neural networks for non-stationary problems. Neurocomputing 2006. [DOI: 10.1016/j.neucom.2006.07.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
128
|
Demongeot J, Elena A, Weil G. Potential automata. Application to the genetic code III. C R Biol 2006; 329:953-62. [PMID: 17126799 DOI: 10.1016/j.crvi.2006.07.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2006] [Revised: 07/07/2006] [Accepted: 07/10/2006] [Indexed: 12/26/2022]
Abstract
In previous notes, we have described both mathematical properties of potential (n-switches) and potential-Hamiltonian (Liénard systems) continuous differential systems, and also biological applications, especially those concerning primitive cyclic RNAs related to the genetic code. In the present note, we give a general definition of a potential automaton, and we show that a discrete Hopfield-like system already introduced by Goles et al. is a good candidate for such a potential automaton: it has a Lyapunov functional that decreases on its trajectories and whose time derivative is just its discrete velocity. Then we apply this new notion of potential automaton to the genetic code. We show in particular that the consideration of only physicochemical properties of amino-acids, like their molecular weight, hydrophobicity and ability to create hydrogen bonds suffices to build a potential decreasing on trajectories corresponding to the synonymy classes of the genetic code. Such an 'a minima' construction reinforces the classical stereochemical hypothesis about the origin of the genetic code and authorizes new views about the optimality of its synonymy classes.
Collapse
Affiliation(s)
- Jacques Demongeot
- TIMC-IMAG UMR CNRS 5525, Faculté de Médecine, Université Joseph-Fourier, Grenoble, 38700 La Tronche, France.
| | | | | |
Collapse
|
129
|
Bulka B, desJardins M, Freeland SJ. An interactive visualization tool to explore the biophysical properties of amino acids and their contribution to substitution matrices. BMC Bioinformatics 2006; 7:329. [PMID: 16817972 PMCID: PMC1524819 DOI: 10.1186/1471-2105-7-329] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Accepted: 07/03/2006] [Indexed: 11/26/2022] Open
Abstract
Background Quantitative descriptions of amino acid similarity, expressed as probabilistic models of evolutionary interchangeability, are central to many mainstream bioinformatic procedures such as sequence alignment, homology searching, and protein structural prediction. Here we present a web-based, user-friendly analysis tool that allows any researcher to quickly and easily visualize relationships between these bioinformatic metrics and to explore their relationships to underlying indices of amino acid molecular descriptors. Results We demonstrate the three fundamental types of question that our software can address by taking as a specific example the connections between 49 measures of amino acid biophysical properties (e.g., size, charge and hydrophobicity), a generalized model of amino acid substitution (as represented by the PAM74-100 matrix), and the mutational distance that separates amino acids within the standard genetic code (i.e., the number of point mutations required for interconversion during protein evolution). We show that our software allows a user to recapture the insights from several key publications on these topics in just a few minutes. Conclusion Our software facilitates rapid, interactive exploration of three interconnected topics: (i) the multidimensional molecular descriptors of the twenty proteinaceous amino acids, (ii) the correlation of these biophysical measurements with observed patterns of amino acid substitution, and (iii) the causal basis for differences between any two observed patterns of amino acid substitution. This software acts as an intuitive bioinformatic exploration tool that can guide more comprehensive statistical analyses relating to a diverse array of specific research questions.
Collapse
Affiliation(s)
- Blazej Bulka
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Marie desJardins
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Stephen J Freeland
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
130
|
Smialowski P, Schmidt T, Cox J, Kirschner A, Frishman D. Will my protein crystallize? A sequence-based predictor. Proteins 2006; 62:343-55. [PMID: 16315316 DOI: 10.1002/prot.20789] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We propose a machine-learning approach to sequence-based prediction of protein crystallizability in which we exploit subtle differences between proteins whose structures were solved by X-ray analysis [or by both X-ray and nuclear magnetic resonance (NMR) spectroscopy] and those proteins whose structures were solved by NMR spectroscopy alone. Because the NMR technique is usually applied on relatively small proteins, sequence length distributions of the X-ray and NMR datasets were adjusted to avoid predictions biased by protein size. As feature space for classification, we used frequencies of mono-, di-, and tripeptides represented by the original 20-letter amino acid alphabet as well as by several reduced alphabets in which amino acids were grouped by their physicochemical and structural properties. The classification algorithm was constructed as a two-layered structure in which the output of primary support vector machine classifiers operating on peptide frequencies was combined by a second-level Naive Bayes classifier. Due to the application of metamethods for cost sensitivity, our method is able to handle real datasets with unbalanced class representation. An overall prediction accuracy of 67% [65% on the positive (crystallizable) and 69% on the negative (noncrystallizable) class] was achieved in a 10-fold cross-validation experiment, indicating that the proposed algorithm may be a valuable tool for more efficient target selection in structural genomics. A Web server for protein crystallizability prediction called SECRET is available at http://webclu.bio.wzw.tum.de:8080/secret.
Collapse
Affiliation(s)
- Pawel Smialowski
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | | | | | | | | |
Collapse
|
131
|
Sánchez R, Grau R, Morgado E. A novel Lie algebra of the genetic code over the Galois field of four DNA bases. Math Biosci 2006; 202:156-74. [PMID: 16780898 DOI: 10.1016/j.mbs.2006.03.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2005] [Revised: 03/10/2006] [Accepted: 03/22/2006] [Indexed: 11/29/2022]
Abstract
Starting from the four DNA bases order in the Boolean lattice, a novel Lie Algebra of the genetic code is proposed. Here, the main partitions of the genetic code table were obtained as equivalent classes of quotient spaces of the genetic code vector space over the Galois field of the four DNA bases. The new algebraic structure shows strong connections among algebraic relationships, codon assignments and physicochemical properties of amino acids. Moreover, a distance defined between codons expresses a physicochemical meaning. It was also noticed that the distance between wild type and mutant codons tends to be small in mutational variants of four genes: human phenylalanine hydroxylase, human beta-globin, HIV-1 protease and HIV-1 reverse transcriptase. These results strongly suggest that deterministic rules in genetic code origin must be involved.
Collapse
Affiliation(s)
- Robersy Sánchez
- Research Institute of Tropical Roots, Tuber Crops, Bananas and Plantains (INIVIT), Biotechnology Group, Cuba.
| | | | | |
Collapse
|
132
|
Zougman A, Wiśniewski JR. Beyond Linker Histones and High Mobility Group Proteins: Global Profiling of Perchloric Acid Soluble Proteins. J Proteome Res 2006; 5:925-34. [PMID: 16602700 DOI: 10.1021/pr050415p] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Extraction with HClO(4) provides an easy method for efficient enrichment of both histone H1 and HMG proteins from a variety of tissues. Usually, the histone and the HMG proteins are the most abundant components of the extracts, however, other proteins have frequently been observed but only seldom studied in more detail. Here we describe a study aimed at global characterization of HClO(4) extractable proteins from breast cancer cell lines. We report identification of 150 unique proteins by liquid chromatography tandem mass spectrometry including almost all major histone H1 variants and canonical members of the HMG protein families. In the extracts, diverse proteins with HMG-like amino acid composition were identified and their post-translational modifications were mapped. Importantly, those include multiple proteins known or supposed to be related to cell proliferation and cancer. Since purification of these proteins as well as low abundant variants of histone and HMG proteins is difficult due to their metabolic instability, characterization of these proteins from crude extracts can facilitate studies aimed at better understanding of their function.
Collapse
|
133
|
Urbina D, Tang B, Higgs PG. The response of amino acid frequencies to directional mutation pressure in mitochondrial genome sequences is related to the physical properties of the amino acids and to the structure of the genetic code. J Mol Evol 2006; 62:340-61. [PMID: 16477524 DOI: 10.1007/s00239-005-0051-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Accepted: 10/01/2005] [Indexed: 11/29/2022]
Abstract
The frequencies of A, C, G, and T in mitochondrial DNA vary among species due to unequal rates of mutation between the bases. The frequencies of bases at fourfold degenerate sites respond directly to mutation pressure. At first and second positions, selection reduces the degree of frequency variation. Using a simple evolutionary model, we show that first position sites are less constrained by selection than second position sites and, therefore, that the frequencies of bases at first position are more responsive to mutation pressure than those at second position. We define a measure of distance between amino acids that is dependent on eight measured physical properties and a similarity measure that is the inverse of this distance. Columns 1, 2, 3, and 4 of the genetic code correspond to codons with U, C, A, and G in their second position, respectively. The similarity of amino acids in the four columns decreases systematically from column 1 to column 2 to column 3 to column 4. We then show that the responsiveness of first position bases to mutation pressure is dependent on the second position base and follows the same decreasing trend through the four columns. Again, this shows the correlation between physical properties and responsiveness. We determine a proximity measure for each amino acid, which is the average similarity between an amino acid and all others that are accessible via single point mutations in the mitochondrial genetic code structure. We also define a responsiveness for each amino acid, which measures how rapidly an amino acid frequency changes as a result of mutation pressure acting on the base frequencies. We show that there is a strong correlation between responsiveness and proximity, and that both these quantities are also correlated with the mutability of amino acids estimated from the mtREV substitution rate matrix. We also consider the variation of base frequencies between strands and between genes on a strand. These trends are consistent with the patterns expected from analysis of the variation among genomes.
Collapse
Affiliation(s)
- Daniel Urbina
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario, Canada
| | | | | |
Collapse
|
134
|
Goodarzi H, Shateri Najafabadi H, Torabi N. On the coevolution of genes and genetic code. Gene 2005; 362:133-40. [PMID: 16213111 DOI: 10.1016/j.gene.2005.08.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Revised: 07/17/2005] [Accepted: 08/03/2005] [Indexed: 10/25/2022]
Abstract
The canonical genetic code acts efficiently in minimizing the effects of mistranslations and point mutations. In the work presented we have also considered the effects of single nucleotide insertions and deletions on the optimality of the genetic code. Our results suggest that the canonical genetic code compensates for the ins/del mutations as well as mistranslations and point mutations. On the other hand, we highlighted the point that ins/del mutations have a lesser impact on the selected genes of Saccharomyces cerevisiae compared to randomly generated ones. We hypothesized that the codon usage preferences in S. cerevisiae genes are responsible for the higher efficiency of translation machinery in this organism. Our results support the conjecture that codon usage preferences render the genetic code more effective in minimizing the effects of ins/del mutations.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Tehran, Iran.
| | | | | |
Collapse
|
135
|
Najafabadi HS, Goodarzi H, Torabi N. Optimality of codon usage in Escherichia coli due to load minimization. J Theor Biol 2005; 237:203-9. [PMID: 15932760 DOI: 10.1016/j.jtbi.2005.04.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2005] [Revised: 04/02/2005] [Accepted: 04/04/2005] [Indexed: 11/19/2022]
Abstract
The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslational errors and point mutations, an ability which in term is designated "load minimization". One parameter involved in calculating the load minimizing property of the genetic code is codon usage. In most bacteria, synonymous codons are not used with equal frequencies. Different factors have been proposed to contribute to codon usage preference. It has been shown that the codon preference is correlated with the composition of the tRNA pool. Selection for translational efficiency and translational accuracy both result in such a correlation. In this work, it is shown that codon usage bias in Escherichia coli works so as to minimize the consequences of translational errors, i.e. optimized for load minimization.
Collapse
Affiliation(s)
- Hamed Shateri Najafabadi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Ave., Tehran, Iran.
| | | | | |
Collapse
|
136
|
Marquez R, Smit S, Knight R. Do universal codon-usage patterns minimize the effects of mutation and translation error? Genome Biol 2005; 6:R91. [PMID: 16277746 PMCID: PMC1297647 DOI: 10.1186/gb-2005-6-11-r91] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2005] [Revised: 08/24/2005] [Accepted: 09/21/2005] [Indexed: 12/03/2022] Open
Abstract
The analysis of codon usage in nearly 900 species of the three domains of life suggests that codon usage patterns in mRNA messages do not minimize the effects of translation error. Background Do species use codons that reduce the impact of errors in translation or replication? The genetic code is arranged in a way that minimizes errors, defined as the sum of the differences in amino-acid properties caused by single-base changes from each codon to each other codon. However, the extent to which organisms optimize the genetic messages written in this code has been far less studied. We tested whether codon and amino-acid usages from 457 bacteria, 264 eukaryotes, and 33 archaea minimize errors compared to random usages, and whether changes in genome G+C content influence these error values. Results We tested the hypotheses that organisms choose their codon usage to minimize errors, and that the large observed variation in G+C content in coding sequences, but the low variation in G+U or G+A content, is due to differences in the effects of variation along these axes on the error value. Surprisingly, the biological distribution of error values has far lower variance than randomized error values, but error values of actual codon and amino-acid usages are actually greater than would be expected by chance. Conclusion These unexpected findings suggest that selection against translation error has not produced codon or amino-acid usages that minimize the effects of errors, and that even messages with very different nucleotide compositions somehow maintain a relatively constant error value. They raise the question: why do all known organisms use highly error-minimizing genetic codes, but fail to minimize the errors in the mRNA messages they encode?
Collapse
Affiliation(s)
- Roberto Marquez
- Department of Computer Science, New Mexico State University, MSC CS, Las Cruces, NM 88003, USA
| | - Sandra Smit
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| | - Rob Knight
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
137
|
Goodarzi H, Najafabadi HS, Torabi N. Designing a neural network for the constraint optimization of the fitness functions devised based on the load minimization of the genetic code. Biosystems 2005; 81:91-100. [PMID: 15936137 DOI: 10.1016/j.biosystems.2005.02.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2004] [Revised: 01/16/2005] [Accepted: 02/02/2005] [Indexed: 11/20/2022]
Abstract
Nonrandom patterns in codon assignments are supported by many statistical and biochemical studies in the last two decades. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslational errors and point mutations, an ability, which in term is designated "load minimization". Prior studies have included many attempts at quantitative estimation of the fraction of randomly generated codes, which in terms of load minimization, score higher than the canonical genetic code. In this study, a neural network, which estimates a highly optimized genetic code in a relatively short period of time has been devised. Several fitness functions were used throughout this text. Meanwhile, we have made use of two cost measure matrices, PAM74-100 and mutation matrix.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab St., Tehran, Iran.
| | | | | |
Collapse
|
138
|
Goodarzi H, Najafabadi HS, Hassani K, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of coevolution theory by comparison of prominent cost measure matrices. J Theor Biol 2005; 235:318-25. [PMID: 15882694 DOI: 10.1016/j.jtbi.2005.01.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2004] [Revised: 01/20/2005] [Accepted: 01/24/2005] [Indexed: 11/22/2022]
Abstract
Statistical and biochemical studies have revealed non-random patterns in codon assignments. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations, since it is known that when an amino acid is converted to another due to error, the biochemical properties of the resulted amino acid are usually very similar to those of the original one. In this study, using altered forms of the fitness functions used in the prior studies, we have optimized the parameters involved in the calculation of the error minimizing property of the genetic code so that the genetic code outscores the random codes as much as possible. This work also compares two prominent matrices, the Mutation Matrix and Point Accepted Mutations 74-100 (PAM(74-100)). It has been resulted that the hypothetical properties of the coevolution theory of the genetic code are already considered in PAM(74-100), giving more evidence on the existence of bias towards the genetic code in this matrix. Furthermore, our results indicate that PAM(74-100) is biased towards the single base mistranslation occurrences in second codon position as well as the frequency of amino acids. Thus PAM(74-100) is not a suitable substitution matrix for the studies conducted on the evolution of the genetic code.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Ave., Tehran, Iran.
| | | | | | | | | |
Collapse
|
139
|
Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol 2005; 51:431-57. [PMID: 16012800 DOI: 10.1007/s00285-005-0332-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2004] [Revised: 03/17/2005] [Indexed: 12/14/2022]
Abstract
By considering two important factors involved in the codon-anticodon interactions, the hydrogen bond number and the chemical type of bases, a codon array of the genetic code table as an increasing code scale of interaction energies of amino acids in proteins was obtained. Next, in order to consecutively obtain all codons from the codon AAC, a sum operation has been introduced in the set of codons. The group obtained over the set of codons is isomorphic to the group (Z(64), +) of the integer module 64. On the Z(64)-algebra of the set of 64(N) codon sequences of length N, gene mutations are described by means of endomorphisms f:(Z(64))(N)-->(Z(64))(N). Endomorphisms and automorphisms helped us describe the gene mutation pathways. For instance, 77.7% mutations in 749 HIV protease gene sequences correspond to unique diagonal endomorphisms of the wild type strain HXB2. In particular, most of the reported mutations that confer drug resistance to the HIV protease gene correspond to diagonal automorphisms of the wild type. What is more, in the human beta-globin gene a similar situation appears where most of the single codon mutations correspond to automorphisms. Hence, in the analyses of molecular evolution process on the DNA sequence set of length N, the Z(64)-algebra will help us explain the quantitative relationships between genes.
Collapse
Affiliation(s)
- R Sanchez
- Research Institute of Tropical Roots, Tuber Crops and Banana (INIVIT), Biotechnology group, Santo Domingo, Villa Clara, Cuba.
| | | | | |
Collapse
|
140
|
How reliable re-adjustment is: correspondence regarding A. Fuglsang, "The 'effective number of codons' revisited". Biochem Biophys Res Commun 2004; 324:1-2. [PMID: 15464973 DOI: 10.1016/j.bbrc.2004.08.213] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2004] [Indexed: 11/18/2022]
Abstract
A. Fuglsang [Biochem. Biophys. Res. Commun. 317 (2004) 957-964] suggested that effective number of codons for individual amino acids (Nc-values) should be re-adjusted to the number of synonymous codons of those amino acids, in order to prevent the overestimation of the effective number of codons. Here, it is shown that re-adjustment at the level of individual amino acids results in loss of considerable amounts of information. Furthermore, we have shown that theoretical Nc-values are functions of GC3s (and GC1s); as a result, when an amino acid Nc-value exceeds the related theoretical Nc-value, the implication of re-adjustment depends on the GC composition of the gene.
Collapse
|
141
|
Goodarzi H, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of termination codons. Biosystems 2004; 77:163-73. [PMID: 15527955 DOI: 10.1016/j.biosystems.2004.05.031] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2004] [Revised: 05/09/2004] [Accepted: 05/25/2004] [Indexed: 11/18/2022]
Abstract
The existence of nonrandom patterns in codon assignments is supported by many statistical and biochemical studies. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations. For example, it is known that when an error induces the conversion of an amino acid to another, the biochemical properties of the resulting amino acid are usually very similar to that of the original. Prior studies include many attempts at quantitative estimation of the fraction of randomly generated codes which, based upon load minimization, score higher than the canonical genetic code. In this study, we took into consideration both the relative frequencies of amino acids and nonsense mistranslations, factors which had been previously ignored. Incorporation of these parameters, resulted in a fitness function (phi) which rendered the canonical genetic code to be highly optimized with respect to load minimization. Considering termination codons, we applied a biosynthetic version of the coevolution theory, however, with low significance. We employed a revised cost for the precursor-product pairs of amino acids and showed that the significance of this approach depends on the cost measure matrix used by the researcher. Thus, we have compared the two prominent matrices, point accepted mutations 74-100 (PAM(74-100)) and mutation matrix in our study.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Avenue, Tehran, Iran.
| | | | | |
Collapse
|
142
|
Abstract
Since discovering the pattern by which amino acids are assigned to codons within the standard genetic code, investigators have explored the idea that natural selection placed biochemically similar amino acids near to one another in coding space so as to minimize the impact of mutations and/or mistranslations. The analytical evidence to support this theory has grown in sophistication and strength over the years, and counterclaims questioning its plausibility and quantitative support have yet to transcend some significant weaknesses in their approach. These weaknesses are illustrated here by means of a simple simulation model for adaptive genetic code evolution. There remain ill explored facets of the 'error minimizing' code hypothesis, however, including the mechanism and pathway by which an adaptive pattern of codon assignments emerged, the extent to which natural selection created synonym redundancy, its role in shaping the amino acid and nucleotide languages, and even the correct interpretation of the adaptive codon assignment pattern: these represent fertile areas for future research.
Collapse
Affiliation(s)
- Stephen J Freeland
- Department of Biology, University of Maryland, Baltimore County, Catonsville, MD, USA.
| | | | | |
Collapse
|