1
|
Hia F, Takeuchi O. The effects of codon bias and optimality on mRNA and protein regulation. Cell Mol Life Sci 2021; 78:1909-1928. [PMID: 33128106 PMCID: PMC11072601 DOI: 10.1007/s00018-020-03685-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 10/05/2020] [Accepted: 10/12/2020] [Indexed: 12/25/2022]
Abstract
The central dogma of molecular biology entails that genetic information is transferred from nucleic acid to proteins. Notwithstanding retro-transcribing genetic elements, DNA is transcribed to RNA which in turn is translated into proteins. Recent advancements have shown that each stage is regulated to control protein abundances for a variety of essential physiological processes. In this regard, mRNA regulation is essential in fine-tuning or calibrating protein abundances. In this review, we would like to discuss one of several mRNA-intrinsic features of mRNA regulation that has been gaining traction of recent-codon bias and optimality. Specifically, we address the effects of codon bias with regard to codon optimality in several biological processes centred on translation, such as mRNA stability and protein folding among others. Finally, we examine how different organisms or cell types, through this system, are able to coordinate physiological pathways to respond to a variety of stress or growth conditions.
Collapse
Affiliation(s)
- Fabian Hia
- Department of Medical Chemistry, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Osamu Takeuchi
- Department of Medical Chemistry, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
2
|
Faure G, Ogurtsov AY, Shabalina SA, Koonin EV. Role of mRNA structure in the control of protein folding. Nucleic Acids Res 2016; 44:10898-10911. [PMID: 27466388 PMCID: PMC5159526 DOI: 10.1093/nar/gkw671] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Revised: 07/12/2016] [Accepted: 07/14/2016] [Indexed: 11/13/2022] Open
Abstract
Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
3
|
Xu L, Tang H, Chen DW, El-Naggar AK, Wei P, Sturgis EM. Genome-wide association study identifies common genetic variants associated with salivary gland carcinoma and its subtypes. Cancer 2015; 121:2367-74. [PMID: 25823930 DOI: 10.1002/cncr.29381] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Revised: 01/20/2015] [Accepted: 02/09/2015] [Indexed: 01/20/2023]
Abstract
BACKGROUND Salivary gland carcinomas (SGCs) are a rare malignancy with unknown etiology. The objective of the current study was to identify genetic variants modifying the risk of SGC and its major subtypes: adenoid cystic carcinoma and mucoepidermoid carcinoma. METHODS The authors conducted a genome-wide association study in 309 well-defined SGC cases and 535 cancer-free controls. A single-nucleotide polymorphism (SNP)-level discovery study was performed in non-Hispanic white individuals followed by a replication study in Hispanic individuals. A logistic regression analysis was applied to calculate odds ratios (ORs) and 95% confidence intervals (95% CIs). A meta-analysis of the results was conducted. RESULTS A genome-wide significant association with SGC in non-Hispanic white individuals was detected at coding SNPs in CHRNA2 (cholinergic receptor, nicotinic, alpha 2 [neuronal]) (OR, 8.55; 95% CI, 4.53-16.13 [P = 3.6 × 10(-11)]), OR4F15 (olfactory receptor, family 4, subfamily F, member 15) (OR, 5.26; 95% CI, 3.13-8.83 [P = 3.5 × 10(-10)]), ZNF343 (zinc finger protein 343) (OR, 3.28; 95% CI, 2.12-5.07 [P = 9.1 × 10(-8)]), and PARP4 (poly(ADP-ribose) polymerase family, member 4) (OR, 2.00; 95% CI, 1.54-2.59 [P = 1.7 × 10(-7)]). Meta-analysis of the non-Hispanic white and Hispanic cohorts identified another genome-wide significant SNP in ELL2 (meta-OR, 1.86; 95% CI, 1.48-2.34 [P = 1.3 × 10(-7)]). Risk alleles were largely enriched in mucoepidermoid carcinoma, in which the SNPs in CHRNA2, OR4F15, and ZNF343 had ORs of 15.71 (95% CI, 6.59-37.47 [P = 5.2 × 10(-10)]), 15.60 (95% CI, 6.50-37.41 [P = 7.5 × 10(-10)]), and 6.49 (95% CI, 3.36-12.52 [P = 2.5 × 10(-8)]), respectively. None of these SNPs retained a significant association with adenoid cystic carcinoma. CONCLUSIONS To the best of the authors' knowledge, the current study is the first to identify a panel of SNPs associated with the risk of SGC. Confirmation of these findings along with functional analysis of identified SNPs are needed.
Collapse
Affiliation(s)
- Li Xu
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Hongwei Tang
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Diane W Chen
- Clincal Research, Quality Improvement, Baylor College of Medicine, Houston, Texas
| | - Adel K El-Naggar
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Peng Wei
- Division of Biostatistics and Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, Texas
| | - Erich M Sturgis
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas.,Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
4
|
Su JH, Ma XX, He YL, Li JD, Ma XS, Dou YX, Luo XN, Cai XP. Mapping codon usage of the translation initiation region in porcine reproductive and respiratory syndrome virus genome. Virol J 2011; 8:476. [PMID: 22014033 PMCID: PMC3219751 DOI: 10.1186/1743-422x-8-476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 10/21/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Porcine reproductive and respitatory syndrome virus (PRRSV) is a recently emerged pathogen and severely affects swine populations worldwide. The replication of PRRSV is tightly controlled by viral gene expression and the codon usage of translation initiation region within each gene could potentially regulate the translation rate. Therefore, a better understanding of the codon usage pattern of the initiation translation region would shed light on the regulation of PRRSV gene expression. RESULTS In this study, the codon usage in the translation initiation region and in the whole coding sequence was compared in PRRSV ORF1a and ORFs2-7. To investigate the potential role of codon usage in affecting the translation initiation rate, we established a codon usage model for PRRSV translation initiation region. We observed that some non-preferential codons are preferentially used in the translation initiation region in particular ORFs. Although some positions vary with codons, they intend to use codons with negative CUB. Furthermore, our model of codon usage showed that the conserved pattern of CUB is not directly consensus with the conserved sequence, but shaped under the translation selection. CONCLUSIONS The non-variation pattern with negative CUB in the PRRSV translation initiation region scanned by ribosomes is considered the rate-limiting step in the translation process.
Collapse
Affiliation(s)
- Jun-hong Su
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, PR China
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Deane CM, Saunders R. The imprint of codons on protein structure. Biotechnol J 2011; 6:641-9. [DOI: 10.1002/biot.201000329] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Revised: 03/10/2011] [Accepted: 03/23/2011] [Indexed: 12/23/2022]
|
6
|
Li Y, Wang C, Cheng X, Wu T, Zhang C. Synonymous codon usage of the VP2 gene of a very virulent infectious bursal disease virus isolate serial passaged in chicken embryos. Biosystems 2011; 104:42-7. [DOI: 10.1016/j.biosystems.2010.12.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Revised: 10/28/2010] [Accepted: 12/23/2010] [Indexed: 11/29/2022]
|
7
|
Saunders R, Deane CM. Synonymous codon usage influences the local protein structure observed. Nucleic Acids Res 2010; 38:6719-28. [PMID: 20530529 PMCID: PMC2965230 DOI: 10.1093/nar/gkq495] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Translation of mRNA into protein is a unidirectional information flow process. Analysing the input (mRNA) and output (protein) of translation, we find that local protein structure information is encoded in the mRNA nucleotide sequence. The Coding Sequence and Structure (CSandS) database developed in this work provides a detailed mapping between over 4000 solved protein structures and their mRNA. CSandS facilitates a comprehensive analysis of codon usage over many organisms. In assigning translation speed, we find that relative codon usage is less informative than tRNA concentration. For all speed measures, no evidence was found that domain boundaries are enriched with slow codons. In fact, genes seemingly avoid slow codons around structurally defined domain boundaries. Translation speed, however, does decrease at the transition into secondary structure. Codons are identified that have structural preferences significantly different from the amino acid they encode. However, each organism has its own set of ‘significant codons’. Our results support the premise that codons encode more information than merely amino acids and give insight into the role of translation in protein folding.
Collapse
Affiliation(s)
- Rhodri Saunders
- Department of Statistics, Oxford University, 1 South Parks Road, Oxford OX1 3TG, UK.
| | | |
Collapse
|
8
|
AT2-AT3-profiling: a new look at synonymous codon usage. J Theor Biol 2006; 243:308-21. [PMID: 16930630 DOI: 10.1016/j.jtbi.2006.07.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Revised: 06/22/2006] [Accepted: 07/10/2006] [Indexed: 11/30/2022]
Abstract
The teleology of synonymous codon usage (SCU) still awaits a unifying concept. Here the 2nd codon letter of human mRNA-codons was graphically, aided by a computer program, put in relation to the 3rd codon letter, the carrier of SCU: AT2, the density of A+T in 2nd codon position, behaves to AT3, the analogous density of the 3rd codon position, mostly in an inverse fashion that can be expressed as typical figures: mRNAs with an overall AT-density below 50% have a tendency to produce bulky figures called "red dragons" (when redness is attributed to graph-areas, where AT3< AT2), while mRNAs with an AT-density above 50% produce a pattern called "harlequin" consisting of alternating red and blue (blueness, in analogy, when AT3>AT2) diamonds. With more diversion of AT3 from AT2, the harlequin patterns can assume the pattern of a "blue dragon". By analysing the mRNA of known proteins, these patterns can be correlated with certain functional regions: proteins with multiple transmembrane passages show bulky "red dragons", structural proteins with a high glycine- and proline content such as collagen result in "blue dragons". Non-coding mRNAs tend to show a balance between AT2 and AT3 and hence "harlequin patterns". Signal peptides usually code red due to a low AT3 with an AT2-density at the expectance level. With this technique DNA-sequences of as yet unknown functional meaning were scanned. When stretches of harlequin patterns appear interrupted by red or blue dragons, closer scrutiny of these stretches can reveal ORFs which deserve to be looked at more closely for their protein-informational content. At least in humans, SCU appears to follow protein-dependent AT2-density in a reciprocal fashion and does not seem to serve the purpose of influencing mRNA secondary structure which is discussed in depth.
Collapse
|
9
|
Jia M, Luo L. The relation between mRNA folding and protein structure. Biochem Biophys Res Commun 2006; 343:177-82. [PMID: 16530729 DOI: 10.1016/j.bbrc.2006.02.135] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2006] [Accepted: 02/23/2006] [Indexed: 12/14/2022]
Abstract
About 200 mRNA sequences of Escherichia coli and human with matching protein secondary structure data were studied. The mRNA folding for each native sequence and for corresponding randomized sequences was calculated through free energy minimization. We have found that the folding energy of mRNA segments in different protein secondary structures is significantly different. The average Z score is more negative for regular secondary structure (alpha-helix and beta-strand) than that for coil. This suggests that the codon choice in native mRNA sequence coding for protein regular structure contributes more to the mRNA folding stability.
Collapse
Affiliation(s)
- Mengwen Jia
- Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | | |
Collapse
|
10
|
Luo L, Jia M, Li X. Protein structure preference, tRNA copy number, and mRNA stem/loop content. Biopolymers 2004; 74:432-47. [PMID: 15274087 DOI: 10.1002/bip.20094] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
From statistical analyses of protein sequences for humans and Escherichia coli we found that the messenger RNA segment of m-codons (for m=2 to 6) with average high tRNA copy number (TCN) (larger than approximately 10.5 for humans or approximately 1.95 for E. coli) preferably code for the alpha helix and that with low TCN (smaller than approximately 7.5 for humans or approximately 1.7 for E. coli) preferably code for coil. Between them there is an intermediate region without correlation to structure preference. For the beta strand the preference/ avoidance tendency is not obvious. All strong preference-modes of TCN for protein secondary structures have been deduced. The mutual interaction between two factors--protein secondary structural type and codon TCN--is tested by F distribution. A phenomenological model on the relation between structure preference and translational efficiency or accuracy is proposed. It is pointed out that the structure preference of codons is related to the distribution of mRNA stem/loop content in three TCN regions.
Collapse
Affiliation(s)
- Liaofu Luo
- Department of Physics, Inner Mongolia University, Hohhot 010021, China.
| | | | | |
Collapse
|
11
|
Kato S, Han SY, Liu W, Otsuka K, Shibata H, Kanamaru R, Ishioka C. Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc Natl Acad Sci U S A 2003; 100:8424-9. [PMID: 12826609 PMCID: PMC166245 DOI: 10.1073/pnas.1431692100] [Citation(s) in RCA: 604] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Inactivation of the tumor suppressor p53 by missense mutations is the most frequent genetic alteration in human cancers. The common missense mutations in the TP53 gene disrupt the ability of p53 to bind to DNA and consequently to transactivate downstream genes. However, it is still not fully understood how a large number of the remaining mutations affect p53 structure and function. Here, we used a comprehensive site-directed mutagenesis technique and a yeast-based functional assay to construct, express, and evaluate 2,314 p53 mutants representing all possible amino acid substitutions caused by a point mutation throughout the protein (5.9 substitutions per residue), and correlated p53 function with structure- and tumor-derived mutations. This high-resolution mutation analysis allows evaluation of previous predictions and hypotheses through interrelation of function, structure and mutation.
Collapse
|
12
|
Jia M, Luo L, Liu C. Statistical correlation between protein secondary structure and messenger RNA stem-loop structure. Biopolymers 2003; 73:16-26. [PMID: 14691936 DOI: 10.1002/bip.10496] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect.
Collapse
Affiliation(s)
- Mengwen Jia
- Department of Physics, Inner Mongolia University, Hohhot 010021, China
| | | | | |
Collapse
|
13
|
Reiss C, Ehrlich R, Lesnick T, Parvez S, Parvez H. Conformational diseases: misfolding mechanisms may pave the way to early therapy. Neurotoxicol Teratol 2002; 24:ix-xiv. [PMID: 12200201 DOI: 10.1016/s0892-0362(02)00312-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- C Reiss
- Alzheim' R&D-Vigicell, 2, rue de la Noue, F91190 Gif, France
| | | | | | | | | |
Collapse
|
14
|
Chiusano ML, Alvarez-Valin F, Di Giulio M, D'Onofrio G, Ammirato G, Colonna G, Bernardi G. Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code. Gene 2000; 261:63-9. [PMID: 11164038 DOI: 10.1016/s0378-1119(00)00521-7] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The nucleotide frequencies in the second codon positions of genes are remarkably different for the coding regions that correspond to different secondary structures in the encoded proteins, namely, helix, beta-strand and aperiodic structures. Indeed, hydrophobic and hydrophilic amino acids are encoded by codons having U or A, respectively, in their second position. Moreover, the beta-strand structure is strongly hydrophobic, while aperiodic structures contain more hydrophilic amino acids. The relationship between nucleotide frequencies and protein secondary structures is associated not only with the physico-chemical properties of these structures but also with the organisation of the genetic code. In fact, this organisation seems to have evolved so as to preserve the secondary structures of proteins by preventing deleterious amino acid substitutions that could modify the physico-chemical properties required for an optimal structure.
Collapse
Affiliation(s)
- M L Chiusano
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, I-80121, Naples, Italy
| | | | | | | | | | | | | |
Collapse
|
15
|
Gupta SK, Majumdar S, Bhattacharya TK, Ghosh TC. Studies on the relationships between the synonymous codon usage and protein secondary structural units. Biochem Biophys Res Commun 2000; 269:692-6. [PMID: 10720478 DOI: 10.1006/bbrc.2000.2351] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The relationship between the synonymous codon usage and protein secondary structural elements (alpha helices and beta sheets) were reinvestigated by taking structural information of proteins from Protein Data Bank (PDB) and their corresponding mRNA sequences from GenBank for four different organisms E. coli, B. subtilis, S. cerevisiae, and Homo sapiens. It was observed that synonymous codon families have non-random codon usage, but there does not exist any species invariant universal correlation between the synonymous codon usage and protein secondary structural elements. The secondary structural units of proteins can be distinguished from the occurrences of bases at the second codon position.
Collapse
Affiliation(s)
- S K Gupta
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, Calcutta, 700 054, India
| | | | | | | |
Collapse
|
16
|
Komar AA, Lesnik T, Reiss C. Synonymous codon substitutions affect ribosome traffic and protein folding during in vitro translation. FEBS Lett 1999; 462:387-91. [PMID: 10622731 DOI: 10.1016/s0014-5793(99)01566-5] [Citation(s) in RCA: 294] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To investigate the possible influence of the local rates of translation on protein folding, 16 consecutive rare (in Escherichia coli) codons in the chloramphenicol acetyltransferase (CAT) gene have been replaced by frequent ones. Site-directed silent mutagenesis reduced the pauses in translation of CAT in E. coli S30 extract cell-free system and led to the acceleration of the overall rate of CAT protein synthesis. At the same time, the silently mutated protein (with unaltered protein sequence) synthesized in the E. coli S30 extract system was shown to possess 20% lower specific activity. The data suggest that kinetics of protein translation can affect the in vivo protein-folding pathway, leading to increased levels of protein misfolding.
Collapse
Affiliation(s)
- A A Komar
- Centre de Génétique Moléculaire, CNRS, Gif-sur-Yvette, France.
| | | | | |
Collapse
|
17
|
Torney DC, Whittaker CC, Xie G. The stationary statistical properties of human coding sequences. J Mol Biol 1999; 286:1461-9. [PMID: 10064709 DOI: 10.1006/jmbi.1998.2567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We introduce a generally applicable method for the discovery and quantitation of all of the characteristic statistical properties of a class of biological sequences, given examples from the class. This method employs a reversible binary encoding of sequences into the binary digits -1 and +1. Then, provided that the sample is sufficient, the sample cumulants on the subsets of digit positions will manifest all of the statistical properties of the class. As an illustration, we present the main results of a complete characterization of the stationary statistical properties of human coding sequences, in terms of their sample cumulants. Many of the telling sample cumulants are described.
Collapse
Affiliation(s)
- D C Torney
- Theoretical Division and U.S.D.O.E. Joint Genome Institute, Mail Stop K710, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.
| | | | | |
Collapse
|
18
|
Abstract
Two more organisms from different taxonomic groups were added to a new version of the Integrated Sequence-Structure Database (ISSD). ISSD serves as an integrated source of sequence and structure information for the analysis of correlations between mRNA synonymous codon usage and three-dimensional structure of the encoded proteins. ISSD now holds 88 non-homologous Escherichia coli proteins and 25 yeast Saccharomyces cerevisiae proteins in addition to the expanded set of mammalian proteins, which includes 166 proteins (107 in ISSD Version 1.0). Comparison of ISSD sequences with organism-specific codon usage data derived from CUTG database shows that it is a representative subset of the GenBank coding sequences data. Preliminary results of the statistical analysis confirm that sequence-structure correlations observed by us earlier are also present in the upgraded ISSD (Version 2.0), including bacterial and yeast proteins. The ISSD Version 2.0 release includes an improved Web-based data search and retrieval system and is accessible via URL http://www.protein.bio.msu.su/issd/. ISSD can be also accessed at ExPASy, URL http://www.expasy.ch/swissmod/swiss-model.htm l
Collapse
Affiliation(s)
- I A Adzhubei
- Department of Molecular Biology, Faculty of Biology, Lomonosov Moscow State University, 119899 Moscow, Russia
| | | |
Collapse
|
19
|
Abstract
The hypothesis that synonymous codon usage is related to protein three-dimensional structure is examined by investigating the correlation between synonymous codon usage and protein secondary structure. All except two codons in E. coli show the same secondary structural preference for alpha-helix, beta-strand or coil as that of amino acids to be encoded by the respective codons, while 17 codons show secondary structural bias in mammalian proteins. The results indicate that there is no significant correlation between synonymous codon usage and protein secondary structure in E. coli, but there is a correlation in mammals. It could be deduced that synonymous codons carry much less structural information in prokaryotes than in eukaryotes due to their divergent evolutionary mechanism.
Collapse
Affiliation(s)
- T Xie
- Shanghai Institute of Biochemistry, Academia Sinica, People's Republic of China
| | | | | | | |
Collapse
|