1
|
Zavala B, Dineen L, Fisher KJ, Opulente DA, Harrison MC, Wolters JF, Shen XX, Zhou X, Groenewald M, Hittinger CT, Rokas A, LaBella AL. Genomic factors shaping codon usage across the Saccharomycotina subphylum. G3 (BETHESDA, MD.) 2024; 14:jkae207. [PMID: 39213398 PMCID: PMC11540330 DOI: 10.1093/g3journal/jkae207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 08/15/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024]
Abstract
Codon usage bias, or the unequal use of synonymous codons, is observed across genes, genomes, and between species. It has been implicated in many cellular functions, such as translation dynamics and transcript stability, but can also be shaped by neutral forces. We characterized codon usage across 1,154 strains from 1,051 species from the fungal subphylum Saccharomycotina to gain insight into the biases, molecular mechanisms, evolution, and genomic features contributing to codon usage patterns. We found a general preference for A/T-ending codons and correlations between codon usage bias, GC content, and tRNA-ome size. Codon usage bias is distinct between the 12 orders to such a degree that yeasts can be classified with an accuracy >90% using a machine learning algorithm. We also characterized the degree to which codon usage bias is impacted by translational selection. We found it was influenced by a combination of features, including the number of coding sequences, BUSCO count, and genome length. Our analysis also revealed an extreme bias in codon usage in the Saccharomycodales associated with a lack of predicted arginine tRNAs that decode CGN codons, leaving only the AGN codons to encode arginine. Analysis of Saccharomycodales gene expression, tRNA sequences, and codon evolution suggests that avoidance of the CGN codons is associated with a decline in arginine tRNA function. Consistent with previous findings, codon usage bias within the Saccharomycotina is shaped by genomic features and GC bias. However, we find cases of extreme codon usage preference and avoidance along yeast lineages, suggesting additional forces may be shaping the evolution of specific codons.
Collapse
Affiliation(s)
- Bryan Zavala
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina Research Campus, Kannapolis, NC 28081, USA
| | - Lauren Dineen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina Research Campus, Kannapolis, NC 28081, USA
| | - Kaitlin J Fisher
- Department of Biological Sciences, SUNY Oswego, Oswego, NY 13126, USA
- Laboratory of Genetics, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin–Madison, Madison, WI 53726, USA
| | - Dana A Opulente
- Department of Biology, Villianova University, Villanova, PA 19085, USA
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53726, USA
| | - Marie-Claire Harrison
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - John F Wolters
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53726, USA
| | - Xing-Xing Shen
- Institute of Insect Sciences and Centre for Evolutionary and Organismal Biology, Zhejiang University, Hangzhou 310058, China
| | - Xiaofan Zhou
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou 510642, China
| | | | - Chris Todd Hittinger
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53726, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Abigail Leavitt LaBella
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina Research Campus, Kannapolis, NC 28081, USA
- Center for Computational Intelligence to Predict Health and Environmental Risks (CIPHER), University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28233, USA
| |
Collapse
|
2
|
Zavala B, Dineen L, Fisher KJ, Opulente DA, Harrison MC, Wolters JF, Shen XX, Zhou X, Groenewald M, Hittinger CT, Rokas A, LaBella AL. Genomic factors shaping codon usage across the Saccharomycotina subphylum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.595506. [PMID: 38826271 PMCID: PMC11142207 DOI: 10.1101/2024.05.23.595506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Codon usage bias, or the unequal use of synonymous codons, is observed across genes, genomes, and between species. The biased use of synonymous codons has been implicated in many cellular functions, such as translation dynamics and transcript stability, but can also be shaped by neutral forces. The Saccharomycotina, the fungal subphylum containing the yeasts Saccharomyces cerevisiae and Candida albicans , has been a model system for studying codon usage. We characterized codon usage across 1,154 strains from 1,051 species to gain insight into the biases, molecular mechanisms, evolution, and genomic features contributing to codon usage patterns across the subphylum. We found evidence of a general preference for A/T-ending codons and correlations between codon usage bias, GC content, and tRNA-ome size. Codon usage bias is also distinct between the 12 orders within the subphylum to such a degree that yeasts can be classified into orders with an accuracy greater than 90% using a machine learning algorithm trained on codon usage. We also characterized the degree to which codon usage bias is impacted by translational selection. Interestingly, the degree of translational selection was influenced by a combination of genome features and assembly metrics that included the number of coding sequences, BUSCO count, and genome length. Our analysis also revealed an extreme bias in codon usage in the Saccharomycodales associated with a lack of predicted arginine tRNAs. The order contains 24 species, and 23 are computationally predicted to lack tRNAs that decode CGN codons, leaving only the AGN codons to encode arginine. Analysis of Saccharomycodales gene expression, tRNA sequences, and codon evolution suggests that extreme avoidance of the CGN codons is associated with a decline in arginine tRNA function. Codon usage bias within the Saccharomycotina is generally consistent with previous investigations in fungi, which show a role for both genomic features and GC bias in shaping codon usage. However, we find cases of extreme codon usage preference and avoidance along yeast lineages, suggesting additional forces may be shaping the evolution of specific codons.
Collapse
|
3
|
Abstract
AbstractEvolutionary biologists have thought about the role of genetic variation during adaptation for a very long time-before we understood the organization of the genetic code, the provenance of genetic variation, and how such variation influenced the phenotypes on which natural selection acts. Half a century after the discovery of the structure of DNA and the unraveling of the genetic code, we have a rich understanding of these problems and the means to both delve deeper and widen our perspective across organisms and natural populations. The 2022 Vice Presidential Symposium of the American Society of Naturalists highlighted examples of recent insights into the role of genetic variation in adaptive processes, which are compiled in this special section. The work was conducted in different parts of the world, included theoretical and empirical studies with diverse organisms, and addressed distinct aspects of how genetic variation influences adaptation. In our introductory article to the special section, we discuss some important recent insights about the generation and maintenance of genetic variation, its impacts on phenotype and fitness, its fate in natural populations, and its role in driving adaptation. By placing the special section articles in the broader context of recent developments, we hope that this overview will also serve as a useful introduction to the field.
Collapse
|
4
|
Das S, Roymondal U, Sahoo S. Analyzing gene expression from relative codon usage bias in Yeast genome: a statistical significance and biological relevance. Gene 2009; 443:121-31. [PMID: 19410638 DOI: 10.1016/j.gene.2009.04.022] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2008] [Revised: 03/08/2009] [Accepted: 04/20/2009] [Indexed: 11/17/2022]
Abstract
Based on the hypothesis that highly expressed genes are often characterized by strong compositional bias in terms of codon usage, there are a number of measures currently in use that quantify codon usage bias in genes, and hence provide numerical indices to predict the expression levels of genes. With the recent advent of expression measure from the score of the relative codon usage bias (RCBS), we have explicitly tested the performance of this numerical measure to predict the gene expression level and illustrate this with an analysis of Yeast genomes. In contradiction with previous other studies, we observe a weak correlations between GC content and RCBS, but a selective pressure on the codon preferences in highly expressed genes. The assertion that the expression of a given gene depends on the score of relative codon usage bias (RCBS) is supported by the data. We further observe a strong correlation between RCBS and protein length indicating natural selection in favour of shorter genes to be expressed at higher level. We also attempt a statistical analysis to assess the strength of relative codon bias in genes as a guide to their likely expression level, suggesting a decrease of the informational entropy in the highly expressed genes.
Collapse
Affiliation(s)
- Shibsankar Das
- Department of Mathematics, Uluberia College, Uluberia, Howrah, W.B., India
| | | | | |
Collapse
|
5
|
Mougel F, Manichanh C, Duchateau N'guyen G, Termier M. Genomic Choice of Codons in 16 Microbial Species. J Biomol Struct Dyn 2004; 22:315-29. [PMID: 15473705 DOI: 10.1080/07391102.2004.10507003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
We study the codon usage over whole set of ORFs of 16 unicellular microbial species: eight archaebacteria, seven eubacteria, and one eukarya. We first try to define, for each species, the neutral expected codon usage to better approach subsequently the influence of selection. Overlapping triplets counted from the complete DNA genomic sequence and mean amino acid composition of ORFs allow us to build satisfying expected codon usage for each species. Within species deviation from this neutral model is then studied through Correspondence Analysis and characterization with bias index, N(C)' (effective number of codons reported to neutral model). Our results are compared to previously published ones for three species and let appear good agreement in spite of very different methods. We thus propose set of codons probably preferred by selection for nine other species. In the four last species, no clear preference can be evidenced. Finally, we characterize variation of codon usage over functional categories. We propose that the high degree of bias of proteins involved in translation, ribosomal structure and biogenesis has a positive influence on overexpression of the corresponding genes under optimum growth conditions and is a negative regulator of the same genes when amino acids become limited resources.
Collapse
Affiliation(s)
- F Mougel
- Bioinformatique des Genomes, IGM, bat. 400, 91405 Orsay CEDEX, France
| | | | | | | |
Collapse
|
6
|
Deoxyribonucleic acid methylation and chromatin organization in Tetrahymena thermophila. Mol Cell Biol 1997. [PMID: 9279374 DOI: 10.1128/mcb.1.7.600] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Deoxyribonucleic acid (DNA) of the transcriptionally active macronucleus of Tetrahymena thermophila is methylated at the N6 position of adenine to produce methyladenine (MeAde); approximately 1 in every 125 adenine residues (0.8 mol%) is methylated. Transcriptionally inert micronuclear DNA is not methylated (< or = 0.01 mol% MeAde; M. A. Gorovsky, S. Hattman, and G. L. Pleger, J. Cell Biol. 56:697-701, 1973). There is no detectable cytosine methylation in macronuclei in Tetrahymena DNA (< or = 0.01 mol% 5-methylcytosine). MeAde-containing DNA sequences in macronuclei are preferentially digested by both staphylococcal nuclease and pancreatic deoxyribonuclease I. In contrast, there is no preferential release of MeAde during digestion of purified DNA. These results indicate that MeAde residues are predominantly located in "linker DNA" and perhaps have a function in transcription. Pulse-chase studies showed that labeled MeAde remains preferentially in linker DNA during subsequent rounds of DNA replication; i.e., there is little, if any, movement of nucleosomes during chromatin replication. This implies that nucleosomes may be phased with respect to DNA sequence.
Collapse
|
7
|
Cerutti H, Osman M, Grandoni P, Jagendorf AT. A homolog of Escherichia coli RecA protein in plastids of higher plants. Proc Natl Acad Sci U S A 1992; 89:8068-72. [PMID: 1518831 PMCID: PMC49857 DOI: 10.1073/pnas.89.17.8068] [Citation(s) in RCA: 116] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Studies of chloroplast DNA variations, and several direct experimental observations, indicate the existence of recombination ability in algal and higher plant plastids. However, no studies have been done of the biochemical pathways involved. Using a part of a cyanobacterial recA gene as a probe in Southern blots, we have found homologous sequences in total DNA from Pisum sativum and Arabidopsis thaliana and in a cDNA library from Arabidopsis. A cDNA was cloned and sequenced, and its predicted amino acid sequence is 60.7% identical to that of the cyanobacterial RecA protein. This finding is consistent with our other results showing both DNA strand transfer activity and the existence of a protein of the predicted molecular mass crossreactive with antibodies to Escherichia coli RecA in the stroma of pea chloroplasts.
Collapse
Affiliation(s)
- H Cerutti
- Section of Plant Biology, Cornell University, Ithaca, NY 14853
| | | | | | | |
Collapse
|
8
|
Oliver JL, Marín A, Martínez-Zapater JM. Chloroplast genes transferred to the nuclear plant genome have adjusted to nuclear base composition and codon usage. Nucleic Acids Res 1990; 18:65-73. [PMID: 2308837 PMCID: PMC330204 DOI: 10.1093/nar/18.1.65] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
During plant evolution, some plastid genes have been moved to the nuclear genome. These transferred genes are now correctly expressed in the nucleus, their products being transported into the chloroplast. We compared the base compositions, the distributions of some dinucleotides and codon usages of transferred, nuclear and chloroplast genes in two dicots and two monocots plant species. Our results indicate that transferred genes have adjusted to nuclear base composition and codon usage, being now more similar to the nuclear genes than to the chloroplast ones in every species analyzed.
Collapse
Affiliation(s)
- J L Oliver
- Unidad de Genética, Facultad de Ciencias, Universidad de Granada, Spain
| | | | | |
Collapse
|
9
|
Blumberg BM, Crowley JC, Silverman JI, Menonna J, Cook SD, Dowling PC. Measles virus L protein evidences elements of ancestral RNA polymerase. Virology 1988; 164:487-97. [PMID: 2835864 DOI: 10.1016/0042-6822(88)90563-6] [Citation(s) in RCA: 67] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We have determined the nucleotide sequence of the measles virus (MV) L gene using a cDNA library encompassing the entire MV genome (J. Crowley et al. (1987) Intervirology, 28, 65-77). The L gene is 6639 nucleotides in length, and contains a single long open reading frame that could code for a protein of 247,611 kDa. Both the L gene and in particular the predicted L protein of MV bear substantial homology to their counterparts in Sendai virus and Newcastle disease virus, suggesting that the multifunctional nature of paramyxovirus L proteins imposes strong evolutionary constraints. The predicted MV L protein also contains distinct elements of a postulated ancestral RNA polymerase.
Collapse
Affiliation(s)
- B M Blumberg
- Neurology Service, East Orange VA Medical Center, New Jersey 07019
| | | | | | | | | | | |
Collapse
|
10
|
Abstract
Higher plant nuclear sequences reveal avoidance of CpG and TpA doublets. Chloroplast sequences avoid the TpA doublet in all codon positions. The chloroplast genome is not methylated but codon positions II-III and untranslated regions avoid CpG. The mitochondrial genome, also unmethylated, avoids CpG in all codon positions. We therefore deduce that methylation is not sufficient to explain CpG avoidance in the higher plant systems. Other factors must be taken into account such as amino acid composition, codon choices and perhaps stability of the DNA helix.
Collapse
|
11
|
Hyde JE, Sims PF. Anomalous dinucleotide frequencies in both coding and non-coding regions from the genome of the human malaria parasite Plasmodium falciparum. Gene X 1987; 61:177-87. [PMID: 3327756 DOI: 10.1016/0378-1119(87)90112-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
We have statistically analysed the distribution of nucleotides and dinucleotides in 21 genes of the 81% A + T-rich human malaria parasite Plasmodium falciparum. The mRNA-synonymous strands of this protozoan show in general a marked excess of purines over pyrimidines, correlated with abnormally high levels of Lys and Glu. We have used the large differences in base composition between coding and non-coding regions to estimate that the parasite possesses in the range of 2700-5400 genes. The dinucleotide preference patterns are compared with consensus patterns derived from other organisms [Nussinov, Nucl. Acids Res. 12 (1984) 1749-1763]. Patterns in the coding regions surprisingly resemble those of higher, rather than lower eukaryotes, particularly with respect to TG elevation and CG suppression. The latter is correlated with an abnormally low level of Arg in these parasites. In the non-coding regions, the four dinucleotides made up of C and/or G are found with significantly higher frequencies than expected (approx. 50-150%), specifically to the 5' side of the coding regions. The possible role of these dinucleotides in control sequences is discussed.
Collapse
Affiliation(s)
- J E Hyde
- Department of Biochemistry and Applied Molecular Biology, University of Manchester Institute of Science and Technology, U.K
| | | |
Collapse
|
12
|
Morgan EM, Rakestraw KM. Sequence of the Sendai virus L gene: open reading frames upstream of the main coding region suggest that the gene may be polycistronic. Virology 1986; 154:31-40. [PMID: 3019006 DOI: 10.1016/0042-6822(86)90427-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The sequence of the L gene of Sendai virus, encompassing 6799 nucleotides, has been determined, completing the primary sequence of the entire virus genome. An open reading frame beginning at position 569 codes for a basic protein of 2048 amino acids with an estimated Mr of 231,608. No nucleotide sequence similarities with the analogous L gene of vesicular stomatitis virus were observed. However, comparison of the deduced amino acid sequences of both proteins revealed a conserved 18 amino acid sequence that may have functional significance. Two additional overlapping reading frames which precede the L protein sequence could encode proteins with MrS of 6474 and 14,026, suggesting that the gene is polycistronic.
Collapse
|
13
|
Kolaskar AS, Reddy BV. Contextual constraints on codon pair usage: structural and biological implications. J Biomol Struct Dyn 1986; 3:725-38. [PMID: 3271046 DOI: 10.1080/07391102.1986.10508458] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Complementary DNA sequence data of 278 protein coding genes from prokaryotic systems have been analysed at the level of near neighbour codon pairs. Our analysis points out that constraints exist even at the level of near neighbour codon pairs. These constraints are in addition to those which arise due to relative levels of tRNA. Codon pairs, which in the data base have different occurrence values from their expected values, neither have common secondary structure nor do have better stabilization due to high base stacking. Our study points out that there are strong interaction between constituent codons in these codon pairs. These strongly interacting codon pairs, we suggest, are involved in the formation of three dimensional structural elements of cDNA/mRNA and interact with ribosome and thus modulate translation.
Collapse
Affiliation(s)
- A S Kolaskar
- Centre for Cellular and Molecular Biology, Hyderabad, India
| | | |
Collapse
|
14
|
Abstract
The use of non-parametric statistics for nucleic acid sequence studies is illustrated by some examples. This method is highly flexible and allows design of specific tests for detecting sequence structure. Tests devoted to local repetitivity, codon nearest neighbors, and dinucleotide avoidance are discussed in detail. An appendix indicates all computations required to use these tests.
Collapse
|
15
|
Kolaskar AS, Reddy BVB. Complimentary DNA sequence data analysis of prokaryotic systems. J Biosci 1985. [DOI: 10.1007/bf02716766] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Abstract
A DNA molecule representing all but the three terminal bases of the Sendai virus nucleoprotein (NP) gene, copied from viral mRNA, was inserted into pBR322. The NP insert comprised 1673 bases. The first AUG protein initiation codon, at position 65, began an open reading frame of 1551 bases, encoding a protein of 517 amino acids with an amino acid composition corresponding to previously published data. The NP gene sequence determined in the present work is similar to that described by Shioda et al. [ Nucl . Acids Res. 11, 7317 (1983)], but there are 14 amino acid differences that probably reflect differences in virus strains. The predicted secondary structure of the NP molecule and the locations within that structure of potential protease cleavage sites are in accord with structural domains previously defined by controlled protease digestion.
Collapse
|
17
|
Pieber M, Tohá J. Code dependent conservation of the physico-chemical properties in amino acid substitutions. ORIGINS OF LIFE 1983; 13:139-46. [PMID: 6669374 DOI: 10.1007/bf00928891] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The frequency of amino acid replacements in families of typical proteins has been elegantly analyzed by Argyle (1980) showing that the most frequent replacements involve a conservation of the amino acid chemical properties. The cyclic arrangement of the twenty amino acids resulting from the most frequent replacements has been described as an amino acid chemical ring. In this work, a novel amino acid replacement frequency ring is proposed, for which a conservation of over 90% of the most general physico-chemical properties can be deduced. The amino acid chemical similarity ring is also analyzed in terms of the genetic code base probability changes, showing that the discrepancy that exists between the standard deviation value of the amino acid replacement frequency matrix and its respective ideal value is almost equal to that deduced from the corresponding base codon replacement probability matrices. These differences are finally evaluated and discussed in terms of the restrictions imposed by the structure of the genetic code and the physico-chemical dissimilarities between some codons of amino acids which are chemically similar.
Collapse
|
18
|
Abstract
We present a model by which we look at the DNA sequence as a Markov process. It has been suggested by several workers that some basic biological or chemical features of nucleic acids stand behind the frequencies of dinucleotides (doublets) in these chains. Comparing patterns of doublet frequencies in DNA of different organisms was shown to be a fruitful approach to some phylogenetic questions (Russel & Subak-Sharpe, 1977). Grantham (1978) formulated mRNA sequence indices, some of which involve certain doublet frequencies. He suggested that using these indices may provide indications of the molecular constraints existing during gene evolution. Nussinov (1981) has shown that a set of dinucleotide preference rules holds consistently for eukaryotes, and suggested a strong correlation between these rules and degenerate codon usage. Gruenbaum, Cedar & Razin (1982) found that methylation in eukaryotic DNA occurs exclusively at C-G sites. Important biological information thus seems to be contained in the doublet frequencies. One of the basic questions to be asked (the "correlation question") is to what extent are the 64 trinucleotide (triplet) frequencies measured in a sequence determined by the 16 doublet frequencies in the same sequence. The DNA is described here as a Markov process, with the nucleotides being outcomes of a sequence generator. Answering the correlation question mentioned above means finding the order of the Markov process. The difficulty is that natural sequences are of finite length, and statistical noise is quite strong. We show that even for a 16000 nucleotide long sequence (like that of the human mitochondrial genome) the finite length effect cannot be neglected. Using the Markov chain model, the correlation between doublet and triplet frequencies can, however, be determined even for finite sequences, taking proper account of the finite length. Two natural DNA sequences, the human mitochondrial genome and the SV40 DNA, are analysed as examples of the method.
Collapse
|
19
|
Abstract
The relationship between the deficiency of CpG dinucleotides and the coding-noncoding segments of DNA has been examined. Analysis of five human alpha-like globin DNA sequences and five human beta-like globin DNA sequences reveal that there is no apparent difference between protein coding and non-coding portions of DNA. Rather CpG deficiency appears to be a property of long contiguous segments of DNA consisting of several genes and their intergenic regions. Thus we propose that CpG deficiency is not involved with translation or transcription but rather is related to chromosomal constraints.
Collapse
|
20
|
|
21
|
|
22
|
Orcutt BC, George DG, Fredrickson JA, Dayhoff MO. Nucleic acid sequence database computer system. Nucleic Acids Res 1982; 10:157-74. [PMID: 6174933 PMCID: PMC326123 DOI: 10.1093/nar/10.1.157] [Citation(s) in RCA: 31] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
|
23
|
Rothberg PG, Wimmer E. Mononucleotide and dinucleotide frequencies, and codon usage in poliovirion RNA. Nucleic Acids Res 1981; 9:6221-9. [PMID: 6275352 PMCID: PMC327599 DOI: 10.1093/nar/9.23.6221] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The polio type 1 (Mahoney) RNA sequence (1) has been analyzed in terms of the distribution of its mononucleotides, dinucleotides and trinucleotides (codons). The distribution of adenosine in the sequence is nonuniform, being lower at the 5' end and higher at the 3' end. The dinucleotide CG is relatively rare and the dinucleotides UG and CA are relatively more common than expected. Codon usage is decidedly nonrandom. Codons containing CG are avoided and those ending in adenosine are favored. The asymmetric use of mononucleotides, dinucleotides and codons in polio RNA is unexplained at the present time although the lowered CG frequency may be the result of a DNA origin for polio RNA.
Collapse
|
24
|
Pratt K, Hattman S. Deoxyribonucleic acid methylation and chromatin organization in Tetrahymena thermophila. Mol Cell Biol 1981; 1:600-8. [PMID: 9279374 PMCID: PMC369708 DOI: 10.1128/mcb.1.7.600-608.1981] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Deoxyribonucleic acid (DNA) of the transcriptionally active macronucleus of Tetrahymena thermophila is methylated at the N6 position of adenine to produce methyladenine (MeAde); approximately 1 in every 125 adenine residues (0.8 mol%) is methylated. Transcriptionally inert micronuclear DNA is not methylated (< or = 0.01 mol% MeAde; M. A. Gorovsky, S. Hattman, and G. L. Pleger, J. Cell Biol. 56:697-701, 1973). There is no detectable cytosine methylation in macronuclei in Tetrahymena DNA (< or = 0.01 mol% 5-methylcytosine). MeAde-containing DNA sequences in macronuclei are preferentially digested by both staphylococcal nuclease and pancreatic deoxyribonuclease I. In contrast, there is no preferential release of MeAde during digestion of purified DNA. These results indicate that MeAde residues are predominantly located in "linker DNA" and perhaps have a function in transcription. Pulse-chase studies showed that labeled MeAde remains preferentially in linker DNA during subsequent rounds of DNA replication; i.e., there is little, if any, movement of nucleosomes during chromatin replication. This implies that nucleosomes may be phased with respect to DNA sequence.
Collapse
Affiliation(s)
- K Pratt
- Department of Biology, University of Rochester, New York 14627, USA
| | | |
Collapse
|
25
|
Nussinov R. The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J Mol Evol 1981; 17:237-44. [PMID: 7021862 DOI: 10.1007/bf01732761] [Citation(s) in RCA: 39] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Natural DNA sequences were recently found to contain distinct nearest neighbor patterns. Hetero-dinucleotides were demonstrated to appear consistently more (less) than their mirror-image counterparts. This paper shows that this asymmetric behavior does not stem from the coding requirements of the DNA. It also shows some codon patterns in prokaryotic and eukaryotic genomes which came up in the course of this work.
Collapse
|
26
|
Nussinov R. Eukaryotic dinucleotide preference rules and their implications for degenerate codon usage. J Mol Biol 1981; 149:125-31. [PMID: 6273582 DOI: 10.1016/0022-2836(81)90264-3] [Citation(s) in RCA: 75] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
27
|
Chavancy G, Garel JP. Does quantitative tRNA adaptation to codon content in mRNA optimize the ribosomal translation efficiency? Proposal for a translation system model. Biochimie 1981; 63:187-95. [PMID: 7225463 DOI: 10.1016/s0300-9084(81)80192-7] [Citation(s) in RCA: 41] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Neither a dynamic nor an energetic approach of the translation process has taken into account that intracellular levels of iso-tRNA species are adapted or adjusted to the codon frequency of mRNA being decoded (Bombyx mori silk gland, rabbit reticulocyte). A critical study of available experimental data suggests that the average elongation rate of a protein is maximized in the presence of an adapted tRNA population, usually an homologous tRNA. In addition, the amount of synthesized protein parallels that of corresponding mRNA. Other evidences--including in vitro and in vivo elongation assays with fibroin mRNA--show that individual elongation rates are not uniform. Pauses occur at certain sites of the mRNA chain. The relative lifetime of these pauses depends on the tRNA pool used. Finally, it appears that translation accuracy also depends on the balanced tRNA population. We propose to explain these different effects by using a codon-anticodon recognition model, called "trial and error system" based on a stochastic processing of the ribosome. Accordingly, various acylated tRNA species which surround a ribosome randomly encounter the receptor A site. Every trapped tRNA species is tested for a proper pairing with the codon to be recognized at the level of a comparator or discriminator function. If the pairing is correct, transpeptidation becomes irreversible. If not, the aminoacyl-tRNA is rejected and another randomly trapped tRNA is processed in turn. Mathematical analysis of this model shows that the mean number of trials used for translating the whole sequence of a mRNA is minimized when the proportion of different iso-tRNA species is correlated with the square root of codon frequency. Quantitations of reticulocyte tRNA support such a parabolic relation. Our translation system model brings some light into the role of tRNA adaptation for optimizing translation efficiency, i.e. maximizing both speed and accuracy. Some consequences of the model are discussed.
Collapse
|
28
|
|
29
|
Mahler HR. MITOCHONDRIAL EVOLUTION: ORGANIZATION AND REGULATION OF MITOCHONDRIAL GENES. Ann N Y Acad Sci 1981. [DOI: 10.1111/j.1749-6632.1981.tb54357.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
30
|
Grantham R, Gautier C, Gouy M, Jacobzone M, Mercier R. Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res 1981; 9:r43-74. [PMID: 7208352 PMCID: PMC326682 DOI: 10.1093/nar/9.1.213-b] [Citation(s) in RCA: 735] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The nucleic acid sequence bank now contains 161 mRNAs, 43 new genes are added. One sequence, that of B. mori fibroin, is dropped due to uncertainty on the starting point for translation. Frequencies of all codons are given for each gene added and for each genome type in the total bank. A new series of correspondence analyses on codon use is presented, substantiating the genome hypothesis. Internal regulation of mRNA expression by different third base choices between quartet and duet codons is proposed for bacterial genes.
Collapse
|
31
|
Cedergren RJ, Sankoff D, LaRue B, Grosjean H. The evolving tRNA molecule. CRC CRITICAL REVIEWS IN BIOCHEMISTRY 1981; 11:35-104. [PMID: 7030617 DOI: 10.3109/10409238109108699] [Citation(s) in RCA: 74] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The study of tRNA molecular evolution is crucial to understanding the origin and establishment of the genetic code as well as the differentiation and refinement of the machinery of protein synthesis in prokaryotes, eukaryotes, organelles, and phage systems. The small size of the molecule and its critical involvement in a multiplicity of roles distinguish its study from classical protein molecular evolution with respect to goals and methods. Here, the authors assess available and missing data, existing and needed methodology, and the impact of tRNA studies on current theories both of genetic code evolution and of the evolution of species. They analyze mutational "hot spots", the role of base modification, synthetase recognition, codon-anticodon interactions and the status of organelle tRNA.
Collapse
|
32
|
Sood AK, Pereira D, Weissman SM. Isolation and partial nucleotide sequence of a cDNA clone for human histocompatibility antigen HLA-B by use of an oligodeoxynucleotide primer. Proc Natl Acad Sci U S A 1981; 78:616-20. [PMID: 6165999 PMCID: PMC319105 DOI: 10.1073/pnas.78.1.616] [Citation(s) in RCA: 314] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
We have isolated a cDNA clone for one of the HLA-B locus alloantigens by hybridization with a 30-nucleotide-long DNA probe. The probe was isolated from a reverse transcriptase (RNA-dependent DNA nucleotidyltransferase)-catalyzed cDNA synthesis reaction on poly(A)-mRNA in which an oligonucleotide (5'-32P)dC-T-T-C-T-C-C-A-C-A-TOH served as a primer and in which dideoxynucleoside triphosphates were used to reduce the size and heterogeneity of the cDNA products. The desired cDNA clone was isolated from a library of recombinant cDNA clones in the plasmid pBR322. The partial nucleotide sequence of the cDNA clone corresponds to the amino acid sequence of HLA-B7 antigen. The approach described in this paper is extremely sensitive and may be useful in cloning other genes for which the corresponding mRNA is present at low levels. This cDNA clone is nearly full length and can be used to isolate and to study the genes within the HLA region and to obtain expression of HLA-B peptides in cells.
Collapse
|
33
|
|
34
|
|
35
|
Abstract
Natural DNA sequences contain distinct nearest neighbor patterns. Eukaryotic as well as prokaryotic sequences show a consistent hierarchy in the frequencies of appearance of most doublets.
Collapse
|
36
|
Miyata T, Yasunaga T. Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J Mol Evol 1980; 16:23-36. [PMID: 6449605 DOI: 10.1007/bf01732067] [Citation(s) in RCA: 299] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented. This method is applied to genes of phi X174 and G4 genomes, histone genes and beta-globin genes, for which homologous nucleotide sequences are available for comparison to be made. It is shown that the rates of synonymous substitutions are quite uniform among the non-overlapping genes of phi X174 and G4 and among histone genes H4, H2B, H3 and H2A. A comparison between phi X174 and G4 reveals that, in the overlapping segments of the A-gene, the rate of synonymous substitution is reduced more significantly than the rate of amino acid substitution relative to the corresponding rate in the non-overlapping segment. It is also suggested that, in the coding region surrounding the splicing points of intervening sequences of beta-globin genes, there exist rigid secondary structures. It is in only these regions that the beta-globin genes show the slowing down of evolutionary rates of both synonymous and amino acid substitutions in the primate line.
Collapse
|
37
|
Grantham R, Gautier C, Gouy M. Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. Nucleic Acids Res 1980; 8:1893-912. [PMID: 6159596 PMCID: PMC324046 DOI: 10.1093/nar/8.9.1893] [Citation(s) in RCA: 282] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The poor printing of our previous Figure 2 (1) is corrected. Codon usage in mRNA sequences just published is also given. A new correspondence analysis is done, based on simultaneous comparison in all mRNA of use of the 61 codons. This analysis reinforces our claim that most genes in a genome, or genome type, have the same coding strategy; that is, they show similar choices among synonymous codons, or among degenerate bases (2). Like analysis on frequency variation in the amino acids coded reveals an entirely different pattern.
Collapse
|
38
|
Yang RC, Young A, Wu R. BK virus DNA sequence coding for the t and T antigens and evaluation of methods for determining sequence homology. J Virol 1980; 34:416-30. [PMID: 6246273 PMCID: PMC288720 DOI: 10.1128/jvi.34.2.416-430.1980] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The DNA sequence of the early region of the human papovavirus BK (MM strain) was determined. A potential initiation signal for translation is located at nucleotides 3,047 to 3,045 or map position 0.614. Extending counterclockwise from this AUG signal there is only one open reading frame, which can code for a putative t antigen of 100 amino acids in length. If the early mRNA of BKV is spliced, then the regions between nucleotides 3,047 to 2,808 and 2,725 to 884 can code for a T antigen 694 amino acids in length. The sequences of the deduced T antigens in BK virus share 71% amino acid homology with those in simian virus 40, whereas the coding sequences of the two viruses share 70% DNA homology. Comparison of DNA sequences and evaluation of homology measurements between these two viruses are discussed.
Collapse
|
39
|
Ninio J. Prediction of pairing schemes in RNA molecules-loop contributions and energy of wobble and non-wobble pairs. Biochimie 1980; 61:1133-50. [PMID: 394764 DOI: 10.1016/s0300-9084(80)80227-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Previously published models for predicting pairing schemes in RNA molecules, when applied to tRNA, give the clover leaf structure in only half the cases. We made a systematic investigation of the predictability of the clover leaf structure under various assumptions concerning the energetic contributions of single and double-stranded regions. We tested 21 different models and variants on a set of 100 tRNA sequences and many other variants on a smaller set of sequences. In our models we allowed not only G.C, A.U and G.U pairing, but also every other pair. Under conditions which are much less restrictive than those of previous attempts, we can nevertheless reach 90 per cent predictability for the clover leaf structure of tRNA. A most surprising and far-reaching result is that we can assign to C.G and C.C pairs binding energies quite close to the energies of G.U pairs, and still predict the clover leaf. The following ranking for non-complementary pairs was obtained : G.U, G.G and C.C, U.U, C.A, A.A and G.A, U.C. The main practical innovation which made possible the improvements in predictability are: i) not counting the stacking of base pairs separated by a bulge loop; ii) making the terminal C.C's in stems more stable than the terminal A.U's by merely -- 0.7 kcal; iii) replacing the distinction between G.C and A.U-closed loops by a distinction based on the presence of loop-favoring residues; iv) carefully adjusting the energetic balance between the various kinds of loops; v) narrowing the gap between the GC/GC and the GC/AU contributions; vi) using observations on nearest-neighbours in tRNA sequences to refine the contributions of G.U pairs.
Collapse
|
40
|
Grantham R, Gautier C. Genetic distances from mRNA sequences. THE SCIENCE OF NATURE - NATURWISSENSCHAFTEN 1980; 67:93-4. [PMID: 6990278 DOI: 10.1007/bf01054695] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
41
|
Grantham R, Gautier C, Gouy M, Mercier R, Pavé A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res 1980; 8:r49-r62. [PMID: 6986610 PMCID: PMC327256 DOI: 10.1093/nar/8.1.197-c] [Citation(s) in RCA: 478] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Frequencies for each of the 61 amino acid codons have been determined in every published mRNA sequence of 50 or more codons. The frequencies are shown for each kind of genome and for each individual gene. A surprising consistency of choices exists among genes of the same or similar genomes. Thus each genome, or kind of genome, appears to possess a "system" for choosing between codons. Frameshift genes, however, have widely different choice strategies from normal genes. Our work indicates that the main factors distinguishing between mRNA sequences relate to choices among degenerate bases. These systematic third base choices can therefore be used to establish a new kind of genetic distance, which reflects differences in coding strategy. The choice patterns we find seem compatible with the idea that the genome and not the individual gene is the unit of selection. Each gene in a genome tends to conform to its species' usage of the codon catalog; this is our genome hypothesis.
Collapse
|
42
|
Grosjean H, Chantrenne H. On codon- anticodon interactions. MOLECULAR BIOLOGY, BIOCHEMISTRY, AND BIOPHYSICS 1980; 32:347-67. [PMID: 7003350 DOI: 10.1007/978-3-642-81503-4_27] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
43
|
Hasegawa M, Yasunaga T, Miyata T. Secondary structure of MS2 phage RNA and bias in code word usage. Nucleic Acids Res 1979; 7:2073-9. [PMID: 537920 PMCID: PMC342367 DOI: 10.1093/nar/7.7.2073] [Citation(s) in RCA: 55] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Based on the secondary structural model of MS2 RNA, it is shown that, in base-pairing regions of the RNA, there is a bias in the use of synonymous codons which favours C and/or G over U and/or A in the third codon positions, and that in non-pairing regions, there is an opposite bias which favours U and/or A over C and/or G. This nature is interpreted as a result of selective constraint which stabilises the secondary structure of the single-stranded RNA genome of the MS2 phage.
Collapse
|
44
|
Mazabraud A, Garel JP. Analysis of tRNA population from Drosophila melanogaster by means of polyacrylamide gel mapping. FEBS Lett 1979; 105:70-6. [PMID: 114421 DOI: 10.1016/0014-5793(79)80889-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
45
|
Chavancy G, Chevallier A, Fournier A, Garel JP. Adaptation of iso-tRNA concentration to mRNA codon frequency in the eukaryote cell. Biochimie 1979; 61:71-8. [PMID: 435560 DOI: 10.1016/s0300-9084(79)80314-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|