1
|
Karlin S, Brocchieri L, Trent J, Blaisdell BE, Mrázek J. Heterogeneity of genome and proteome content in bacteria, archaea, and eukaryotes. Theor Popul Biol 2002; 61:367-90. [PMID: 12167359 DOI: 10.1006/tpbi.2002.1606] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Our analysis compares bacteria, archaea, and eukaryota with respect to a wide assortment of genome and proteome properties. These properties include ribosomal protein gene distributions, chaperone protein contrasts, major variation of transcription/translation factors, gene encoding pathways of energy metabolism, and predicted protein expression levels. Significant differences within and between the three domains of life include protein lengths, information processing procedures, many metabolic and lipid biosynthesis pathways, cellular controls, and regulatory proteins. Differences among genomes are influenced by lifestyle, habitat, physiology, energy sources, and other factors.
Collapse
Affiliation(s)
- Samuel Karlin
- Department of Mathematics, Stanford University, California 94305-2125, USA
| | | | | | | | | |
Collapse
|
2
|
The pattern of substitution mutation in different nearest-neighbor environments of the human genome. ACTA ACUST UNITED AC 1992. [DOI: 10.1016/0097-8485(92)80043-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
3
|
Blake RD, Hess ST, Nicholson-Tuell J. The influence of nearest neighbors on the rate and pattern of spontaneous point mutations. J Mol Evol 1992; 34:189-200. [PMID: 1588594 DOI: 10.1007/bf00162968] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The numbers and local sequence environments of the two types of substitution mutation plus additions and deletions have been obtained directly in this study from differences between a large number of extant primate gene and pseudogene sequences. A total of 3786 mutations were scored in regions where similarities between pseudogene and corresponding gene sequences is greater than or equal to 85%, comprising approximately 30% of the pseudogene database of 80,584 bp. The pattern of mutations obtained in this fashion is almost identical to that obtained by Li et al. (1984) using a slightly different, more direct approach and with a smaller database. When mutations were scored, the neighbor pairs on the 5' and 3' sides were also noted, leading to a large 16 x 12 matrix of transitions and transversions. Biases of varying magnitude are found in the rates of substitution of the same base pair in different local sequence environments. The overall order for the effect of the 5' neighbor on the rates of substitution mutation of a pyrimidine is A greater than C much greater than T greater than G, and G greater than A greater than T greater than C for the 3' neighbor; where these results represent the average of substitution rates for the complement purine with complement neighbors of bases ordered above. The order for the 3' neighbor is essentially the same for the two transitions and most of the four transversions as well; however, the order for the 5' neighbor is more variable. The overall rate for the C.G----T.A transition is not unusual, however the presence of a 3' neighboring G.C pair boosts the rate substantially, presumably due to specific cytosine methylation of the CG doublet in primate DNAs. The rate of the T.A----C.G transition is also well above average when the 3' neighbor is an A.T, and to a lesser extent a G.C, pair. The latter bias is typical in that it reflects the association of alternating pyrimidine-purine sequences with increasing mutation rates. The substitution of the pyrimidine in a 5'purine-pyrimidine-purine3' sequence generally occurs much faster than in a pyrimidine tract and points to the local conformation as a major determining factor of the substitution rate. An apparent inverse relationship is found between starting and product doublet frequencies of base pairs undergoing mutations with specific 3' neighbors, indicating that differences in intrinsic substitution rates of base pairs with specific neighbors are a key factor in producing the familiar biases of nearest-neighbor frequencies.
Collapse
Affiliation(s)
- R D Blake
- Department of Biochemistry, Microbiology and Molecular Biology, University of Maine, Orono 04469
| | | | | |
Collapse
|
4
|
Abstract
Doublet preference analysis was carried out on coding and noncoding regions of Escherichia coli, Saccharomyces cerevisiae, and human mitochondrial and nuclear DNA. The preference pattern in 1-2 and 2-3 doublets in E. coli and S. cerevisiae correlated with that in noncoding regions. The 3-1 doublet preference in E. coli genes with low optimal codon frequency and in S. cerevisiae genes also showed a correlation with each of their noncoding doublet preference. A mechanism to explain these double preference correlations in doublet preference is presented: mutational biases, the origin of the noncoding region doublet preference, evolved so as to maintain the 1-2 and 2-3 doublet preference, which is determined by codon usage. These biases then acted on the 3-1 doublet, which was almost free of coding constraints, resulting in a similar preference in this doublet.
Collapse
Affiliation(s)
- R Hanai
- Department of Physics, Faculty of Science, University of Tokyo, Japan
| | | |
Collapse
|
5
|
Hanai R, Suyama A, Wada A. Characteristic features of thermal stability map of DNA in Escherichia coli and eukaryotic genes. J Biomol Struct Dyn 1988; 6:51-62. [PMID: 3078238 DOI: 10.1080/07391102.1988.10506482] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Distribution of double-helix thermal stability of Escherichia coli and eukaryotic DNAs was analyzed. The results confirmed the previous propositions based on the study of the stability distribution in phage DNAs: (1) stability fluctuation appears near the boundaries of protein coding regions (PCRs) and non protein coding regions (NPCRs); (2) PCRs have less fluctuation than NPCRs. The present analysis also revealed that the local G + C content is lower in the beginning of PCRs of E. coli than the average G + C content of PCR and that deviations in the amino acid composition and the third letter usage PCRs are involved in the low G + C content; the biological meaning of this is discussed in relation to mRNA structure.
Collapse
Affiliation(s)
- R Hanai
- Department of Physics, Faculty of Science, University of Tokyo, Japan
| | | | | |
Collapse
|
6
|
Hanai R, Wada A. The effects of guanine and cytosine variation on dinucleotide frequency and amino acid composition in the human genome. J Mol Evol 1988; 27:321-5. [PMID: 3146642 DOI: 10.1007/bf02101194] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
One hundred twelve human DNA sequences were analyzed with respect to dinucleotide frequency and amino acid composition. The variation in guanine and cytosine (G + C) content revealed: (1) at 2-3 and 3-1 doublet positions CG discrimination is attenuated at high G + C, but TA disfavor is enhanced, and (2) several amino acids are subject to G + C change. These findings have been reported in part for collections of sequences from various species. The present study confirms that in a single organism--the human--the G + C effects do exist. Aspects of the argument that connects G + C with protein thermal stability are also discussed.
Collapse
Affiliation(s)
- R Hanai
- Department of Physics, Faculty of Science, University of Tokyo, Japan
| | | |
Collapse
|
7
|
|
8
|
Nur I, Szyf M, Razin A, Glaser G, Rottem S, Razin S. Procaryotic and eucaryotic traits of DNA methylation in spiroplasmas (mycoplasmas). J Bacteriol 1985; 164:19-24. [PMID: 4044519 PMCID: PMC214205 DOI: 10.1128/jb.164.1.19-24.1985] [Citation(s) in RCA: 90] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Differences in the type of base methylated (cytosine or adenine) and in the extent of methylation were detected by high-pressure liquid chromatography in the DNAs of five spiroplasmas. Nearest neighbor analysis and digestion by restriction enzyme isoschizomers also revealed differences in methylation sequence specificity. Whereas in Spiroplasma floricola and Spiroplasma sp. strain PPS-1 5-methylcytosine was found on the 5' side of each of the four major bases, the cytosine in Spiroplasma apis DNA was methylated only when its 3' neighboring base was adenine or thymine. In Spiroplasma sp. strain MQ-1 over 95% of the methylated cytosine was in C-G sequences. Essentially all of the C-G sequences in the MQ-1 DNA were methylated. Partially purified extracts of S. apis and Spiroplasma sp. strain MQ-1 were used to study substrate and sequence specificity of the methylase activity. Methylation by the MQ-1 enzyme was exclusively at C-G sequences, resembling in this respect eucaryotic DNA methylases. However, the MQ-1 methylase differed from eucaryotic methylases by showing high activity on nonmethylated DNA duplexes, low activity with hemimethylated DNA duplexes, and no activity on single-stranded DNA.
Collapse
|
9
|
Hinds PW, Blake RD. Degrees of divergence in the E. coli genome from correlations between dinucleotide, trinucleotide and codon frequencies. J Biomol Struct Dyn 1984; 2:101-18. [PMID: 6401128 DOI: 10.1080/07391102.1984.10507550] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences greater than 0.9 greater than lambda DNA greater than noncoding, non-RNA greater than phi X174 DNA greater than complementary strands greater than RNA genes congruent to 0.6 greater than transposon-insertion elements greater than T7DNA much greater than eukaryotic sequences congruent to 0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome.
Collapse
Affiliation(s)
- P W Hinds
- Department of Biochemistry, University of Main, Orono 04469
| | | |
Collapse
|
10
|
Lipman DJ, Smith TF, Beckman RJ, Waterman MS. Hierarchical analysis of influenza A hemagglutinin gene sequences. Nucleic Acids Res 1982; 10:5375-89. [PMID: 7145705 PMCID: PMC320879 DOI: 10.1093/nar/10.17.5375] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Five recently sequenced hemagglutinin genes from Influenza A virus strains are studied for similarities in a hierarchical fashion. The sequences are compared for similarity, first on the level of sequence homology, and then on several progressively more general levels. Though the HA1 subsequences contain regions where homology drops to that of a Monte Carlo generated reference value, subsequent tests reveal great similarity due to constraints on the level of amino acid sequence. Other tests detect statistically significant differences between subtypes due to constraints acting below the level of amino acid sequence, such as the 2 degrees structure of the viral RNA, or involving translation of the mRNA. The general applicability of the hierarchical approach to sequence analysis is discussed.
Collapse
|
11
|
Durup J. On the relations between error rates in DNA replication and elementary chemical rate constants. J Theor Biol 1982; 94:607-32. [PMID: 7078220 DOI: 10.1016/0022-5193(82)90303-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
12
|
Ninio J. Prediction of pairing schemes in RNA molecules-loop contributions and energy of wobble and non-wobble pairs. Biochimie 1980; 61:1133-50. [PMID: 394764 DOI: 10.1016/s0300-9084(80)80227-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Previously published models for predicting pairing schemes in RNA molecules, when applied to tRNA, give the clover leaf structure in only half the cases. We made a systematic investigation of the predictability of the clover leaf structure under various assumptions concerning the energetic contributions of single and double-stranded regions. We tested 21 different models and variants on a set of 100 tRNA sequences and many other variants on a smaller set of sequences. In our models we allowed not only G.C, A.U and G.U pairing, but also every other pair. Under conditions which are much less restrictive than those of previous attempts, we can nevertheless reach 90 per cent predictability for the clover leaf structure of tRNA. A most surprising and far-reaching result is that we can assign to C.G and C.C pairs binding energies quite close to the energies of G.U pairs, and still predict the clover leaf. The following ranking for non-complementary pairs was obtained : G.U, G.G and C.C, U.U, C.A, A.A and G.A, U.C. The main practical innovation which made possible the improvements in predictability are: i) not counting the stacking of base pairs separated by a bulge loop; ii) making the terminal C.C's in stems more stable than the terminal A.U's by merely -- 0.7 kcal; iii) replacing the distinction between G.C and A.U-closed loops by a distinction based on the presence of loop-favoring residues; iv) carefully adjusting the energetic balance between the various kinds of loops; v) narrowing the gap between the GC/GC and the GC/AU contributions; vi) using observations on nearest-neighbours in tRNA sequences to refine the contributions of G.U pairs.
Collapse
|
13
|
Marck C, Guschlbauer W. A simple method for the computation of first neighbour frequencies of DNAs from CD spectra. Nucleic Acids Res 1978; 5:2013-31. [PMID: 673843 PMCID: PMC342141 DOI: 10.1093/nar/5.6.2013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A procedure for the computation of the first neighbour frequencies of DNA's is presented. This procedure is based on the first neighbour approximation of Gray and Tinoco. We show that the knowledge of all the ten elementary CD signals attached to the ten double stranded first neighbour configurations is not necessary. One can obtain the ten frequencies of an unknown DNA with the use of eight elementary CD signals corresponding to eight linearly independent polymer sequences. These signals can be extracted very simply from any eight or more CD spectra of double stranded DNA's of known frequencies. The ten frequencies of a DNA are obtained by least square fit of its CD spectrum with these elementary signals. One advantage of this procedure is that it does not necessitate linear programming, it can be used with CD data digitalized using a large number of wavelengths, thus permitting an accurate resolution of the CD spectra. Under favorable case, the ten frequencies of a DNA (not used as input data) can be determined with an average absolute error < 2%. We have also observed that certain satellite DNA's, those of Drosophila virilis and Callinectes sapidus have CD spectra compatible with those of DNA's of quasi random sequence; these satellite DNA's should adopt also the B-form in solution.
Collapse
|
14
|
Russell GJ, Subak-Sharpe JH. Similarity of the general designs of protochordates and invertebrates. Nature 1977; 266:533-6. [PMID: 558523 DOI: 10.1038/266533a0] [Citation(s) in RCA: 40] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
15
|
Russell GJ, Walker PM, Elton RA, Subak-Sharpe JH. Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol 1976; 108:1-23. [PMID: 1003479 DOI: 10.1016/s0022-2836(76)80090-3] [Citation(s) in RCA: 140] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
16
|
Abstract
On the basis of the results of an analysis of frequencies of pyrimidine oligonucleotides, the degree of pyrimidine clustering of DNA in species from different taxa has been determined. A tendency for an increase in the index of clustering of DNA was revealed in the sequence: invertebrates, fishes, amphibians, reptiles, birds, mammals. A mechanism is postulated, according to which the increase in the degree of clustering of DNA d-ring the evolution may be associated with the accumulation of mutations, Purine equalibrium Pyrimidine transversions, resulting in a selective enrichment of one of the chains of DNA with pyrimidines and the other- with purines, i.e. in an increase in the degree of purine-pyrimidine imbalance (asymmetry) of DNA complementary chains. This mechanism of DNA evolution is supported by the presence of positive correlation between the degree of clustering and the degree of the chain asymmetry of natural DNAs, as well as the character of the amino acid substitutions in cytochromes c in different species. The progressive evolution of different groups of organisms on the whole may have been accompanied by an acceleration of the rates of evolution of the DNA structure. On the basis of the amino acid sequence of cytochromes c in different species the degree of clustering and the degree of the chain asymmetry of the corresponding structural genes of DNA was found to have a general tendency towards an increase in the following order: invertebrates, fishes, amphibians, reptiles, birds, mammals. Thus, evolution of cytochrome c cistron is a vector process based on a selection of mutations which, on the one hand, are neurtral to protein, and, on the other hand, result in the sense chain of DNA being enriched with pyrimidines and the nonsense one (and the corresponding mRNA)- with purines. Hence, it is the polynucleotide template rather than protein, that must have been the "object of selection". The frequency of substitutions in cytochromes c cistron for vertebrates is 1.56x13(-9) per nucleotide per year. It is believed that the evolutionary modification of the DNA structure may be associated with an increase in the interference resistance of the translation, i.e. with selection for codons of highest readout stability.
Collapse
|
17
|
Elton RA, Russell GJ, Subak-Sharpe JH. Doublet frequencies and codon weighting in the DNA of Escherichia coli and its phages. J Mol Evol 1976; 8:117-35. [PMID: 787545 DOI: 10.1007/bf01739098] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
A compilation of nucleic acid sequences from E. coli and its phages has been analysed for the frequency of occurrence of nearest neighbour base doublets and codons. Several statistically significant deviations from random are found in both doublet and codon frequencies. The deviations in E. coli also appear to occur in lambda and in the coat protein gene of MS2, whereas T4 and other parts of the MS2 genome show different sequence properties. These and other findings are discussed in relation to the hypothesis that rapidity of translation of mRNAs in the E. coli system is dependent on doublet frequency and codon usage patterns.
Collapse
|
18
|
Halliburton IW, Hill EA, Russell GJ. Identification of strains of herpes simplex virus by comparison of the density of their DNA using the preparative ultracentrifuge. Arch Virol 1975; 48:157-68. [PMID: 167692 DOI: 10.1007/bf01318148] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The buoyant densities of the DNA of herpes simplex virus type 1, type 2 and Pseudorabies virus, as determined in the analytical ultracentrifuge, are 1.725, 1.727 and 1.731 correlating with G+C contents of 67, 69 and 73 per cent respectively. The density differences for the DNA's of type 1 and type 2 herpes simplex viruses have been confirmed in experiments with isotopically labelled DNA from four type 1 and six type 2 strains by preparative CsCl gradient ultracentrifugation. The DNA of all the type 2 strains was denser than that of any of the type 1 strains examined. Despite these differences in DNA base composition of type 1 and type 2 strains, nearest neighbour analysis of their DNA's disclosed no obvious differences in doublet pattern or general design.
Collapse
|
19
|
Abstract
A doublet frequency count (set of frequencies of the 16 possible two-base sequences) can be calculated from the experimentally determined overall sequence of a nucleic acid. In this paper, a statistical methodology is developed for comparing such counts with random, with others of the same type or with doublet proportions found in whole DNAs. The methods are applied to two major categories of sequenced RNAs. It is found that vertebrate ribosomal and transfer RNAs show significant differences from the overall vertebrate DNA pattern, especially in the frequency of the doublet CG. Bacterial rRNA and tRNA, on the other hand, show less dissimilarity from total DNA. In the RNA of the small bacteriophage MS2, the doublet frequencies of the translated regions of the genome resemble those in the host E. coli, whereas those in the intercistronic regions differ substantially. All these findings are discussed in relation to the origin, evolution and selection of the nucleic acids concerned.
Collapse
|
20
|
|