51
|
Jost D, Everaers R. Genome wide application of DNA melting analysis. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2009; 21:034108. [PMID: 21817253 DOI: 10.1088/0953-8984/21/3/034108] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Correspondences between functional and thermodynamic melting properties in a genome are being increasingly employed for ab initio gene finding and for the interpretation of the evolution of genomes. Here we present the first systematic genome wide comparison between biologically coding domains and thermodynamically stable regions. In particular, we develop statistical methods to estimate the reliability of the resulting predictions. Not surprisingly, we find that the success of the approach depends on the difference in GC content between the coding and the non-coding parts of the genome and on the percentage of coding base-pairs in the sequence. These prerequisites vary strongly between species, where we observe no systematic differences between eukaryotes and prokaryotes. We find a number of organisms in which the strong correlation of coding domains and thermodynamically stable regions allows us to identify putative exons or genes to complement existing approaches. In contrast to previous investigations along these lines we have not employed the Poland-Scheraga (PS) model of DNA melting but use the earlier Zimm-Bragg (ZB) model. The Ising-like form of the ZB model can be viewed as an approximation to the PS model, with averaged loop entropies included into the cooperative factor [Formula: see text]. This results in a speed-up by a factor of 20-100 compared to the Fixman-Freire algorithm for the solution of the PS model. We show that for genomic sequences the resulting systematic errors are negligible compared to the parameterization uncertainty of the models. We argue that for limited computing resources, available CPU power is better invested in broadening the statistical base for genomic investigations than in marginal improvements of the description of the physical melting behavior.
Collapse
Affiliation(s)
- Daniel Jost
- Laboratoire de Physique de l'École Normale Supérieure de Lyon, Université de Lyon, CNRS UMR 5672, 46 Allée d'Italie 69364 Lyon Cedex 07, France
| | | |
Collapse
|
52
|
Wang Y, Leung FCC. GC content increased at CpG flanking positions of fish genes compared with sea squirt orthologs as a mechanism for reducing impact of DNA methylation. PLoS One 2008; 3:e3612. [PMID: 19005573 PMCID: PMC2580031 DOI: 10.1371/journal.pone.0003612] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 10/13/2008] [Indexed: 01/18/2023] Open
Abstract
Background Fractional DNA methylation in sea squirts evolved to global DNA methylation in fish. The impact of global DNA methylation is reflected by more CpG depletions and/or more A/T to G/C changes at CpG flanking positions due to context-dependent mutations of methylated CpG sites. Methods and Findings In this report, we demonstrate that the sea squirt genes have undergone more CpG to TpG/CpA substitutions than the fish orthologs using homologous fragments from orthologous genes among Ciona intestinalis, Ciona savignyi, fugufish and zebrafish. To avoid premature transcription, the TGA sites derived from CGA were largely converted to TGG in sea squirt genes. By contrast, a significant increment of GC content at CpG flanking positions was shown in fish genes. The positively selected A/T to G/C substitutions, in combination with the CpG to TpG/CpA substitutions, are the sources of the extremely low CpG observed/expected ratios in vertebrates. The nonsynonymous substitutions caused by the GC content increase have resulted in frequent amino acid replacements in the directions that were not noticed previously. Conclusion The increased GC content at CpG flanking positions can reduce CpG loss in fish genes and attenuate the impact of DNA methylation on CpG-containing codons, probably accounting for evolution towards vertebrates.
Collapse
Affiliation(s)
- Yong Wang
- Department of Zoology and Genome Research Centre, The University of Hong Kong, Pokfulam, Hong Kong
- * E-mail:
| | - Frederick C. C. Leung
- Department of Zoology and Genome Research Centre, The University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
53
|
Gatherer D. Evolution of the G+C Content Frontier in the Rat Cytomegalovirus Genome. Virology (Auckl) 2008. [DOI: 10.4137/vrt.s1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Within the 230138 bp of the rat cytomegalovirus (RCMV) genome, the G+C content changes abruptly at position 142644, constituting a G+C content frontier. To the left of this point, overall G+C content is 69.2%, and to the right it is only 47.6%. A region of extremely low G+C content (33.8%) is found in the 5 kb immediately to the right of the frontier, in which there are no predicted coding sequences. To the right of position 147501, the G+C content rises and predicted coding sequences reappear. However, these genes are much shorter (average 848 bp, 50% G+C) than those in the left two-thirds of the genome (average 1462 bp, 70% G+C). Whole genome alignment of several viruses indicates that the initial ultra-low G+C region appeared in the common ancestor of the genera Cytomegalovirus and Muromegalovirus, and that the lowering of G+C in the right third has been a subsequent process in the lineage leading to RCMV. The left two-thirds of RCMV has stop codon occurrences at 67.5% of their expected level, based on a modified Markov chain model of stop codon distribution, and the corresponding figure for the right third is 78%. Therefore, despite heavy mutation pressure, selective constraint has operated in the right third of the RCMV genome to maintain a degree of gene length unusual for such low G+C sequences.
Collapse
Affiliation(s)
- Derek Gatherer
- MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow, G11 5JR, U.K
| |
Collapse
|
54
|
Atambayeva SA, Khailenko VA, Ivashchenko AT. Intron and exon length variation in Arabidopsis, rice, nematode, and human. Mol Biol 2008. [DOI: 10.1134/s0026893308020180] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
55
|
LI MK, GU L, CHEN SS, DAI JQ, TAO SH. Evolution of the isochore structure in the scale of chromosome: insight from the mutation bias and fixation bias. J Evol Biol 2007; 21:173-182. [DOI: 10.1111/j.1420-9101.2007.01455.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
56
|
Kan XZ, Wang SS, Ding X, Wang XQ. Structural evolution of nrDNA ITS in Pinaceae and its phylogenetic implications. Mol Phylogenet Evol 2007; 44:765-77. [PMID: 17596969 DOI: 10.1016/j.ympev.2007.05.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Revised: 04/24/2007] [Accepted: 05/07/2007] [Indexed: 11/29/2022]
Abstract
Nuclear ribosomal DNA (nrDNA) has been considered as an important tool for inferring phylogenetic relationships at many taxonomic levels. In comparison with its fast concerted evolution in angiosperms, nrDNA is symbolized by slow concerted evolution and substantial ITS region length variation in gymnosperms, particularly in Pinaceae. Here we studied structure characteristics, including subrepeat composition, size, GC content and secondary structure, of nrDNA ITS regions of all Pinaceae genera. The results showed that the ITS regions of all taxa studied contained subrepeat units, ranging from 2 to 9 in number, and these units could be divided into two types, longer subrepeat (LSR) without the motif (5'-GGCCACCCTAGTC) and shorter subrepeat (SSR) with the motif. Phylogenetic analyses indicate that the homology of some SSRs still can be recognized, providing important informations for the evolutionary history of nrDNA ITS and phylogeny of Pinaceae. In particular, the adjacent tandem SSRs are not more closely related to one another than they are to remote SSRs in some genera, which may imply that multiple structure variations such as recombination have occurred in the ITS1 region of these groups. This study also found that GC content in the ITS1 region is relevant to its sequence length and subrepeat number, and could provide some phylogenetic information, especially supporting the close relationships among Picea, Pinus, and Cathaya. Moreover, several characteristics of the secondary structure of Pinaceae ITS1 were found as follows: (1) the structure is dominated by several extended hairpins; (2) the configuration complexity is positively correlated with subrepeat number; (3) paired subrepeats often partially overlap at the conserved motif (5'-GGCCACCCTAGTC), and form a long stem, while other subrepeats fold onto itself, leaving part of the conserved motif exposed in hairpin loops.
Collapse
Affiliation(s)
- Xian-Zhao Kan
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, The Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
| | | | | | | |
Collapse
|
57
|
Lin FH, Forsdyke DR. Prokaryotes that grow optimally in acid have purine-poor codons in long open reading frames. Extremophiles 2006; 11:9-18. [PMID: 16957882 DOI: 10.1007/s00792-006-0005-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Accepted: 03/29/2006] [Indexed: 10/24/2022]
Abstract
In nucleic acids the N-glycosyl bonds between purines and their ribose sugar moities are broken under acid conditions. If one strand of a duplex DNA segment were more vulnerable to mutation than the other, then the archaeon Picrophilus torridus, with an optimum growth pH near zero, could have adapted by decreasing the purine content of that strand. Yet, P. torridus has an optimum growth temperature near 60 degrees C, and thermophiles prefer purine-rich codons. We found that, as in other thermophiles, high growth temperature correlates with the use of purine-rich codons. The extra purines are often in third, non-amino acid determining, codon positions. However, as in other acidophiles, as open reading frame lengths increase, there is increased use of purine-poor codons, particularly those without purines in second, amino acid-determining, codon positions. Thus, P. torridus can be seen as adapting (a) to temperature by increasing its purines in all open reading frames without greatly impacting protein amino acid compositions, and (b) to pH by decreasing purines in longer open reading frames, thereby potentially impacting protein amino acid compositions. It is proposed that longer open reading frames, being larger mutational targets, have become less vulnerable to depurination by virtue of pyrimidine for purine substitutions.
Collapse
Affiliation(s)
- Feng-Hsu Lin
- Department of Biochemistry, Queen's University, K7L3N6, Kingston, ON, Canada
| | | |
Collapse
|
58
|
Synonymous codon usage and its potential link with optimal growth temperature in prokaryotes. Gene 2006; 385:128-36. [PMID: 16989961 DOI: 10.1016/j.gene.2006.05.033] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2006] [Accepted: 05/29/2006] [Indexed: 12/01/2022]
Abstract
The relationship between codon usage in prokaryotes and their ability to grow at extreme temperatures has been given much attention over the past years. Previous studies have suggested that the difference in synonymous codon usage between (hyper)thermophiles and mesophiles is a consequence of a selective pressure linked to growth temperature. Here, we performed an updated analysis of the variation in synonymous codon usage with growth temperature; our study includes a large number of species from a wide taxonomic and growth temperature range. The presence of psychrophilic species in our study allowed us to test whether the same selective pressure acts on synonymous codon usage at very low growth temperature. Our results show that the synonymous codon usage for Arg (through the AGG, AGA and CGT codons) is the most discriminating factor between (hyper)thermophilic and non-thermophilic species, thus confirming previous studies. We report the unusual clustering of an Archaeal psychrophile with the thermophilic and hyperthermophilic species on the synonymous codon usage factorial map; the other psychrophiles in our study cluster with the mesophilic species. Our conclusion is that the difference in synonymous codon usage between (hyper)thermophilic and non-thermophilic species cannot be clearly attributed to a selective pressure linked to growth at high temperatures.
Collapse
|
59
|
Abstract
Though generally small and gene rich, bacterial genomes are constantly subjected to both mutational and population-level processes that operate to increase amounts of functionless DNA. As a result, the coding potential of bacterial genomes can be substantially lower than originally predicted. Whereas only a single pseudogene was included in the original annotation of the bacterium Escherichia coli, we estimate that this genome harbors hundreds of inactivated and otherwise functionless genes. Such regions will never yield a detectable phenotype, but their identification is vital to efforts to elucidate the biological role of all the proteins within the cell.
Collapse
Affiliation(s)
- Howard Ochman
- Department of Biochemistry and Molecular Biophysics, University of Arizona, Tucson, AZ 85721, USA.
| | | |
Collapse
|
60
|
Zavala A, Naya H, Romero H, Sabbia V, Piovani R, Musto H. Genomic GC content prediction in prokaryotes from a sample of genes. Gene 2005; 357:137-43. [PMID: 16125339 DOI: 10.1016/j.gene.2005.06.030] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Revised: 05/05/2005] [Accepted: 06/16/2005] [Indexed: 11/29/2022]
Abstract
GC level is a key feature in prokaryotic genomes. Widely employed in evolutionary studies, new insights appear however limited because of the relatively low number of characterized genomes. Since public databases mainly comprise several hundreds of prokaryotes with a low number of sequences per genome, a reliable prediction method based on available sequences may be useful for studies that need a trustworthy estimation of whole genomic GC. As the analysis of completely sequenced genomes shows a great variability in distributional shapes, it is of interest to compare different estimators. Our analysis shows that the mean of GC values of a random sample of genes is a reasonable estimator, based on simplicity of the calculation and overall performance. However, usually sequences come from a process that cannot be considered as random sampling. When we analyzed two introduced sources of bias (gene length and protein functional categories) we were able to detect an additional bias in the estimation for some cases, although the precision was not affected. We conclude that the mean genic GC level of a sample of 10 genes is a reliable estimator of genomic GC content, showing comparable accuracy with many widely employed experimental methods.
Collapse
Affiliation(s)
- Alejandro Zavala
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Iguá 4225, Montevideo 11400, Uruguay
| | | | | | | | | | | |
Collapse
|
61
|
Abstract
We present a coding measure which is based on the statistical properties of the stop codons, and that is able to estimate accurately the variation of coding content along an anonymous sequence. As the stop codons play the same role in all the genomes (with very few exceptions) the measure turns out to be species-independent. We show results both for prokaryotic and for eukaryotic genomes, indicating, first, the accuracy of the measure, and, second, that better prediction is achieved if the measure is applied on homogeneous, isochore-like sequences than if it is applied following the standard moving window approach. Finally, we discuss on some of the possible applications of the measure.
Collapse
Affiliation(s)
- P Carpena
- Departamento de Física Aplicada II, E.T.S.I. de Telecomunicación, Universidad de Málaga, Malaga, Spain.
| | | | | | | |
Collapse
|
62
|
|
63
|
Gregory TR. A bird's-eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class aves. Evolution 2002; 56:121-30. [PMID: 11913657 DOI: 10.1111/j.0014-3820.2002.tb00854.x] [Citation(s) in RCA: 180] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
For half a century, variation in genome size (C-value) has been an unresolved puzzle in evolutionary biology. While the initial "C-value paradox" was solved with the discovery of noncoding DNA, a much more complex "C-value enigma" remains. The present study focuses on one aspect of this puzzle, namely the small genome sizes of birds. Significant negative correlations are reported between resting metabolic rate and both C-value and erythrocyte size. Cell size is positively correlated with both nucleus size and C-value in birds, as in other vertebrates. These findings shed light on the constraints acting on genome size in birds and illustrate the importance of interactions among various levels of the biological hierarchy, ranging from the subchromosomal to the ecological. Following from a discussion of the mechanistic bases of the correlations reported and the processes by which birds achieved and/or maintain small genomes, a pluralistic approach to the C-value enigma is recommended.
Collapse
Affiliation(s)
- T Ryan Gregory
- Department of Zoology, University of Guelph, Ontario, Canada.
| |
Collapse
|
64
|
Lobry JR, Sueoka N. Asymmetric directional mutation pressures in bacteria. Genome Biol 2002; 3:RESEARCH0058. [PMID: 12372146 PMCID: PMC134625 DOI: 10.1186/gb-2002-3-10-research0058] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2001] [Revised: 06/18/2002] [Accepted: 08/15/2002] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND When there are no strand-specific biases in mutation and selection rates (that is, in the substitution rates) between the two strands of DNA, the average nucleotide composition is theoretically expected to be A = T and G = C within each strand. Deviations from these equalities are therefore evidence for an asymmetry in selection and/or mutation between the two strands. By focusing on weakly selected regions that could be oriented with respect to replication in 43 out of 51 completely sequenced bacterial chromosomes, we have been able to detect asymmetric directional mutation pressures. RESULTS Most of the 43 chromosomes were found to be relatively enriched in G over C and T over A, and slightly depleted in G+C, in their weakly selected positions (intergenic regions and third codon positions) in the leading strand compared with the lagging strand. Deviations from A = T and G = C were highly correlated between third codon positions and intergenic regions, with a lower degree of deviation in intergenic regions, and were not correlated with overall genomic G+C content. CONCLUSIONS During the course of bacterial chromosome evolution, the effects of asymmetric directional mutation pressures are commonly observed in weakly selected positions. The degree of deviation from equality is highly variable among species, and within species is higher in third codon positions than in intergenic regions. The orientation of these effects is almost universal and is compatible in most cases with the hypothesis of an excess of cytosine deamination in the single-stranded state during DNA replication. However, the variation in G+C content between species is influenced by factors other than asymmetric mutation pressure.
Collapse
Affiliation(s)
- Jean R Lobry
- Laboratoire BBE CNRS UMR 5558, Université Claude Bernard, 43 Bd du 11 Novembre 1918, F-69622 Villeurbanne cedex, France.
| | | |
Collapse
|
65
|
Tiggemann M, Jeske S, Larsen M, Meinhardt F. Kluyveromyces lactis cytoplasmic plasmid pGKL2: heterologous expression of Orf3p and proof of guanylyltransferase and mRNA-triphosphatase activities. Yeast 2001; 18:815-25. [PMID: 11427964 DOI: 10.1002/yea.728] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The predicted ORF3 polypeptide (Orf3p) of the linear genetic element pGKL2 from Kluyveromyces lactis was expressed in Bacillus megaterium as a fusion protein with a His(6X)-tag at the C-terminus for isolation by Ni-affinity chromatography. This is the first time that a yeast cytoplasmic gene product has been expressed heterologously as a functional protein in a bacterial system. The purified protein was found to display both RNA 5'-triphosphatase and guanylyltransferase activities. When the lysine residue present at position 177 of the protein within the sequence motif (KXDG), highly conserved in capping enzymes and other nucleotidyl transferases, was substituted by alanine, the guanylyltransferase activity was lost, thereby proving an important role for the transfer of GMP from GTP to the 5'-diphosphate end of the mRNA. Our in vitro data provides the first direct evidence that the polypeptide encoded by ORF3 of the cytoplasmic yeast plasmid pGKL2 functions as a plasmid-specific capping enzyme. Since genes equivalent to ORF3 of pGKL2 have been identified in all autonomous cytoplasmic yeast DNA elements investigated so far, our findings are of general significance for these widely distributed yeast extranuclear genetic elements.
Collapse
Affiliation(s)
- M Tiggemann
- Institut für Mikrobiologie, Westfälische Wilhelms-Universität Münster, Corrensstrasse 3, 48149 Münster, Germany
| | | | | | | |
Collapse
|
66
|
Aronovich EL, Johnston JM, Wang P, Giger U, Whitley CB. Molecular basis of mucopolysaccharidosis type IIIB in emu (Dromaius novaehollandiae): an avian model of Sanfilippo syndrome type B. Genomics 2001; 74:299-305. [PMID: 11414757 DOI: 10.1006/geno.2001.6552] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Sanfilippo syndrome type B, or mucopolysaccharidosis (MPS) IIIB, is an autosomal recessive disease caused by a deficiency of lysosomal alpha-N-acetylglucosaminidase (NAGLU). In Dromaius novaehollandiae (emu), a progressive neurologic disease was recently discovered, which was characterized by NAGLU deficiency and heparan sulfate accumulation. To define the molecular basis, the sequences of the normal emu NAGLU cDNA and gene were determined by PCR-based approaches using primers for highly conserved regions of evolutionarily distant NAGLU homologues. It was observed that the emu NAGLU gene is structurally similar to that of human and mouse, but the introns are considerably shorter. The cDNA had an open reading frame (ORF) of 2259 bp. The deduced amino acid sequence is estimated to share 64% identity with human, 63% with mouse, 41% with Drosophila, 39% with tobacco, and 35% with the Caenorhabditis elegans enzyme. Three normal and two affected emus were studied for nucleotide sequence covering the entire coding region and exon-intron boundaries. Unlike the human gene, emu NAGLU appeared to be highly polymorphic: 19 variations were found in the coding region alone. The two affected emus were found to be homozygous for a 2-bp deletion, 1098-1099delGG, in exon 6. The resulting frameshift predicts a longer ORF of 2370 bp encoding a polypeptide with 37 additional amino acids and 387 altered amino acids. The availability of mutation screening in emus now permits early detection of MPS IIIB in breeding stocks and is an important step in characterizing this unique, naturally occurring avian model for the development of gene transfer studies.
Collapse
Affiliation(s)
- E L Aronovich
- Department of Pediatrics, Institute of Human Genetics, Minneapolis, Minnesota 55455, USA
| | | | | | | | | |
Collapse
|
67
|
Fridolfsson AK, Ellegren H. Molecular evolution of the avian CHD1 genes on the Z and W sex chromosomes. Genetics 2000; 155:1903-12. [PMID: 10924484 PMCID: PMC1461215 DOI: 10.1093/genetics/155.4.1903] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genes shared between the nonrecombining parts of the two types of sex chromosomes offer a potential means to study the molecular evolution of the same gene exposed to different genomic environments. We have analyzed the molecular evolution of the coding sequence of the first pair of genes found to be shared by the avian Z (present in both sexes) and W (female-specific) sex chromosomes, CHD1Z and CHD1W. We show here that these two genes evolve independently but are highly conserved at nucleotide as well as amino acid levels, thus not indicating a female-specific role of the CHD1W gene. From comparisons of sequence data from three avian lineages, the frequency of nonsynonymous substitutions (K(a)) was found to be higher for CHD1W (1.55 per 100 sites) than for CHD1Z (0.81), while the opposite was found for synonymous substitutions (K(s), 13.5 vs. 22.7). We argue that the lower effective population size and the absence of recombination on the W chromosome will generally imply that nonsynonymous substitutions accumulate faster on this chromosome than on the Z chromosome. The same should be true for the Y chromosome relative to the X chromosome in XY systems. Our data are compatible with a male-biased mutation rate, manifested by the faster rate of neutral evolution (synonymous substitutions) on the Z chromosome than on the female-specific W chromosome.
Collapse
Affiliation(s)
- A K Fridolfsson
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752-36 Uppsala, Sweden
| | | |
Collapse
|
68
|
Abstract
Cryptosporidium parvum is an obligate intracellular pathogen responsible for widespread infections in humans and animals. The inability to obtain purified samples of this organism's various developmental stages has limited the understanding of the biochemical mechanisms important for C. parvum development or host-parasite interaction. To identify C. parvum genes independent of their developmental expression, a random sequence analysis of the 10.4-megabase genome of C. parvum was undertaken. Total genomic DNA was sheared by nebulization, and fragments between 800 and 1,500 bp were gel purified and cloned into a plasmid vector. A total of 442 clones were randomly selected and subjected to automated sequencing by using one or two primers flanking the cloning site. In this way, 654 genomic survey sequences (GSSs) were generated, corresponding to >320 kb of genomic sequence. These sequences were assembled into 408 contigs containing >250 kb of unique sequence, representing approximately 2.5% of the C. parvum genome. Comparison of the GSSs with sequences in the public DNA and protein databases revealed that 107 contigs (26%) displayed similarity to previously identified proteins and rRNA and tRNA genes. These included putative genes involved in the glycolytic pathway, DNA, RNA, and protein metabolism, and signal transduction pathways. The repetitive sequence elements identified included a telomere-like sequence containing hexamer repeats, 57 microsatellite-like elements composed of dinucleotide or trinucleotide repeats, and a direct repeat sequence. This study demonstrates that large-scale genomic sequencing is an efficient approach to analyze the organizational characteristics and information content of the C. parvum genome.
Collapse
Affiliation(s)
- C Liu
- Department of Veterinary PathoBiology, University of Minnesota, St. Paul, Minnesota, USA
| | | | | | | |
Collapse
|
69
|
Li W. Statistical properties of open reading frames in complete genome sequences. COMPUTERS & CHEMISTRY 1999; 23:283-301. [PMID: 10404621 DOI: 10.1016/s0097-8485(99)00014-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.
Collapse
Affiliation(s)
- W Li
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA.
| |
Collapse
|
70
|
Li W, Stolovitzky G, Bernaola-Galván P, Oliver JL. Compositional heterogeneity within, and uniformity between, DNA sequences of yeast chromosomes. Genome Res 1998; 8:916-28. [PMID: 9750191 DOI: 10.1101/gr.8.9.916] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The heterogeneity within, and similarities between, yeast chromosomes are studied. For the former, we show by the size distribution of domains, coding density, size distribution of open reading frames, spatial power spectra, and deviation from binomial distribution for C + G% in large moving windows that there is a strong deviation of the yeast sequences from random sequences. For the latter, not only do we graphically illustrate the similarity for the above mentioned statistics, but we also carry out a rigorous analysis of variance (ANOVA) test. The hypothesis that all yeast chromosomes are similar cannot be rejected by this test. We examine the two possible explanations of this interchromosomal uniformity: a common origin, such as genome-wide duplication (polyploidization), and a concerted evolutionary process.
Collapse
Affiliation(s)
- W Li
- Laboratory of Statistical Genetics, Rockefeller University, New York, New York 10021 USA.
| | | | | | | |
Collapse
|
71
|
Abstract
Transcriptional repression in eukaryotes often involves tens or hundreds of kilobase pairs, two to three orders of magnitude more than the bacterial operator/repressor model does. Classical repression, represented by this model, was maintained over the whole span of evolution under different guises, and consists of repressor factors interacting primarily with promoters and, in later evolution, also with enhancers. The use of much larger amounts of DNA in the other mode of repression, here called the sectorial mode ('superrepression'), results in the conceptual transfer of so-called junk DNA to the domain of functional DNA. This contribution to the solution of the c-value paradox involves perhaps 15% of genomic 'junk,' and encompasses the bulk of the introns, thought to fill a stabilizing role in sectorially repressed chromatin structures. In the case of developmental genes, such structures appear to be heterochromatoid in character. However, solid clues regarding general structural features of superrepressed terminal differentiation genes remain elusive. The competition among superrepressible DNA sectors for sectorially binding factors offers, in principle, a molecular mechanism for developmental switches. Position effect variegation may be considered an abnormal manifestation of normal processes that underly development and involve heterochromatoid sectorial repression, which is apparently required for local elimination or modulation of morphological features (morpholysis). Sectorial repression of genes participating either in development or in terminal differentiation is considered instrumental in establishing stable cell types, and provides a basis for the distinction between determination and cell type specification. The gamut of possible stable cell types may have been broadened by the appearance in evolution of heavy isochores. Additional types of relatively frequent GC-rich cis-acting DNA motifs may offer reiterated binding sites to factors endowed with a selective (though not individually strong) affinity for these motifs. The majority of sequence motifs thought to be used in superrepression need not be individually maintained by natural selection. It is re-emphasized that the dispensability of sequences is not an indicator of their nonfunctionality and that in many cases, along noncoding sequences, nucleotides tend to fill functions collectively, rather than individually.
Collapse
Affiliation(s)
- E Zuckerkandl
- Institute of Molecular Medical Sciences, Palo Alto, CA 94306, USA
| |
Collapse
|
72
|
Lobry JR. Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. Gene X 1997; 205:309-16. [PMID: 9461405 DOI: 10.1016/s0378-1119(97)00403-4] [Citation(s) in RCA: 106] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The amino-acid composition of 23,490 proteins from 59 bacterial species was analyzed as a function of genomic G+C content. Observed amino-acid frequencies were compared with those expected from a neutral model assuming the absence of selection on average protein composition. Integral membrane proteins and non-integral membrane proteins were analyzed separately. The average deviation from this neutral model shows that there is a selective pressure increasing content in charged amino acids for non-integral membrane proteins, and content in hydrophobic amino acids for integral membrane proteins. Amino-acid frequencies were greatly influenced by genomic G+C content, but the influence was found to be often weaker than predicted. This may be evidence for a selective pressure, maintaining most amino-acid frequencies close to an optimal value. Concordance between the genetic code and protein composition is discussed in the light of this observation.
Collapse
Affiliation(s)
- J R Lobry
- CNRS UMR 5558-Laboratoire BGBP, Université Claude Bernard, Villeurbanne, France.
| |
Collapse
|