101
|
Marsolier-Kergoat MC, Goldar A. DNA replication induces compositional biases in yeast. Mol Biol Evol 2011; 29:893-904. [PMID: 21948086 DOI: 10.1093/molbev/msr240] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Asymmetries intrinsic to the process of DNA replication are expected to cause differences in the substitution patterns of the leading and the lagging strands and to induce compositional biases. These biases have been detected in the majority of eubacterial genomes but rarely in eukaryotes. Only in the human genome, the activity of a minority of replication origins seems to generate compositional biases. In this work, we provide evidence for replication-associated GC and TA skews in the genomes of two yeast species, Saccharomyces cerevisiae and Kluyveromyces lactis, whereas the data for the Schizosaccharomyces pombe genome are less conclusive. In contrast with the genomes of Homo sapiens and of the majority of eubacteria, the leading strand is enriched in cytosine and adenine in both S. cerevisiae and K. lactis. We observed significant variations across the interorigin intervals of several substitution rates in the S. cerevisiae lineage since its divergence from S. paradoxus. We also found that the S. cerevisiae genome is far from compositional equilibrium and that its present compositional biases are due to substitution rates operating before its divergence from S. paradoxus. Finally, we observed that replication and transcription tend to be cooriented in the S. cerevisiae genome, especially for genes encoding subunits of protein complexes. Taken together, our results suggest that replication-related compositional biases may be a feature of many eukaryotic genomes despite the stochastic nature of the firing of replication origins in these genomes.
Collapse
|
102
|
Voets AM, van den Bosch BJC, Stassen AP, Hendrickx AT, Hellebrekers DM, Van Laer L, Van Eyken E, Van Camp G, Pyle A, Baudouin SV, Chinnery PF, Smeets HJM. Large scale mtDNA sequencing reveals sequence and functional conservation as major determinants of homoplasmic mtDNA variant distribution. Mitochondrion 2011; 11:964-72. [PMID: 21946566 DOI: 10.1016/j.mito.2011.09.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Revised: 04/19/2011] [Accepted: 09/09/2011] [Indexed: 02/07/2023]
Abstract
The mitochondrial DNA (mtDNA) is highly variable, containing large numbers of pathogenic mutations and neutral polymorphisms. The spectrum of homoplasmic mtDNA variation was characterized in 730 subjects and compared with known pathogenic sites. The frequency and distribution of variants in protein coding genes were inversely correlated with conservation at the amino acid level. Analysis of tRNA secondary structures indicated a preference of variants for the loops and some acceptor stem positions. This comprehensive overview of mtDNA variants distinguishes between regions and positions which are likely not critical, mainly conserved regions with pathogenic mutations and essential regions containing no mutations at all.
Collapse
Affiliation(s)
- A M Voets
- Department of Genetics and Cell Biology, Maastricht University, Maastricht, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
103
|
Comparative genomic analysis of dinucleotide repeats in Tritryps. Gene 2011; 487:29-37. [PMID: 21824509 DOI: 10.1016/j.gene.2011.07.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2011] [Revised: 07/12/2011] [Accepted: 07/14/2011] [Indexed: 12/29/2022]
Abstract
The protozoans Trypanosoma cruzi, Trypanosoma brucei and Leishmania major (Tritryps), are evolutionarily ancient eukaryotes which cause worldwide human parasitosis. They present unique biological features. Indeed, canonical DNA/RNA cis-acting elements remain mostly elusive. Repetitive sequences, originally considered as selfish DNA, have been lately recognized as potentially important functional sequence elements in cell biology. In particular, the dinucleotide patterns have been related to genome compartmentalization, gene evolution and gene expression regulation. Thus, we perform a comparative analysis of the occurrence, length and location of dinucleotide repeats (DRs) in the Tritryp genomes and their putative associations with known biological processes. We observe that most types of DRs are more abundant than would be expected by chance. Complementary DRs usually display asymmetrical strand distribution, favoring TT and GT repeats in the coding strands. In addition, we find that GT repeats are among the longest DRs in the three genomes. We also show that specific DRs are non-uniformly distributed along the polycistronic unit, decreasing toward its boundaries. Distinctive non-uniform density patterns were also found in the intergenic regions, with predominance at the vicinity of the ORFs. These findings further support that DRs may control genome structure and gene expression.
Collapse
|
104
|
Bainbridge MN, Wang M, Wu Y, Newsham I, Muzny DM, Jefferies JL, Albert TJ, Burgess DL, Gibbs RA. Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol 2011; 12:R68. [PMID: 21787409 PMCID: PMC3218830 DOI: 10.1186/gb-2011-12-7-r68] [Citation(s) in RCA: 166] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Revised: 01/16/2011] [Accepted: 07/25/2011] [Indexed: 01/12/2023] Open
Abstract
Background Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood. Results We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS. Conclusions We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.
Collapse
Affiliation(s)
- Matthew N Bainbridge
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
105
|
Mugal CF, Ellegren H. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol 2011; 12:R58. [PMID: 21696599 PMCID: PMC3218846 DOI: 10.1186/gb-2011-12-6-r58] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 05/04/2011] [Accepted: 06/22/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND A major goal in the study of molecular evolution is to unravel the mechanisms that induce variation in the germ line mutation rate and in the genome-wide mutation profile. The rate of germ line mutation is considerably higher for cytosines at CpG sites than for any other nucleotide in the human genome, an increase commonly attributed to cytosine methylation at CpG sites. The CpG mutation rate, however, is not uniform across the genome and, as methylation levels have recently been shown to vary throughout the genome, it has been hypothesized that methylation status may govern variation in the rate of CpG mutation. RESULTS Here, we use genome-wide methylation data from human sperm cells to investigate the impact of DNA methylation on the CpG substitution rate in introns of human genes. We find that there is a significant correlation between the extent of methylation and the substitution rate at CpG sites. Further, we show that the CpG substitution rate is positively correlated with non-CpG divergence, suggesting susceptibility to factors responsible for the general mutation rate in the genome, and negatively correlated with GC content. We only observe a minor contribution of gene expression level, while recombination rate appears to have no significant effect. CONCLUSIONS Our study provides the first direct empirical support for the hypothesis that variation in the level of germ line methylation contributes to substitution rate variation at CpG sites. Moreover, we show that other genomic features also impact on CpG substitution rate variation.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, Uppsala, Sweden
| | | |
Collapse
|
106
|
Du P, Yang Y, Wang H, Liu D, Gao GF, Chen C. A large scale comparative genomic analysis reveals insertion sites for newly acquired genomic islands in bacterial genomes. BMC Microbiol 2011; 11:135. [PMID: 21672261 PMCID: PMC3148964 DOI: 10.1186/1471-2180-11-135] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Accepted: 06/15/2011] [Indexed: 01/15/2023] Open
Abstract
Background Bacterial virulence enhancement and drug resistance are major threats to public health worldwide. Interestingly, newly acquired genomic islands (GIs) from horizontal transfer between different bacteria strains were found in Vibrio cholerae, Streptococcus suis, and Mycobacterium tuberculosis, which caused outbreak of epidemic diseases in recently years. Results Using a large-scale comparative genomic analysis of 1088 complete genomes from all available bacteria (1009) and Archaea (79), we found that newly acquired GIs are often anchored around switch sites of GC-skew (sGCS). After calculating correlations between relative genomic distances of genomic islands to sGCSs and the evolutionary distances of the genomic islands themselves, we found that newly acquired genomic islands are closer to sGCSs than the old ones, indicating that regions around sGCSs are hotspots for genomic island insertion. Conclusions Based on our results, we believe that genomic regions near sGCSs are hotspots for horizontal transfer of genomic islands, which may significantly affect key properties of epidemic disease-causing pathogens, such as virulence and adaption to new environments.
Collapse
Affiliation(s)
- Pengcheng Du
- National Institute for Communicable Disease Control and Prevention, Center for Disease Control and Prevention/State Key Laboratory for Infectious Disease Prevention and Control, Beijing 102206, China
| | | | | | | | | | | |
Collapse
|
107
|
Calistri E, Livi R, Buiatti M. Evolutionary trends of GC/AT distribution patterns in promoters. Mol Phylogenet Evol 2011; 60:228-35. [PMID: 21554969 DOI: 10.1016/j.ympev.2011.04.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Revised: 03/25/2011] [Accepted: 04/17/2011] [Indexed: 11/18/2022]
Abstract
Nucleotide distributions in genomes is known not to be random, showing the presence of specific motifs, long and short range correlations, periodicities, etc. Particularly, motifs are critical for the recognition by specific proteins affecting chromosome organization, transcription and DNA replication but little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA, putatively leading to differential selective pressures throughout evolution. Promoter sequences have a fundamental role in the regulation of gene activity and a vast literature suggests that their conformational landscapes may be a critical factor in gene expression dynamics. On these grounds, with the aim of investigating the putative existence of phylogenetic patterns of promoter base distributions, we analyzed GC/AT ratios along the 1000 nucleotide sequences upstream of TSS in wide sets of promoters belonging to organisms ranging from bacteria to pluricellular eukaryotes. The data obtained showed very clear phylogenetic trends throughout evolution of promoter sequence base distributions. Particularly, in all cases either GC-rich or AT-rich monotone gradients were observed: the former being present in eukaryotes, the latter in bacteria along with strand biases. Moreover, within eukaryotes, GC-rich gradients increased in length from unicellular organisms to plants, to vertebrates and, within them, from ancestral to more recent species. Finally, results were thoroughly discussed with particular attention to the possible correlation between nucleotide distribution patterns, evolution, and the putative existence of differential selection pressures, deriving from structural and/or functional constraints, between and within prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Elisa Calistri
- Dipartimento di Biologia Evoluzionistica, Universita' degli Studi di Firenze, via Romana 19, 50125 Firenze, Italy.
| | | | | |
Collapse
|
108
|
Unexpected functional similarities between gatekeeper tumour suppressor genes and proto-oncogenes revealed by systems biology. J Hum Genet 2011; 56:369-76. [PMID: 21368766 DOI: 10.1038/jhg.2011.21] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Familial tumor suppressor genes comprise two subgroups: caretaker genes (CTs) that repair DNA, and gatekeeper genes (GKs) that trigger cell death. Since GKs may also induce cell cycle delay and thus enhance cell survival by facilitating DNA repair, we hypothesized that the prosurvival phenotype of GKs could be selected during cancer progression, and we used a multivariable systems biology approach to test this. We performed multidimensional data analysis, non-negative matrix factorization and logistic regression to compare the features of GKs with those of their putative antagonists, the proto-oncogenes (POs), as well as with control groups of CTs and functionally unrelated congenital heart disease genes (HDs). GKs and POs closely resemble each other, but not CTs or HDs, in terms of gene structure (P<0.001), expression level and breadth (P<0.01), DNA methylation signature (P<0.001) and evolutionary rate (P<0.001). The similar selection pressures and epigenetic trajectories of GKs and POs so implied suggest a common functional attribute that is strongly negatively selected-that is, a shared phenotype that enhances cell survival. The counterintuitive finding of similar evolutionary pressures affecting GKs and POs raises an intriguing possibility: namely, that cancer microevolution is accelerated by an epistatic cascade in which upstream suppressor gene defects subvert the normal bifunctionality of wild-type GKs by constitutively shifting the phenotype away from apoptosis towards survival. If correct, this interpretation would explain the hitherto unexplained phenomenon of frequent wild-type GK (for example, p53) overexpression in tumors.
Collapse
|
109
|
Chen CL, Duquenne L, Audit B, Guilbaud G, Rappailles A, Baker A, Huvet M, d'Aubenton-Carafa Y, Hyrien O, Arneodo A, Thermes C. Replication-associated mutational asymmetry in the human genome. Mol Biol Evol 2011; 28:2327-37. [PMID: 21368316 DOI: 10.1093/molbev/msr056] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
During evolution, mutations occur at rates that can differ between the two DNA strands. In the human genome, nucleotide substitutions occur at different rates on the transcribed and non-transcribed strands that may result from transcription-coupled repair. These mutational asymmetries generate transcription-associated compositional skews. To date, the existence of such asymmetries associated with replication has not yet been established. Here, we compute the nucleotide substitution matrices around replication initiation zones identified as sharp peaks in replication timing profiles and associated with abrupt jumps in the compositional skew profile. We show that the substitution matrices computed in these regions fully explain the jumps in the compositional skew profile when crossing initiation zones. In intergenic regions, we observe mutational asymmetries measured as differences between complementary substitution rates; their sign changes when crossing initiation zones. These mutational asymmetries are unlikely to result from cryptic transcription but can be explained by a model based on replication errors and strand-biased repair. In transcribed regions, mutational asymmetries associated with replication superimpose on the previously described mutational asymmetries associated with transcription. We separate the substitution asymmetries associated with both mechanisms, which allows us to determine for the first time in eukaryotes, the mutational asymmetries associated with replication and to reevaluate those associated with transcription. Replication-associated mutational asymmetry may result from unequal rates of complementary base misincorporation by the DNA polymerases coupled with DNA mismatch repair (MMR) acting with different efficiencies on the leading and lagging strands. Replication, acting in germ line cells during long evolutionary times, contributed equally with transcription to produce the present abrupt jumps in the compositional skew. These results demonstrate that DNA replication is one of the major processes that shape human genome composition.
Collapse
Affiliation(s)
- Chun-Long Chen
- Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique (CNRS), Gif-sur-Yvette, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
110
|
Steele EJ, Williamson JF, Lester S, Stewart BJ, Millman JA, Carnegie P, Lindley RA, Pain GN, Dawkins RL. Genesis of ancestral haplotypes: RNA modifications and reverse transcription-mediated polymorphisms. Hum Immunol 2010; 72:283-293.e1. [PMID: 21156194 DOI: 10.1016/j.humimm.2010.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2010] [Revised: 11/15/2010] [Accepted: 12/06/2010] [Indexed: 11/30/2022]
Abstract
Understanding the genesis of the block haplotype structure of the genome is a major challenge. With the completion of the sequencing of the Human Genome and the initiation of the HapMap project the concept that the chromosomes of the mammalian genome are a mosaic, or patchwork, of conserved extended block haplotype sequences is now accepted by the mainstream genomics research community. Ancestral Haplotypes (AHs) can be viewed as a recombined string of smaller Polymorphic Frozen Blocks (PFBs). How have such variant extended DNA sequence tracts emerged in evolution? Here the relevant literature on the problem is reviewed from various fields of molecular and cell biology particularly molecular immunology and comparative and functional genomics. Based on our synthesis we then advance a testable molecular and cellular model. A critical part of the analysis concerns the origin of the strand biased mutation signatures in the transcribed regions of the human and higher primate genome, A-to-G versus T-to-C (ratio ∼ 1.5 fold) and C-to-T versus G-to-A (≥ 1.5 fold). A comparison and evaluation of the current state of the fields of immunoglobulin Somatic Hypermutation (SHM) and Transcription-Coupled DNA Repair focused on how mutations in newly synthesized RNA might be copied back to DNA thus accounting for some of the genome-wide strand biases (e.g., the A-to-G vs T-to-C component of the strand biased spectrum). We hypothesize that the genesis of PFBs and extended AHs occurs during mutagenic episodes in evolution (e.g., retroviral infections) and that many of the critical DNA sequence diversifying events occur first at the RNA level, e.g., recombination between RNA strings resulting in tandem and dispersed RNA duplications (retroduplications), RNA mutations via adenosine-to-inosine pre-mRNA editing events as well as error prone RNA synthesis. These are then copied back into DNA by a cellular reverse transcription process (also likely to be error-prone) that we have called "reverse transcription-mediated long DNA conversion." Finally we suggest that all these activities and others can be envisaged as being brought physically under the umbrella of special sites in the nucleus involved in transcription known as "transcription factories."
Collapse
Affiliation(s)
- Edward J Steele
- C.Y O'Connor ERADE Village Foundation, Canning Vale, Western Australia, Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
111
|
Nakken S, Rødland EA, Hovig E. Impact of DNA physical properties on local sequence bias of human mutation. Hum Mutat 2010; 31:1316-25. [PMID: 20886615 DOI: 10.1002/humu.21371] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 08/31/2010] [Indexed: 01/07/2023]
Abstract
In selectively neutral regions of the human genome, nucleotide substitutions do not occur at random with respect to the local DNA sequence neighborhood. However, apart from the hypermutability of methylated CpG dinucleotides, which can explain the overrepresentation of nucleotide transitions in this context, the sequence-specific factors underlying point mutation bias remain largely to be determined, both in nature and in quantitative impact. One hypothesis suggests that the physical characteristics of a DNA context could have a modulating effect on its mutability, adjusting the impact of damage or the efficiency of repair. Here, we report a genome-wide computational test of this hypothesis, in which we utilize a constrained set of human non-CpG SNPs as the source of selectively neutral germline mutations. Interestingly, we observe that the quantitative context-dependencies of some substitution types display significant associations to measures of local structural topography and helix stability in DNA. Most prominently, we find that the local sequence bias of transition mutations is significantly associated with the sequence-dependent level of helix instability imposed by the potentially underlying DNA mismatches. The results of our work indicate the extent to which DNA physical properties could have shaped the recent point mutational spectrum in the human genome.
Collapse
Affiliation(s)
- Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Norwegian Radium Hospital, Norway
| | | | | |
Collapse
|
112
|
Weber CC, Hurst LD. Intronic AT skew is a defendable proxy for germline transcription but does not predict crossing-over or protein evolution rates in Drosophila melanogaster. J Mol Evol 2010; 71:415-26. [PMID: 20938653 DOI: 10.1007/s00239-010-9395-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 09/17/2010] [Indexed: 01/28/2023]
Abstract
Recent evidence suggests that germline transcription may affect both protein evolutionary rates, possibly mediated by repair processes, and recombination rates, possibly mediated by chromatin and epigenetic modification. Here, we test these propositions in Drosophila melanogaster. The challenge for such analyses is to provide defendable measures of germline gene expression. Intronic AT skew is a good candidate measure as it is thought to be a consequence, at least in part, of transcription-coupled repair. Prior evidence suggests that intronic AT skew in D. melanogaster is not affected by proximity to intron extremities and differs between transcribed DNA and flanking sequence. We now also establish that intronic AT skew is a defendable proxy for germline expression as (a) it is more similar than expected by chance between introns of the same gene (which is not accounted for by physical proximity), (b) is correlated with male germline expression, and (c) is more pronounced in broadly expressed genes. Furthermore, (d) a trend for intronic skew to differ between 3' and 5' ends of genes is particular to broadly expressed genes. Finally, (e) controlling for physical distance, introns of proximate genes are most different in skew if they have different tissue specificity. We find that intronic AT skew, employed as a proxy for germline transcription, correlates neither with recombination rates nor with the rate of protein evolution. We conclude that there is no prima facie evidence that germline expression modulates recombination rates or monotonically affects protein evolution rates in D. melanogaster.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
113
|
Wei SJ, Shi M, Chen XX, Sharkey MJ, van Achterberg C, Ye GY, He JH. New views on strand asymmetry in insect mitochondrial genomes. PLoS One 2010; 5:e12708. [PMID: 20856815 PMCID: PMC2939890 DOI: 10.1371/journal.pone.0012708] [Citation(s) in RCA: 198] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2009] [Accepted: 08/20/2010] [Indexed: 01/16/2023] Open
Abstract
Strand asymmetry in nucleotide composition is a remarkable feature of animal mitochondrial genomes. Understanding the mutation processes that shape strand asymmetry is essential for comprehensive knowledge of genome evolution, demographical population history and accurate phylogenetic inference. Previous studies found that the relative contributions of different substitution types to strand asymmetry are associated with replication alone or both replication and transcription. However, the relative contributions of replication and transcription to strand asymmetry remain unclear. Here we conducted a broad survey of strand asymmetry across 120 insect mitochondrial genomes, with special reference to the correlation between the signs of skew values and replication orientation/gene direction. The results show that the sign of GC skew on entire mitochondrial genomes is reversed in all species of three distantly related families of insects, Philopteridae (Phthiraptera), Aleyrodidae (Hemiptera) and Braconidae (Hymenoptera); the replication-related elements in the A+T-rich regions of these species are inverted, confirming that reversal of strand asymmetry (GC skew) was caused by inversion of replication origin; and finally, the sign of GC skew value is associated with replication orientation but not with gene direction, while that of AT skew value varies with gene direction, replication and codon positions used in analyses. These findings show that deaminations during replication and other mutations contribute more than selection on amino acid sequences to strand compositions of G and C, and that the replication process has a stronger affect on A and T content than does transcription. Our results may contribute to genome-wide studies of replication and transcription mechanisms.
Collapse
Affiliation(s)
- Shu-Jun Wei
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
- Institute of Plant and Environmental Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Min Shi
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Xue-Xin Chen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Michael J. Sharkey
- Department of Entomology, University of Kentucky, Lexington, Kentucky, United States of America
| | | | - Gong-Yin Ye
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jun-Hua He
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
114
|
Abstract
Genomes encode multiple signals, raising the question of how these different codes are organized along the linear genome sequence. Within protein-coding regions, the redundancy of the genetic code can, in principle, allow for the overlapping encoding of signals in addition to the amino acid sequence, but it is not known to what extent genomes exploit this potential and, if so, for what purpose. Here, we systematically explore whether protein-coding regions accommodate overlapping codes, by comparing the number of occurrences of each possible short sequence within the protein-coding regions of over 700 species from viruses to plants, to the same number in randomizations that preserve amino acid sequence and codon bias. We find that coding regions across all phyla encode additional information, with bacteria carrying more information than eukaryotes. The detailed signals consist of both known and potentially novel codes, including position-dependent secondary RNA structure, bacteria-specific depletion of transcription and translation initiation signals, and eukaryote-specific enrichment of microRNA target sites. Our results suggest that genomes may have evolved to encode extensive overlapping information within protein-coding regions.
Collapse
|
115
|
Mutation biases and mutation rate variation around very short human microsatellites revealed by human-chimpanzee-orangutan genomic sequence alignments. J Mol Evol 2010; 71:192-201. [PMID: 20700734 DOI: 10.1007/s00239-010-9377-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Accepted: 07/26/2010] [Indexed: 01/21/2023]
Abstract
I have studied mutation patterns around very short microsatellites, focusing mainly on sequences carrying only two repeat units. By using human-chimpanzee-orangutan alignments, inferences can be made about both the relative rates of mutations and which bases have mutated. I find remarkable non-randomness, with mutation rate depending on a base's position relative to the microsatellite, the identity of the base itself and the motif in the microsatellite. Comparing the patterns around AC2 with those around other four-base combinations reveals that AC2 does not stand out as being special in the sense that non-repetitive tetramers also generate strong mutation biases. However, comparing AC2 and AC3 with AC4 reveals a step change in both the rate and nature of mutations occurring, suggesting a transition state, AC4 exhibiting an alternating high-low mutation rate pattern consistent with the sequence patterning seen around longer microsatellites. Surprisingly, most changes in repeat number occur through base substitutions rather than slippage, and the relative probability of gaining versus losing a repeat in this way varies greatly with repeat number. Slippage mutations reveal rather similar patterns of mutability compared with point mutations, being rare at two repeats where most cause the loss of a repeat, with both mutation rate and the proportion of expansion mutations increasing up to 6-8 repeats. Inferences about longer repeat tracts are hampered by uncertainties about the proportion of multi-species alignments that fail due to multi-repeat mutations and other rearrangements.
Collapse
|
116
|
Baele G, Van de Peer Y, Vansteelandt S. Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences. BMC Evol Biol 2010; 10:244. [PMID: 20698960 PMCID: PMC2928787 DOI: 10.1186/1471-2148-10-244] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Accepted: 08/10/2010] [Indexed: 12/04/2022] Open
Abstract
Background Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the context-dependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence. Results We find strong support, in terms of Bayes Factors, for using a second-order Markov chain at the ancestral root sequence along with a context-dependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a first-order Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single context-independent set of independent model frequencies as presented in previous work, yields a further drastic increase in model fit. We show that the substitution rates associated with the CpG-methylation-deamination process can be modelled through context-dependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuous-time approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences. Conclusions We show that the combination of a dependency scheme at the ancestral root sequence and a context-dependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate context-dependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging.
Collapse
Affiliation(s)
- Guy Baele
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium
| | | | | |
Collapse
|
117
|
Abstract
Transcribed regions in the human genome differ from adjacent intergenic regions in transposable element density, crossover rates, and asymmetric substitution and sequence composition patterns. We tested whether these differences reflect selection or are instead a byproduct of germline transcription, using publicly available gene expression data from a variety of germline and somatic tissues. Crossover rate shows a strong negative correlation with gene expression in meiotic tissues, suggesting that crossover is inhibited by transcription. Strand-biased composition (G+T content) and A → G versus T → C substitution asymmetry are both positively correlated with germline gene expression. We find no evidence for a strand bias in allele frequency data, implying that the substitution asymmetry reflects a mutation rather than a fixation bias. The density of transposable elements is positively correlated with germline expression, suggesting that such elements preferentially insert into regions that are actively transcribed. For each of the features examined, our analyses favor a nonselective explanation for the observed trends and point to the role of germline gene expression in shaping the mammalian genome.
Collapse
Affiliation(s)
- Graham McVicker
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
118
|
Baele G, Van de Peer Y, Vansteelandt S. Using non-reversible context-dependent evolutionary models to study substitution patterns in primate non-coding sequences. J Mol Evol 2010; 71:34-50. [PMID: 20623275 DOI: 10.1007/s00239-010-9362-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/26/2010] [Indexed: 11/28/2022]
Abstract
We discuss the importance of non-reversible evolutionary models when analyzing context-dependence. Given the inherent non-reversible nature of the well-known CpG-methylation-deamination process in mammalian evolution, non-reversible context-dependent evolutionary models may be well able to accurately model such a process. In particular, the lack of constraints on non-reversible substitution models might allow for more accurate estimation of context-dependent substitution parameters. To demonstrate this, we have developed different time-homogeneous context-dependent evolutionary models to analyze a large genomic dataset of primate ancestral repeats based on existing independent evolutionary models. We have calculated the difference in model fit for each of these models using Bayes Factors obtained via thermodynamic integration. We find that non-reversible context-dependent models can drastically increase model fit when compared to independent models and this on two primate non-coding datasets. Further, we show that further improvements are possible by clustering similar parameters across contexts.
Collapse
Affiliation(s)
- Guy Baele
- Department of Plant Systems Biology, VIB, Ghent University, Technologiepark 927, 9052, Ghent, Belgium.
| | | | | |
Collapse
|
119
|
Polak P, Querfurth R, Arndt PF. The evolution of transcription-associated biases of mutations across vertebrates. BMC Evol Biol 2010; 10:187. [PMID: 20565875 PMCID: PMC2927911 DOI: 10.1186/1471-2148-10-187] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 06/18/2010] [Indexed: 02/03/2024] Open
Abstract
Background The interplay between transcription and mutational processes can lead to particular mutation patterns in transcribed regions of the genome. Transcription introduces several biases in mutational patterns; in particular it invokes strand specific mutations. In order to understand the forces that have shaped transcripts during evolution, one has to study mutation patterns associated with transcription across animals. Results Using multiple alignments of related species we estimated the regional single-nucleotide substitution patterns along genes in four vertebrate taxa: primates, rodents, laurasiatheria and bony fishes. Our analysis is focused on intronic and intergenic regions and reveals differences in the patterns of substitution asymmetries between mammals and fishes. In mammals, the levels of asymmetries are stronger for genes starting within CpG islands than in genes lacking this property. In contrast to all other species analyzed, we found a mutational pressure in dog and stickleback, promoting an increase of GC-contents in the proximity to transcriptional start sites. Conclusions We propose that the asymmetric patterns in transcribed regions are results of transcription associated mutagenic processes and transcription coupled repair, which both seem to evolve in a taxon related manner. We also discuss alternative mechanisms that can generate strand biases and involves error prone DNA polymerases and reverse transcription. A localized increase of the GC content near the transcription start site is a signature of biased gene conversion (BGC) that occurs during recombination and heteroduplex formation. Since dog and stickleback are known to be subject to rapid adaptations due to population bottlenecks and breeding, we further hypothesize that an increase in recombination rates near gene starts has been part of an adaptive process.
Collapse
Affiliation(s)
- Paz Polak
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | |
Collapse
|
120
|
Abstract
The accumulation of base substitutions (mutations) not subject to natural selection is the neutral mutation rate. Because this rate reflects the in vivo processes involved in maintaining the integrity of genetic information, the factors that affect the neutral mutation rate are of considerable interest. Mammals exhibit two dramatically different neutral mutation rates: the CpG mutation rate, wherein the C of most CpGs (i.e., methyl-CpG) mutate at 10-50 times that of C in any other context or of any other base. The latter mutations constitute the non-CpG rate. The high CpG rate results from the spontaneous deamination of methyl-C to T and incomplete restoration of the ensuing T:G mismatches to C:Gs. Here, we determined the neutral non-CpG mutation rate as a function of CpG content by comparing sequence divergence of thousands of pairs of neutrally evolving chimpanzee and human orthologs that differ primarily in CpG content. Both the mutation rate and the mutational spectrum (transition/transversion ratio) of non-CpG residues change in parallel as sigmoidal (logistic) functions of CpG content. As different mechanisms generate transitions and transversions, these results indicate that both mutation rate and mutational processes are contingent on the local CpG content. We consider several possible mechanisms that might explain how CpG exerts these effects.
Collapse
Affiliation(s)
- Jean-Claude Walser
- Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes and Digestive and Kidney diseases, National Institutes of Health, Bethesda, Maryland 20892-0830, USA
| | | |
Collapse
|
121
|
Kim H, Lee BS, Tomita M, Kanai A. Transcription-associated mutagenesis increases protein sequence diversity more effectively than does random mutagenesis in Escherichia coli. PLoS One 2010; 5:e10567. [PMID: 20479947 PMCID: PMC2866735 DOI: 10.1371/journal.pone.0010567] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 04/19/2010] [Indexed: 01/15/2023] Open
Abstract
Background During transcription, the nontranscribed DNA strand becomes single-stranded DNA (ssDNA), which can form secondary structures. Unpaired bases in the ssDNA are less protected from mutagens and hence experience more mutations than do paired bases. These mutations are called transcription-associated mutations. Transcription-associated mutagenesis is increased under stress and depends on the DNA sequence. Therefore, selection might significantly influence protein-coding sequences in terms of the transcription-associated mutability per transcription event under stress to improve the survival of Escherichia coli. Methodology/Principal Findings The mutability index (MI) was developed by Wright et al. to estimate the relative transcription-associated mutability of bases per transcription event. Using the most stable fold of each ssDNA that have an average length n, MI was defined as (the number of folds in which the base is unpaired)/n×(highest –ΔG of all n folds in which the base is unpaired), where ΔG is the free energy. The MI values show a significant correlation with mutation data under stress but not with spontaneous mutations in E. coli. Protein sequence diversity is preferred under stress but not under favorable conditions. Therefore, we evaluated the selection pressure on MI in terms of the protein sequence diversity for all the protein-coding sequences in E. coli. The distributions of the MI values were lower at bases that could be substituted with each of the other three bases without affecting the amino acid sequence than at bases that could not be so substituted. Start codons had lower distributions of MI values than did nonstart codons. Conclusions/Significance Our results suggest that the majority of protein-coding sequences have evolved to promote protein sequence diversity and to reduce gene knockout under stress. Consequently, transcription-associated mutagenesis increases protein sequence diversity more effectively than does random mutagenesis under stress. Nonrandom transcription-associated mutagenesis under stress should improve the survival of E. coli.
Collapse
Affiliation(s)
- Hyunchul Kim
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Baek-Seok Lee
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Akio Kanai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
- * E-mail:
| |
Collapse
|
122
|
Kondrashov FA, Kondrashov AS. Measurements of spontaneous rates of mutations in the recent past and the near future. Philos Trans R Soc Lond B Biol Sci 2010; 365:1169-76. [PMID: 20308091 PMCID: PMC2871817 DOI: 10.1098/rstb.2009.0286] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The rate of spontaneous mutation in natural populations is a fundamental parameter for many evolutionary phenomena. Because the rate of mutation is generally low, most of what is currently known about mutation has been obtained through indirect, complex and imprecise methodological approaches. However, in the past few years genome-wide sequencing of closely related individuals has made it possible to estimate the rates of mutation directly at the level of the DNA, avoiding most of the problems associated with using indirect methods. Here, we review the methods used in the past with an emphasis on next generation sequencing, which may soon make the accurate measurement of spontaneous mutation rates a matter of routine.
Collapse
Affiliation(s)
- Fyodor A Kondrashov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, , C/Dr. Aiguader 88, Barcelona Biomedical Research Park Building 08003, Barcelona, Spain.
| | | |
Collapse
|
123
|
Eory L, Halligan DL, Keightley PD. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol Biol Evol 2010; 27:177-92. [PMID: 19759235 DOI: 10.1093/molbev/msp219] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.
Collapse
Affiliation(s)
- Lél Eory
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom.
| | | | | |
Collapse
|
124
|
Ying H, Epps J, Williams R, Huttley G. Evidence that localized variation in primate sequence divergence arises from an influence of nucleosome placement on DNA repair. Mol Biol Evol 2010; 27:637-49. [PMID: 19843619 PMCID: PMC2822288 DOI: 10.1093/molbev/msp253] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Understanding the origins of localized substitution rate heterogeneity has important implications for identifying functional genomic sequences. Outside of gene regions, the origins of rate heterogeneity remain unclear. Experimental studies establish that chromatin compaction affects rates of both DNA lesion formation and repair. A functional association between chromatin status and 5-methyl-cytosine also exists. These suggest that both the total rate and the type of substitution will be affected by chromatin status. Regular positioning of nucleosomes, the building block of chromatin, further predicts that substitution rate and type should vary spatially in an oscillating manner. We addressed chromatin's influence on substitution rate and type in primates. Matched numbers of sites were sampled from Dnase I hypersensitive (DHS) and closed chromatin control flank (Flank). Likelihood ratio tests revealed significant excesses of total and of transition substitutions in Flank compared with matched DHS for both intergenic and intronic samples. An additional excess of CpG transitions was evident for the intergenic, but not intronic, regions. Fluctuation in substitution rate along approximately 1,800 primate promoters was measured using phylogenetic footprinting. Significant positive correlations were evident between the substitution rate and a nucleosome score from resting human T-cells, with up to approximately 50% of the variance in substitution rate accounted for. Using signal processing techniques, a dominant oscillation at approximately 200 bp was evident in both the substitution rate and the nucleosome score. Our results support a role for differential DNA repair rates between open and closed chromatin in the spatial distribution of rate heterogeneity.
Collapse
Affiliation(s)
- Hua Ying
- John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Julian Epps
- School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, New South Wales, Australia
| | - Rohan Williams
- John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Gavin Huttley
- John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| |
Collapse
|
125
|
Geraci G, D'Elia I, del Gaudio R, Di Giaimo R. Evidence of genetic instability in tumors and normal nearby tissues. PLoS One 2010; 5:e9343. [PMID: 20186333 PMCID: PMC2826410 DOI: 10.1371/journal.pone.0009343] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2009] [Accepted: 01/18/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Comprehensive analyses have recently been performed on many human cancer tissues, leading to the identification of a number of mutated genes but providing no information on the variety of mutations present in each of them. This information is of interest to understand the possible origin of gene mutations that cause tumors. METHODOLOGY/PRINCIPAL FINDINGS We have analyzed the sequence heterogeneity of the transcripts of the human HPRT and G6PD single copy genes that are not considered tumor markers. Analyses have been performed on different colon cancers and on the nearby histologically normal tissues of two male patients. Several copies of each cDNA, which were produced by cloning the RT-PCR-amplified fragments of the specific mRNA, have been sequenced. Similar analyses have been performed on blood samples of two ostensibly healthy males as reference controls. The sequence heterogeneity of the HPRT and G6PD genes was also determined on DNA from tumor tissues. The employed analytical approach revealed the presence of low-frequency mutations not detectable by other procedures. The results show that genetic heterogeneity is detectable in HPRT and G6PD transcripts in both tumors and nearby healthy tissues of the two studied colon tumors. Similar frequencies of mutations are observed in patient genomic DNA, indicating that mutations have a somatic origin. HPRT transcripts show genetic heterogeneity also in healthy individuals, in agreement with previous results on human T-cells, while G6PD transcript heterogeneity is a characteristic of the patient tissues. Interestingly, data on TP53 show little, if any, heterogeneity in the same tissues. CONCLUSIONS/SIGNIFICANCE These findings show that genetic heterogeneity is a peculiarity not only of cancer cells but also of the normal tissue where a tumor arises.
Collapse
Affiliation(s)
- Giuseppe Geraci
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
- Ceinge Biotecnologie Avanzate s.c. a r.l., Napoli, Italy
| | - Ida D'Elia
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Rosanna del Gaudio
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Rossella Di Giaimo
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
- * E-mail:
| |
Collapse
|
126
|
Abstract
Is it possible to mutate DNA during transcription? A new study shows that UV-damaged DNA is deaminated during transcription, which is a probable mechanism underlying CC tandem mutations found in the p53 gene in skin cancers.
Collapse
Affiliation(s)
- Thomas Helleday
- Gray Institute for Radiation Oncology and Biology, University of Oxford, Oxford, OX3 7DQ, UK.
| |
Collapse
|
127
|
Mugal CF, Wolf JBW, von Grünberg HH, Ellegren H. Conservation of neutral substitution rate and substitutional asymmetries in mammalian genes. Genome Biol Evol 2010; 2:19-28. [PMID: 20333222 PMCID: PMC2839347 DOI: 10.1093/gbe/evp056] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2009] [Indexed: 12/21/2022] Open
Abstract
Local variation in neutral substitution rate across mammalian genomes is governed by several factors, including sequence context variables and structural variables. In addition, the interplay of replication and transcription, known to induce a strand bias in mutation rate, gives rise to variation in substitutional strand asymmetries. Here, we address the conservation of variation in mutation rate and substitutional strand asymmetries using primate- and rodent-specific repeat elements located within the introns of protein-coding genes. We find significant but weak conservation of local mutation rates between human and mouse orthologs. Likewise, substitutional strand asymmetries are conserved between human and mouse, where substitution rate asymmetries show a higher degree of conservation than mutation rate. Moreover, we provide evidence that replication and transcription are correlated to the strength of substitutional asymmetries. The effect of transcription is particularly visible for genes with highly conserved gene expression. In comparison with replication and transcription, mutation rate influences the strength of substitutional asymmetries only marginally.
Collapse
Affiliation(s)
- C F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
| | | | | | | |
Collapse
|
128
|
Pink CJ, Hurst LD. Timing of replication is a determinant of neutral substitution rates but does not explain slow Y chromosome evolution in rodents. Mol Biol Evol 2009; 27:1077-86. [PMID: 20026481 DOI: 10.1093/molbev/msp314] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mutation rates, assayed as substitution rates of putatively neutral sites, are highly variable around mammalian genomes: There is heterogeneity between genes, between autosomes, and between X, Y, and autosomes. The differences between X, Y, and autosomes are typically assumed to reflect the greater number of cell divisions in the male germ-line. Such an effect can neither account for within-autosome differences nor does it predict the differences between X, Y, and autosome observed in rodents. It has recently been proposed that in primates, the time during S-phase when a gene is replicated is an important determinant of neutral rates of evolution. Here we ask 1) whether we can replicate this result in rodents, 2) whether different autosomes replicate on average at different times, and 3) whether this might explain differences in their substitution rates. Finally we ask 4) whether X, Y, and autosome replicate at different times and 5) whether any difference might explain why the number of replication events alone cannot explain their substitution rates. We find that, as in primates, autosomal intronic rates of evolution increase significantly during S-phase. Different autosomes do have different average replication times, and together with rearrangement, this is a significant predictor of between-autosome differences in substitution rate. Although we find that autosomal, X-, and Y-linked genes replicate at different times, it is paradoxical that the Y-linked genes replicate latest, and replicate more often, but are not especially fast evolving. These results support the hypothesis that replication timing is an important source of substitution rate heterogeneity.
Collapse
Affiliation(s)
- Catherine J Pink
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | | |
Collapse
|
129
|
Relative mutation rates of each nucleotide for another estimated from allele frequency spectra at human gene loci. Genet Res (Camb) 2009; 91:293-303. [PMID: 19640324 DOI: 10.1017/s0016672309990164] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
This study aims to comprehensively examine the mutation rates of one base for another in human gene loci. In contrast to most previous efforts based on divergence data from untranscribed regions, the present study employs the basic theory of the reversible recurrent mutation model using large-scale, high-quality re-sequencing data from public databases of gene loci. Population mutation parameters (4Nnu and 4Nmu) are obtained for each pair of base substitutions. The estimated parameters show good strand reversal symmetry, supporting the existence of mutation-drift equilibrium. Analysis of specific gene regions including mRNA, coding sequence (CDS), 5'-untranslated region (5'-UTRs), 3'-UTR and intron shows that there are clear differences in the mutation rates of each base for another depending on the location of the base in question. Results from analyses that take the adjacent bases into account exhibit excellent strand reversal symmetry, confirming that the identity of an adjacent base influences mutation rates. The CpG to TpG (or CpG to CpA) substitution is found at a rate approximately seven-fold higher than the reverse transition in intron regions due to cytosine deamination, but the effect is strongly reduced in mRNA regions and almost entirely lost in 5'-UTRs. However, from the overall increased transitions in sites other than CpGs and the proportion of CpGs in the total sequence, CpG methylation is not the main factor responsible for the increased rate of transitions as compared with transversions. In this report, after adjusting average mutation rates to the sequence compositions, no substitution bias is found between A+T and C+G, indicating base composition equilibrium in human gene loci. Population differences are also identified between groups of people of African and European descent, presumably due to past population histories. By applying the basic theory of population genetics to re-sequenced data, this study contributes new, detailed information regarding mutations in human gene regions.
Collapse
|
130
|
Abstract
Recent large-scale cancer sequencing studies have focused primarily on identifying cancer-associated genes, but as an important byproduct provide "passenger mutation" data that can potentially illuminate the mutational mechanisms at work in cancer cells. Here, we explore patterns of nucleotide substitution in several cancer types using published data. We first show that selection (negative or positive) has affected only a small fraction of mutations, allowing us to attribute observed trends to underlying mutational processes rather than selection. We then show that the increased CpG mutation frequency observed in some cancers primarily occurs outside of CpG islands and CpG island shores, thus rejecting the hypothesis that the increase is a byproduct of island or shore methylation followed by deamination. We observe an A-->G vs. T-->C mutational asymmetry in some cancers similar to one that has been observed in germline mutations in transcribed regions, suggesting that the mutation process may be influenced by gene expression. We also demonstrate that the relative frequency of mutations at dinucleotide "hotspots" can be used as a tool to detect likely technical artifacts in large-scale studies.
Collapse
|
131
|
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2009; 20:110-21. [PMID: 19858363 DOI: 10.1101/gr.097857.109] [Citation(s) in RCA: 1515] [Impact Index Per Article: 101.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.
Collapse
Affiliation(s)
- Katherine S Pollard
- Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA.
| | | | | | | |
Collapse
|
132
|
Increased rate of human mutations where DNA and RNA polymerases collide. Trends Genet 2009; 25:523-7. [PMID: 19853958 DOI: 10.1016/j.tig.2009.10.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 10/05/2009] [Accepted: 10/05/2009] [Indexed: 12/27/2022]
Abstract
Gene density and orientation of genes in eukaryotes seem to be correlated with the replication origin and the mutation rate is greater in late replicating regions; however, the reason for these patterns is unknown. Here, we investigate predicted replication origins in the human genome and find that levels of polymorphism as well as divergence from the chimpanzee genome are greater in genes transcribed on the lagging strand than those on the leading strand. This might be caused by interference between RNA and DNA polymerases, and avoidance of collisions between these enzymes might be an evolutionary force shaping gene orientation and density surrounding replication start sites. Physical constraints might have a larger influence on genome evolution than previously thought.
Collapse
|
133
|
Yap VB, Lindsay H, Easteal S, Huttley G. Estimates of the effect of natural selection on protein-coding content. Mol Biol Evol 2009; 27:726-34. [PMID: 19815689 PMCID: PMC2822286 DOI: 10.1093/molbev/msp232] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.
Collapse
Affiliation(s)
- Von Bing Yap
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore.
| | | | | | | |
Collapse
|
134
|
Polak P, Arndt PF. Long-range bidirectional strand asymmetries originate at CpG islands in the human genome. Genome Biol Evol 2009; 1:189-97. [PMID: 20333189 PMCID: PMC2817419 DOI: 10.1093/gbe/evp024] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/22/2009] [Indexed: 12/24/2022] Open
Abstract
In the human genome, CpG islands (CGIs), which are GC- and CpG-rich sequences, are associated with transcription starting sites (TSSs); in addition, there is evidence that CGIs harbor origins of bidirectional replication (OBRs) and are preferred sites for heteroduplex formation during recombination. Transcription, replication, and recombination processes are known to induce specific mutational patterns in various genomes, and therefore, these patterns are expected to be found around CGIs. We use triple alignments of human, chimp, and macaque to compute the rates of nucleotide substitutions in up to 1 Mbps long intergenic regions on both sides of CGIs. Our analysis revealed that around a CGI there is an asymmetry between complementary substitution rates that is similar to the one that found around the OBR in bacteria. We hypothesize that these asymmetries are induced by differences in the replication of the leading and lagging strand and that a significant number of CGIs overlap OBRs. Within CGIs, we observed a mutational signature of GC-biased gene conversion that is associated with recombination. We suggest that recombination has played a major role in the creation of CGIs.
Collapse
Affiliation(s)
- Paz Polak
- Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | |
Collapse
|
135
|
Poptsova MS, Larionov SA, Ryadchenko EV, Rybalko SD, Zakharov IA, Loskutov A. Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes. PLoS One 2009; 4:e6396. [PMID: 19636424 PMCID: PMC2712679 DOI: 10.1371/journal.pone.0006396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed.
Collapse
Affiliation(s)
- Maria S Poptsova
- University of Connecticut, Storrs, Connecticut, United States of America.
| | | | | | | | | | | |
Collapse
|
136
|
Understanding what determines the frequency and pattern of human germline mutations. Nat Rev Genet 2009; 10:478-88. [PMID: 19488047 DOI: 10.1038/nrg2529] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Surprising findings about human germline mutation have come from applying new technologies to detect rare mutations in germline DNA, from analysing DNA sequence divergence between humans and closely related species, and from investigating human polymorphic variation. In this Review we discuss how these approaches affect our current understanding of the roles of sex, age, mutation hot spots, germline selection and genomic factors in determining human nucleotide substitution mutation patterns and frequencies. To enhance our understanding of mutation and disease, more extensive molecular data on the human germ line with regard to mutation origin, DNA repair, epigenetic status and the effect of newly arisen mutations on gamete development are needed.
Collapse
|
137
|
Kvikstad EM, Chiaromonte F, Makova KD. Ride the wavelet: A multiscale analysis of genomic contexts flanking small insertions and deletions. Genome Res 2009; 19:1153-64. [PMID: 19502380 DOI: 10.1101/gr.088922.108] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent studies have revealed that insertions and deletions (indels) are more different in their formation than previously assumed. What remains enigmatic is how the local DNA sequence context contributes to these differences. To investigate the relative impact of various molecular mechanisms to indel formation, we analyzed sequence contexts of indels in the non protein- or RNA-coding, nonrepetitive (NCNR) portion of the human genome. We considered small (<or=30-bp) indels occurring in the human lineage since its divergence from chimpanzee and used wavelet techniques to study, simultaneously for multiple scales, the spatial patterns of short sequence motifs associated with indel mutagenesis. In particular, we focused on motifs associated with DNA polymerase activity, topoisomerase cleavage, double-strand breaks (DSBs), and their repair. We came to the following conclusions. First, many motifs are characterized by unique enrichment profiles in the vicinity of indels vs. indel-free portions of the genome, verifying the importance of sequence context in indel mutagenesis. Second, only limited similarity in motif frequency profiles is evident flanking insertions vs. deletions, confirming differences in their mutagenesis. Third, substantial similarity in frequency profiles exists between pairs of individual motifs flanking insertions (and separately deletions), suggesting "cooperation" among motifs, and thus molecular mechanisms, during indel formation. Fourth, the wavelet analyses demonstrate that all these patterns are highly dependent on scale (the size of an interval considered). Finally, our results depict a model of indel mutagenesis comprising both replication and recombination (via repair of paused replication forks and site-specific recombination).
Collapse
Affiliation(s)
- Erika M Kvikstad
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | |
Collapse
|
138
|
Pink CJ, Swaminathan SK, Dunham I, Rogers J, Ward A, Hurst LD. Evidence that replication-associated mutation alone does not explain between-chromosome differences in substitution rates. Genome Biol Evol 2009; 1:13-22. [PMID: 20333173 PMCID: PMC2817397 DOI: 10.1093/gbe/evp001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2009] [Indexed: 12/12/2022] Open
Abstract
Since Haldane first noticed an excess of paternally derived mutations, it has been considered that most mutations derive from errors during germ line replication. Miyata et al. (1987) proposed that differences in the rate of neutral evolution on X, Y, and autosome can be employed to measure the extent of this male bias. This commonly applied method assumes replication to be the sole source of between-chromosome variation in substitution rates. We propose a simple test of this assumption: If true, estimates of the male bias should be independent of which two chromosomal classes are compared. Prior evidence from rodents suggested that this might not be true, but conclusions were limited by a lack of rat Y-linked sequence. We therefore sequenced two rat Y-linked bacterial artificial chromosomes and determined evolutionary rate by comparison with mouse. For estimation of rates we consider both introns and synonymous rates. Surprisingly, for both data sets the prediction of congruent estimates of alpha is strongly rejected. Indeed, some comparisons suggest a female bias with autosomes evolving faster than Y-linked sequence. We conclude that the method of Miyata et al. (1987) has the potential to provide incorrect estimates. Correcting the method requires understanding of the other causes of substitution that might differ between chromosomal classes. One possible cause is recombination-associated substitution bias for which we find some evidence. We note that if, as some suggest, this association is dominantly owing to male recombination, the high estimates of alpha seen in birds is to be expected as Z chromosomes recombine in males.
Collapse
Affiliation(s)
- Catherine J Pink
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | | | | | | | | | | |
Collapse
|
139
|
Abstract
Why are some genomic positions more mutable than others? The identification of cryptic mutation hotspots in the human genome indicates that the determinants of mutation rates are more complex than anticipated.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, France.
| |
Collapse
|
140
|
Hodgkinson A, Ladoukakis E, Eyre-Walker A. Cryptic variation in the human mutation rate. PLoS Biol 2009; 7:e1000027. [PMID: 19192947 PMCID: PMC2634788 DOI: 10.1371/journal.pbio.1000027] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 12/12/2008] [Indexed: 11/18/2022] Open
Abstract
The mutation rate is known to vary between adjacent sites within the human genome as a consequence of context, the most well-studied example being the influence of CpG dinucelotides. We investigated whether there is additional variation by testing whether there is an excess of sites at which both humans and chimpanzees have a single-nucleotide polymorphism (SNP). We found a highly significant excess of such sites, and we demonstrated that this excess is not due to neighbouring nucleotide effects, ancestral polymorphism, or natural selection. We therefore infer that there is cryptic variation in the mutation rate. However, although this variation in the mutation rate is not associated with the adjacent nucleotides, we show that there are highly nonrandom patterns of nucleotides that extend approximately 80 base pairs on either side of sites with coincident SNPs, suggesting that there are extensive and complex context effects. Finally, we estimate the level of variation needed to produce the excess of coincident SNPs and show that there is a similar, or higher, level of variation in the mutation rate associated with this cryptic process than there is associated with adjacent nucleotides, including the CpG effect. We conclude that there is substantial variation in the mutation that has, until now, been hidden from view.
Collapse
|
141
|
Sasaki S, Mello CC, Shimada A, Nakatani Y, Hashimoto SI, Ogawa M, Matsushima K, Gu SG, Kasahara M, Ahsan B, Sasaki A, Saito T, Suzuki Y, Sugano S, Kohara Y, Takeda H, Fire A, Morishita S. Chromatin-associated periodicity in genetic variation downstream of transcriptional start sites. Science 2009; 323:401-4. [PMID: 19074313 PMCID: PMC2757552 DOI: 10.1126/science.1163183] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Might DNA sequence variation reflect germline genetic activity and underlying chromatin structure? We investigated this question using medaka (Japanese killifish, Oryzias latipes), by comparing the genomic sequences of two strains (Hd-rR and HNI) and by mapping approximately 37.3 million nucleosome cores from Hd-rR blastulae and 11,654 representative transcription start sites from six embryonic stages. We observed a distinctive approximately 200-base pair (bp) periodic pattern of genetic variation downstream of transcription start sites; the rate of insertions and deletions longer than 1 bp peaked at positions of approximately +200, +400, and +600 bp, whereas the point mutation rate showed corresponding valleys. This approximately 200-bp periodicity was correlated with the chromatin structure, with nucleosome occupancy minimized at positions 0, +200, +400, and +600 bp. These data exemplify the potential for genetic activity (transcription) and chromatin structure to contribute to molding the DNA sequence on an evolutionary time scale.
Collapse
Affiliation(s)
- Shin Sasaki
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, 277-0882, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
142
|
Necsulea A, Guillet C, Cadoret JC, Prioleau MN, Duret L. The relationship between DNA replication and human genome organization. Mol Biol Evol 2009; 26:729-41. [PMID: 19126867 DOI: 10.1093/molbev/msn303] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Assessment of the impact of DNA replication on genome architecture in Eukaryotes has long been hampered by the scarcity of experimental data. Recent work, relying on computational predictions of origins of replication, suggested that replication might be a major determinant of gene organization in human (Huvet et al. 2007. Human gene organization driven by the coordination of replication and transcription. Genome Res. 17:1278-1285). Here, we address this question by analyzing the first large-scale data set of experimentally determined origins of replication in human: 283 origins identified in HeLa cells, in 1% of the genome covered by ENCODE regions (Cadoret et al. 2008. Genome-wide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci USA. 105:15837-15842). We show that origins of replication are not randomly distributed as they display significant overlap with promoter regions and CpG islands. The hypothesis of a selective pressure to avoid frontal collisions between replication and transcription polymerases is not supported by experimental data as we find no evidence for gene orientation bias in the proximity of origins of replication. The lack of a significant orientation bias remains manifest even when considering only genes expressed at a high rate, or in a wide number of tissues, and is not affected by the regional replication timing. Gene expression breadth does not appear to be correlated with the distance from the origins of replication. We conclude that the impact of DNA replication on human genome organization is considerably weaker than previously proposed.
Collapse
|
143
|
Lindsay H, Yap VB, Ying H, Huttley GA. Pitfalls of the most commonly used models of context dependent substitution. Biol Direct 2008; 3:52. [PMID: 19087239 PMCID: PMC2628887 DOI: 10.1186/1745-6150-3-52] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 12/16/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates. RESULTS We prove that the NF form always nests the independent nucleotide process and that this is not true for the TF form. As a consequence, using TF to study context effects can be misleading, which is shown by both theoretical calculations and simulations. We describe a simple example where a context parameter estimated under TF is confounded with composition terms unless all sequence states are equi-frequent. We illustrate this for the dinucleotide case by simulation under a nucleotide model, showing that the TF form identifies a CpG effect when none exists. Our analysis of primate introns revealed that the effect of nucleotide neighbors is over-estimated under TF compared with NF. Parameter estimates for a number of contexts are also strikingly discordant between the two model forms. CONCLUSION Our results establish that the NF form should be used for analysis of independent-tuple context dependent processes. Although neighboring effects in general are still important, prominent influences such as the elevated CpG transversion rate previously identified using the TF form are an artifact. Our results further suggest as few as 5 parameters may account for approximately 85% of neighboring nucleotide influence.
Collapse
Affiliation(s)
- Helen Lindsay
- Computational Genomics Laboratory, John Curtin School of Medical Research, The Australian National University, Canberra, Australia.
| | | | | | | |
Collapse
|
144
|
Mugal CF, von Grünberg HH, Peifer M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 2008; 26:131-42. [PMID: 18974087 DOI: 10.1093/molbev/msn245] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
If substitution rates are not the same on the two complementary DNA strands, a substitution is considered strand asymmetric. Such substitutional strand asymmetries are determined here for the three most frequent types of substitution on the human genome (C --> T, A --> G, and G --> T). Substitution rate differences between both strands are estimated for 4,590 human genes by aligning all repeats occurring within the introns with their ancestral consensus sequences. For 1,630 of these genes, both coding strand and noncoding strand rates could be compared with rates in gene-flanking regions. All three rates considered are found to be on average higher on the coding strand and lower on the transcribed strand in comparison to their values in the gene-flanking regions. This finding points to the simultaneous action of rate-increasing effects on the coding strand--such as increased adenine and cytosine deamination--and transcription-coupled repair as a rate-reducing effect on the transcribed strand. The common behavior of the three rates leads to strong correlations of the rate asymmetries: Whenever one rate is strand biased, the other two rates are likely to show the same bias. Furthermore, we determine all three rate asymmetries as a function of time: the A --> G and G --> T rate asymmetries are both found to be constant in time, whereas the C --> T rate asymmetry shows a pronounced time dependence, an observation that explains the difference between our results and those of an earlier work by Green et al. (2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514-517.). Finally, we show that in addition to transcription also the replication process biases the substitution rates in genes.
Collapse
Affiliation(s)
- Carina F Mugal
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria
| | | | | |
Collapse
|
145
|
Marín A, Xia X. GC skew in protein-coding genes between the leading and lagging strands in bacterial genomes: New substitution models incorporating strand bias. J Theor Biol 2008; 253:508-13. [DOI: 10.1016/j.jtbi.2008.04.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Revised: 02/29/2008] [Accepted: 04/04/2008] [Indexed: 10/22/2022]
|
146
|
Abstract
A regional analysis of nucleotide substitution rates along human genes and their flanking regions allows us to quantify the effect of mutational mechanisms associated with transcription in germ line cells. Our analysis reveals three distinct patterns of substitution rates. First, a sharp decline in the deamination rate of methylated CpG dinucleotides, which is observed in the vicinity of the 5' end of genes. Second, a strand asymmetry in complementary substitution rates, which extends from the 5' end to 1 kbp downstream from the 3' end, associated with transcription-coupled repair. Finally, a localized strand asymmetry, an excess of C-->T over G-->A substitution in the nontemplate strand confined to the first 1-2 kbp downstream of the 5' end of genes. We hypothesize that higher exposure of the nontemplate strand near the 5' end of genes leads to a higher cytosine deamination rate. Up to now, only the somatic hypermutation (SHM) pathway has been known to mediate localized and strand-specific mutagenic processes associated with transcription in mammalia. The mutational patterns in SHM are induced by cytosine deaminase, which just targets single-stranded DNA. This DNA conformation is induced by R-loops, which preferentially occur at the 5' ends of genes. We predict that R-loops are extensively formed in the beginning of transcribed regions in germ line cells.
Collapse
|
147
|
RNA landscape of evolution for optimal exon and intron discrimination. Proc Natl Acad Sci U S A 2008; 105:5797-802. [PMID: 18391195 DOI: 10.1073/pnas.0801692105] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Accurate pre-mRNA splicing requires primary splicing signals, including the splice sites, a polypyrimidine tract, and a branch site, other splicing-regulatory elements (SREs). The SREs include exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs), and intronic splicing silencers (ISSs), which are typically located near the splice sites. However, it is unclear to what extent splicing-driven selective pressure constrains exonic and intronic sequences, especially those distant from the splice sites. Here, we studied the distribution of SREs in human genes in terms of DNA strand-asymmetry patterns. Under a neutral evolution model, each mononucleotide or oligonucleotide should have a symmetric (Chargaff's second parity rule), or weakly asymmetric yet uniform, distribution throughout a pre-mRNA transcript. However, we found that large sets of unbiased, experimentally determined SREs show a distinct strand-asymmetry pattern that is inconsistent with the neutral evolution model, and reflects their functional roles in splicing. ESEs are selected in exons and depleted in introns and vice versa for ESSs. Surprisingly, this trend extends into deep intronic sequences, accounting for one third of the genome. Selection is detectable even at the mononucleotide level, so that the asymmetric base compositions of exons and introns are predictive of ESEs and ESSs. We developed a method that effectively predicts SREs based on strand asymmetry, expanding the current catalog of SREs. Our results suggest that human genes have been optimized for exon and intron discrimination through an RNA landscape shaped during evolution.
Collapse
|
148
|
Aguilera A, Gómez-González B. Genome instability: a mechanistic view of its causes and consequences. Nat Rev Genet 2008; 9:204-17. [PMID: 18227811 DOI: 10.1038/nrg2268] [Citation(s) in RCA: 555] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genomic instability in the form of mutations and chromosome rearrangements is usually associated with pathological disorders, and yet it is also crucial for evolution. Two types of elements have a key role in instability leading to rearrangements: those that act in trans to prevent instability--among them are replication, repair and S-phase checkpoint factors--and those that act in cis--chromosomal hotspots of instability such as fragile sites and highly transcribed DNA sequences. Taking these elements as a guide, we review the causes and consequences of instability with the aim of providing a mechanistic perspective on the origin of genomic instability.
Collapse
Affiliation(s)
- Andrés Aguilera
- Centro Andaluz de Biologia Molecular y Medicina Regenerativa CABIMER, Universidad de Sevilla-CSIC, Avd. Américo Vespucio s/n, 41092 Sevilla, Spain.
| | | |
Collapse
|
149
|
Krishnan NM, Seligmann H, Rao BJ. Relationship between mRNA secondary structure and sequence variability in Chloroplast genes: possible life history implications. BMC Genomics 2008; 9:48. [PMID: 18226235 PMCID: PMC2276208 DOI: 10.1186/1471-2164-9-48] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Accepted: 01/28/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Synonymous sites are freer to vary because of redundancy in genetic code. Messenger RNA secondary structure restricts this freedom, as revealed by previous findings in mitochondrial genes that mutations at third codon position nucleotides in helices are more selected against than those in loops. This motivated us to explore the constraints imposed by mRNA secondary structure on evolutionary variability at all codon positions in general, in chloroplast systems. RESULTS We found that the evolutionary variability and intrinsic secondary structure stability of these sequences share an inverse relationship. Simulations of most likely single nucleotide evolution in Psilotum nudum and Nephroselmis olivacea mRNAs, indicate that helix-forming propensities of mutated mRNAs are greater than those of the natural mRNAs for short sequences and vice-versa for long sequences. Moreover, helix-forming propensity estimated by the percentage of total mRNA in helices increases gradually with mRNA length, saturating beyond 1000 nucleotides. Protection levels of functionally important sites vary across plants and proteins: r-strategists minimize mutation costs in large genes; K-strategists do the opposite. CONCLUSION Mrna length presumably predisposes shorter mRNAs to evolve under different constraints than longer mRNAs. The positive correlation between secondary structure protection and functional importance of sites suggests that some sites might be conserved due to packing-protection constraints at the nucleic acid level in addition to protein level constraints. Consequently, nucleic acid secondary structure a priori biases mutations. The converse (exposure of conserved sites) apparently occurs in a smaller number of cases, indicating a different evolutionary adaptive strategy in these plants. The differences between the protection levels of functionally important sites for r- and K-strategists reflect their respective molecular adaptive strategies. These converge with increasing domestication levels of K-strategists, perhaps because domestication increases reproductive output.
Collapse
Affiliation(s)
- Neeraja M Krishnan
- Department of Biological Sciences, Tata Institute of Fundamental Research, 1 Homi Bhabha road, Colaba, Mumbai 400005, India.
| | | | | |
Collapse
|
150
|
Evans KJ. Genomic DNA from animals shows contrasting strand bias in large and small subsequences. BMC Genomics 2008; 9:43. [PMID: 18221531 PMCID: PMC2267173 DOI: 10.1186/1471-2164-9-43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Accepted: 01/25/2008] [Indexed: 01/09/2023] Open
Abstract
Background For eukaryotes, there is almost no strand bias with regard to base composition, with exceptions for origins of replication and transcription start sites and transcribed regions. This paper revisits the question for subsequences of DNA taken at random from the genome. Results For a typical mammal, for example mouse or human, there is a small strand bias throughout the genomic DNA: there is a correlation between (G - C) and (A - T) on the same strand, (that is between the difference in the number of guanine and cytosine bases and the difference in the number of adenine and thymine bases). For small subsequences – up to 1 kb – this correlation is weak but positive; but for large windows – around 50 kb to 2 Mb – the correlation is strong and negative. This effect is largely independent of GC%. Transcribed and untranscribed regions give similar correlations both for small and large subsequences, but there is a difference in these regions for intermediate sized subsequences. An analysis of the human genome showed that position within the isochore structure did not affect these correlations. An analysis of available genomes of different species shows that this contrast between large and small windows is a general feature of mammals and birds. Further down the evolutionary tree, other organisms show a similar but smaller effect. Except for the nematode, all the animals analysed showed at least a small effect. Conclusion The correlations on the large scale may be explained by DNA replication. Transcription may be a modifier of these effects but is not the fundamental cause. These results cast light on how DNA mutations affect the genome over evolutionary time. At least for vertebrates, there is a broad relationship between body temperature and the size of the correlation. The genome of mammals and birds has a structure marked by strand bias segments.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|