1
|
Teterina AA, Willis JH, Lukac M, Jovelin R, Cutter AD, Phillips PC. Genomic diversity landscapes in outcrossing and selfing Caenorhabditis nematodes. PLoS Genet 2023; 19:e1010879. [PMID: 37585484 PMCID: PMC10461856 DOI: 10.1371/journal.pgen.1010879] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 08/28/2023] [Accepted: 07/21/2023] [Indexed: 08/18/2023] Open
Abstract
Caenorhabditis nematodes form an excellent model for studying how the mode of reproduction affects genetic diversity, as some species reproduce via outcrossing whereas others can self-fertilize. Currently, chromosome-level patterns of diversity and recombination are only available for self-reproducing Caenorhabditis, making the generality of genomic patterns across the genus unclear given the profound potential influence of reproductive mode. Here we present a whole-genome diversity landscape, coupled with a new genetic map, for the outcrossing nematode C. remanei. We demonstrate that the genomic distribution of recombination in C. remanei, like the model nematode C. elegans, shows high recombination rates on chromosome arms and low rates toward the central regions. Patterns of genetic variation across the genome are also similar between these species, but differ dramatically in scale, being tenfold greater for C. remanei. Historical reconstructions of variation in effective population size over the past million generations echo this difference in polymorphism. Evolutionary simulations demonstrate how selection, recombination, mutation, and selfing shape variation along the genome, and that multiple drivers can produce patterns similar to those observed in natural populations. The results illustrate how genome organization and selection play a crucial role in shaping the genomic pattern of diversity whereas demographic processes scale the level of diversity across the genome as a whole.
Collapse
Affiliation(s)
- Anastasia A. Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
- Center of Parasitology, Severtsov Institute of Ecology and Evolution RAS, Moscow, Russia
| | - John H. Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Matt Lukac
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Asher D. Cutter
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Patrick C. Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
2
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
3
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
4
|
FERMI: A Novel Method for Sensitive Detection of Rare Mutations in Somatic Tissue. G3-GENES GENOMES GENETICS 2019; 9:2977-2987. [PMID: 31352405 PMCID: PMC6723130 DOI: 10.1534/g3.119.400438] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
With growing interest in monitoring mutational processes in normal tissues, tumor heterogeneity, and cancer evolution under therapy, the ability to accurately and economically detect ultra-rare mutations is becoming increasingly important. However, this capability has often been compromised by significant sequencing, PCR and DNA preparation error rates. Here, we describe FERMI (Fast Extremely Rare Mutation Identification) - a novel method designed to eliminate the majority of these sequencing and library-preparation errors in order to significantly improve rare somatic mutation detection. This method leverages barcoded targeting probes to capture and sequence DNA of interest with single copy resolution. The variant calls from the barcoded sequencing data are then further filtered in a position-dependent fashion against an adaptive, context-aware null model in order to distinguish true variants. As a proof of principle, we employ FERMI to probe bone marrow biopsies from leukemia patients, and show that rare mutations and clonal evolution can be tracked throughout cancer treatment, including during historically intractable periods like minimum residual disease. Importantly, FERMI is able to accurately detect nascent clonal expansions within leukemias in a manner that may facilitate the early detection and characterization of cancer relapse.
Collapse
|
5
|
Terekhanova NV, Seplyarskiy VB, Soldatov RA, Bazykin GA. Evolution of Local Mutation Rate and Its Determinants. Mol Biol Evol 2017; 34:1100-1109. [PMID: 28138076 PMCID: PMC5850301 DOI: 10.1093/molbev/msx060] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Mutation rate varies along the human genome, and part of this variation is explainable by measurable local properties of the DNA molecule. Moreover, mutation rates differ between orthologous genomic regions of different species, but the drivers of this change are unclear. Here, we use data on human divergence from chimpanzee, human rare polymorphism, and human de novo mutations to predict the substitution rate at orthologous regions of non-human mammals. We show that the local mutation rates are very similar between human and apes, implying that their variation has a strong underlying cryptic component not explainable by the known genomic features. Mutation rates become progressively less similar in more distant species, and these changes are partially explainable by changes in the local genomic features of orthologous regions, most importantly, in the recombination rate. However, they are much more rapid, implying that the cryptic component underlying the mutation rate is more ephemeral than the known genomic features. These findings shed light on the determinants of mutation rate evolution. Key words local mutation rate, molecular evolution, recombination rate.
Collapse
Affiliation(s)
- Nadezhda V. Terekhanova
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
- M. V. Lomonosov Moscow State University, Moscow, Russia
| | - Vladimir B. Seplyarskiy
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
| | - Ruslan A. Soldatov
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
- M. V. Lomonosov Moscow State University, Moscow, Russia
| | - Georgii A. Bazykin
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
- M. V. Lomonosov Moscow State University, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
6
|
Stephens ZD, Hudson ME, Mainzer LS, Taschuk M, Weber MR, Iyer RK. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. PLoS One 2016; 11:e0167047. [PMID: 27893777 PMCID: PMC5125660 DOI: 10.1371/journal.pone.0167047] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 11/08/2016] [Indexed: 12/31/2022] Open
Abstract
An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads.
Collapse
Affiliation(s)
- Zachary D. Stephens
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
- * E-mail:
| | - Matthew E. Hudson
- Department of Crop Sciences, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
- Institute for Genomic Biology, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Liudmila S. Mainzer
- Institute for Genomic Biology, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
- National Center for Supercomputing Applications, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Morgan Taschuk
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Matthew R. Weber
- National Center for Supercomputing Applications, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Ravishankar K. Iyer
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
7
|
Sun L, Zhang Y, Zhang Z, Zheng Y, Du L, Zhu B. Preferential Protection of Genetic Fidelity within Open Chromatin by the Mismatch Repair Machinery. J Biol Chem 2016; 291:17692-705. [PMID: 27382058 DOI: 10.1074/jbc.m116.719971] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Indexed: 12/30/2022] Open
Abstract
Epigenetic systems are well known for the roles they play in regulating the differential expression of the same genome in different cell types. However, epigenetic systems can also directly impact genomic integrity by protecting genetic sequences. Using an experimental evolutionary approach, we studied rates of mutation in the fission yeast Schizosaccharomyces pombe strains that lacked genes encoding several epigenetic regulators or mismatch repair components. We report that loss of a functional mismatch repair pathway in S. pombe resulted in the preferential enrichment of mutations in euchromatin, indicating that the mismatch repair machinery preferentially protected genetic fidelity in euchromatin. This preference is probably determined by differences in the accessibility of chromatin at distinct chromatin regions, which is supported by our observations that chromatin accessibility positively correlated with mutation rates in S. pombe or human cancer samples with deficiencies in mismatch repair. Importantly, such positive correlation was not observed in S. pombe strains or human cancer samples with functional mismatch repair machinery.
Collapse
Affiliation(s)
- Lue Sun
- From the Tsinghua University-Peking University-National Institute of Biological Sciences Joint Graduate Program, School of Life Sciences, Tsinghua University, Beijing 100084, the National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, the National Institute of Biological Sciences, Beijing 102206, and
| | - Yan Zhang
- the National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101
| | - Zhuqiang Zhang
- the National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101
| | - Yong Zheng
- the National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101
| | - Lilin Du
- the National Institute of Biological Sciences, Beijing 102206, and
| | - Bing Zhu
- the National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, the College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
8
|
Natural Selection and Functional Potentials of Human Noncoding Elements Revealed by Analysis of Next Generation Sequencing Data. PLoS One 2015; 10:e0129023. [PMID: 26053627 PMCID: PMC4460046 DOI: 10.1371/journal.pone.0129023] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 05/04/2015] [Indexed: 11/19/2022] Open
Abstract
Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution.
Collapse
|
9
|
Abstract
Species survival depends on the faithful replication of genetic information, which is continually monitored and maintained by DNA repair pathways that correct replication errors and the thousands of lesions that arise daily from the inherent chemical lability of DNA and the effects of genotoxic agents. Nonetheless, neutrally evolving DNA (not under purifying selection) accumulates base substitutions with time (the neutral mutation rate). Thus, repair processes are not 100% efficient. The neutral mutation rate varies both between and within chromosomes. For example it is 10-50 fold higher at CpGs than at non-CpG positions. Interestingly, the neutral mutation rate at non-CpG sites is positively correlated with CpG content. Although the basis of this correlation was not immediately apparent, some bioinformatic results were consistent with the induction of non-CpG mutations by DNA repair at flanking CpG sites. Recent studies with a model system showed that in vivo repair of preformed lesions (mismatches, abasic sites, single stranded nicks) can in fact induce mutations in flanking DNA. Mismatch repair (MMR) is an essential component for repair-induced mutations, which can occur as distant as 5 kb from the introduced lesions. Most, but not all, mutations involved the C of TpCpN (G of NpGpA) which is the target sequence of the C-preferring single-stranded DNA specific APOBEC deaminases. APOBEC-mediated mutations are not limited to our model system: Recent studies by others showed that some tumors harbor mutations with the same signature, as can intermediates in RNA-guided endonuclease-mediated genome editing. APOBEC deaminases participate in normal physiological functions such as generating mutations that inactivate viruses or endogenous retrotransposons, or that enhance immunoglobulin diversity in B cells. The recruitment of normally physiological error-prone processes during DNA repair would have important implications for disease, aging and evolution. This perspective briefly reviews both the bioinformatic and biochemical literature relevant to repair-induced mutagenesis and discusses future directions required to understand the mechanistic basis of this process.
Collapse
Affiliation(s)
- Jia Chen
- School of Life Science and Technology, ShanghaiTech University, Building 8, 319 Yueyang Road, Shanghai 200031, China
| | - Anthony V Furano
- Section on Genomic Structure and Function, Laboratory of Cell and Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 8, Room 203, 8 Center Drive, MSC 0830, Bethesda, MD 20892-0830, USA.
| |
Collapse
|
10
|
Makova KD, Hardison RC. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 2015; 16:213-23. [PMID: 25732611 PMCID: PMC4500049 DOI: 10.1038/nrg3890] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The variation in local rates of mutations can affect both the evolution of genes and their function in normal and cancer cells. Deciphering the molecular determinants of this variation will be aided by the elucidation of distinct types of mutations, as they differ in regional preferences and in associations with genomic features. Chromatin organization contributes to regional variation in mutation rates, but its contribution differs among mutation types. In both germline and somatic mutations, base substitutions are more abundant in regions of closed chromatin, perhaps reflecting error accumulation late in replication. By contrast, a distinctive mutational state with very high levels of insertions and deletions (indels) and substitutions is enriched in regions of open chromatin. These associations indicate an intricate interplay between the nucleotide sequence of DNA and its dynamic packaging into chromatin, and have important implications for current biomedical research. This Review focuses on recent studies showing associations between chromatin state and mutation rates, including pairwise and multivariate investigations of germline and somatic (particularly cancer) mutations.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| |
Collapse
|
11
|
Zhu A, Guo W, Jain K, Mower JP. Unprecedented Heterogeneity in the Synonymous Substitution Rate within a Plant Genome. Mol Biol Evol 2014; 31:1228-36. [DOI: 10.1093/molbev/msu079] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
12
|
Segmenting the human genome based on states of neutral genetic divergence. Proc Natl Acad Sci U S A 2013; 110:14699-704. [PMID: 23959903 DOI: 10.1073/pnas.1221792110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.
Collapse
|
13
|
Characterization of bud emergence 46 (BEM46) protein: sequence, structural, phylogenetic and subcellular localization analyses. Biochem Biophys Res Commun 2013; 438:526-32. [PMID: 23916612 DOI: 10.1016/j.bbrc.2013.07.103] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 07/25/2013] [Indexed: 02/04/2023]
Abstract
The bud emergence 46 (BEM46) protein from Neurospora crassa belongs to the α/β-hydrolase superfamily. Recently, we have reported that the BEM46 protein is localized in the perinuclear ER and also forms spots close by the plasma membrane. The protein appears to be required for cell type-specific polarity formation in N. crassa. Furthermore, initial studies suggested that the BEM46 amino acid sequence is conserved in eukaryotes and is considered to be one of the widespread conserved "known unknown" eukaryotic genes. This warrants for a comprehensive phylogenetic analysis of this superfamily to unravel origin and molecular evolution of these genes in different eukaryotes. Herein, we observe that all eukaryotes have at least a single copy of a bem46 ortholog. Upon scanning of these proteins in various genomes, we find that there are expansions leading into several paralogs in vertebrates. Usingcomparative genomic analyses, we identified insertion/deletions (indels) in the conserved domain of BEM46 protein, which allow to differentiate fungal classes such as ascomycetes from basidiomycetes. We also find that exonic indels are able to differentiate BEM46 homologs of different eukaryotic lineage. Furthermore, we unravel that BEM46 protein from N. crassa possess a novel endoplasmic-retention signal (PEKK) using GFP-fusion tagging experiments. We propose that three residues namely a serine 188S, a histidine 292H and an aspartic acid 262D are most critical residues, forming a catalytic triad in BEM46 protein from N. crassa. We carried out a comprehensive study on bem46 genes from a molecular evolution perspective with combination of functional analyses. The evolutionary history of BEM46 proteins is characterized by exonic indels in lineage specific manner.
Collapse
|
14
|
Gossmann TI, Keightley PD, Eyre-Walker A. The effect of variation in the effective population size on the rate of adaptive molecular evolution in eukaryotes. Genome Biol Evol 2012; 4:658-67. [PMID: 22436998 PMCID: PMC3381672 DOI: 10.1093/gbe/evs027] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The role of adaptation is a fundamental question in molecular evolution. Theory predicts that species with large effective population sizes should undergo a higher rate of adaptive evolution than species with low effective population sizes if adaptation is limited by the supply of mutations. Previous analyses have appeared to support this conjecture because estimates of the proportion of nonsynonymous substitutions fixed by adaptive evolution, α, tend to be higher in species with large Ne. However, α is a function of both the number of advantageous and effectively neutral substitutions, either of which might depend on Ne. Here, we investigate the relationship between Ne and ωa, the rate of adaptive evolution relative to the rate of neutral evolution, using nucleotide polymorphism and divergence data from 13 independent pairs of eukaryotic species. We find a highly significant positive correlation between ωa and Ne. We also find some evidence that the rate of adaptive evolution varies between groups of organisms for a given Ne. The correlation between ωa and Ne does not appear to be an artifact of demographic change or selection on synonymous codon use. Our results suggest that adaptation is to some extent limited by the supply of mutations and that at least some adaptation depends on newly occurring mutations rather than on standing genetic variation. Finally, we show that the proportion of nearly neutral nonadaptive substitutions declines with increasing Ne. The low rate of adaptive evolution and the high proportion of effectively neutral substitution in species with small Ne are expected to combine to make it difficult to detect adaptive molecular evolution in species with small Ne.
Collapse
Affiliation(s)
- Toni I Gossmann
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | | | | |
Collapse
|
15
|
Ponting CP, Nellåker C, Meader S. Rapid turnover of functional sequence in human and other genomes. Annu Rev Genomics Hum Genet 2011; 12:275-99. [PMID: 21721940 DOI: 10.1146/annurev-genom-090810-183115] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The amount of a genome's sequence that is functional has been surprisingly difficult to estimate accurately. This has severely hindered analyses asking whether the amount of functional genomic sequence correlates with organismal complexity. Most studies estimate these amounts by considering nucleotide substitution rates within aligned sequences. These approaches show reduced power to identify sequence that is aligned, functional, and constrained only within narrowly defined phyla. The neutral indel model exploits insertions or deletions (indels) rather than substitutions in predicting functional sequence. Surprisingly, this method indicates that half of all functional sequence is specific to individual eutherian lineages. This review considers the rates at which coding or noncoding and functional or nonfunctional sequence changes among mammalian genomes. In contrast to the slow rate at which protein-coding sequence changes, functional noncoding sequence appears to change or be turned over at rapid rates in mammals.
Collapse
Affiliation(s)
- Chris P Ponting
- Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom.
| | | | | |
Collapse
|
16
|
Hodgkinson A, Chen Y, Eyre-Walker A. The large-scale distribution of somatic mutations in cancer genomes. Hum Mutat 2011; 33:136-43. [PMID: 21953857 DOI: 10.1002/humu.21616] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Accepted: 08/28/2011] [Indexed: 11/12/2022]
Abstract
Recently, the genome sequences from several cancers have been published, along with the genome from a noncancer tissue from the same individual, allowing the identification of new somatic mutations in the cancer. We show that there is significant variation in the density of mutations at the 1-Mb scale within three cancer genomes and that the density of mutations is correlated between them. We also demonstrate that the density of mutations is correlated to that in the germline, as measured by the divergence between humans and chimpanzees and humans and macaques. We show that the density of mutations is correlated to the guanine and cytosine (GC) conent, replication time, distance to telomere and centromere, gene density, and nucleosome occupancy in the cancer genomes. However, overall, all factors explain less than 40% of the variance in mutation density and each factor explains very little of the variance. We find that genes associated with cancer occupy regions of the genome with significantly lower mutation rates than the average. Finally, we show that the density of mutations varies at a 10-Mb and a chromosomal scale, but that the variation at these scales is weak.
Collapse
Affiliation(s)
- Alan Hodgkinson
- School of Life Sciences, University of Sussex, Brighton, United Kingdom.
| | | | | |
Collapse
|
17
|
Abstract
It has been known for many years that the mutation rate varies across the genome. However, only with the advent of large genomic data sets is the full extent of this variation becoming apparent. The mutation rate varies over many different scales, from adjacent sites to whole chromosomes, with the strongest variation seen at the smallest scales. Some of these patterns have clear mechanistic bases, but much of the rate variation remains unexplained, and some of it is deeply perplexing. Variation in the mutation rate has important implications in evolutionary biology and underexplored implications for our understanding of hereditary disease and cancer.
Collapse
|
18
|
Late replicating domains are highly recombining in females but have low male recombination rates: implications for isochore evolution. PLoS One 2011; 6:e24480. [PMID: 21949720 PMCID: PMC3176772 DOI: 10.1371/journal.pone.0024480] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 08/11/2011] [Indexed: 01/01/2023] Open
Abstract
In mammals sequences that are either late replicating or highly recombining have high rates of evolution at putatively neutral sites. As early replicating domains and highly recombining domains both tend to be GC rich we a priori expect these two variables to covary. If so, the relative contribution of either of these variables to the local neutral substitution rate might have been wrongly estimated owing to covariance with the other. Against our expectations, we find that sex-averaged recombination rates show little or no correlation with replication timing, suggesting that they are independent determinants of substitution rates. However, this result masks significant sex-specific complexity: late replicating domains tend to have high recombination rates in females but low recombination rates in males. That these trends are antagonistic explains why sex-averaged recombination is not correlated with replication timing. This unexpected result has several important implications. First, although both male and female recombination rates covary significantly with intronic substitution rates, the magnitude of this correlation is moderately underestimated for male recombination and slightly overestimated for female recombination, owing to covariance with replicating timing. Second, the result could explain why male recombination is strongly correlated with GC content but female recombination is not. If to explain the correlation between GC content and replication timing we suppose that late replication forces reduced GC content, then GC promotion by biased gene conversion during female recombination is partly countered by the antagonistic effect of later replicating sequence tending increase AT content. Indeed, the strength of the correlation between female recombination rate and local GC content is more than doubled by control for replication timing. Our results underpin the need to consider sex-specific recombination rates and potential covariates in analysis of GC content and rates of evolution.
Collapse
|
19
|
Abstract
Many evolutionary studies over the past decade have estimated α(sel), the proportion of all nucleotides in the human genome that are subject to purifying selection because of their biological function. Most of these studies have estimated the nucleotide substitution rates from genome sequence alignments across many diverse mammals. Some α(sel) estimates will be affected by the heterogeneity of substitution rates in neutral sequence across the genome. Most will also be inaccurate if change in the functional sequence repertoire occurs rapidly relative to the separation of lineages that are being compared. Evidence gathered from both evolutionary and experimental analyses now indicate that rates of "turnover" of functional, predominantly noncoding, sequence are, indeed, high. They are sufficiently high that an estimated 50% of mouse constrained noncoding sequence is predicted not to be shared with rat, a closely related rodent. The rapidity of turnover results in, at least, a twofold underestimate of α(sel) by analyses that measure constraint across the eutherian phylogeny. Approaches that take account of turnover estimate that the steady-state value of α(sel) lies between 10% and 15%. Experimental studies corroborate the predicted rates of loss and gain of noncoding functional sites. These studies show the limitations inherent in the use of deep sequence conservation for identifying functional sequence. Experimental investigations focusing on lineage-specific, noncoding, and functional sequence are now essential if we are to appreciate the complete functional repertoire of the human genome.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom.
| | | |
Collapse
|
20
|
Panchin AY, Mitrofanov SI, Alexeevski AV, Spirin SA, Panchin YV. New words in human mutagenesis. BMC Bioinformatics 2011; 12:268. [PMID: 21718472 PMCID: PMC3152918 DOI: 10.1186/1471-2105-12-268] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Accepted: 06/30/2011] [Indexed: 11/28/2022] Open
Abstract
Background The substitution rates within different nucleotide contexts are subject to varying levels of bias. The most well known example of such bias is the excess of C to T (C > T) mutations in CpG (CG) dinucleotides. The molecular mechanisms underlying this bias are important factors in human genome evolution and cancer development. The discovery of other nucleotide contexts that have profound effects on substitution rates can improve our understanding of how mutations are acquired, and why mutation hotspots exist. Results We compared rates of inherited mutations in 1-4 bp nucleotide contexts using reconstructed ancestral states of human single nucleotide polymorphisms (SNPs) from intergenic regions. Chimp and orangutan genomic sequences were used as outgroups. We uncovered 3.5 and 3.3-fold excesses of T > C mutations in the second position of ATTG and ATAG words, respectively, and a 3.4-fold excess of A > C mutations in the first position of the ACAA word. Conclusions Although all the observed biases are less pronounced than the 5.1-fold excess of C > T mutations in CG dinucleotides, the three 4 bp mutation contexts mentioned above (and their complementary contexts) are well distinguished from all other mutation contexts. This provides a challenge to discover the underlying mechanisms responsible for the observed excesses of mutations.
Collapse
Affiliation(s)
- Alexander Y Panchin
- Department of Bioengineering and Bioinformatics, Moscow State University, Vorbyevy Gory 1-73, Moscow, 119992, Russian Federation.
| | | | | | | | | |
Collapse
|
21
|
Mugal CF, Ellegren H. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol 2011; 12:R58. [PMID: 21696599 PMCID: PMC3218846 DOI: 10.1186/gb-2011-12-6-r58] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 05/04/2011] [Accepted: 06/22/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND A major goal in the study of molecular evolution is to unravel the mechanisms that induce variation in the germ line mutation rate and in the genome-wide mutation profile. The rate of germ line mutation is considerably higher for cytosines at CpG sites than for any other nucleotide in the human genome, an increase commonly attributed to cytosine methylation at CpG sites. The CpG mutation rate, however, is not uniform across the genome and, as methylation levels have recently been shown to vary throughout the genome, it has been hypothesized that methylation status may govern variation in the rate of CpG mutation. RESULTS Here, we use genome-wide methylation data from human sperm cells to investigate the impact of DNA methylation on the CpG substitution rate in introns of human genes. We find that there is a significant correlation between the extent of methylation and the substitution rate at CpG sites. Further, we show that the CpG substitution rate is positively correlated with non-CpG divergence, suggesting susceptibility to factors responsible for the general mutation rate in the genome, and negatively correlated with GC content. We only observe a minor contribution of gene expression level, while recombination rate appears to have no significant effect. CONCLUSIONS Our study provides the first direct empirical support for the hypothesis that variation in the level of germ line methylation contributes to substitution rate variation at CpG sites. Moreover, we show that other genomic features also impact on CpG substitution rate variation.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, Uppsala, Sweden
| | | |
Collapse
|
22
|
Ananda G, Chiaromonte F, Makova KD. A genome-wide view of mutation rate co-variation using multivariate analyses. Genome Biol 2011; 12:R27. [PMID: 21426544 PMCID: PMC3129677 DOI: 10.1186/gb-2011-12-3-r27] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Revised: 02/21/2011] [Accepted: 03/22/2011] [Indexed: 01/03/2023] Open
Abstract
Background While the abundance of available sequenced genomes has led to many studies of regional heterogeneity in mutation rates, the co-variation among rates of different mutation types remains largely unexplored, hindering a deeper understanding of mutagenesis and genome dynamics. Here, utilizing primate and rodent genomic alignments, we apply two multivariate analysis techniques (principal components and canonical correlations) to investigate the structure of rate co-variation for four mutation types and simultaneously explore the associations with multiple genomic features at different genomic scales and phylogenetic distances. Results We observe a consistent, largely linear co-variation among rates of nucleotide substitutions, small insertions and small deletions, with some non-linear associations detected among these rates on chromosome X and near autosomal telomeres. This co-variation appears to be shaped by a common set of genomic features, some previously investigated and some novel to this study (nuclear lamina binding sites, methylated non-CpG sites and nucleosome-free regions). Strong non-linear relationships are also detected among genomic features near the centromeres of large chromosomes. Microsatellite mutability co-varies with other mutation rates at finer scales, but not at 1 Mb, and shows varying degrees of association with genomic features at different scales. Conclusions Our results allow us to speculate about the role of different molecular mechanisms, such as replication, recombination, repair and local chromatin environment, in mutagenesis. The software tools developed for our analyses are available through Galaxy, an open-source genomics portal, to facilitate the use of multivariate techniques in future large-scale genomics studies.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Center for Medical Genomics, Penn State University, University Park, PA 16802, USA
| | | | | |
Collapse
|
23
|
Cooper DN, Ball EV, Mort M. Chromosomal distribution of disease genes in the human genome. Genet Test Mol Biomarkers 2010; 14:441-6. [PMID: 20642358 DOI: 10.1089/gtmb.2010.0081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Genes are nonrandomly distributed in the human genome, both within and between chromosomes. Thus, genes of similar function and common evolutionary origin are often clustered, as are genes with similar expression profiles. We now report that the >2400 genes known to underlie human monogenic inherited disease are non-randomly distributed in the genome over and above the general nonrandomness evident in the distribution of human genes. Further, a subset of 315 inherited disease genes subject to gross deletion was found to exhibit a degree of clustering that was twice that manifested by disease genes in general. The clustering of human disease genes is likely to have important implications for understanding the genotype-phenotype relationship in contiguous gene syndromes as well as those conditions characterized by multigene deletions or complex chromosomal rearrangements.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, United Kingdom.
| | | | | |
Collapse
|
24
|
Hodgkinson A, Eyre-Walker A. The genomic distribution and local context of coincident SNPs in human and chimpanzee. Genome Biol Evol 2010; 2:547-57. [PMID: 20675616 PMCID: PMC2997558 DOI: 10.1093/gbe/evq039] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
We have previously shown that there is an excess of sites that are polymorphic at orthologous positions in humans and chimpanzees and that this is most likely due to cryptic variation in the mutation rate. We showed that this might be a consequence of complex context effects since we found significant heterogeneity in triplet frequencies around coincident single nucleotide polymorphism (SNP) sites. Here, we show that the heterogeneity in triplet frequencies is not specifically associated with coincident SNPs but is instead driven by base composition bias around CpG dinucleotides. As a result, we suggest that cryptic variation in the mutation rate is truly cryptic, in the sense that the mutation rate does not appear to depend on any specific primary sequence context. Furthermore, we propose that the patterns around CpG dinucleotides are driven by the mutability of CpG dinucleotides in different DNA contexts. We also show that the genomic distribution of coincident SNPs is nonuniform and that there are some subtle differences between the distributions of single and coincident SNPs. Furthermore, we identify regions that contain high numbers of coincident SNPs and suggest that one in particular, a region containing the gene PRIM2, may be under balancing selection.
Collapse
Affiliation(s)
- Alan Hodgkinson
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom.
| | | |
Collapse
|
25
|
Abstract
The accumulation of base substitutions (mutations) not subject to natural selection is the neutral mutation rate. Because this rate reflects the in vivo processes involved in maintaining the integrity of genetic information, the factors that affect the neutral mutation rate are of considerable interest. Mammals exhibit two dramatically different neutral mutation rates: the CpG mutation rate, wherein the C of most CpGs (i.e., methyl-CpG) mutate at 10-50 times that of C in any other context or of any other base. The latter mutations constitute the non-CpG rate. The high CpG rate results from the spontaneous deamination of methyl-C to T and incomplete restoration of the ensuing T:G mismatches to C:Gs. Here, we determined the neutral non-CpG mutation rate as a function of CpG content by comparing sequence divergence of thousands of pairs of neutrally evolving chimpanzee and human orthologs that differ primarily in CpG content. Both the mutation rate and the mutational spectrum (transition/transversion ratio) of non-CpG residues change in parallel as sigmoidal (logistic) functions of CpG content. As different mechanisms generate transitions and transversions, these results indicate that both mutation rate and mutational processes are contingent on the local CpG content. We consider several possible mechanisms that might explain how CpG exerts these effects.
Collapse
Affiliation(s)
- Jean-Claude Walser
- Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes and Digestive and Kidney diseases, National Institutes of Health, Bethesda, Maryland 20892-0830, USA
| | | |
Collapse
|
26
|
Pink CJ, Hurst LD. Timing of replication is a determinant of neutral substitution rates but does not explain slow Y chromosome evolution in rodents. Mol Biol Evol 2009; 27:1077-86. [PMID: 20026481 DOI: 10.1093/molbev/msp314] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mutation rates, assayed as substitution rates of putatively neutral sites, are highly variable around mammalian genomes: There is heterogeneity between genes, between autosomes, and between X, Y, and autosomes. The differences between X, Y, and autosomes are typically assumed to reflect the greater number of cell divisions in the male germ-line. Such an effect can neither account for within-autosome differences nor does it predict the differences between X, Y, and autosome observed in rodents. It has recently been proposed that in primates, the time during S-phase when a gene is replicated is an important determinant of neutral rates of evolution. Here we ask 1) whether we can replicate this result in rodents, 2) whether different autosomes replicate on average at different times, and 3) whether this might explain differences in their substitution rates. Finally we ask 4) whether X, Y, and autosome replicate at different times and 5) whether any difference might explain why the number of replication events alone cannot explain their substitution rates. We find that, as in primates, autosomal intronic rates of evolution increase significantly during S-phase. Different autosomes do have different average replication times, and together with rearrangement, this is a significant predictor of between-autosome differences in substitution rate. Although we find that autosomal, X-, and Y-linked genes replicate at different times, it is paradoxical that the Y-linked genes replicate latest, and replicate more often, but are not especially fast evolving. These results support the hypothesis that replication timing is an important source of substitution rate heterogeneity.
Collapse
Affiliation(s)
- Catherine J Pink
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | | |
Collapse
|
27
|
Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, Sninsky JJ, Cargill M, Adams MD, Bustamante CD, Clark AG. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet 2009; 5:e1000592. [PMID: 19662163 PMCID: PMC2714078 DOI: 10.1371/journal.pgen.1000592] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Accepted: 07/10/2009] [Indexed: 01/30/2023] Open
Abstract
Analysis of polymorphism and divergence in the non-coding portion of the human genome yields crucial information about factors driving the evolution of gene regulation. Candidate cis-regulatory regions spanning more than 15,000 genes in 15 African Americans and 20 European Americans were re-sequenced and aligned to the chimpanzee genome in order to identify potentially functional polymorphism and to characterize and quantify departures from neutral evolution. Distortions of the site frequency spectra suggest a general pattern of selective constraint on conserved non-coding sites in the flanking regions of genes (CNCs). Moreover, there is an excess of fixed differences that cannot be explained by a Gamma model of deleterious fitness effects, suggesting the presence of positive selection on CNCs. Extensions of the McDonald-Kreitman test identified candidate cis-regulatory regions with high probabilities of positive and negative selection near many known human genes, the biological characteristics of which exhibit genome-wide trends that differ from patterns observed in protein-coding regions. Notably, there is a higher probability of positive selection in candidate cis-regulatory regions near genes expressed in the fetal brain, suggesting that a larger portion of adaptive regulatory changes has occurred in genes expressed during brain development. Overall we find that natural selection has played an important role in the evolution of candidate cis-regulatory regions throughout hominid evolution. It has been suggested that changes in gene expression may have played a more important role in the evolution of modern humans than changes in protein-coding sequences. In order to identify signatures of natural selection on candidate cis-regulatory regions, we examined single nucleotide polymorphisms obtained from the complete re-sequencing of conserved non-coding sites (CNCs) in the flanking regions of over 15,000 genes in 35 humans. Patterns of allele frequencies in CNCs indicate the presence of both positive and negative selection acting on standing variation within these candidate cis-regulatory regions, particularly for the 5′ and 3′ UTRs of genes. Gene-specific tests comparing levels of polymorphism and divergence identify several genes with strong signatures of selection on candidate cis-regulatory regions and suggest that the biological characteristics of genes subject to selection are different between coding and candidate cis-regulatory regions with respect to gene expression and function. For example, we find stronger signatures of positive selection in candidate cis-regulatory regions near genes expressed in the fetal brain, which we do not observe in a concurrent analysis on protein-coding regions. Our results suggest that both positive and negative selection have acted on candidate cis-regulatory regions and that the evolution of non-coding DNA has played an important role throughout hominid evolution.
Collapse
Affiliation(s)
- Dara G Torgerson
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
Each human carries a large number of deleterious mutations. Together, these mutations make a significant contribution to human disease. Identification of deleterious mutations within individual genome sequences could substantially impact an individual's health through personalized prevention and treatment of disease. Yet, distinguishing deleterious mutations from the massive number of nonfunctional variants that occur within a single genome is a considerable challenge. Using a comparative genomics data set of 32 vertebrate species we show that a likelihood ratio test (LRT) can accurately identify a subset of deleterious mutations that disrupt highly conserved amino acids within protein-coding sequences, which are likely to be unconditionally deleterious. The LRT is also able to identify known human disease alleles and performs as well as two commonly used heuristic methods, SIFT and PolyPhen. Application of the LRT to three human genomes reveals 796-837 deleterious mutations per individual, approximately 40% of which are estimated to be at <5% allele frequency. However, the overlap between predictions made by the LRT, SIFT, and PolyPhen, is low; 76% of predictions are unique to one of the three methods, and only 5% of predictions are shared across all three methods. Our results indicate that only a small subset of deleterious mutations can be reliably identified, but that this subset provides the raw material for personalized medicine.
Collapse
|
29
|
Li JB, Gao Y, Aach J, Zhang K, Kryukov GV, Xie B, Ahlford A, Yoon JK, Rosenbaum AM, Zaranek AW, LeProust E, Sunyaev SR, Church GM. Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. Genome Res 2009; 19:1606-15. [PMID: 19525355 DOI: 10.1101/gr.092213.109] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.
Collapse
Affiliation(s)
- Jin Billy Li
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Patrushev LI, Minkevich IG. The problem of the eukaryotic genome size. BIOCHEMISTRY (MOSCOW) 2009; 73:1519-52. [PMID: 19216716 DOI: 10.1134/s0006297908130117] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The current state of knowledge concerning the unsolved problem of the huge interspecific eukaryotic genome size variations not correlating with the species phenotypic complexity (C-value enigma also known as C-value paradox) is reviewed. Characteristic features of eukaryotic genome structure and molecular mechanisms that are the basis of genome size changes are examined in connection with the C-value enigma. It is emphasized that endogenous mutagens, including reactive oxygen species, create a constant nuclear environment where any genome evolves. An original quantitative model and general conception are proposed to explain the C-value enigma. In accordance with the theory, the noncoding sequences of the eukaryotic genome provide genes with global and differential protection against chemical mutagens and (in addition to the anti-mutagenesis and DNA repair systems) form a new, third system that protects eukaryotic genetic information. The joint action of these systems controls the spontaneous mutation rate in coding sequences of the eukaryotic genome. It is hypothesized that the genome size is inversely proportional to functional efficiency of the anti-mutagenesis and/or DNA repair systems in a particular biological species. In this connection, a model of eukaryotic genome evolution is proposed.
Collapse
Affiliation(s)
- L I Patrushev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia.
| | | |
Collapse
|
31
|
Imamura H, Karro JE, Chuang JH. Weak preservation of local neutral substitution rates across mammalian genomes. BMC Evol Biol 2009; 9:89. [PMID: 19416516 PMCID: PMC2689173 DOI: 10.1186/1471-2148-9-89] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/05/2009] [Indexed: 01/06/2023] Open
Abstract
Background The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species. Results We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate – from human/macaque alignments, rodent – from mouse/rat alignments, and laurasiatheria – from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r2 <5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates. Conclusion Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.
Collapse
Affiliation(s)
- Hideo Imamura
- Boston College, Department of Biology, Chestnut Hill, MA 02467, USA.
| | | | | |
Collapse
|
32
|
Hodgkinson A, Ladoukakis E, Eyre-Walker A. Cryptic variation in the human mutation rate. PLoS Biol 2009; 7:e1000027. [PMID: 19192947 PMCID: PMC2634788 DOI: 10.1371/journal.pbio.1000027] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 12/12/2008] [Indexed: 11/18/2022] Open
Abstract
The mutation rate is known to vary between adjacent sites within the human genome as a consequence of context, the most well-studied example being the influence of CpG dinucelotides. We investigated whether there is additional variation by testing whether there is an excess of sites at which both humans and chimpanzees have a single-nucleotide polymorphism (SNP). We found a highly significant excess of such sites, and we demonstrated that this excess is not due to neighbouring nucleotide effects, ancestral polymorphism, or natural selection. We therefore infer that there is cryptic variation in the mutation rate. However, although this variation in the mutation rate is not associated with the adjacent nucleotides, we show that there are highly nonrandom patterns of nucleotides that extend approximately 80 base pairs on either side of sites with coincident SNPs, suggesting that there are extensive and complex context effects. Finally, we estimate the level of variation needed to produce the excess of coincident SNPs and show that there is a similar, or higher, level of variation in the mutation rate associated with this cryptic process than there is associated with adjacent nucleotides, including the CpG effect. We conclude that there is substantial variation in the mutation that has, until now, been hidden from view.
Collapse
|
33
|
Schmidt S, Gerasimova A, Kondrashov FA, Adzuhbei IA, Kondrashov AS, Sunyaev S. Hypermutable non-synonymous sites are under stronger negative selection. PLoS Genet 2008; 4:e1000281. [PMID: 19043566 PMCID: PMC2583910 DOI: 10.1371/journal.pgen.1000281] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2007] [Accepted: 10/27/2008] [Indexed: 12/04/2022] Open
Abstract
Mutation rate varies greatly between nucleotide sites of the human genome and depends both on the global genomic location and the local sequence context of a site. In particular, CpG context elevates the mutation rate by an order of magnitude. Mutations also vary widely in their effect on the molecular function, phenotype, and fitness. Independence of the probability of occurrence of a new mutation's effect has been a fundamental premise in genetics. However, highly mutable contexts may be preserved by negative selection at important sites but destroyed by mutation at sites under no selection. Thus, there may be a positive correlation between the rate of mutations at a nucleotide site and the magnitude of their effect on fitness. We studied the impact of CpG context on the rate of human-chimpanzee divergence and on intrahuman nucleotide diversity at non-synonymous coding sites. We compared nucleotides that occupy identical positions within codons of identical amino acids and only differ by being within versus outside CpG context. Nucleotides within CpG context are under a stronger negative selection, as revealed by their lower, proportionally to the mutation rate, rate of evolution and nucleotide diversity. In particular, the probability of fixation of a non-synonymous transition at a CpG site is two times lower than at a CpG site. Thus, sites with different mutation rates are not necessarily selectively equivalent. This suggests that the mutation rate may complement sequence conservation as a characteristic predictive of functional importance of nucleotide sites.
Collapse
Affiliation(s)
- Steffen Schmidt
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biochemistry, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Anna Gerasimova
- Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Fyodor A. Kondrashov
- Section on Ecology, Behavior, and Evolution, Division of Biological Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Ivan A. Adzuhbei
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alexey S. Kondrashov
- Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
34
|
Khil PP, Camerini-Otero RD. Molecular Features and Functional Constraints in the Evolution of the Mammalian X Chromosome. Crit Rev Biochem Mol Biol 2008; 40:313-30. [PMID: 16338684 DOI: 10.1080/10409230500356703] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Recent advances in genomic sequencing of multiple organisms have fostered significant advances in our understanding of the evolution of the sex chromosomes. The integration of this newly available sequence information with functional data has facilitated a considerable refinement of our conceptual framework of the forces driving this evolution. Here we address multiple functional constraints that were encountered in the evolution of the X chromosome and the impact that this evolutionary history has had on its modern behavior.
Collapse
Affiliation(s)
- Pavel P Khil
- Genetics and Biochemistry Branch, National Institutes of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
35
|
Peifer M, Karro JE, von Grünberg HH. Is there an acceleration of the CpG transition rate during the mammalian radiation? Bioinformatics 2008; 24:2157-64. [PMID: 18662928 PMCID: PMC2553435 DOI: 10.1093/bioinformatics/btn391] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Revised: 07/27/2008] [Accepted: 07/27/2008] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In this article we build a model of the CpG dinucleotide substitution rate and use it to challenge the claim that, that rate underwent a sudden mammalian-specific increase approximately 90 million years ago. The evidence supporting this hypothesis comes from the application of a model of neutral substitution rates able to account for elevated CpG dinucleotide substitution rates. With the initial goal of improving that model's accuracy, we introduced a modification enabling us to account for boundary effects arising by the truncation of the Markov field, as well as improving the optimization procedure required for estimating the substitution rates. RESULTS When using this modified method to reproduce the supporting analysis, the evidence of the rate shift vanished. Our analysis suggests that the CpG-specific rate has been constant over the relevant time period and that the asserted acceleration of the CpG rate is likely an artifact of the original model.
Collapse
Affiliation(s)
- M Peifer
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria.
| | | | | |
Collapse
|
36
|
Gaffney DJ, Keightley PD. Effect of the assignment of ancestral CpG state on the estimation of nucleotide substitution rates in mammals. BMC Evol Biol 2008; 8:265. [PMID: 18826599 PMCID: PMC2576242 DOI: 10.1186/1471-2148-8-265] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2008] [Accepted: 09/30/2008] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Molecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately. Frequently, in alignments of two sequences, the division of sites into CpG and non-CpG classes is based simply on the presence or absence of a CpG dinucleotide in either sequence, a procedure that we refer to as CpG/non-CpG assignment. Although it likely that this procedure is biased, it is generally assumed that the bias is negligible if species are very closely related. RESULTS Using simulations of DNA sequence evolution we show that assignment of the ancestral CpG state based on the simple presence/absence of the CpG dinucleotide can seriously bias estimates of the substitution rate, because many true non-CpG changes are misassigned as CpG. Paradoxically, this bias is most severe between closely related species, because a minimum of two substitutions are required to misassign a true ancestral CpG site as non-CpG whereas only a single substitution is required to misassign a true ancestral non-CpG site as CpG in a two branch tree. We also show that CpG misassignment bias differentially affects fourfold degenerate and noncoding sites due to differences in base composition such that fourfold degenerate sites can appear to be evolving more slowly than noncoding sites. We demonstrate that the effects predicted by our simulations occur in a real evolutionary setting by comparing substitution rates estimated from human-chimp coding and intronic sequence using CpG/non-CpG assignment with estimates derived from a method that is largely free from bias. CONCLUSION Our study demonstrates that a common method of assigning sites into CpG and non CpG classes in pairwise alignments is seriously biased and recommends against the adoption of ad hoc methods of ancestral state assignment.
Collapse
Affiliation(s)
- Daniel J Gaffney
- McGill University and Genome Québec Innovation Centre, 740 ave Dr Penfield Rm 7208, Montréal (Québec), H3A 1A4, Canada
| | - Peter D Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK
| |
Collapse
|
37
|
Gaffney DJ, Blekhman R, Majewski J. Selective constraints in experimentally defined primate regulatory regions. PLoS Genet 2008; 4:e1000157. [PMID: 18704158 PMCID: PMC2490716 DOI: 10.1371/journal.pgen.1000157] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 07/09/2008] [Indexed: 11/18/2022] Open
Abstract
Changes in gene regulation may be important in evolution. However, the evolutionary properties of regulatory mutations are currently poorly understood. This is partly the result of an incomplete annotation of functional regulatory DNA in many species. For example, transcription factor binding sites (TFBSs), a major component of eukaryotic regulatory architecture, are typically short, degenerate, and therefore difficult to differentiate from randomly occurring, nonfunctional sequences. Furthermore, although sites such as TFBSs can be computationally predicted using evolutionary conservation as a criterion, estimates of the true level of selective constraint (defined as the fraction of strongly deleterious mutations occurring at a locus) in regulatory regions will, by definition, be upwardly biased in datasets that are a priori evolutionarily conserved. Here we investigate the fitness effects of regulatory mutations using two complementary datasets of human TFBSs that are likely to be relatively free of ascertainment bias with respect to evolutionary conservation but, importantly, are supported by experimental data. The first is a collection of almost >2,100 human TFBSs drawn from the literature in the TRANSFAC database, and the second is derived from several recent high-throughput chromatin immunoprecipitation coupled with genomic microarray (ChIP-chip) analyses. We also define a set of putative cis-regulatory modules (pCRMs) by spatially clustering multiple TFBSs that regulate the same gene. We find that a relatively high proportion ( approximately 37%) of mutations at TFBSs are strongly deleterious, similar to that at a 2-fold degenerate protein-coding site. However, constraint is significantly reduced in human and chimpanzee pCRMS and ChIP-chip sequences, relative to macaques. We estimate that the fraction of regulatory mutations that have been driven to fixation by positive selection in humans is not significantly different from zero. We also find that the level of selective constraint in our TFBSs, pCRMs, and ChIP-chip sequences is negatively correlated with the expression breadth of the regulated gene, whereas the opposite relationship holds at that gene's nonsynonymous and synonymous sites. Finally, we find that the rate of protein evolution in a transcription factor appears to be positively correlated with the breadth of expression of the gene it regulates. Our study suggests that strongly deleterious regulatory mutations are considerably more likely (1.6-fold) to occur in tissue-specific than in housekeeping genes, implying that there is a fitness cost to increasing "complexity" of gene expression.
Collapse
|
38
|
Fox AK, Tuch BB, Chuang JH. Measuring the prevalence of regional mutation rates: an analysis of silent substitutions in mammals, fungi, and insects. BMC Evol Biol 2008; 8:186. [PMID: 18588686 PMCID: PMC2447844 DOI: 10.1186/1471-2148-8-186] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Accepted: 06/27/2008] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The patterns of mutation vary both within and across genomes. It has been shown for a few mammals that mutation rates vary within the genome, while for unknown reasons, the sensu stricto yeasts have uniform rates instead. The generality of these observations has been unknown. Here we examine silent site substitutions in a more expansive set (20 mammals, 27 fungi, 4 insects) to determine why some genomes demonstrate this mosaic distribution and why others are uniform. RESULTS We applied several intragene and intergene correlation tests to measure regional substitution patterns. Assuming that silent sites are a reasonable approximation to neutrally mutating sequence, our results show that all multicellular eukaryotes exhibit mutational heterogeneity. In striking contrast, all fungi are mutationally uniform - with the exception of three Candida species: C. albicans, C. dubliniensis, and C. tropicalis. We speculate that aspects of replication timing may be responsible for distinguishing these species. Our analysis also reveals classes of genes whose silent sites behave anomalously with respect to the mutational background in many species, indicating prevalent selective pressures. Genes associated with nucleotide binding or gene regulation have consistently low silent substitution rates in every mammalian species, as well as multiple fungi. On the other hand, receptor genes repeatedly exhibit high silent substitution rates, suggesting they have been influenced by diversifying selection. CONCLUSION Our findings provide a framework for understanding the regional mutational properties of eukaryotes, revealing a sharp difference between fungi and multicellular species. They also elucidate common selective pressures acting on eukaryotic silent sites, with frequent evidence for both purifying and diversifying selection.
Collapse
Affiliation(s)
- Aleah K Fox
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA.
| | | | | |
Collapse
|
39
|
Tyekucheva S, Makova KD, Karro JE, Hardison RC, Miller W, Chiaromonte F. Human-macaque comparisons illuminate variation in neutral substitution rates. Genome Biol 2008; 9:R76. [PMID: 18447906 PMCID: PMC2643947 DOI: 10.1186/gb-2008-9-4-r76] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Revised: 04/04/2008] [Accepted: 04/30/2008] [Indexed: 11/10/2022] Open
Abstract
The evolutionary distance between human and macaque is particularly attractive for investigating neutral substitution rates, which were calculated as a function of a number of genomic parameters. Background The evolutionary distance between human and macaque is particularly attractive for investigating local variation in neutral substitution rates, because substitutions can be inferred more reliably than in comparisons with rodents and are less influenced by the effects of current and ancient diversity than in comparisons with closer primates. Here we investigate the human-macaque neutral substitution rate as a function of a number of genomic parameters. Results Using regression analyses we find that male mutation bias, male (but not female) recombination rate, distance to telomeres and substitution rates computed from orthologous regions in mouse-rat and dog-cow comparisons are prominent predictors of the neutral rate. Additionally, we demonstrate that the previously observed biphasic relationship between neutral rate and GC content can be accounted for by properly combining rates at CpG and non-CpG sites. Finally, we find the neutral rate to be negatively correlated with the densities of several classes of computationally predicted functional elements, and less so with the densities of certain classes of experimentally verified functional elements. Conclusion Our results suggest that while female recombination may be mainly responsible for driving evolution in GC content, male recombination may be mutagenic, and that other mutagenic mechanisms acting near telomeres, and mechanisms whose effects are shared across mammalian genomes, play significant roles. We also have evidence that the nonlinear increase in rates at high GC levels may be largely due to hyper-mutability of CpG dinucleotides. Finally, our results suggest that the performance of conservation-based prediction methods can be improved by accounting for neutral rates.
Collapse
Affiliation(s)
- Svitlana Tyekucheva
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | |
Collapse
|
40
|
Elango N, Kim SH, Vigoda E, Yi SV. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput Biol 2008; 4:e1000015. [PMID: 18463707 PMCID: PMC2265638 DOI: 10.1371/journal.pcbi.1000015] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 01/30/2008] [Indexed: 11/19/2022] Open
Abstract
Transitions at CpG dinucleotides, referred to as “CpG substitutions”, are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates. We show that CpG substitutions occur approximately 15 times more frequently than other single nucleotide substitutions in primate genomes, and that they exhibit substantial regional variation. Patterns of CpG rate variation are consistent with differences in methylation level and susceptibility to subsequent deamination. In particular, we propose a “distance-decaying” hypothesis, positing that due to the molecular mechanism of a CpG substitution, rates are correlated with the stability of double-stranded DNA surrounding each CpG dinucleotide, and the effect of local DNA stability may decrease with distance from the CpG dinucleotide. Consistent with our “distance-decaying” hypothesis, rates of CpG substitution are strongly (negatively) correlated with regional G+C content. The influence of G+C content decays as the distance from the target CpG site increases. We estimate that the influence of local G+C content extends up to 1,500∼2,000 bps centered on each CpG site. We also show that the distance-decaying relationship persisted when we controlled for the effect of long-range homogeneity of nucleotide composition. GpC sites, in contrast, do not exhibit such “distance-decaying” relationship. Our results highlight an example of the distinctive properties of methylation-dependent substitutions versus substitutions mostly arising from errors during DNA replication. Furthermore, the negative relationship between G+C content and CpG rates may provide an explanation for the observation that GC-rich SINEs show lower CpG rates than other repetitive elements. Mutations are raw materials of evolution. Earlier studies have shown that mutations occur at different frequencies in different genomic regions. By investigating the patterns and causes of such “regional” variation of mutations, we can better understand the mechanisms of underlying mutagenesis. In the human and other mammalian genomes, the most common type of mutation is caused by DNA methylation, which targets cytosines followed by guanine (CpG dinucleotides). Methylated cytosines are then subject to spontaneous deamination, which will cause a C to T (or G to A) transition (CpG substitution). Because this mutational process is unique to CpG substitutions, we reasoned that they might show different patterns of variability from other substitutions. Using high quality genomic sequences from primates and by separately analyzing variability of CpG substitutions and other substitutions, we demonstrate that CpG substitutions occur approximately 15 times more frequently than other substitutions, and show a distinctive pattern of regional variability. Particularly, we propose and provide evidence that because the deamination step requires temporary strand separation, G+C composition near 1,500–2,000 bps each direction from a target CpG affects the probability of a CpG substitution. Incorporating the difference in CpG and other substitutions discovered in this study will help build more realistic evolutionary models.
Collapse
Affiliation(s)
- Navin Elango
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Seong-Ho Kim
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - NISC Comparative Sequencing Program
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Eric Vigoda
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
41
|
Tanay A, Siggia ED. Sequence context affects the rate of short insertions and deletions in flies and primates. Genome Biol 2008; 9:R37. [PMID: 18291026 PMCID: PMC2374710 DOI: 10.1186/gb-2008-9-2-r37] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Revised: 09/25/2007] [Accepted: 02/21/2008] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Insertions and deletions (indels) are an important evolutionary force, making the evolutionary process more efficient and flexible by copying and removing genomic fragments of various lengths instead of rediscovering them by point mutations. As a mutational process, indels are known to be more active in specific sequences (like micro-satellites) but not much is known about the more general and mechanistic effect of sequence context on the insertion and deletion susceptibility of genomic loci. RESULTS Here we analyze a large collection of high confidence short insertions and deletions in primates and flies, revealing extensive correlations between sequence context and indel rates and building principled models for predicting these rates from sequence. According to our results, the rate of insertion or deletion of specific lengths can vary by more than 100-fold, depending on the surrounding sequence. These mutational biases can strongly influence the composition of the genome and the rate at which particular sequences appear. We exemplify this by showing how degenerate loci in human exons are selected to reduce their frame shifting indel propensity. CONCLUSION Insertions and deletions are strongly affected by sequence context. Consequentially, genomes must adapt to significant variation in the mutational input at indel-prone and indel-immune loci.
Collapse
Affiliation(s)
- Amos Tanay
- Center for Studies in Physics and Biology, The Rockefeller University, York Ave, New York, NY 10021, USA.
| | | |
Collapse
|
42
|
|
43
|
Karro JE, Peifer M, Hardison RC, Kollmann M, von Grünberg HH. Exponential decay of GC content detected by strand-symmetric substitution rates influences the evolution of isochore structure. Mol Biol Evol 2007; 25:362-74. [PMID: 18042807 DOI: 10.1093/molbev/msm261] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The distribution of guanine and cytosine nucleotides throughout a genome, or the GC content, is associated with numerous features in mammals; understanding the pattern and evolutionary history of GC content is crucial to our efforts to annotate the genome. The local GC content is decaying toward an equilibrium point, but the causes and rates of this decay, as well as the value of the equilibrium point, remain topics of debate. By comparing the results of 2 methods for estimating local substitution rates, we identify 620 Mb of the human genome in which the rates of the various types of nucleotide substitutions are the same on both strands. These strand-symmetric regions show an exponential decay of local GC content at a pace determined by local substitution rates. DNA segments subjected to higher rates experience disproportionately accelerated decay and are AT rich, whereas segments subjected to lower rates decay more slowly and are GC rich. Although we are unable to draw any conclusions about causal factors, the results support the hypothesis proposed by Khelifi A, Meunier J, Duret L, and Mouchiroud D (2006. GC content evolution of the human and mouse genomes: insights from the study of processed pseudogenes in regions of different recombination rates. J Mol Evol. 62:745-752.) that the isochore structure has been reshaped over time. If rate variation were a determining factor, then the current isochore structure of mammalian genomes could result from the local differences in substitution rates. We predict that under current conditions strand-symmetric portions of the human genome will stabilize at an average GC content of 30% (considerably less than the current 42%), thus confirming that the human genome has not yet reached equilibrium.
Collapse
Affiliation(s)
- J E Karro
- Department of Computer Science and Systems Analysis, Miami University, Ohio, USA.
| | | | | | | | | |
Collapse
|
44
|
Kelkar YD, Tyekucheva S, Chiaromonte F, Makova KD. The genome-wide determinants of human and chimpanzee microsatellite evolution. Genome Res 2007; 18:30-8. [PMID: 18032720 DOI: 10.1101/gr.7113408] [Citation(s) in RCA: 176] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mutation rates of microsatellites vary greatly among loci. The causes of this heterogeneity remain largely enigmatic yet are crucial for understanding numerous human neurological diseases and genetic instability in cancer. In this first genome-wide study, the relative contributions of intrinsic features and regional genomic factors to the variation in mutability among orthologous human-chimpanzee microsatellites are investigated with resampling and regression techniques. As a result, we uncover the intricacies of microsatellite mutagenesis as follows. First, intrinsic features (repeat number, length, and motif size), which all influence the probability and rate of slippage, are the strongest predictors of mutability. Second, mutability increases nonuniformly with length, suggesting that processes additional to slippage, such as faulty repair, contribute to mutations. Third, mutability varies among microsatellites with different motif composition likely due to dissimilarities in secondary DNA structure formed by their slippage intermediates. Fourth, mutability of mononucleotide microsatellites is impacted by their location on sex chromosomes vs. autosomes and inside vs. outside of Alu repeats, the former confirming the importance of replication and the latter suggesting a role for gene conversion. Fifth, transcription status and location in a particular isochore do not influence microsatellite mutability. Sixth, compared with intrinsic features, regional genomic factors have only minor effects. Finally, our regression models explain approximately 90% of variation in microsatellite mutability and can generate useful predictions for the studies of human diseases, forensics, and conservation genetics.
Collapse
Affiliation(s)
- Yogeshwar D Kelkar
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | | | |
Collapse
|
45
|
Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S. Analysis of sequence conservation at nucleotide resolution. PLoS Comput Biol 2007; 3:e254. [PMID: 18166073 PMCID: PMC2230682 DOI: 10.1371/journal.pcbi.0030254] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 11/13/2007] [Indexed: 12/02/2022] Open
Abstract
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved “chunks.” Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence. The structure of the human genome remains largely unknown, including which parts of the genome are functionally relevant and which parts are “junk.” The availability of genomic sequence from a large number of mammals allows a more detailed exploration of this structure, using comparison of related sequences from different species to identify portions of the genome that have remained unchanged, conserved by the action of natural selection, and thus likely to be functionally significant. To date, most efforts focused on localizing the functional fraction of the human genome have been based on identifying contiguous stretches of positions conserved in multiple species. Here, we present an analysis that is based instead on a single-position measure of conservation called SCONE. Our analysis suggests that the majority of conserved and putatively functional positions are highly fragmented and lie outside contiguous regions of conserved sequence. A subset of these fragmented positions may be identified based on local clustering.
Collapse
Affiliation(s)
- Saurabh Asthana
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mikhail Roytberg
- Computational Biology Group, Institute of Mathematical Problems in Biology, Russian Academy of Sciences, Pushchino, Russia
| | - John Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail: (SS), (JS)
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail: (SS), (JS)
| |
Collapse
|
46
|
Minkevich IG, Patrushev LI. [Genomic noncoding sequences and the size of eukaryotic cell nucleus as important factors of gene protection from chemical mutagens]. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2007; 33:474-7. [PMID: 17886440 DOI: 10.1134/s1068162007040115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
An improved quantitative model describing a protective function of eukaryotic genomic noncoding sequences was developed. In this new model, two factors affecting gene protection from chemical mutagens are considered: (1) the ratio of the total lengths of coding and noncoding genomic sequences and (2) the volume of the cell nucleus. An increase in the noncoding DNA in the genome reduces the number of mutagen-damaged nucleotides in the coding region, whereas an increase in the volume of the nucleus decreases the flow of mutagens per unit of nuclear volume that attacks its surface.
Collapse
|
47
|
Baer CF, Miyamoto MM, Denver DR. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet 2007; 8:619-31. [PMID: 17637734 DOI: 10.1038/nrg2158] [Citation(s) in RCA: 294] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A basic knowledge about mutation rates is central to our understanding of a myriad of evolutionary phenomena, including the maintenance of sex and rates of molecular evolution. Although there is substantial evidence that mutation rates vary among taxa, relatively little is known about the factors that underlie this variation at an empirical level, particularly in multicellular eukaryotes. Here we integrate several disparate lines of theoretical and empirical inquiry into a unified framework to guide future studies that are aimed at understanding why and how mutation rates evolve in multicellular species.
Collapse
Affiliation(s)
- Charles F Baer
- Department of Zoology, University of Florida, Gainesville, Florida 32611, USA.
| | | | | |
Collapse
|
48
|
Dreszer TR, Wall GD, Haussler D, Pollard KS. Biased clustered substitutions in the human genome: the footprints of male-driven biased gene conversion. Genome Res 2007; 17:1420-30. [PMID: 17785536 PMCID: PMC1987345 DOI: 10.1101/gr.6395807] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We examined fixed substitutions in the human lineage since divergence from the common ancestor with the chimpanzee, and determined what fraction are AT to GC (weak-to-strong). Substitutions that are densely clustered on the chromosomes show a remarkable excess of weak-to-strong "biased" substitutions. These unexpected biased clustered substitutions (UBCS) are common near the telomeres of all autosomes but not the sex chromosomes. Regions of extreme bias are enriched for genes. Human and chimp orthologous regions show a striking similarity in the shape and magnitude of their respective UBCS maps, suggesting a relatively stable force leads to clustered bias. The strong and stable signal near telomeres may have participated in the evolution of isochores. One exception to the UBCS pattern found in all autosomes is chromosome 2, which shows a UBCS peak midchromosome, mapping to the fusion site of two ancestral chromosomes. This provides evidence that the fusion occurred as recently as 740,000 years ago and no more than approximately 3 million years ago. No biased clustering was found in SNPs, suggesting that clusters of biased substitutions are selected from mutations. UBCS is strongly correlated with male (and not female) recombination rates, which explains the lack of UBCS signal on chromosome X. These observations support the hypothesis that biased gene conversion (BGC), specifically in the male germline, played a significant role in the evolution of the human genome.
Collapse
MESH Headings
- Animals
- Chromosomes, Human, Pair 2/genetics
- Chromosomes, Human, X/genetics
- Chromosomes, Human, Y/genetics
- Evolution, Molecular
- Female
- Gene Conversion
- Gene Fusion
- Genome, Human
- Humans
- Male
- Models, Genetic
- Pan troglodytes/genetics
- Polymorphism, Single Nucleotide
- Recombination, Genetic
- Sex Characteristics
- Species Specificity
- Telomere/genetics
- Time Factors
Collapse
Affiliation(s)
- Timothy R. Dreszer
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Gregory D. Wall
- Department of Statistics, University of California, Davis, California 95616, USA
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| | - Katherine S. Pollard
- Department of Statistics, University of California, Davis, California 95616, USA
- UC Davis Genome Center, University of California, Davis, California 95616, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| |
Collapse
|
49
|
Chuang JH, Li H. Similarity of synonymous substitution rates across mammalian genomes. J Mol Evol 2007; 65:236-48. [PMID: 17674075 DOI: 10.1007/s00239-007-9008-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Accepted: 04/04/2007] [Indexed: 11/29/2022]
Abstract
Given that a gene has a high (or low) synonymous substitution rate in one mammalian species, will it also have a high (or low) synonymous substitution rate in another mammalian species? Such similarities in the rate of synonymous substitution can reveal both selective pressures and neutral processes acting on mammalian gene sequences; however, the existence of such an effect has been a matter of disagreement. We resolve whether such synonymous substitution rate similarities exist using 7462 ortholog triplets aligned across rat, mouse, and human, a dataset two orders of magnitude larger than previous studies. We find that a gene's synonymous substitution rate in the rat-mouse branch of the phylogeny is correlated with its rate in the branch connecting human and the rat-mouse ancestor. We confirm this for several different measures of synonymous substitution rate, including corrections for base composition and CpG dinucleotides, and we verify the results in the larger mouse-human-rat-dog phylogeny. This similarity of rates is most apparent for genes in which synonymous sites are well conserved across species, suggesting that a significant component of the effect is due to purifying selection. We observe rate correlations at a resolution as fine as a few hundred kilobases, and the genes with the most similar synonymous substitution rates are enriched for regulatory functions. Genes with above-average substitution rates also exhibit significant, though somewhat weaker, rate correlations, suggesting that some neutral processes may have persisted in the phylogeny as well.
Collapse
Affiliation(s)
- Jeffrey H Chuang
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA.
| | | |
Collapse
|
50
|
A macaque's-eye view of human insertions and deletions: differences in mechanisms. PLoS Comput Biol 2007; 3:1772-82. [PMID: 17941704 PMCID: PMC1976337 DOI: 10.1371/journal.pcbi.0030176] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 07/26/2007] [Indexed: 11/19/2022] Open
Abstract
Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments.
Collapse
|