1
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. Genetics 2024; 226:iyae013. [PMID: 38298127 PMCID: PMC10990422 DOI: 10.1093/genetics/iyae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/11/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024] Open
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E Goldberg
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
2
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573131. [PMID: 38187618 PMCID: PMC10769404 DOI: 10.1101/2023.12.22.573131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E. Goldberg
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
| | - Michelle D. Noyes
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Howard Hughes Medical Institute, 3720 15 Ave NE, University of Washington, Seattle, WA, 98195
| | - Aaron R. Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
- These authors contributed equally to this work
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Computational Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109
- These authors contributed equally to this work
| |
Collapse
|
3
|
Teterina AA, Willis JH, Lukac M, Jovelin R, Cutter AD, Phillips PC. Genomic diversity landscapes in outcrossing and selfing Caenorhabditis nematodes. PLoS Genet 2023; 19:e1010879. [PMID: 37585484 PMCID: PMC10461856 DOI: 10.1371/journal.pgen.1010879] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 08/28/2023] [Accepted: 07/21/2023] [Indexed: 08/18/2023] Open
Abstract
Caenorhabditis nematodes form an excellent model for studying how the mode of reproduction affects genetic diversity, as some species reproduce via outcrossing whereas others can self-fertilize. Currently, chromosome-level patterns of diversity and recombination are only available for self-reproducing Caenorhabditis, making the generality of genomic patterns across the genus unclear given the profound potential influence of reproductive mode. Here we present a whole-genome diversity landscape, coupled with a new genetic map, for the outcrossing nematode C. remanei. We demonstrate that the genomic distribution of recombination in C. remanei, like the model nematode C. elegans, shows high recombination rates on chromosome arms and low rates toward the central regions. Patterns of genetic variation across the genome are also similar between these species, but differ dramatically in scale, being tenfold greater for C. remanei. Historical reconstructions of variation in effective population size over the past million generations echo this difference in polymorphism. Evolutionary simulations demonstrate how selection, recombination, mutation, and selfing shape variation along the genome, and that multiple drivers can produce patterns similar to those observed in natural populations. The results illustrate how genome organization and selection play a crucial role in shaping the genomic pattern of diversity whereas demographic processes scale the level of diversity across the genome as a whole.
Collapse
Affiliation(s)
- Anastasia A. Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
- Center of Parasitology, Severtsov Institute of Ecology and Evolution RAS, Moscow, Russia
| | - John H. Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Matt Lukac
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Asher D. Cutter
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Patrick C. Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
4
|
Gill SE, Chain FJJ. Very Low Rates of Spontaneous Gene Deletions and Gene Duplications in Dictyostelium discoideum. J Mol Evol 2023; 91:24-32. [PMID: 36484794 PMCID: PMC9849192 DOI: 10.1007/s00239-022-10081-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 12/02/2022] [Indexed: 12/13/2022]
Abstract
The study of spontaneous mutation rates has revealed a wide range of heritable point mutation rates across species, but there are comparatively few estimates for large-scale deletion and duplication rates. The handful of studies that have directly calculated spontaneous rates of deletion and duplication using mutation accumulation lines have estimated that genes are duplicated and deleted at orders of magnitude greater rates than the spontaneous point mutation rate. In our study, we tested whether spontaneous gene deletion and gene duplication rates are also high in Dictyostelium discoideum, a eukaryote with among the lowest point mutation rates (2.5 × 10-11 per site per generation) and an AT-rich genome (GC content of 22%). We calculated mutation rates of gene deletions and duplications using whole-genome sequencing data originating from a mutation accumulation experiment and determined the association between the copy number mutations and GC content. Overall, we estimated an average of 3.93 × 10-8 gene deletions and 1.18 × 10-8 gene duplications per gene per generation. While orders of magnitude greater than their point mutation rate, these rates are much lower compared to gene deletion and duplication rates estimated from mutation accumulation lines in other organisms (that are on the order of ~ 10-6 per gene/generation). The deletions and duplications were enriched in regions that were AT-rich even compared to the genomic background, in contrast to our expectations if low GC content was contributing to low mutation rates. The low deletion and duplication mutation rates in D. discoideum compared to other eukaryotes mirror their low point mutation rates, supporting previous work suggesting that this organism has high replication fidelity and effective molecular machinery to avoid the accumulation of mutations in their genome.
Collapse
Affiliation(s)
- Shelbi E Gill
- Department of Biology, University of Massachusetts Lowell, Lowell, MA, 01854-2874, USA.
| | - Frédéric J J Chain
- Department of Biology, University of Massachusetts Lowell, Lowell, MA, 01854-2874, USA.
| |
Collapse
|
5
|
Ho EKH, Bellis ES, Calkins J, Adrion JR, Latta IV LC, Schaack S. Engines of change: Transposable element mutation rates are high and variable within Daphnia magna. PLoS Genet 2021; 17:e1009827. [PMID: 34723969 PMCID: PMC8594854 DOI: 10.1371/journal.pgen.1009827] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 11/16/2021] [Accepted: 09/16/2021] [Indexed: 12/22/2022] Open
Abstract
Transposable elements (TEs) represent a major portion of most eukaryotic genomes, yet little is known about their mutation rates or how their activity is shaped by other evolutionary forces. Here, we compare short- and long-term patterns of genome-wide mutation accumulation (MA) of TEs among 9 genotypes from three populations of Daphnia magna from across a latitudinal gradient. While the overall proportion of the genome comprised of TEs is highly similar among genotypes from Finland, Germany, and Israel, populations are distinguishable based on patterns of insertion site polymorphism. Our direct rate estimates indicate TE movement is highly variable (net rates ranging from -11.98 to 12.79 x 10-5 per copy per generation among genotypes), differing both among populations and TE families. Although gains outnumber losses when selection is minimized, both types of events appear to be highly deleterious based on their low frequency in control lines where propagation is not limited to random, single-progeny descent. With rate estimates 4 orders of magnitude higher than base substitutions, TEs clearly represent a highly mutagenic force in the genome. Quantifying patterns of intra- and interspecific variation in TE mobility with and without selection provides insight into a powerful mechanism generating genetic variation in the genome.
Collapse
Affiliation(s)
- Eddie K. H. Ho
- Department of Biology, Reed College, Portland, Oregon, United States of America
| | - Emily S. Bellis
- Department of Biology, Reed College, Portland, Oregon, United States of America
- Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, United States of America
| | - Jaclyn Calkins
- Department of Biology, Reed College, Portland, Oregon, United States of America
- College of Human Medicine, Michigan State University, East Lansing, Michigan, United States of America
| | - Jeffrey R. Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Leigh C. Latta IV
- Department of Biology, Reed College, Portland, Oregon, United States of America
- Lewis-Clark State College, Lewiston, Idaho, United States of America
| | - Sarah Schaack
- Department of Biology, Reed College, Portland, Oregon, United States of America
| |
Collapse
|
6
|
Goldberg ME, Harris K. Mutational signatures of replication timing and epigenetic modification persist through the global divergence of mutation spectra across the great ape phylogeny. Genome Biol Evol 2021; 14:6275268. [PMID: 33983415 PMCID: PMC8743035 DOI: 10.1093/gbe/evab104] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/07/2021] [Indexed: 11/17/2022] Open
Abstract
Great ape clades exhibit variation in the relative mutation rates of different three-base-pair genomic motifs, with closely related species having more similar mutation spectra than distantly related species. This pattern cannot be explained by classical demographic or selective forces, but imply that DNA replication fidelity has been perturbed in different ways on each branch of the great ape phylogeny. Here, we use whole-genome variation from 88 great apes to investigate whether these species’ mutation spectra are broadly differentiated across the entire genome, or whether mutation spectrum differences are driven by DNA compartments that have particular functional features or chromatin states. We perform principal component analysis (PCA) and mutational signature deconvolution on mutation spectra ascertained from compartments defined by features including replication timing and ancient repeat content, finding evidence for consistent species-specific mutational signatures that do not depend on which functional compartments the spectra are ascertained from. At the same time, we find that many compartments have their own characteristic mutational signatures that appear stable across the great ape phylogeny. For example, in a mutation spectrum PCA compartmentalized by replication timing, the second principal component explaining 21.2% of variation separates all species’ late-replicating regions from their early-replicating regions. Our results suggest that great ape mutation spectrum evolution is not driven by epigenetic changes that modify mutation rates in specific genomic regions, but instead by trans-acting mutational modifiers that affect mutagenesis across the whole genome fairly uniformly.
Collapse
Affiliation(s)
- Michael E Goldberg
- University of Washington Department of Genome Sciences, 3720 15th Ave NE, Seattle WA 98105, United States of America
| | - Kelley Harris
- University of Washington Department of Genome Sciences, 3720 15th Ave NE, Seattle WA 98105, United States of America.,Fred Hutchinson Cancer Center Computational Biology Division, 1100 Fairview Ave N, Seattle, WA 98109, United States of America
| |
Collapse
|
7
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
8
|
Extreme differences between human germline and tumor mutation densities are driven by ancestral human-specific deviations. Nat Commun 2020; 11:2512. [PMID: 32427823 PMCID: PMC7237693 DOI: 10.1038/s41467-020-16296-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 04/22/2020] [Indexed: 12/29/2022] Open
Abstract
Mutations do not accumulate uniformly across the genome. Human germline and tumor mutation density correlate poorly, and each is associated with different genomic features. Here, we use non-human great ape (NHGA) germlines to determine human germline- and tumor-specific deviations from an ancestral-like great ape genome-wide mutational landscape. Strikingly, we find that the distribution of mutation densities in tumors presents a stronger correlation with NHGA than with human germlines. This effect is driven by human-specific differences in the distribution of mutations at non-CpG sites. We propose that ancestral human demographic events, together with the human-specific mutation slowdown, disrupted the human genome-wide distribution of mutation densities. Tumors partially recover this distribution by accumulating preneoplastic-like somatic mutations. Our results highlight the potential utility of using NHGA population data, rather than human controls, to establish the expected mutational background of healthy somatic cells.
Collapse
|
9
|
Supek F, Lehner B. Scales and mechanisms of somatic mutation rate variation across the human genome. DNA Repair (Amst) 2019; 81:102647. [PMID: 31307927 DOI: 10.1016/j.dnarep.2019.102647] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Cancer genome sequencing has revealed that somatic mutation rates vary substantially across the human genome and at scales from megabase-sized domains to individual nucleotides. Here we review recent work that has both revealed the major mutation biases that operate across the genome and the molecular mechanisms that cause them. The default mutation rate landscape in mammalian genomes results in active genes having low mutation rates because of a combination of factors that increase DNA repair: early DNA replication, transcription, active chromatin modifications and accessible chromatin. Therefore, either an increase in the global mutation rate or a redistribution of mutations from inactive to active DNA can increase the rate at which consequential mutations are acquired in active genes. Several environmental carcinogens and intrinsic mechanisms operating in tumor cells likely cause cancer by this second mechanism: by specifically increasing the mutation rate in active regions of the genome.
Collapse
Affiliation(s)
- Fran Supek
- Genome Data Science, Institut de Recerca Biomedica (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028, Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, 08010 Barcelona, Spain.
| | - Ben Lehner
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, 08010 Barcelona, Spain; Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Doctor Aiguader 88, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
10
|
Zhai Y, Alexandre BC. A Poissonian Model of Indel Rate Variation for Phylogenetic Tree Inference. Syst Biol 2018; 66:698-714. [PMID: 28204784 DOI: 10.1093/sysbio/syx033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 01/27/2017] [Indexed: 01/22/2023] Open
Abstract
While indel rate variation has been observed and analyzed in detail, it is not taken into account by current indel-aware phylogenetic reconstruction methods. In this work, we introduce a continuous time stochastic process, the geometric Poisson indel process, that generalizes the Poisson indel process by allowing insertion and deletion rates to vary across sites. We design an efficient algorithm for computing the probability of a given multiple sequence alignment based on our new indel model. We describe a method to construct phylogeny estimates from a fixed alignment using neighbor joining. Using simulation studies, we show that ignoring indel rate variation may have a detrimental effect on the accuracy of the inferred phylogenies, and that our proposed method can sidestep this issue by inferring latent indel rate categories. We also show that our phylogenetic inference method may be more stable to taxa subsampling than methods that either ignore indels or indel rate variation. [evolutionary stochastic process; indel rate variation; Poisson indel process; TKF91.].
Collapse
Affiliation(s)
- Yongliang Zhai
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Bouchard-Côté Alexandre
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| |
Collapse
|
11
|
Nuclear topology modulates the mutational landscapes of cancer genomes. Nat Struct Mol Biol 2017; 24:1000-1006. [PMID: 28967881 PMCID: PMC5744871 DOI: 10.1038/nsmb.3474] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 08/28/2017] [Indexed: 01/18/2023]
Abstract
Nuclear organization of genomic DNA affects DNA damage and repair processes, and yet its impact on mutational landscapes in cancer genomes remains unclear. Here we analyzed genome-wide somatic mutations from 366 samples of 6 cancer types. We found that lamina-associated regions, which are typically localized at the nuclear periphery, displayed higher somatic mutation frequencies compared to the inter-lamina regions at the nuclear core. This effect remained even after adjusting for features such as GC%, chromatin, and replication timing. Furthermore, mutational signatures differed between the nuclear core and periphery, indicating differences in the patterns of DNA damage and/or DNA repair processes. For instance, smoking and UV-related signatures were more enriched in the nuclear periphery. Substitutions at certain motifs were also more common in the nuclear periphery. Taken together, we found that the nuclear architecture influences mutational landscapes in cancer genomes beyond the effects already captured by chromatin and replication timing.
Collapse
|
12
|
Terekhanova NV, Seplyarskiy VB, Soldatov RA, Bazykin GA. Evolution of Local Mutation Rate and Its Determinants. Mol Biol Evol 2017; 34:1100-1109. [PMID: 28138076 PMCID: PMC5850301 DOI: 10.1093/molbev/msx060] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Mutation rate varies along the human genome, and part of this variation is explainable by measurable local properties of the DNA molecule. Moreover, mutation rates differ between orthologous genomic regions of different species, but the drivers of this change are unclear. Here, we use data on human divergence from chimpanzee, human rare polymorphism, and human de novo mutations to predict the substitution rate at orthologous regions of non-human mammals. We show that the local mutation rates are very similar between human and apes, implying that their variation has a strong underlying cryptic component not explainable by the known genomic features. Mutation rates become progressively less similar in more distant species, and these changes are partially explainable by changes in the local genomic features of orthologous regions, most importantly, in the recombination rate. However, they are much more rapid, implying that the cryptic component underlying the mutation rate is more ephemeral than the known genomic features. These findings shed light on the determinants of mutation rate evolution. Key words local mutation rate, molecular evolution, recombination rate.
Collapse
Affiliation(s)
- Nadezhda V. Terekhanova
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
- M. V. Lomonosov Moscow State University, Moscow, Russia
| | - Vladimir B. Seplyarskiy
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
| | - Ruslan A. Soldatov
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
- M. V. Lomonosov Moscow State University, Moscow, Russia
| | - Georgii A. Bazykin
- Sector for Molecular Evolution, Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia
- M. V. Lomonosov Moscow State University, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
13
|
Abstract
Events in primate evolution are often dated by assuming a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life history traits, suggesting it should also be found among primates. Motivated by these considerations, we analyze whole genomes from 10 primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs), and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are up to 64% higher in lineages leading from the hominoid-NWM ancestor to NWMs than to apes. Within apes, rates are ∼2% higher in chimpanzees and ∼7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however; in particular, transitions at CpG sites exhibit a more clocklike behavior than do other types, presumably because of their nonreplicative origin. Thus, not only the total rate, but also the mutational spectrum, varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate the human and chimpanzee divergence time is 12.1 million years, and the human and gorilla divergence time is 15.1 million years.
Collapse
|
14
|
Fungtammasan A, Tomaszkiewicz M, Campos-Sánchez R, Eckert KA, DeGiorgio M, Makova KD. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol 2016; 33:2744-58. [PMID: 27413049 PMCID: PMC5026258 DOI: 10.1093/molbev/msw139] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Rebeca Campos-Sánchez
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Kristin A Eckert
- Center for Medical Genomics, Pennsylvania State University Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Institute for CyberScience, Pennsylvania State University
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| |
Collapse
|
15
|
Renzette N, Kowalik TF, Jensen JD. On the relative roles of background selection and genetic hitchhiking in shaping human cytomegalovirus genetic diversity. Mol Ecol 2015. [PMID: 26211679 DOI: 10.1111/mec.13331] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
A central focus of population genetics has been examining the contribution of selective and neutral processes in shaping patterns of intraspecies diversity. In terms of selection specifically, surveys of higher organisms have shown considerable variation in the relative contributions of background selection and genetic hitchhiking in shaping the distribution of polymorphisms, although these analyses have rarely been extended to bacteria and viruses. Here, we study the evolution of a ubiquitous, viral pathogen, human cytomegalovirus (HCMV), by analysing the relationship among intraspecies diversity, interspecies divergence and rates of recombination. We show that there is a strong correlation between diversity and divergence, consistent with expectations of neutral evolution. However, after correcting for divergence, there remains a significant correlation between intraspecies diversity and recombination rates, with additional analyses suggesting that this correlation is largely due to the effects of background selection. In addition, a small number of loci, centred on long noncoding RNAs, also show evidence of selective sweeps. These data suggest that HCMV evolution is dominated by neutral mechanisms as well as background selection, expanding our understanding of linked selection to a novel class of organisms.
Collapse
Affiliation(s)
- Nicholas Renzette
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01655, USA
| | - Timothy F Kowalik
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01655, USA.,Immunology and Microbiology Program, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01655, USA
| | - Jeffrey D Jensen
- Swiss Institute of Bioinformatics (SIB), Lausanne, CH-1015, Switzerland.,School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, CH-1015, Switzerland
| |
Collapse
|
16
|
Plyler ZE, Hill AE, McAtee CW, Cui X, Moseley LA, Sorscher EJ. SNP Formation Bias in the Murine Genome Provides Evidence for Parallel Evolution. Genome Biol Evol 2015; 7:2506-19. [PMID: 26253317 PMCID: PMC4607513 DOI: 10.1093/gbe/evv150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this study, we show novel DNA motifs that promote single nucleotide polymorphism (SNP) formation and are conserved among exons, introns, and intergenic DNA from mice (Sanger Mouse Genomes Project), human genes (1000 Genomes), and tumor-specific somatic mutations (data from TCGA). We further characterize SNPs likely to be very recent in origin (i.e., formed in otherwise congenic mice) and show enrichment for both synonymous and parallel DNA variants occurring under circumstances not attributable to purifying selection. The findings provide insight regarding SNP contextual bias and eukaryotic codon usage as strategies that favor long-term exonic stability. The study also furnishes new information concerning rates of murine genomic evolution and features of DNA mutagenesis (at the time of SNP formation) that should be viewed as "adaptive."
Collapse
Affiliation(s)
| | - Aubrey E Hill
- Department of Computer and Information Sciences, University of Alabama at Birmingham
| | - Christopher W McAtee
- Gregory Fleming James Cystic Fibrosis Research Center, University of Alabama at Birmingham
| | - Xiangqin Cui
- Department of Biostatistics, University of Alabama at Birmingham
| | - Leah A Moseley
- Gregory Fleming James Cystic Fibrosis Research Center, University of Alabama at Birmingham
| | - Eric J Sorscher
- Department of Pediatrics, Emory University School of Medicine
| |
Collapse
|
17
|
Abstract
Human cytomegalovirus (HCMV) exhibits surprisingly high genomic diversity during natural infection although little is known about the limits or patterns of HCMV diversity among humans. To address this deficiency, we analyzed genomic diversity among congenitally infected infants. We show that there is an upper limit to HCMV genomic diversity in these patient samples, with ∼ 25% of the genome being devoid of polymorphisms. These low diversity regions were distributed across 26 loci that were preferentially located in DNA-processing genes. Furthermore, by developing, to our knowledge, the first genome-wide mutation and recombination rate maps for HCMV, we show that genomic diversity is positively correlated with these two rates. In contrast, median levels of viral genomic diversity did not vary between putatively single or mixed strain infections. We also provide evidence that HCMV populations isolated from vascular compartments of hosts from different continents are genetically similar and that polymorphisms in glycoproteins and regulatory proteins are enriched in these viral populations. This analysis provides the most highly detailed map of HCMV genomic diversity in human hosts to date and informs our understanding of the distribution of HCMV genomic diversity within human hosts.
Collapse
|
18
|
Abstract
Species survival depends on the faithful replication of genetic information, which is continually monitored and maintained by DNA repair pathways that correct replication errors and the thousands of lesions that arise daily from the inherent chemical lability of DNA and the effects of genotoxic agents. Nonetheless, neutrally evolving DNA (not under purifying selection) accumulates base substitutions with time (the neutral mutation rate). Thus, repair processes are not 100% efficient. The neutral mutation rate varies both between and within chromosomes. For example it is 10-50 fold higher at CpGs than at non-CpG positions. Interestingly, the neutral mutation rate at non-CpG sites is positively correlated with CpG content. Although the basis of this correlation was not immediately apparent, some bioinformatic results were consistent with the induction of non-CpG mutations by DNA repair at flanking CpG sites. Recent studies with a model system showed that in vivo repair of preformed lesions (mismatches, abasic sites, single stranded nicks) can in fact induce mutations in flanking DNA. Mismatch repair (MMR) is an essential component for repair-induced mutations, which can occur as distant as 5 kb from the introduced lesions. Most, but not all, mutations involved the C of TpCpN (G of NpGpA) which is the target sequence of the C-preferring single-stranded DNA specific APOBEC deaminases. APOBEC-mediated mutations are not limited to our model system: Recent studies by others showed that some tumors harbor mutations with the same signature, as can intermediates in RNA-guided endonuclease-mediated genome editing. APOBEC deaminases participate in normal physiological functions such as generating mutations that inactivate viruses or endogenous retrotransposons, or that enhance immunoglobulin diversity in B cells. The recruitment of normally physiological error-prone processes during DNA repair would have important implications for disease, aging and evolution. This perspective briefly reviews both the bioinformatic and biochemical literature relevant to repair-induced mutagenesis and discusses future directions required to understand the mechanistic basis of this process.
Collapse
Affiliation(s)
- Jia Chen
- School of Life Science and Technology, ShanghaiTech University, Building 8, 319 Yueyang Road, Shanghai 200031, China
| | - Anthony V Furano
- Section on Genomic Structure and Function, Laboratory of Cell and Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 8, Room 203, 8 Center Drive, MSC 0830, Bethesda, MD 20892-0830, USA.
| |
Collapse
|
19
|
Makova KD, Hardison RC. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 2015; 16:213-23. [PMID: 25732611 PMCID: PMC4500049 DOI: 10.1038/nrg3890] [Citation(s) in RCA: 160] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The variation in local rates of mutations can affect both the evolution of genes and their function in normal and cancer cells. Deciphering the molecular determinants of this variation will be aided by the elucidation of distinct types of mutations, as they differ in regional preferences and in associations with genomic features. Chromatin organization contributes to regional variation in mutation rates, but its contribution differs among mutation types. In both germline and somatic mutations, base substitutions are more abundant in regions of closed chromatin, perhaps reflecting error accumulation late in replication. By contrast, a distinctive mutational state with very high levels of insertions and deletions (indels) and substitutions is enriched in regions of open chromatin. These associations indicate an intricate interplay between the nucleotide sequence of DNA and its dynamic packaging into chromatin, and have important implications for current biomedical research. This Review focuses on recent studies showing associations between chromatin state and mutation rates, including pairwise and multivariate investigations of germline and somatic (particularly cancer) mutations.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| |
Collapse
|
20
|
Abstract
Mutational heterogeneity must be taken into account when reconstructing evolutionary histories, calibrating molecular clocks, and predicting links between genes and disease. Selective pressures and various DNA transactions have been invoked to explain the heterogeneous distribution of genetic variation between species, within populations, and in tissue-specific tumors. To examine relationships between such heterogeneity and variations in leading- and lagging-strand replication fidelity and mismatch repair, we accumulated 40,000 spontaneous mutations in eight diploid yeast strains in the absence of selective pressure. We found that replicase error rates vary by fork direction, coding state, nucleosome proximity, and sequence context. Further, error rates and DNA mismatch repair efficiency both vary by mismatch type, responsible polymerase, replication time, and replication origin proximity. Mutation patterns implicate replication infidelity as one driver of variation in somatic and germline evolution, suggest mechanisms of mutual modulation of genome stability and composition, and predict future observations in specific cancers.
Collapse
|
21
|
Chuang TJ, Chen FC. DNA methylation is associated with an increased level of conservation at nondegenerate nucleotides in mammals. Mol Biol Evol 2013; 31:387-96. [PMID: 24157417 PMCID: PMC3907051 DOI: 10.1093/molbev/mst208] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
DNA methylation at CpG dinucleotides can significantly increase the rate of cytosine-to-thymine mutations and the level of sequence divergence. Although the correlations between DNA methylation and genomic sequence evolution have been widely studied, an unaddressed yet fundamental question is how DNA methylation is associated with the conservation of individual nucleotides in different sequence contexts. Here, we demonstrate that in mammalian exons, the correlations between DNA methylation and the conservation of individual nucleotides are dependent on the type of exonic sequence (coding or untranslated), the degeneracy of coding nucleotides, background selection pressure, and the relative position (first or nonfirst exon in the transcript) where the nucleotides are located. For untranslated and nonzero-fold degenerate nucleotides, methylated sites are less conserved than unmethylated sites regardless of background selection pressure and the relative position of the exon. For zero-fold degenerate (or nondegenerate) nucleotides, however, the reverse trend is observed in nonfirst coding exons and first coding exons that are under stringent background selection pressure. Furthermore, cytosine-to-thymine mutations at methylated zero-fold degenerate nucleotides are predicted to be more detrimental than those that occur at unmethylated nucleotides. As zero-fold and nonzero-fold degenerate nucleotides are very close to each other, our results suggest that the "functional resolution" of DNA methylation may be finer than previously recognized. In addition, the positive correlation between CpG methylation and the level of conservation at zero-fold degenerate nucleotides implies that CpG methylation may serve as an "indicator" of functional importance of these nucleotides.
Collapse
Affiliation(s)
- Trees-Juen Chuang
- Physical and Computational Genomics Division, Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
22
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
23
|
Segmenting the human genome based on states of neutral genetic divergence. Proc Natl Acad Sci U S A 2013; 110:14699-704. [PMID: 23959903 DOI: 10.1073/pnas.1221792110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.
Collapse
|
24
|
Ananda G, Walsh E, Jacob KD, Krasilnikova M, Eckert KA, Chiaromonte F, Makova KD. Distinct mutational behaviors differentiate short tandem repeats from microsatellites in the human genome. Genome Biol Evol 2013; 5:606-20. [PMID: 23241442 PMCID: PMC3622297 DOI: 10.1093/gbe/evs116] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A tandem repeat's (TR) propensity to mutate increases with repeat number, and can become very pronounced beyond a critical boundary, transforming it into a microsatellite (MS). However, a clear understanding of the mutational behavior of different TR classes and motifs and related mechanisms is lacking, as is a consensus on the existence of a boundary separating short TRs (STRs) from MSs. This hinders our understanding of MSs' mutational properties and their effective use as genetic markers. Using indel calls for 179 individuals from 1000 Genomes Pilot-1 Project, we determined polymorphism incidence for four major TR classes, and formalized its varying relationship with repeat number using segmented regression. We observed a biphasic regime with a transition from a faster to a slower exponential growth at 9, 5, 4, and 4 repeats for mono-, di-, tri-, and tetranucleotide TRs, respectively. We used an in vitro mutagenesis assay to evaluate the contribution of strand slippage errors to mutability. STRs and MSs differ in their absolute polymorphism levels, but more importantly in their rates of mutability growth. Although strand slippage is a major factor driving mononucleotide polymorphism incidence, dinucleotide polymorphism incidence is greater than that expected due to strand slippage alone, indicating that additional cellular factors might be driving dinucleotide mutability in the human genome. Leveraging on hundreds of human genomes, we present the first comprehensive, genome-wide analysis of TR mutational behavior, encompassing several motif sizes and compositions.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, PA, USA
| | | | | | | | | | | | | |
Collapse
|
25
|
Nygren K, Wallberg A, Samils N, Stajich JE, Townsend JP, Karlsson M, Johannesson H. Analyses of expressed sequence tags in Neurospora reveal rapid evolution of genes associated with the early stages of sexual reproduction in fungi. BMC Evol Biol 2012. [PMID: 23186325 PMCID: PMC3571971 DOI: 10.1186/1471-2148-12-229] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Background The broadly accepted pattern of rapid evolution of reproductive genes is primarily based on studies of animal systems, although several examples of rapidly evolving genes involved in reproduction are found in diverse additional taxa. In fungi, genes involved in mate recognition have been found to evolve rapidly. However, the examples are too few to draw conclusions on a genome scale. Results In this study, we performed microarray hybridizations between RNA from sexual and vegetative tissues of two strains of the heterothallic (self-sterile) filamentous ascomycete Neurospora intermedia, to identify a set of sex-associated genes in this species. We aligned Expressed Sequence Tags (ESTs) from sexual and vegetative tissue of N. intermedia to orthologs from three closely related species: N. crassa, N. discreta and N. tetrasperma. The resulting four-species alignments provided a dataset for molecular evolutionary analyses. Our results confirm a general pattern of rapid evolution of fungal sex-associated genes, compared to control genes with constitutive expression or a high relative expression during vegetative growth. Among the rapidly evolving sex-associated genes, we identified candidates that could be of importance for mating or fruiting-body development. Analyses of five of these candidate genes from additional species of heterothallic Neurospora revealed that three of them evolve under positive selection. Conclusions Taken together, our study represents a novel finding of a genome-wide pattern of rapid evolution of sex-associated genes in the fungal kingdom, and provides a list of candidate genes important for reproductive isolation in Neurospora.
Collapse
Affiliation(s)
- Kristiina Nygren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, SE-752 36, Uppsala, Sweden
| | | | | | | | | | | | | |
Collapse
|
26
|
Minority of mammalian orthologs can be regarded as physiologically closest genes. Gene X 2012; 509:201-5. [DOI: 10.1016/j.gene.2012.08.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Revised: 07/31/2012] [Accepted: 08/19/2012] [Indexed: 11/18/2022] Open
|
27
|
Li Y, Zhang L, Ball RL, Liang X, Li J, Lin Z, Liang H. Comparative analysis of somatic copy-number alterations across different human cancer types reveals two distinct classes of breakpoint hotspots. Hum Mol Genet 2012; 21:4957-65. [PMID: 22899649 DOI: 10.1093/hmg/dds340] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Somatic copy-number alterations (SCNAs) play a crucial role in the development of human cancer. However, it is not well understood what evolutionary mechanisms contribute to the global patterns of SCNAs in cancer genomes. Taking advantage of data recently available through The Cancer Genome Atlas, we performed a systematic analysis on genome-wide SCNA breakpoint data for eight cancer types. First, we observed a high degree of overall similarity among the SCNA breakpoint landscapes of different cancer types. Then, we compiled 19 genomic features and evaluated their effects on the observed SCNA patterns. We found that evolutionary indel and substitution rates between species (i.e. humans and chimpanzees) consistently show the strongest correlations with breakpoint frequency among all the surveyed features; whereas the effects of some features are quite cancer-type dependent. Focusing on SCNA breakpoint hotspots, we found that cancer-type-specific breakpoint hotspots and common hotspots show distinct patterns. Cancer-type-specific hotspots are enriched with known cancer genes but are poorly predicted from genomic features; whereas common hotspots show the opposite patterns. This contrast suggests that explaining high-frequency SCNAs in cancer may require different evolutionary models: positive selection driven by cancer genes, and non-adaptive evolution related to an intrinsically unstable genomic context. Our results not only present a systematic view of the effects of genetic factors on genome-wide SCNA patterns, but also provide deep insights into the evolutionary process of SCNAs in cancer.
Collapse
Affiliation(s)
- Yudong Li
- Department of Bioengineering, School of Food Sciences and Biotechnology, Zhejiang Gongshang University, Hangzhou, PR China
| | | | | | | | | | | | | |
Collapse
|
28
|
Fungtammasan A, Walsh E, Chiaromonte F, Eckert KA, Makova KD. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome? Genome Res 2012; 22:993-1005. [PMID: 22456607 PMCID: PMC3371707 DOI: 10.1101/gr.134395.111] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Chromosomal common fragile sites (CFSs) are unstable genomic regions that break under replication stress and are involved in structural variation. They frequently are sites of chromosomal rearrangements in cancer and of viral integration. However, CFSs are undercharacterized at the molecular level and thus difficult to predict computationally. Newly available genome-wide profiling studies provide us with an unprecedented opportunity to associate CFSs with features of their local genomic contexts. Here, we contrasted the genomic landscape of cytogenetically defined aphidicolin-induced CFSs (aCFSs) to that of nonfragile sites, using multiple logistic regression. We also analyzed aCFS breakage frequencies as a function of their genomic landscape, using standard multiple regression. We show that local genomic features are effective predictors both of regions harboring aCFSs (explaining ∼77% of the deviance in logistic regression models) and of aCFS breakage frequencies (explaining ∼45% of the variance in standard regression models). In our optimal models (having highest explanatory power), aCFSs are predominantly located in G-negative chromosomal bands and away from centromeres, are enriched in Alu repeats, and have high DNA flexibility. In alternative models, CpG island density, transcription start site density, H3K4me1 coverage, and mononucleotide microsatellite coverage are significant predictors. Also, aCFSs have high fragility when colocated with evolutionarily conserved chromosomal breakpoints. Our models are predictive of the fragility of aCFSs mapped at a higher resolution. Importantly, the genomic features we identified here as significant predictors of fragility allow us to draw valuable inferences on the molecular mechanisms underlying aCFSs.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- The Integrative Biosciences Graduate Program, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | |
Collapse
|
29
|
Abstract
It has been known for many years that the mutation rate varies across the genome. However, only with the advent of large genomic data sets is the full extent of this variation becoming apparent. The mutation rate varies over many different scales, from adjacent sites to whole chromosomes, with the strongest variation seen at the smallest scales. Some of these patterns have clear mechanistic bases, but much of the rate variation remains unexplained, and some of it is deeply perplexing. Variation in the mutation rate has important implications in evolutionary biology and underexplored implications for our understanding of hereditary disease and cancer.
Collapse
|