1
|
Rodrigues MF, Kern AD, Ralph PL. Shared evolutionary processes shape landscapes of genomic variation in the great apes. Genetics 2024; 226:iyae006. [PMID: 38242701 PMCID: PMC10990428 DOI: 10.1093/genetics/iyae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 10/26/2023] [Accepted: 01/03/2024] [Indexed: 01/21/2024] Open
Abstract
For at least the past 5 decades, population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modeling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well-sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations, we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modeling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.
Collapse
Affiliation(s)
- Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
2
|
Bitter MC, Berardi S, Oken H, Huynh A, Schmidt P, Petrov DA. Continuously fluctuating selection reveals extreme granularity and parallelism of adaptive tracking. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.16.562586. [PMID: 37904939 PMCID: PMC10614893 DOI: 10.1101/2023.10.16.562586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Temporally fluctuating environmental conditions are a ubiquitous feature of natural habitats. Yet, how finely natural populations adaptively track fluctuating selection pressures via shifts in standing genetic variation is unknown. We generated high-frequency, genome-wide allele frequency data from a genetically diverse population of Drosophila melanogaster in extensively replicated field mesocosms from late June to mid-December, a period of ∼12 generations. Adaptation throughout the fundamental ecological phases of population expansion, peak density, and collapse was underpinned by extremely rapid, parallel changes in genomic variation across replicates. Yet, the dominant direction of selection fluctuated repeatedly, even within each of these ecological phases. Comparing patterns of allele frequency change to an independent dataset procured from the same experimental system demonstrated that the targets of selection are predictable across years. In concert, our results reveal fitness-relevance of standing variation that is likely to be masked by inference approaches based on static population sampling, or insufficiently resolved time-series data. We propose such fine-scaled temporally fluctuating selection may be an important force maintaining functional genetic variation in natural populations and an important stochastic force affecting levels of standing genetic variation genome-wide.
Collapse
|
3
|
Simon A, Coop G. The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. Proc Natl Acad Sci U S A 2024; 121:e2312377121. [PMID: 38363870 PMCID: PMC10907250 DOI: 10.1073/pnas.2312377121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 y, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
Collapse
Affiliation(s)
- Alexis Simon
- Center for Population Biology, University of California, Davis, CA95616
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA95616
- Department of Evolution and Ecology, University of California, Davis, CA95616
| |
Collapse
|
4
|
Zurita AMI, Kyriazis CC, Lohmueller KE. The impact of non-neutral synonymous mutations when inferring selection on non-synonymous mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.07.579314. [PMID: 38370782 PMCID: PMC10871344 DOI: 10.1101/2024.02.07.579314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The distribution of fitness effects (DFE) describes the proportions of new mutations that have different effects on reproductive fitness. Accurate measurements of the DFE are important because the DFE is a fundamental parameter in evolutionary genetics and has implications for our understanding of other phenomena like complex disease or inbreeding depression. Current computational methods to infer the DFE for nonsynonymous mutations from natural variation first estimate demographic parameters from synonymous variants to control for the effects of demography and background selection. Then, conditional on these parameters, the DFE is then inferred for nonsynonymous mutations. This approach relies on the assumption that synonymous variants are neutrally evolving. However, some evidence points toward synonymous mutations having measurable effects on fitness. To test whether selection on synonymous mutations affects inference of the DFE of nonsynonymous mutations, we simulated several possible models of selection on synonymous mutations using SLiM and attempted to recover the DFE of nonsynonymous mutations using Fit∂a∂i, a common method for DFE inference. Our results show that the presence of selection on synonymous variants leads to incorrect inferences of recent population growth. Furthermore, under certain parameter combinations, inferences of the DFE can have an inflated proportion of highly deleterious nonsynonymous mutations. However, this bias can be eliminated if the correct demographic parameters are used for DFE inference instead of the biased ones inferred from synonymous variants. Our work demonstrates how unmodeled selection on synonymous mutations may affect downstream inferences of the DFE.
Collapse
Affiliation(s)
- Aina Martinez I Zurita
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Christopher C Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA
| | - Kirk E Lohmueller
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, USA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, USA
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA
| |
Collapse
|
5
|
Simon A, Coop G. The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.11.548607. [PMID: 37503227 PMCID: PMC10370008 DOI: 10.1101/2023.07.11.548607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 years, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
Collapse
Affiliation(s)
- Alexis Simon
- Center for Population Biology, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| |
Collapse
|
6
|
Thomas GWC, Hughes JJ, Kumon T, Berv JS, Nordgren CE, Lampson M, Levine M, Searle JB, Good JM. The genomic landscape, causes, and consequences of extensive phylogenomic discordance in Old World mice and rats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.28.555178. [PMID: 37693498 PMCID: PMC10491188 DOI: 10.1101/2023.08.28.555178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
A species tree is a central concept in evolutionary biology whereby a single branching phylogeny reflects relationships among species. However, the phylogenies of different genomic regions often differ from the species tree. Although tree discordance is often widespread in phylogenomic studies, we still lack a clear understanding of how variation in phylogenetic patterns is shaped by genome biology or the extent to which discordance may compromise comparative studies. We characterized patterns of phylogenomic discordance across the murine rodents (Old World mice and rats) - a large and ecologically diverse group that gave rise to the mouse and rat model systems. Combining new linked-read genome assemblies for seven murine species with eleven published rodent genomes, we first used ultra-conserved elements (UCEs) to infer a robust species tree. We then used whole genomes to examine finer-scale patterns of discordance and found that phylogenies built from proximate chromosomal regions had similar phylogenies. However, there was no relationship between tree similarity and local recombination rates in house mice, suggesting that genetic linkage influences phylogenetic patterns over deeper timescales. This signal may be independent of contemporary recombination landscapes. We also detected a strong influence of linked selection whereby purifying selection at UCEs led to less discordance, while genes experiencing positive selection showed more discordant and variable phylogenetic signals. Finally, we show that assuming a single species tree can result in high error rates when testing for positive selection under different models. Collectively, our results highlight the complex relationship between phylogenetic inference and genome biology and underscore how failure to account for this complexity can mislead comparative genomic studies.
Collapse
Affiliation(s)
- Gregg W. C. Thomas
- Division of Biological Sciences, University of Montana, Missoula, MT, 59801
- Informatics Group, Harvard University, Cambridge, MA, 02138
| | - Jonathan J. Hughes
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853
- Department of Evolution, Ecology, and Organismal Biology, University of California Riverside, Riverside, CA, 92521
| | - Tomohiro Kumon
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Jacob S. Berv
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109
| | - C. Erik Nordgren
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Michael Lampson
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Mia Levine
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Jeremy B. Searle
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853
| | - Jeffrey M. Good
- Division of Biological Sciences, University of Montana, Missoula, MT, 59801
| |
Collapse
|
7
|
Winbush A, Singh ND. Variation in fine-scale recombination rate in temperature-evolved Drosophila melanogaster populations in response to selection. G3 GENES|GENOMES|GENETICS 2022; 12:6663992. [PMID: 35961026 PMCID: PMC9526048 DOI: 10.1093/g3journal/jkac208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/05/2022] [Indexed: 11/16/2022]
Abstract
Meiotic recombination plays a critical evolutionary role in maintaining fitness in response to selective pressures due to changing environments. Variation in recombination rate has been observed amongst and between species and populations and within genomes across numerous taxa. Studies have demonstrated a link between changes in recombination rate and selection, but the extent to which fine-scale recombination rate varies between evolved populations during the evolutionary period in response to selection is under active research. Here, we utilize a set of 3 temperature-evolved Drosophila melanogaster populations that were shown to have diverged in several phenotypes, including recombination rate, based on the temperature regime in which they evolved. Using whole-genome sequencing data from these populations, we generated linkage disequilibrium-based fine-scale recombination maps for each population. With these maps, we compare recombination rates and patterns among the 3 populations and show that they have diverged at fine scales but are conserved at broader scales. We further demonstrate a correlation between recombination rates and genomic variation in the 3 populations. Lastly, we show variation in localized regions of enhanced recombination rates, termed warm spots, between the populations with these warm spots and associated genes overlapping areas previously shown to have diverged in the 3 populations due to selection. These data support the existence of recombination modifiers in these populations which are subject to selection during evolutionary change.
Collapse
Affiliation(s)
- Ari Winbush
- Department of Biology, Institute of Ecology and Evolution, University of Oregon , Eugene, OR 97403, USA
| | - Nadia D Singh
- Department of Biology, Institute of Ecology and Evolution, University of Oregon , Eugene, OR 97403, USA
| |
Collapse
|
8
|
Booker TR, Payseur BA, Tigano A. Background selection under evolving recombination rates. Proc Biol Sci 2022; 289:20220782. [PMID: 35730151 PMCID: PMC9233929 DOI: 10.1098/rspb.2022.0782] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Background selection (BGS), the effect that purifying selection exerts on sites linked to deleterious alleles, is expected to be ubiquitous across eukaryotic genomes. The effects of BGS reflect the interplay of the rates and fitness effects of deleterious mutations with recombination. A fundamental assumption of BGS models is that recombination rates are invariant over time. However, in some lineages, recombination rates evolve rapidly, violating this central assumption. Here, we investigate how recombination rate evolution affects genetic variation under BGS. We show that recombination rate evolution modifies the effects of BGS in a manner similar to a localized change in the effective population size, potentially leading to underestimation or overestimation of the genome-wide effects of selection. Furthermore, we find evidence that recombination rate evolution in the ancestors of modern house mice may have impacted inferences of the genome-wide effects of selection in that species.
Collapse
Affiliation(s)
- Tom R. Booker
- Department of Zoology, University of British Columbia, Vancouver Campus, Vancouver, BC, Canada
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin - Madison, Madison, WI, USA
| | - Anna Tigano
- Department of Biology, University of British Columbia, Okanagan Campus, Kelowna, BC, Canada
| |
Collapse
|
9
|
Dilber E, Terhorst J. Robust detection of natural selection using a probabilistic model of tree imbalance. Genetics 2022; 220:6511494. [PMID: 35100408 PMCID: PMC8893258 DOI: 10.1093/genetics/iyac009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 12/16/2021] [Indexed: 01/21/2023] Open
Abstract
Neutrality tests such as Tajima's D and Fay and Wu's H are standard implements in the population genetics toolbox. One of their most common uses is to scan the genome for signals of natural selection. However, it is well understood that D and H are confounded by other evolutionary forces-in particular, population expansion-that may be unrelated to selection. Because they are not model-based, it is not clear how to deconfound these tests in a principled way. In this article, we derive new likelihood-based methods for detecting natural selection, which are robust to fluctuations in effective population size. At the core of our method is a novel probabilistic model of tree imbalance, which generalizes Kingman's coalescent to allow certain aberrant tree topologies to arise more frequently than is expected under neutrality. We derive a frequency spectrum-based estimator that can be used in place of D, and also extend to the case where genealogies are first estimated. We benchmark our methods on real and simulated data, and provide an open source software implementation.
Collapse
Affiliation(s)
- Enes Dilber
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA,Corresponding author: Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
10
|
Gompert Z, Feder JL, Nosil P. Natural selection drives genome-wide evolution via chance genetic associations. Mol Ecol 2021; 31:467-481. [PMID: 34704650 DOI: 10.1111/mec.16247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 10/13/2021] [Accepted: 10/15/2021] [Indexed: 11/29/2022]
Abstract
Understanding selection's impact on the genome is a major theme in biology. Functionally neutral genetic regions can be affected indirectly by natural selection, via their statistical association with genes under direct selection. The genomic extent of such indirect selection, particularly across loci not physically linked to those under direct selection, remains poorly understood, as does the time scale at which indirect selection occurs. Here, we use field experiments and genomic data in stick insects, deer mice and stickleback fish to show that widespread statistical associations with genes known to affect fitness cause many genetic loci across the genome to be impacted indirectly by selection. This includes regions physically distant from those directly under selection. Then, focusing on the stick insect system, we show that statistical associations between SNPs and other unknown, causal variants result in additional indirect selection in general and specifically within genomic regions of physically linked loci. This widespread indirect selection necessarily makes aspects of evolution more predictable. Thus, natural selection combines with chance genetic associations to affect genome-wide evolution across linked and unlinked loci and even in modest-sized populations. This process has implications for the application of evolutionary principles in basic and applied science.
Collapse
Affiliation(s)
- Zachariah Gompert
- Department of Biology, Utah State University, Logan, Utah, USA.,Ecology Center, Utah State University, Logan, Utah, USA
| | - Jeffrey L Feder
- Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana, USA
| | - Patrik Nosil
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Univ Paul Valéry Montpellier 3, Montpellier, France
| |
Collapse
|
11
|
Buffalo V. Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin's Paradox. eLife 2021; 10:e67509. [PMID: 34409937 PMCID: PMC8486380 DOI: 10.7554/elife.67509] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 08/16/2021] [Indexed: 12/21/2022] Open
Abstract
Neutral theory predicts that genetic diversity increases with population size, yet observed levels of diversity across metazoans vary only two orders of magnitude while population sizes vary over several. This unexpectedly narrow range of diversity is known as Lewontin's Paradox of Variation (1974). While some have suggested selection constrains diversity, tests of this hypothesis seem to fall short. Here, I revisit Lewontin's Paradox to assess whether current models of linked selection are capable of reducing diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine previously-published estimates of pairwise diversity from 172 metazoan taxa with newly derived estimates of census sizes. Using phylogenetic comparative methods, I show this relationship is significant accounting for phylogeny, but with high phylogenetic signal and evidence that some lineages experience shifts in the evolutionary rate of diversity deep in the past. Additionally, I find a negative relationship between recombination map length and census size, suggesting abundant species have less recombination and experience greater reductions in diversity due to linked selection. However, I show that even assuming strong and abundant selection, models of linked selection are unlikely to explain the observed relationship between diversity and census sizes across species.
Collapse
Affiliation(s)
- Vince Buffalo
- Institute for Ecology and Evolution, University of OregonEugeneUnited States
| |
Collapse
|
12
|
Mathur S, DeWoody JA. Genetic load has potential in large populations but is realized in small inbred populations. Evol Appl 2021; 14:1540-1557. [PMID: 34178103 PMCID: PMC8210801 DOI: 10.1111/eva.13216] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 02/25/2021] [Accepted: 03/02/2021] [Indexed: 12/20/2022] Open
Abstract
Populations with higher genetic diversity and larger effective sizes have greater evolutionary capacity (i.e., adaptive potential) to respond to ecological stressors. We are interested in how the variation captured in protein-coding genes fluctuates relative to overall genomic diversity and whether smaller populations suffer greater costs due to their genetic load of deleterious mutations compared with larger populations. We analyzed individual whole-genome sequences (N = 74) from three different populations of Montezuma quail (Cyrtonyx montezumae), a small ground-dwelling bird that is sustainably harvested in some portions of its range but is of conservation concern elsewhere. Our historical demographic results indicate that Montezuma quail populations in the United States exhibit low levels of genomic diversity due in large part to long-term declines in effective population sizes over nearly a million years. The smaller and more isolated Texas population is significantly more inbred than the large Arizona and the intermediate-sized New Mexico populations we surveyed. The Texas gene pool has a significantly smaller proportion of strongly deleterious variants segregating in the population compared with the larger Arizona gene pool. Our results demonstrate that even in small populations, highly deleterious mutations are effectively purged and/or lost due to drift. However, we find that in small populations the realized genetic load is elevated because of inbreeding coupled with a higher frequency of slightly deleterious mutations that are manifested in homozygotes. Overall, our study illustrates how population genomics can be used to proactively assess both neutral and functional aspects of contemporary genetic diversity in a conservation framework while simultaneously considering deeper demographic histories.
Collapse
Affiliation(s)
- Samarth Mathur
- Department of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Present address:
Department of Evolution, Ecology and Organismal BiologyThe Ohio State UniversityColumbusOhioUSA
| | - J. Andrew DeWoody
- Department of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Department of Forestry and Natural ResourcesPurdue UniversityWest LafayetteIndianaUSA
| |
Collapse
|
13
|
Hill T, Unckless RL. Adaptation, ancestral variation and gene flow in a 'Sky Island' Drosophila species. Mol Ecol 2021; 30:83-99. [PMID: 33089581 PMCID: PMC7945764 DOI: 10.1111/mec.15701] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 09/28/2020] [Accepted: 10/08/2020] [Indexed: 02/06/2023]
Abstract
Over time, populations of species can expand, contract, fragment and become isolated, creating subpopulations that must adapt to local conditions. Understanding how species maintain variation after divergence as well as adapt to these changes in the face of gene flow is of great interest, especially as the current climate crisis has caused range shifts and frequent migrations for many species. Here, we characterize how a mycophageous fly species, Drosophila innubila, came to inhabit and adapt to its current range which includes mountain forests in south-western USA separated by large expanses of desert. Using population genomic data from more than 300 wild-caught individuals, we examine four populations to determine their population history in these mountain forests, looking for signatures of local adaptation. In this first extensive study, establishing D. innubila as a key genomic "Sky Island" model, we find D. innubila spread northwards during the previous glaciation period (30-100 KYA) and have recently expanded even further (0.2-2 KYA). D. innubila shows little evidence of population structure, consistent with a recent establishment and genetic variation maintained since before geographic stratification. We also find some signatures of recent selective sweeps in chorion proteins and population differentiation in antifungal immune genes suggesting differences in the environments to which flies are adapting. However, we find little support for long-term recurrent selection in these genes. In contrast, we find evidence of long-term recurrent positive selection in immune pathways such as the Toll signalling system and the Toll-regulated antimicrobial peptides.
Collapse
Affiliation(s)
- Tom Hill
- 4055 Haworth Hall, The Department of Molecular Biosciences, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS 66045
| | - Robert L. Unckless
- 4055 Haworth Hall, The Department of Molecular Biosciences, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS 66045
| |
Collapse
|
14
|
Kartje ME, Jing P, Payseur BA. Weak Correlation between Nucleotide Variation and Recombination Rate across the House Mouse Genome. Genome Biol Evol 2020; 12:293-299. [PMID: 32108880 PMCID: PMC7186785 DOI: 10.1093/gbe/evaa045] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2020] [Indexed: 01/01/2023] Open
Abstract
Positive selection and purifying selection reduce levels of variation at linked neutral loci. One consequence of these processes is that the amount of neutral diversity and the meiotic recombination rate are predicted to be positively correlated across the genome-a prediction met in some species but not others. To better document the prevalence of selection at linked sites, we used new and published whole-genome sequences to survey nucleotide variation in population samples of the western European house mouse (Mus musculus domesticus) from Germany, France, and Gough Island, a remote volcanic island in the south Atlantic. Correlations between sequence variation and recombination rates estimated independently from dense linkage maps were consistently very weak (ρ ≤ 0.06), though they exceeded conventional significance thresholds. This pattern persisted in comparisons between genomic regions with the highest and lowest recombination rates, as well as in models incorporating the density of transcribed sites, the density of CpG dinucleotides, and divergence between mouse and rat as covariates. We conclude that natural selection affects linked neutral variation in a restricted manner in the western European house mouse.
Collapse
Affiliation(s)
- Michael E Kartje
- Laboratory of Genetics, University of Wisconsin – Madison, Madison
| | - Peicheng Jing
- Laboratory of Genetics, University of Wisconsin – Madison, Madison
| | - Bret A Payseur
- Laboratory of Genetics, University of Wisconsin – Madison, Madison
| |
Collapse
|
15
|
Woerner AE, Veeramah KR, Watkins JC, Hammer MF. The Role of Phylogenetically Conserved Elements in Shaping Patterns of Human Genomic Diversity. Mol Biol Evol 2020; 35:2284-2295. [PMID: 30113695 DOI: 10.1093/molbev/msy145] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Evolutionary genetic studies have shown a positive correlation between levels of nucleotide diversity and either rates of recombination or genetic distance to genes. Both positive-directional and purifying selection have been offered as the source of these correlations via genetic hitchhiking and background selection, respectively. Phylogenetically conserved elements (CEs) are short (∼100 bp), widely distributed (comprising ∼5% of genome), sequences that are often found far from genes. While the function of many CEs is unknown, CEs also are associated with reduced diversity at linked sites. Using high coverage (>80×) whole genome data from two human populations, the Yoruba and the CEU, we perform fine scale evaluations of diversity, rates of recombination, and linkage to genes. We find that the local rate of recombination has a stronger effect on levels of diversity than linkage to genes, and that these effects of recombination persist even in regions far from genes. Our whole genome modeling demonstrates that, rather than recombination or GC-biased gene conversion, selection on sites within or linked to CEs better explains the observed genomic diversity patterns. A major implication is that very few sites in the human genome are predicted to be free of the effects of selection. These sites, which we refer to as the human "neutralome," comprise only 1.2% of the autosomes and 5.1% of the X chromosome. Demographic analysis of the neutralome reveals larger population sizes and lower rates of growth for ancestral human populations than inferred by previous analyses.
Collapse
Affiliation(s)
- August E Woerner
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ.,Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY
| | | | - Michael F Hammer
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ
| |
Collapse
|
16
|
The Temporal Dynamics of Background Selection in Nonequilibrium Populations. Genetics 2020; 214:1019-1030. [PMID: 32071195 DOI: 10.1534/genetics.119.302892] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Accepted: 01/30/2020] [Indexed: 01/06/2023] Open
Abstract
Neutral genetic diversity across the genome is determined by the complex interplay of mutation, demographic history, and natural selection. While the direct action of natural selection is limited to functional loci across the genome, its impact can have effects on nearby neutral loci due to genetic linkage. These effects of selection at linked sites, referred to as genetic hitchhiking and background selection (BGS), are pervasive across natural populations. However, only recently has there been a focus on the joint consequences of demography and selection at linked sites, and some empirical studies have come to apparently contradictory conclusions as to their combined effects. To understand the relationship between demography and selection at linked sites, we conducted an extensive forward simulation study of BGS under a range of demographic models. We found that the relative levels of diversity in BGS and neutral regions vary over time and that the initial dynamics after a population size change are often in the opposite direction of the long-term expected trajectory. Our detailed observations of the temporal dynamics of neutral diversity in the context of selection at linked sites in nonequilibrium populations provide new intuition about why patterns of diversity under BGS vary through time in natural populations and help reconcile previously contradictory observations. Most notably, our results highlight that classical models of BGS are poorly suited for predicting diversity in nonequilibrium populations.
Collapse
|
17
|
Matthey‐Doret R, Whitlock MC. Background selection andFST: Consequences for detecting local adaptation. Mol Ecol 2019; 28:3902-3914. [DOI: 10.1111/mec.15197] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 06/19/2019] [Accepted: 07/03/2019] [Indexed: 01/03/2023]
Affiliation(s)
- Remi Matthey‐Doret
- Department of Zoology and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| | - Michael C. Whitlock
- Department of Zoology and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| |
Collapse
|
18
|
Igoshin AV, Gunbin KV, Yudin NS, Voevoda MI. Searching for Signatures of Cold Climate Adaptation in TRPM8 Gene in Populations of East Asian Ancestry. Front Genet 2019; 10:759. [PMID: 31507633 PMCID: PMC6716346 DOI: 10.3389/fgene.2019.00759] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 07/17/2019] [Indexed: 12/14/2022] Open
Abstract
Dispersal of Homo sapiens across the globe during the last 200,000 years was accompanied by adaptation to local climatic conditions, with severe winter temperatures being probably one of the most significant selective forces. The TRPM8 gene codes for a cold-sensing ion channel, and adaptation to low temperatures is the major determinant of its molecular evolution. Here, our aim was to search for signatures of cold climate adaptation in TRPM8 gene using a combined data set of 19 populations of East Asian ancestry from the 1000 Genomes Project and Human Genome Diversity Project. As a result, out of a total of 60 markers under study, none showed significant association with the average winter temperatures at the locations of the studied populations considering the multiple testing thresholds. This might suggest that the principal mode of TRPM8 evolution may be different from widespread models, where adaptive alleles are additive, dominant or recessive, at least in populations with the predominant East Asian component. For example, evolution by means of selectively preferable epistatic interactions among amino acids may have taken place. Despite the lack of strong signals of association, however, a very promising single nucleotide polymorphism (SNP) was found. The SNP rs7577262 is considered the best candidate based on its allelic correlations with winter temperatures, signatures of selective sweep and physiological evidences. The second top SNP, rs17862920, may participate in adaptation as well. Additionally, to assist in interpreting the nominal associations, the other markers reached, we performed SNP prioritization based on functional evidences found in literature and on evolutionary conservativeness.
Collapse
Affiliation(s)
- Alexander V. Igoshin
- Sector of the Genetics of Industrial Microorganisms, The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch, The Russian Academy of Sciences, Novosibirsk, Russia
| | - Konstantin V. Gunbin
- Center of Brain Neurobiology and Neurogenetics, The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch, The Russian Academy of Sciences, Novosibirsk, Russia
- V. Zelman Institute for Medicine and Psychology Novosibirsk State University, Novosibirsk, Russia
- Center for Mitochondrial Functional Genomics, Institute of Living Systems, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Nikolay S. Yudin
- V. Zelman Institute for Medicine and Psychology Novosibirsk State University, Novosibirsk, Russia
- Laboratory of Livestock Molecular Genetics and Breeding, The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch, The Russian Academy of Sciences, Novosibirsk, Russia
| | - Mikhail I. Voevoda
- Laboratory of Human Molecular Genetics, The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch, The Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
19
|
Bourgeois Y, Ruggiero RP, Manthey JD, Boissinot S. Recent Secondary Contacts, Linked Selection, and Variable Recombination Rates Shape Genomic Diversity in the Model Species Anolis carolinensis. Genome Biol Evol 2019; 11:2009-2022. [PMID: 31134281 PMCID: PMC6681179 DOI: 10.1093/gbe/evz110] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/23/2019] [Indexed: 12/14/2022] Open
Abstract
Gaining a better understanding on how selection and neutral processes affect genomic diversity is essential to gain better insights into the mechanisms driving adaptation and speciation. However, the evolutionary processes affecting variation at a genomic scale have not been investigated in most vertebrate lineages. Here, we present the first population genomics survey using whole genome resequencing in the green anole (Anolis carolinensis). Anoles have been intensively studied to understand mechanisms underlying adaptation and speciation. The green anole in particular is an important model to study genome evolution. We quantified how demography, recombination, and selection have led to the current genetic diversity of the green anole by using whole-genome resequencing of five genetic clusters covering the entire species range. The differentiation of green anole's populations is consistent with a northward expansion from South Florida followed by genetic isolation and subsequent gene flow among adjacent genetic clusters. Dispersal out-of-Florida was accompanied by a drastic population bottleneck followed by a rapid population expansion. This event was accompanied by male-biased dispersal and/or selective sweeps on the X chromosome. We show that the interaction between linked selection and recombination is the main contributor to the genomic landscape of differentiation in the anole genome.
Collapse
Affiliation(s)
| | | | - Joseph D Manthey
- New York University Abu Dhabi, United Arab Emirates
- Department of Biological Sciences, Texas Tech University
| | | |
Collapse
|
20
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
21
|
Petersdorf EW, O'hUigin C. The MHC in the era of next-generation sequencing: Implications for bridging structure with function. Hum Immunol 2019; 80:67-78. [PMID: 30321633 PMCID: PMC6542361 DOI: 10.1016/j.humimm.2018.10.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 09/24/2018] [Accepted: 10/01/2018] [Indexed: 12/19/2022]
Abstract
The MHC continues to have the most disease-associations compared to other regions of the human genome, even in the genome-wide association study (GWAS) and single nucleotide polymorphism (SNP) era. Analysis of non-coding variation and their impact on the level of expression of HLA allotypes has shed new light on the potential mechanisms underlying HLA disease associations and alloreactivity in transplantation. Next-generation sequencing (NGS) technology has the capability of delineating the phase of variants in the HLA antigen-recognition site (ARS) with non-coding regulatory polymorphisms. These relationships are critical for understanding the qualitative and quantitative implications of HLA gene diversity. This article summarizes current understanding of non-coding region variation of HLA loci, the consequences of regulatory variation on HLA expression, the role for evolution in shaping lineage-specific expression, and the impact of HLA expression on disease susceptibility and transplantation outcomes. A role for phased sequencing methods for the MHC, and perspectives for future directions in basic and applied immunogenetic studies of the MHC are presented.
Collapse
Affiliation(s)
- Effie W Petersdorf
- University of Washington, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, D4-115, Seattle, WA 98109, United States.
| | - Colm O'hUigin
- Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Microbiome and Genetics Core, Building 37, Room 4140B, Bethesda, MD 20852, United States.
| |
Collapse
|
22
|
Mooney JA, Huber CD, Service S, Sul JH, Marsden CD, Zhang Z, Sabatti C, Ruiz-Linares A, Bedoya G, Freimer N, Lohmueller KE. Understanding the Hidden Complexity of Latin American Population Isolates. Am J Hum Genet 2018; 103:707-726. [PMID: 30401458 PMCID: PMC6218714 DOI: 10.1016/j.ajhg.2018.09.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Accepted: 09/26/2018] [Indexed: 12/12/2022] Open
Abstract
Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole-genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 36×. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals is significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we find that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests that there is no single genetic signature of a population isolate.
Collapse
Affiliation(s)
- Jazlyn A Mooney
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Christian D Huber
- Department of Ecology & Evolutionary Biology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Susan Service
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, Semel Center for Informatics and Personalized Genomics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Clare D Marsden
- Department of Ecology & Evolutionary Biology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Zhongyang Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Andrés Ruiz-Linares
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China; Aix-Marseille Univ, CNRS, EFS, ADES, Marseille, France
| | - Gabriel Bedoya
- Genética Molecular (GENMOL), Universidad de Antioquia, Medellín, Colombia
| | - Nelson Freimer
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Kirk E Lohmueller
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Ecology & Evolutionary Biology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
23
|
Song K, Li L, Zhang G. Relationship Among Intron Length, Gene Expression, and Nucleotide Diversity in the Pacific Oyster Crassostrea gigas. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2018; 20:676-684. [PMID: 29967965 DOI: 10.1007/s10126-018-9838-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 06/05/2018] [Indexed: 06/08/2023]
Abstract
Crassostrea gigas is a model mollusk, but its genetic features have not been studied comprehensively. In this study, we used whole-genome resequencing data to identify and characterize nucleotide diversity and population recombination rate in a diverse collection of 21 C. gigas samples. Our analyses revealed that C. gigas harbors both extremely high genetic diversity and recombination rates across the whole genome as compared with those of the other taxa. The noncoding regions, introns, intergenic spacers, and untranslated regions (UTRs) showed a lower level diversity than the synonymous sites. The larger introns tended to have lower diversity. Moreover, we found a negative association of the non-synonymous diversity with gene expression, which suggested that purifying selection played an important role in shaping genetic diversity. The nucleotide diversity at the 100- and 50-kb levels was positively correlated with population recombination rates, which was expected if the diversity was shaped by purifying selection or hitchhiking of advantageous mutants. Our work gives a general picture of the oyster's polymorphism pattern and its association with recombination rates.
Collapse
Affiliation(s)
- Kai Song
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, Shandong, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, Shandong, China
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, 266071, Shandong, China
| | - Li Li
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, Shandong, China.
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, 266071, Shandong, China.
- Laboratory for Marine Fisheries and Aquaculture, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, Shandong, China.
- Institute of Oceanology, Chinese Academy of Sciences, 7th Nanhai Rd., Qingdao, China.
| | - Guofan Zhang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, Shandong, China.
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, Shandong, China.
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, 266071, Shandong, China.
- Institute of Oceanology, Chinese Academy of Sciences, 7th Nanhai Rd., Qingdao, China.
| |
Collapse
|
24
|
Lo E, Bonizzoni M, Hemming-Schroeder E, Ford A, Janies DA, James AA, Afrane Y, Etemesi H, Zhou G, Githeko A, Yan G. Selection and Utility of Single Nucleotide Polymorphism Markers to Reveal Fine-Scale Population Structure in Human Malaria Parasite Plasmodium falciparum. Front Ecol Evol 2018. [DOI: 10.3389/fevo.2018.00145] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|
25
|
Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, Myers RM, Boehnke M, Kang HM, Scott LJ, Li JZ, Zöllner S. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun 2018; 9:3753. [PMID: 30218074 PMCID: PMC6138700 DOI: 10.1038/s41467-018-05936-5] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 07/30/2018] [Indexed: 12/30/2022] Open
Abstract
A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.
Collapse
Affiliation(s)
- Jedidiah Carlson
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Adam E Locke
- McDonnell Genome Institute & Department of Medicine, Washington University, St. Louis, MO, 63108, USA
| | - Matthew Flickinger
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Matthew Zawistowski
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Hyun Min Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Laura J Scott
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jun Z Li
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
26
|
Pouyet F, Aeschbacher S, Thiéry A, Excoffier L. Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences. eLife 2018; 7:e36317. [PMID: 30125248 PMCID: PMC6177262 DOI: 10.7554/elife.36317] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/17/2018] [Indexed: 12/15/2022] Open
Abstract
Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.
Collapse
Affiliation(s)
- Fanny Pouyet
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Simon Aeschbacher
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
| | - Alexandre Thiéry
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
27
|
Torres R, Szpiech ZA, Hernandez RD. Human demographic history has amplified the effects of background selection across the genome. PLoS Genet 2018; 14:e1007387. [PMID: 29912945 PMCID: PMC6056204 DOI: 10.1371/journal.pgen.1007387] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 07/23/2018] [Accepted: 04/30/2018] [Indexed: 01/22/2023] Open
Abstract
Natural populations often grow, shrink, and migrate over time. Such demographic processes can affect genome-wide levels of genetic diversity. Additionally, genetic variation in functional regions of the genome can be altered by natural selection, which drives adaptive mutations to higher frequencies or purges deleterious ones. Such selective processes affect not only the sites directly under selection but also nearby neutral variation through genetic linkage via processes referred to as genetic hitchhiking in the context of positive selection and background selection (BGS) in the context of purifying selection. While there is extensive literature examining the consequences of selection at linked sites at demographic equilibrium, less is known about how non-equilibrium demographic processes influence the effects of hitchhiking and BGS. Utilizing a global sample of human whole-genome sequences from the Thousand Genomes Project and extensive simulations, we investigate how non-equilibrium demographic processes magnify and dampen the consequences of selection at linked sites across the human genome. When binning the genome by inferred strength of BGS, we observe that, compared to Africans, non-African populations have experienced larger proportional decreases in neutral genetic diversity in strong BGS regions. We replicate these findings in admixed populations by showing that non-African ancestral components of the genome have also been affected more severely in these regions. We attribute these differences to the strong, sustained/recurrent population bottlenecks that non-Africans experienced as they migrated out of Africa and throughout the globe. Furthermore, we observe a strong correlation between FST and the inferred strength of BGS, suggesting a stronger rate of genetic drift. Forward simulations of human demographic history with a model of BGS support these observations. Our results show that non-equilibrium demography significantly alters the consequences of selection at linked sites and support the need for more work investigating the dynamic process of multiple evolutionary forces operating in concert.
Collapse
Affiliation(s)
- Raul Torres
- Biomedical Sciences Graduate Program, University of California San Francisco, San Francisco, CA, United States of America
| | - Zachary A. Szpiech
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, United States of America
| | - Ryan D. Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, United States of America
- Institute for Computational Health Sciences, University of California San Francisco, San Francisco, CA, United States of America
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, United States of America
- * E-mail:
| |
Collapse
|
28
|
Smith TCA, Arndt PF, Eyre-Walker A. Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans. PLoS Genet 2018; 14:e1007254. [PMID: 29590096 PMCID: PMC5891062 DOI: 10.1371/journal.pgen.1007254] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/09/2018] [Accepted: 02/13/2018] [Indexed: 01/17/2023] Open
Abstract
It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investigate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show different patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that cannot be explained by variation at smaller scales, however the level of this variation is modest at large scales-at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore structure of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between species is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered.
Collapse
Affiliation(s)
| | - Peter F. Arndt
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
29
|
Kamdem C, Fouet C, White BJ. Chromosome arm-specific patterns of polymorphism associated with chromosomal inversions in the major African malaria vector, Anopheles funestus. Mol Ecol 2017; 26:5552-5566. [PMID: 28833796 PMCID: PMC5927613 DOI: 10.1111/mec.14335] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2016] [Revised: 08/08/2017] [Accepted: 08/14/2017] [Indexed: 02/02/2023]
Abstract
Chromosomal inversions facilitate local adaptation of beneficial mutations and modulate genetic polymorphism, but the extent of their effects within the genome is still insufficiently understood. The genome of Anopheles funestus, a malaria mosquito endemic to sub-Saharan Africa, contains an impressive number of paracentric polymorphic inversions, which are unevenly distributed among chromosomes and provide an excellent framework for investigating the genomic impacts of chromosomal rearrangements. Here, we present results of a fine-scale analysis of genetic variation within the genome of two weakly differentiated populations of Anopheles funestus inhabiting contrasting moisture conditions in Cameroon. Using population genomic analyses, we found that genetic divergence between the two populations is centred on regions of the genome corresponding to three inversions, which are characterized by high values of FST , absolute sequence divergence and fixed differences. Importantly, in contrast to the 2L chromosome arm, which is collinear, nucleotide diversity is significantly reduced along the entire length of three autosome arms bearing multiple overlapping chromosomal rearrangements. These findings support the idea that interactions between reduced recombination and natural selection within inversions contribute to sculpt nucleotide polymorphism across chromosomes in An. funestus.
Collapse
Affiliation(s)
- Colince Kamdem
- Department of Entomology, University of California, Riverside, CA 92521
| | - Caroline Fouet
- Department of Entomology, University of California, Riverside, CA 92521
| | - Bradley J. White
- Department of Entomology, University of California, Riverside, CA 92521
| |
Collapse
|
30
|
Booker TR, Ness RW, Keightley PD. The Recombination Landscape in Wild House Mice Inferred Using Population Genomic Data. Genetics 2017; 207:297-309. [PMID: 28751421 PMCID: PMC5586380 DOI: 10.1534/genetics.117.300063] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 07/19/2017] [Indexed: 11/29/2022] Open
Abstract
Characterizing variation in the rate of recombination across the genome is important for understanding several evolutionary processes. Previous analysis of the recombination landscape in laboratory mice has revealed that the different subspecies have different suites of recombination hotspots. It is unknown, however, whether hotspots identified in laboratory strains reflect the hotspot diversity of natural populations or whether broad-scale variation in the rate of recombination is conserved between subspecies. In this study, we constructed fine-scale recombination rate maps for a natural population of the Eastern house mouse, Mus musculus castaneus We performed simulations to assess the accuracy of recombination rate inference in the presence of phase errors, and we used a novel approach to quantify phase error. The spatial distribution of recombination events is strongly positively correlated between our castaneus map, and a map constructed using inbred lines derived predominantly from M. m. domesticus Recombination hotspots in wild castaneus show little overlap, however, with the locations of double-strand breaks in wild-derived house mouse strains. Finally, we also find that genetic diversity in M. m. castaneus is positively correlated with the rate of recombination, consistent with pervasive natural selection operating in the genome. Our study suggests that recombination rate variation is conserved at broad scales between house mouse subspecies, but it is not strongly conserved at fine scales.
Collapse
Affiliation(s)
- Tom R Booker
- Institute of Evolutionary Biology, University of Edinburgh, EH9 3FL, United Kingdom
| | - Rob W Ness
- Department of Biology, University of Toronto Mississauga, Ontario, L5L 1C6, Canada
| | - Peter D Keightley
- Institute of Evolutionary Biology, University of Edinburgh, EH9 3FL, United Kingdom
| |
Collapse
|
31
|
Charlesworth et al. on Background Selection and Neutral Diversity. Genetics 2017; 204:829-832. [PMID: 28114095 DOI: 10.1534/genetics.116.196170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
32
|
Phung TN, Huber CD, Lohmueller KE. Determining the Effect of Natural Selection on Linked Neutral Divergence across Species. PLoS Genet 2016; 12:e1006199. [PMID: 27508305 PMCID: PMC4980041 DOI: 10.1371/journal.pgen.1006199] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 06/25/2016] [Indexed: 11/18/2022] Open
Abstract
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation. Genetic variation at neutral sites can be reduced through linkage to nearby selected sites. This pattern has been used to show the widespread effects of natural selection at shaping patterns of genetic diversity across genomes from a variety of species. However, it is not entirely clear whether natural selection has an effect on neutral divergence between species. Here we show that putatively neutral divergence between closely related species (human and chimp) and between distantly related pairs of species (humans and mice) show signatures consistent with having been affected by linkage to selected sites. Further, our theoretical models and simulations show that natural selection indirectly affecting linked neutral sites can generate these patterns. Unless substantially more of the genome is under the direct effects of purifying selection than currently believed, our results argue that natural selection has played an important role in shaping variation in levels of putatively neutral sequence divergence across the genome. Our findings further suggest that divergence-based estimates of neutral mutation rate variation across the genome as well as certain estimators of population history may be confounded by linkage to selected sites.
Collapse
Affiliation(s)
- Tanya N. Phung
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Christian D. Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
33
|
Osada N. Genetic diversity in humans and non-human primates and its evolutionary consequences. Genes Genet Syst 2016; 90:133-45. [PMID: 26510568 DOI: 10.1266/ggs.90.133] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Genetic diversity is a key parameter in population genetics and is important for understanding the process of evolution and for the development of appropriate conservation strategies. Recent advances in sequencing technology have enabled the measurement of genetic diversity of various organisms at the nucleotide level and on a genome-wide scale, yielding more precise estimates than were previously achievable. In this review, I have compiled and summarized the estimates of genetic diversity in humans and non-human primates based on recent genome-wide studies. Although studies on population genetics demonstrated fluctuations in population sizes over time, general patterns have emerged. As shown previously, genetic diversity in humans is one of the lowest among primates; however, certain other primate species exhibit genetic diversity that is comparable to or even lower than that in humans. There exists greater than 10-fold variation in genetic diversity among primate species, and I found weak correlation with species fecundity but not with body or propagule size. I further discuss the potential evolutionary consequences of population size decline on the evolution of primate species. The level of genetic diversity negatively correlates with the ratio of non-synonymous to synonymous polymorphisms in a population, suggesting that proportionally greater numbers of slightly deleterious mutations segregate in small rather than large populations. Although population size decline is likely to promote the fixation of slightly deleterious mutations, there are molecular mechanisms, such as compensatory mutations at various molecular levels, which may prevent fitness decline at the population level. The effects of slightly deleterious mutations from theoretical and empirical studies and their relevance to conservation biology are also discussed in this review.
Collapse
Affiliation(s)
- Naoki Osada
- Department of Population Genetics, National Institute of Genetics
| |
Collapse
|
34
|
Elyashiv E, Sattath S, Hu TT, Strutsovsky A, McVicker G, Andolfatto P, Coop G, Sella G. A Genomic Map of the Effects of Linked Selection in Drosophila. PLoS Genet 2016; 12:e1006130. [PMID: 27536991 PMCID: PMC4990265 DOI: 10.1371/journal.pgen.1006130] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 05/26/2016] [Indexed: 01/23/2023] Open
Abstract
Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of "linked selection" on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of other modes of linked selection and of adaptation in particular. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.
Collapse
Affiliation(s)
- Eyal Elyashiv
- Department of Ecology, Evolution, and Behavior, Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Shmuel Sattath
- Department of Ecology, Evolution, and Behavior, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Tina T. Hu
- Department of Ecology and Evolutionary Biology and the Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Alon Strutsovsky
- Department of Ecology, Evolution, and Behavior, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Graham McVicker
- The Laboratory of Genetics and The Integrative Biology Laboratory, Salk Institute for Biological Studies, La Jolla, California, United States of America
| | - Peter Andolfatto
- Department of Ecology and Evolutionary Biology and the Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Graham Coop
- Department of Evolution and Ecology, University of California, Davis, Davis, California, United States of America
| | - Guy Sella
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| |
Collapse
|
35
|
A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years. Proc Natl Acad Sci U S A 2016; 113:5652-7. [PMID: 27140627 DOI: 10.1073/pnas.1514696113] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
The study of human evolution has been revolutionized by inferences from ancient DNA analyses. Key to these studies is the reliable estimation of the age of ancient specimens. High-resolution age estimates can often be obtained using radiocarbon dating, and, while precise and powerful, this method has some biases, making it of interest to directly use genetic data to infer a date for samples that have been sequenced. Here, we report a genetic method that uses the recombination clock. The idea is that an ancient genome has evolved less than the genomes of present-day individuals and thus has experienced fewer recombination events since the common ancestor. To implement this idea, we take advantage of the insight that all non-Africans have a common heritage of Neanderthal gene flow into their ancestors. Thus, we can estimate the date since Neanderthal admixture for present-day and ancient samples simultaneously and use the difference as a direct estimate of the ancient specimen's age. We apply our method to date five Upper Paleolithic Eurasian genomes with radiocarbon dates between 12,000 and 45,000 y ago and show an excellent correlation of the genetic and (14)C dates. By considering the slope of the correlation between the genetic dates, which are in units of generations, and the (14)C dates, which are in units of years, we infer that the mean generation interval in humans over this period has been 26-30 y. Extensions of this methodology that use older shared events may be applicable for dating beyond the radiocarbon frontier.
Collapse
|
36
|
Inferring the Frequency Spectrum of Derived Variants to Quantify Adaptive Molecular Evolution in Protein-Coding Genes of Drosophila melanogaster. Genetics 2016; 203:975-84. [PMID: 27098912 DOI: 10.1534/genetics.116.188102] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 04/18/2014] [Indexed: 11/18/2022] Open
Abstract
Many approaches for inferring adaptive molecular evolution analyze the unfolded site frequency spectrum (SFS), a vector of counts of sites with different numbers of copies of derived alleles in a sample of alleles from a population. Accurate inference of the high-copy-number elements of the SFS is difficult, however, because of misassignment of alleles as derived vs. ancestral. This is a known problem with parsimony using outgroup species. Here we show that the problem is particularly serious if there is variation in the substitution rate among sites brought about by variation in selective constraint levels. We present a new method for inferring the SFS using one or two outgroups that attempts to overcome the problem of misassignment. We show that two outgroups are required for accurate estimation of the SFS if there is substantial variation in selective constraints, which is expected to be the case for nonsynonymous sites in protein-coding genes. We apply the method to estimate unfolded SFSs for synonymous and nonsynonymous sites in a population of Drosophila melanogaster from phase 2 of the Drosophila Population Genomics Project. We use the unfolded spectra to estimate the frequency and strength of advantageous and deleterious mutations and estimate that ∼50% of amino acid substitutions are positively selected but that <0.5% of new amino acid mutations are beneficial, with a scaled selection strength of Nes ≈ 12.
Collapse
|
37
|
Memon S, Jia X, Gu L, Zhang X. Genomic variations and distinct evolutionary rate of rare alleles in Arabidopsis thaliana. BMC Evol Biol 2016; 16:25. [PMID: 26817829 PMCID: PMC4728917 DOI: 10.1186/s12862-016-0590-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2015] [Accepted: 01/12/2016] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The variation rate in genomic regions associated with different alleles, impacts to distinct evolutionary patterns involving rare alleles. The rare alleles bias towards genome-wide association studies (GWASs), aim to detect different variants at genomic loci associated with single-nucleotide polymorphisms (SNPs) inclined to produce different haplotypes. Here, we sequenced Arabidopsis thaliana and compared its coding and non-coding genomic regions with its closest outgroup relative, Arabidopsis lyrta, which accounted for the ancestral misinference. The use of genome-wide SNPs interpret the genetic architecture of rare alleles in Arabidopsis thaliana, elucidating a significant departure from a neutral evolutionary model and the pattern of polymorphisms around a selected locus will exclusively influence natural selection. RESULTS We found 23.4% of the rare alleles existing randomly in the genome. Notably, in our results significant differences (P < 0.01) were estimated in the relative rates between rare versus intermediate alleles, between fixed versus non-fixed mutations, and between type I versus type II rare-mutations by using the χ (2)-test. However, the rare alleles generating negative values of Tajima's D suggest that they generated under selective sweeps. Relative to polymorphic sites including SNPs, 67.5% of the fixed mutations were attributed, indicating major contributors to speciation. Substantially, an evolution occurred in the rare allele that was 1.42-times faster than that in a major haplotype. CONCLUSION Our results interpret that rare alleles fits a random occurrence model, indicating that rare alleles occur at any locus in a genome and in any accession in a species. Based on the higher relative rate of derived to ancient mutations and higher average D xy, we conclude that rare alleles evolve faster than the higher frequency alleles. The rapid evolution of rare alleles indicates that they must have been newly generated with fixed mutations, compared with the other alleles. Eventually, PCR and sequencing results, in the flanking regions of rare allele loci confirm that they are of short extension, indicating the absence of a genome-wide pattern for a rare haplotype. The indel-associated model for rare alleles assumes that indel-associated mutations only occur in an indel heterozygote.
Collapse
Affiliation(s)
- Shabana Memon
- School of life Sciences, Nanjing University, Nanjing, 210093, China. .,Lecturer, Department of Plant Breeding and Genetics, Sindh Agriculture University, Tando Jam, Hyderabad, 70060, Pakistan.
| | - Xianqing Jia
- School of life Sciences, Nanjing University, Nanjing, 210093, China.
| | - Longjiang Gu
- School of life Sciences, Nanjing University, Nanjing, 210093, China.
| | - Xiaohui Zhang
- School of life Sciences, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
38
|
Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 2016; 48:231-237. [PMID: 26808112 PMCID: PMC4942303 DOI: 10.1038/ng.3493] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 12/23/2015] [Indexed: 12/20/2022]
Abstract
An unexpectedly large number of human autosomal genes are subject to monoallelic expression (MAE). Our analysis of 4,227 such genes uncovers surprisingly high genetic variation across human populations. This increased diversity is unlikely to reflect relaxed purifying selection. Remarkably, MAE genes exhibit an elevated recombination rate and an increased density of hypermutable sequence contexts. However, these factors do not fully account for the increased diversity. We find that the elevated nucleotide diversity of MAE genes is also associated with greater allelic age: variants in these genes tend to be older and are enriched in polymorphisms shared by Neanderthals and chimpanzees. Both synonymous and nonsynonymous alleles of MAE genes have elevated average population frequencies. We also observed strong enrichment of the MAE signature among genes reported to evolve under balancing selection. We propose that an important biological function of widespread MAE might be the generation of cell-to-cell heterogeneity; the increased genetic variation contributes to this heterogeneity.
Collapse
|
39
|
Sanseverino W, Hénaff E, Vives C, Pinosio S, Burgos-Paz W, Morgante M, Ramos-Onsins SE, Garcia-Mas J, Casacuberta JM. Transposon Insertions, Structural Variations, and SNPs Contribute to the Evolution of the Melon Genome. Mol Biol Evol 2015; 32:2760-74. [PMID: 26174143 DOI: 10.1093/molbev/msv152] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution.
Collapse
Affiliation(s)
- Walter Sanseverino
- Institut de Recerca i Tecnologia Agroalimentàries, Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Barcelona, Spain
| | - Elizabeth Hénaff
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Barcelona, Spain
| | - Cristina Vives
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Barcelona, Spain
| | - Sara Pinosio
- Dipartimento di szience agrarie e ambientali, Università degli studi di Udine, Udine, Italy
| | - William Burgos-Paz
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Barcelona, Spain
| | - Michele Morgante
- Dipartimento di szience agrarie e ambientali, Università degli studi di Udine, Udine, Italy
| | | | - Jordi Garcia-Mas
- Institut de Recerca i Tecnologia Agroalimentàries, Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Barcelona, Spain
| | | |
Collapse
|
40
|
Deinum EE, Halligan DL, Ness RW, Zhang YH, Cong L, Zhang JX, Keightley PD. Recent Evolution in Rattus norvegicus Is Shaped by Declining Effective Population Size. Mol Biol Evol 2015; 32:2547-58. [PMID: 26037536 PMCID: PMC4576703 DOI: 10.1093/molbev/msv126] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The brown rat, Rattus norvegicus, is both a notorious pest and a frequently used model in biomedical research. By analyzing genome sequences of 12 wild-caught brown rats from their presumed ancestral range in NE China, along with the sequence of a black rat, Rattus rattus, we investigate the selective and demographic forces shaping variation in the genome. We estimate that the recent effective population size (Ne) of this species = 1.24×105, based on silent site diversity. We compare patterns of diversity in these genomes with patterns in multiple genome sequences of the house mouse (Mus musculus castaneus), which has a much larger Ne. This reveals an important role for variation in the strength of genetic drift in mammalian genome evolution. By a Pairwise Sequentially Markovian Coalescent analysis of demographic history, we infer that there has been a recent population size bottleneck in wild rats, which we date to approximately 20,000 years ago. Consistent with this, wild rat populations have experienced an increased flux of mildly deleterious mutations, which segregate at higher frequencies in protein-coding genes and conserved noncoding elements. This leads to negative estimates of the rate of adaptive evolution (α) in proteins and conserved noncoding elements, a result which we discuss in relation to the strongly positive estimates observed in wild house mice. As a consequence of the population bottleneck, wild rats also show a markedly slower decay of linkage disequilibrium with physical distance than wild house mice.
Collapse
Affiliation(s)
- Eva E Deinum
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Daniel L Halligan
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Rob W Ness
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Yao-Hua Zhang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents in Agriculture, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Lin Cong
- Institute of Plant Protection, Heilongjiang Academy of Agricultural Sciences, Harbin, China
| | - Jian-Xu Zhang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents in Agriculture, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Peter D Keightley
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
41
|
Wallberg A, Glémin S, Webster MT. Extreme recombination frequencies shape genome variation and evolution in the honeybee, Apis mellifera. PLoS Genet 2015; 11:e1005189. [PMID: 25902173 PMCID: PMC4406589 DOI: 10.1371/journal.pgen.1005189] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 04/01/2015] [Indexed: 01/10/2023] Open
Abstract
Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC), which we infer to generate an allele fixation bias 5 – 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution. Evolution results from changes in allele frequencies in populations. The main forces that cause such changes are natural selection and random genetic drift. However, an additional process, GC-biased gene conversion (gBGC), associated with meiotic recombination, affects the probability that alleles are passed from one generation to the next. The honeybee, Apis mellifera, has extremely high recombination rates—more than 20 times to those observed in humans. However, the reason for this is unknown and the effects of such high recombination rates on evolution are not well understood. Here we use patterns of genetic variation in the genomes of 30 honeybees to infer variation in the rate of recombination across the genome. We find that recombination rates and levels of genetic variation are strongly correlated, which is indicative of a pervasive impact of natural selection on genetic variation. We also infer a major role of DNA methylation in determining recombination rates in genes. Patterns of genetic variation appear to be strongly skewed due to the effects of gBGC, suggesting that recombination generates a bias in transmission of alleles during meiosis. This process seems to be interfering with the efficacy of selection at removing deleterious alleles and favouring beneficial ones. Recombination therefore has a huge impact on genetic variation and evolution in honeybees and appears to play a dominant role in genome evolution.
Collapse
Affiliation(s)
- Andreas Wallberg
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Sylvain Glémin
- Institut des Sciences de l’Evolution (ISEM—UMR 5554 Université de Montpellier-CNRS-IRD-EPHE), France
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Matthew T. Webster
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
42
|
Corbett-Detig RB, Hartl DL, Sackton TB. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol 2015; 13:e1002112. [PMID: 25859758 PMCID: PMC4393120 DOI: 10.1371/journal.pbio.1002112] [Citation(s) in RCA: 196] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 02/20/2015] [Indexed: 11/19/2022] Open
Abstract
The neutral theory of molecular evolution predicts that the amount of neutral polymorphisms within a species will increase proportionally with the census population size (Nc). However, this prediction has not been borne out in practice: while the range of Nc spans many orders of magnitude, levels of genetic diversity within species fall in a comparatively narrow range. Although theoretical arguments have invoked the increased efficacy of natural selection in larger populations to explain this discrepancy, few direct empirical tests of this hypothesis have been conducted. In this work, we provide a direct test of this hypothesis using population genomic data from a wide range of taxonomically diverse species. To do this, we relied on the fact that the impact of natural selection on linked neutral diversity depends on the local recombinational environment. In regions of relatively low recombination, selected variants affect more neutral sites through linkage, and the resulting correlation between recombination and polymorphism allows a quantitative assessment of the magnitude of the impact of selection on linked neutral diversity. By comparing whole genome polymorphism data and genetic maps using a coalescent modeling framework, we estimate the degree to which natural selection reduces linked neutral diversity for 40 species of obligately sexual eukaryotes. We then show that the magnitude of the impact of natural selection is positively correlated with Nc, based on body size and species range as proxies for census population size. These results demonstrate that natural selection removes more variation at linked neutral sites in species with large Nc than those with small Nc and provides direct empirical evidence that natural selection constrains levels of neutral genetic diversity across many species. This implies that natural selection may provide an explanation for this longstanding paradox of population genetics.
Collapse
Affiliation(s)
- Russell B. Corbett-Detig
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge Massachusetts, United States of America
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Daniel L. Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge Massachusetts, United States of America
| | - Timothy B. Sackton
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge Massachusetts, United States of America
| |
Collapse
|
43
|
Siepel A, Arbiza L. Cis-regulatory elements and human evolution. Curr Opin Genet Dev 2014; 29:81-9. [PMID: 25218861 PMCID: PMC4258466 DOI: 10.1016/j.gde.2014.08.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 08/17/2014] [Accepted: 08/23/2014] [Indexed: 11/20/2022]
Abstract
Modification of gene regulation has long been considered an important force in human evolution, particularly through changes to cis-regulatory elements (CREs) that function in transcriptional regulation. For decades, however, the study of cis-regulatory evolution was severely limited by the available data. New data sets describing the locations of CREs and genetic variation within and between species have now made it possible to study CRE evolution much more directly on a genome-wide scale. Here, we review recent research on the evolution of CREs in humans based on large-scale genomic data sets. We consider inferences based on primate divergence, human polymorphism, and combinations of divergence and polymorphism. We then consider 'new frontiers' in this field stemming from recent research on transcriptional regulation.
Collapse
Affiliation(s)
- Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | - Leonardo Arbiza
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
44
|
De Silva DR, Nichols R, Elgar G. Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences. PLoS One 2014; 9:e103357. [PMID: 25062004 PMCID: PMC4111549 DOI: 10.1371/journal.pone.0103357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 07/01/2014] [Indexed: 12/30/2022] Open
Abstract
Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans--higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.
Collapse
Affiliation(s)
- Dilrini R. De Silva
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Richard Nichols
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Greg Elgar
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
| |
Collapse
|
45
|
Rasmussen MD, Hubisz MJ, Gronau I, Siepel A. Genome-wide inference of ancestral recombination graphs. PLoS Genet 2014; 10:e1004342. [PMID: 24831947 PMCID: PMC4022496 DOI: 10.1371/journal.pgen.1004342] [Citation(s) in RCA: 179] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 03/17/2014] [Indexed: 01/23/2023] Open
Abstract
The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.
Collapse
Affiliation(s)
- Matthew D. Rasmussen
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (MDR); (AS)
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambs, United Kingdom
- * E-mail: (MDR); (AS)
| |
Collapse
|
46
|
Wright FA, Sullivan PF, Brooks AI, Zou F, Sun W, Xia K, Madar V, Jansen R, Chung W, Zhou YH, Abdellaoui A, Batista S, Butler C, Chen G, Chen TH, D'Ambrosio D, Gallins P, Ha MJ, Hottenga JJ, Huang S, Kattenberg M, Kochar J, Middeldorp CM, Qu A, Shabalin A, Tischfield J, Todd L, Tzeng JY, van Grootheest G, Vink JM, Wang Q, Wang W, Wang W, Willemsen G, Smit JH, de Geus EJ, Yin Z, Penninx BWJH, Boomsma DI. Heritability and genomics of gene expression in peripheral blood. Nat Genet 2014; 46:430-7. [PMID: 24728292 PMCID: PMC4012342 DOI: 10.1038/ng.2951] [Citation(s) in RCA: 248] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Accepted: 03/14/2014] [Indexed: 12/14/2022]
Abstract
We assessed gene expression profiles in 2,752 twins, using a classic twin design to quantify expression heritability and quantitative trait loci (eQTLs) in peripheral blood. The most highly heritable genes (∼777) were grouped into distinct expression clusters, enriched in gene-poor regions, associated with specific gene function or ontology classes, and strongly associated with disease designation. The design enabled a comparison of twin-based heritability to estimates based on dizygotic identity-by-descent sharing and distant genetic relatedness. Consideration of sampling variation suggests that previous heritability estimates have been upwardly biased. Genotyping of 2,494 twins enabled powerful identification of eQTLs, which we further examined in a replication set of 1,895 unrelated subjects. A large number of non-redundant local eQTLs (6,756) met replication criteria, whereas a relatively small number of distant eQTLs (165) met quality control and replication standards. Our results provide a new resource toward understanding the genetic control of transcription.
Collapse
Affiliation(s)
- Fred A Wright
- 1] Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA. [2] Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA. [3] Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, USA. [4]
| | - Patrick F Sullivan
- 1] Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. [2]
| | - Andrew I Brooks
- Department of Genetics, Rutgers University, New Brunswick, New Jersey, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Wei Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kai Xia
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Vered Madar
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Rick Jansen
- Department of Psychiatry, VU Medical Center, Amsterdam, The Netherlands
| | - Wonil Chung
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Yi-Hui Zhou
- 1] Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA. [2] Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Abdel Abdellaoui
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Sandra Batista
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Casey Butler
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Guanhua Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Ting-Huei Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - David D'Ambrosio
- Environmental and Occupational Health Sciences Institute, Rutgers University, New Brunswick, New Jersey, USA
| | - Paul Gallins
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Min Jin Ha
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jouke Jan Hottenga
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Shunping Huang
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mathijs Kattenberg
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Jaspreet Kochar
- Environmental and Occupational Health Sciences Institute, Rutgers University, New Brunswick, New Jersey, USA
| | | | - Ani Qu
- Environmental and Occupational Health Sciences Institute, Rutgers University, New Brunswick, New Jersey, USA
| | - Andrey Shabalin
- Department of Pharmacotherapy and Outcomes Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Jay Tischfield
- Department of Genetics, Rutgers University, New Brunswick, New Jersey, USA
| | - Laura Todd
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jung-Ying Tzeng
- 1] Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA. [2] Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Jacqueline M Vink
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Qi Wang
- Environmental and Occupational Health Sciences Institute, Rutgers University, New Brunswick, New Jersey, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA
| | - Weibo Wang
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Gonneke Willemsen
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Johannes H Smit
- Department of Psychiatry, VU Medical Center, Amsterdam, The Netherlands
| | - Eco J de Geus
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Zhaoyu Yin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Dorret I Boomsma
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| |
Collapse
|
47
|
Ye K, Lu J, Raj SM, Gu Z. Human expression QTLs are enriched in signals of environmental adaptation. Genome Biol Evol 2014; 5:1689-701. [PMID: 23960253 PMCID: PMC3787676 DOI: 10.1093/gbe/evt124] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Expression quantitative trait loci (eQTLs) have been found to be enriched in trait-associated single-nucleotide polymorphisms (SNPs). However, whether eQTLs are adaptive to different environmental factors and its relative evolutionary significance compared with nonsynonymous SNPs (NS SNPs) are still elusive. Compiling environmental correlation data from three studies for more than 500,000 SNPs and 42 environmental factors, including climate, subsistence, pathogens, and dietary patterns, we performed a systematic examination of the adaptive patterns of eQTLs to local environment. Compared with intergenic SNPs, eQTLs are significantly enriched in the lower tail of a transformed rank statistic in the environmental correlation analysis, indicating possible adaptation of eQTLs to the majority of 42 environmental factors. The mean enrichment of eQTLs across 42 environmental factors is as great as, if not greater than, that of NS SNPs. The enrichment of eQTLs, although significant across all levels of recombination rate, is inversely correlated with recombination rate, suggesting the presence of selective sweep or background selection. Further pathway enrichment analysis identified a number of pathways with possible environmental adaption from eQTLs. These pathways are mostly related with immune function and metabolism. Our results indicate that eQTLs might have played an important role in recent and ongoing human adaptation and are of special importance for some environmental factors and biological pathways.
Collapse
Affiliation(s)
- Kaixiong Ye
- Division of Nutritional Sciences, Cornell University
| | | | | | | |
Collapse
|
48
|
Abstract
The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci, and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1000 Genomes Project and detect signatures of positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to argue that the observed signatures require a high rate of strongly adaptive substitutions near amino acid changes. We further demonstrate that the observed signatures of positive selection correlate better with the presence of regulatory sequences, as predicted by the ENCODE Project Consortium, than with the positions of amino acid substitutions. Our results suggest that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson that adaptive divergence is primarily driven by regulatory changes.
Collapse
Affiliation(s)
- David Enard
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | - Philipp W Messer
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | - Dmitri A Petrov
- Department of Biology, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
49
|
Halligan DL, Kousathanas A, Ness RW, Harr B, Eöry L, Keane TM, Adams DJ, Keightley PD. Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents. PLoS Genet 2013; 9:e1003995. [PMID: 24339797 PMCID: PMC3854965 DOI: 10.1371/journal.pgen.1003995] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 10/16/2013] [Indexed: 12/22/2022] Open
Abstract
The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ∼100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution. We present an analysis of the genome sequences of multiple wild house mice. Wild house mice are about ten times more genetically diverse than humans, reflecting the large effective population size of the species. This manifests itself as more effective natural selection acting against deleterious mutations and favouring advantageous mutations in mice than in humans. We show that there are strong signals of adaptive evolution at many sites in the genome. We estimate that 80% of adaptive changes in the genome are in gene regulatory elements and only 20% are in protein-coding genes. We find that nucleotide diversity is markedly reduced close to gene regulatory elements and protein-coding gene sequences. The reductions around regulatory elements can be explained by selection purging deleterious mutations that occur in the elements themselves, but this process only partially explains the diversity reductions around protein-coding genes. Recurrent adaptive evolution, which can also cause local reductions in diversity via selective sweeps, may be necessary to fully explain the patterns in diversity that we observe surrounding genes. Although most adaptive molecular evolution appears to be regulatory, adaptive phenotypic change may principally be driven by structural change in proteins.
Collapse
Affiliation(s)
- Daniel L. Halligan
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | | | - Rob W. Ness
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Bettina Harr
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| | - Lél Eöry
- The Roslin Institute and R(D)SVS, University of Edinburgh, Midlothian, United Kingdom
| | - Thomas M. Keane
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - David J. Adams
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Peter D. Keightley
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- * E-mail:
| |
Collapse
|
50
|
Schaibley VM, Zawistowski M, Wegmann D, Ehm MG, Nelson MR, St. Jean PL, Abecasis GR, Novembre J, Zöllner S, Li JZ. The influence of genomic context on mutation patterns in the human genome inferred from rare variants. Genome Res 2013; 23:1974-84. [PMID: 23990608 PMCID: PMC3847768 DOI: 10.1101/gr.154971.113] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 08/19/2013] [Indexed: 01/22/2023]
Abstract
Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.
Collapse
Affiliation(s)
- Valerie M. Schaibley
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Matthew Zawistowski
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48019, USA
| | - Daniel Wegmann
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
| | - Margaret G. Ehm
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, North Carolina 27709, USA
| | - Matthew R. Nelson
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, North Carolina 27709, USA
| | - Pamela L. St. Jean
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, North Carolina 27709, USA
| | - Gonçalo R. Abecasis
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48019, USA
| | - John Novembre
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48019, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, Michigan 48019, USA
| | - Jun Z. Li
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|