1
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
2
|
Lyulina AS, Liu Z, Good BH. Linkage equilibrium between rare mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587282. [PMID: 38617331 PMCID: PMC11014483 DOI: 10.1101/2024.03.28.587282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Recombination breaks down genetic linkage by reshuffling existing variants onto new genetic backgrounds. These dynamics are traditionally quantified by examining the correlations between alleles, and how they decay as a function of the recombination rate. However, the magnitudes of these correlations are strongly influenced by other evolutionary forces like natural selection and genetic drift, making it difficult to tease out the effects of recombination. Here we introduce a theoretical framework for analyzing an alternative family of statistics that measure the homoplasy produced by recombination. We derive analytical expressions that predict how these statistics depend on the rates of recombination and recurrent mutation, the strength of negative selection and genetic drift, and the present-day frequencies of the mutant alleles. We find that the degree of homoplasy can strongly depend on this frequency scale, which reflects the underlying timescales over which these mutations occurred. We show how these scaling properties can be used to isolate the effects of recombination, and discuss their implications for the rates of horizontal gene transfer in bacteria.
Collapse
Affiliation(s)
- Anastasia S Lyulina
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Zhiru Liu
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Benjamin H Good
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
3
|
Teterina AA, Willis JH, Lukac M, Jovelin R, Cutter AD, Phillips PC. Genomic diversity landscapes in outcrossing and selfing Caenorhabditis nematodes. PLoS Genet 2023; 19:e1010879. [PMID: 37585484 PMCID: PMC10461856 DOI: 10.1371/journal.pgen.1010879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 08/28/2023] [Accepted: 07/21/2023] [Indexed: 08/18/2023] Open
Abstract
Caenorhabditis nematodes form an excellent model for studying how the mode of reproduction affects genetic diversity, as some species reproduce via outcrossing whereas others can self-fertilize. Currently, chromosome-level patterns of diversity and recombination are only available for self-reproducing Caenorhabditis, making the generality of genomic patterns across the genus unclear given the profound potential influence of reproductive mode. Here we present a whole-genome diversity landscape, coupled with a new genetic map, for the outcrossing nematode C. remanei. We demonstrate that the genomic distribution of recombination in C. remanei, like the model nematode C. elegans, shows high recombination rates on chromosome arms and low rates toward the central regions. Patterns of genetic variation across the genome are also similar between these species, but differ dramatically in scale, being tenfold greater for C. remanei. Historical reconstructions of variation in effective population size over the past million generations echo this difference in polymorphism. Evolutionary simulations demonstrate how selection, recombination, mutation, and selfing shape variation along the genome, and that multiple drivers can produce patterns similar to those observed in natural populations. The results illustrate how genome organization and selection play a crucial role in shaping the genomic pattern of diversity whereas demographic processes scale the level of diversity across the genome as a whole.
Collapse
Affiliation(s)
- Anastasia A. Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
- Center of Parasitology, Severtsov Institute of Ecology and Evolution RAS, Moscow, Russia
| | - John H. Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Matt Lukac
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Asher D. Cutter
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Patrick C. Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
4
|
Reid BN, Pinsky ML. Simulation-Based Evaluation of Methods, Data Types, and Temporal Sampling Schemes for Detecting Recent Population Declines. Integr Comp Biol 2022; 62:1849-1863. [PMID: 36104155 PMCID: PMC9801984 DOI: 10.1093/icb/icac144] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 08/08/2022] [Accepted: 08/14/2022] [Indexed: 01/05/2023] Open
Abstract
Understanding recent population trends is critical to quantifying species vulnerability and implementing effective management strategies. To evaluate the accuracy of genomic methods for quantifying recent declines (beginning <120 generations ago), we simulated genomic data using forward-time methods (SLiM) coupled with coalescent simulations (msprime) under a number of demographic scenarios. We evaluated both site frequency spectrum (SFS)-based methods (momi2, Stairway Plot) and methods that employ linkage disequilibrium information (NeEstimator, GONE) with a range of sampling schemes (contemporary-only samples, sampling two time points, and serial sampling) and data types (RAD-like data and whole-genome sequencing). GONE and momi2 performed best overall, with >80% power to detect severe declines with large sample sizes. Two-sample and serial sampling schemes could accurately reconstruct changes in population size, and serial sampling was particularly valuable for making accurate inferences when genotyping errors or minor allele frequency cutoffs distort the SFS or under model mis-specification. However, sampling only contemporary individuals provided reliable inferences about contemporary size and size change using either site frequency or linkage-based methods, especially when large sample sizes or whole genomes from contemporary populations were available. These findings provide a guide for researchers designing genomics studies to evaluate recent demographic declines.
Collapse
Affiliation(s)
| | - Malin L Pinsky
- Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, NJ 08901, USA
| |
Collapse
|
5
|
A decade of genetic monitoring reveals increased inbreeding for the Endangered western leopard toad, Sclerophrys pantherina. CONSERV GENET 2022. [DOI: 10.1007/s10592-022-01463-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Biddanda A, Steinrücken M, Novembre J. Properties of Two-Locus Genealogies and Linkage Disequilibrium in Temporally Structured Samples. Genetics 2022; 221:6549526. [PMID: 35294015 PMCID: PMC9245597 DOI: 10.1093/genetics/iyac038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 02/06/2022] [Indexed: 11/13/2022] Open
Abstract
Archaeogenetics has been revolutionary, revealing insights into demographic history and recent positive selection. However, most studies to date have ignored the non-random association of genetic variants at different loci (i.e., linkage disequilibrium, LD). This may be in part because basic properties of LD in samples from different times are still not well understood. Here, we derive several results for summary statistics of haplotypic variation under a model with time-stratified sampling: 1) The correlation between the number of pairwise differences observed between time-staggered samples (πΔt) in models with and without strict population continuity; 2) The product of the LD coefficient, D, between ancient and modern samples, which is a measure of haplotypic similarity between modern and ancient samples; and 3) The expected switch rate in the Li and Stephens haplotype copying model. The latter has implications for genotype imputation and phasing in ancient samples with modern reference panels. Overall, these results provide a characterization of how haplotype patterns are affected by sample age, recombination rates, and population sizes. We expect these results will help guide the interpretation and analysis of haplotype data from ancient and modern samples.
Collapse
Affiliation(s)
- Arjun Biddanda
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Matthias Steinrücken
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
7
|
Good BH. Linkage disequilibrium between rare mutations. Genetics 2022; 220:6503502. [PMID: 35100407 PMCID: PMC8982034 DOI: 10.1093/genetics/iyac004] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/21/2021] [Indexed: 01/13/2023] Open
Abstract
The statistical associations between mutations, collectively known as linkage disequilibrium, encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of linkage disequilibrium, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, I introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. I show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. I use this approach to derive analytical expressions for a family of frequency-weighted linkage disequilibrium statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. I find that the frequency scale can have a dramatic impact on the shapes of the resulting linkage disequilibrium curves, reflecting the broad range of time scales over which these correlations arise. I also show that the differences between neutral and deleterious linkage disequilibrium are not purely driven by differences in their mutation frequencies and can instead display qualitative features that are reminiscent of epistasis. I conclude by discussing the implications of these results for recent linkage disequilibrium measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.
Collapse
Affiliation(s)
- Benjamin H Good
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA,Corresponding author: Department of Applied Physics, Stanford University, Clark Center, 318 Campus Drive, Stanford, CA 94305, USA.
| |
Collapse
|
8
|
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations. Sci Rep 2021; 11:17741. [PMID: 34493766 PMCID: PMC8423758 DOI: 10.1038/s41598-021-97129-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 08/13/2021] [Indexed: 11/11/2022] Open
Abstract
Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.
Collapse
|
9
|
Genetic diversity and population structure of Ottelia alismoides (Hydrocharitaceae), a vulnerable plant in agro-ecosystems of Japan. Glob Ecol Conserv 2021. [DOI: 10.1016/j.gecco.2021.e01676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
|
10
|
Zeng K, Charlesworth B, Hobolth A. Studying models of balancing selection using phase-type theory. Genetics 2021; 218:6237896. [PMID: 33871627 DOI: 10.1093/genetics/iyab055] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/25/2021] [Indexed: 11/15/2022] Open
Abstract
Balancing selection (BLS) is the evolutionary force that maintains high levels of genetic variability in many important genes. To further our understanding of its evolutionary significance, we analyze models with BLS acting on a biallelic locus: an equilibrium model with long-term BLS, a model with long-term BLS and recent changes in population size, and a model of recent BLS. Using phase-type theory, a mathematical tool for analyzing continuous time Markov chains with an absorbing state, we examine how BLS affects polymorphism patterns in linked neutral regions, as summarized by nucleotide diversity, the expected number of segregating sites, the site frequency spectrum, and the level of linkage disequilibrium (LD). Long-term BLS affects polymorphism patterns in a relatively small genomic neighborhood, and such selection targets are easier to detect when the equilibrium frequencies of the selected variants are close to 50%, or when there has been a population size reduction. For a new mutation subject to BLS, its initial increase in frequency in the population causes linked neutral regions to have reduced diversity, an excess of both high and low frequency derived variants, and elevated LD with the selected locus. These patterns are similar to those produced by selective sweeps, but the effects of recent BLS are weaker. Nonetheless, compared to selective sweeps, nonequilibrium polymorphism and LD patterns persist for a much longer period under recent BLS, which may increase the chance of detecting such selection targets. An R package for analyzing these models, among others (e.g., isolation with migration), is available.
Collapse
Affiliation(s)
- Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus DK-8000, Denmark
| |
Collapse
|
11
|
Lucek K, Willi Y. Drivers of linkage disequilibrium across a species' geographic range. PLoS Genet 2021; 17:e1009477. [PMID: 33770075 PMCID: PMC8026057 DOI: 10.1371/journal.pgen.1009477] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 04/07/2021] [Accepted: 03/09/2021] [Indexed: 11/25/2022] Open
Abstract
While linkage disequilibrium (LD) is an important parameter in genetics and evolutionary biology, the drivers of LD remain elusive. Using whole-genome sequences from across a species’ range, we assessed the impact of demographic history and mating system on LD. Both range expansion and a shift from outcrossing to selfing in North American Arabidopsis lyrata were associated with increased average genome-wide LD. Our results indicate that range expansion increases short-distance LD at the farthest range edges by about the same amount as a shift to selfing. However, the extent over which LD in genic regions unfolds was shorter for range expansion compared to selfing. Linkage among putatively neutral variants and between neutral and deleterious variants increased to a similar degree with range expansion, providing support that genome-wide LD was positively associated with mutational load. As a consequence, LD combined with mutational load may decelerate range expansions and set range limits. Finally, a small number of genes were identified as LD outliers, suggesting that they experience selection by either of the two demographic processes. These included genes involved in flowering and photoperiod for range expansion, and the self-incompatibility locus for mating system. Nearby genomic variants are often co-inherited because of limited recombination. The extent of non-random association of alleles at different loci is called linkage disequilibrium (LD) and is commonly used in genomic analyses, for example to detect regions under selection or to determine effective population size. Here we reversed testing and addressed how demographic history may affect LD within a species. Using genomic data from more than a thousand individuals of North American Arabidopsis lyrata from across the entire species’ range, we quantified the effect of postglacial range expansion and a shift in mating system from outcrossing to selfing on LD. We show that both factors lead to increased LD, and that the maximal effect of range expansion is comparable with a shift in mating system to selfing. Heightened LD involves deleterious mutations, and therefore, LD can also serve as an indicator of mutation accumulation. Furthermore, we provide evidence that some genes experienced stronger increases in LD possibly due to selection associated with the two demographic changes. Our results provide a novel and broad view on the evolutionary factors shaping LD that may also apply to the very many species that underwent postglacial range expansion.
Collapse
Affiliation(s)
- Kay Lucek
- Department of Environmental Sciences, University of Basel, Basel, Switzerland
- * E-mail:
| | - Yvonne Willi
- Department of Environmental Sciences, University of Basel, Basel, Switzerland
| |
Collapse
|
12
|
Ragsdale AP, Gravel S. Unbiased Estimation of Linkage Disequilibrium from Unphased Data. Mol Biol Evol 2020; 37:923-932. [PMID: 31697386 DOI: 10.1093/molbev/msz265] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| |
Collapse
|
13
|
Ralph P, Thornton K, Kelleher J. Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes. Genetics 2020; 215:779-797. [PMID: 32357960 PMCID: PMC7337078 DOI: 10.1534/genetics.120.303253] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open
Abstract
As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics' relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.
Collapse
Affiliation(s)
- Peter Ralph
- Institute of Evolution and Ecology, Departments of Mathematics and Biology, University of Oregon, Eugene, Oregon 97405
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, United Kingdom OX3 7LF
| |
Collapse
|
14
|
Osmond MM, Coop G. Genetic Signatures of Evolutionary Rescue by a Selective Sweep. Genetics 2020; 215:813-829. [PMID: 32398227 PMCID: PMC7337082 DOI: 10.1534/genetics.120.303173] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 05/06/2020] [Indexed: 12/31/2022] Open
Abstract
One of the most useful models in population genetics is that of a selective sweep and the consequent hitch-hiking of linked neutral alleles. While variations on this model typically assume constant population size, many instances of strong selection and rapid adaptation in nature may co-occur with complex demography. Here, we extend the hitch-hiking model to evolutionary rescue, where adaptation and demography not only co-occur but are intimately entwined. Our results show how this feedback between demography and evolution determines-and restricts-the genetic signatures of evolutionary rescue, and how these differ from the signatures of sweeps in populations of constant size. In particular, we find rescue to harden sweeps from standing variance or new mutation (but not from migration), reduce genetic diversity both at the selected site and genome-wide, and increase the range of observed Tajima's D values. For a given initial rate of population decline, the feedback between demography and evolution makes all of these differences more dramatic under weaker selection, where bottlenecks are prolonged. Nevertheless, it is likely difficult to infer the co-incident timing of the sweep and bottleneck from these simple signatures, never mind a feedback between them. Temporal samples spanning contemporary rescue events may offer one way forward.
Collapse
Affiliation(s)
- Matthew M Osmond
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Graham Coop
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, California 95616
| |
Collapse
|
15
|
Kang JTL, Rosenberg NA. Mathematical Properties of Linkage Disequilibrium Statistics Defined by Normalization of the Coefficient D = pAB - pApB. Hum Hered 2020; 84:127-143. [PMID: 32045910 DOI: 10.1159/000504171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/10/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Many statistics for measuring linkage disequilibrium (LD) take the form of a normalization of the LD coefficient D. Different normalizations produce statistics with different ranges, interpretations, and arguments favoring their use. METHODS Here, to compare the mathematical properties of these normalizations, we consider 5 of these normalized statistics, describing their upper bounds, the mean values of their maxima over the set of possible allele frequency pairs, and the size of the allele frequency regions accessible given specified values of the statistics. RESULTS We produce detailed characterizations of these properties for the statistics d and ρ, analogous to computations previously performed for r2. We examine the relationships among the statistics, uncovering conditions under which some of them have close connections. CONCLUSION The results contribute insight into LD measurement, particularly the understanding of differences in the features of different LD measures when computed on the same data.
Collapse
Affiliation(s)
- Jonathan T L Kang
- Department of Biology, Stanford University, Stanford, California, USA,
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California, USA
| |
Collapse
|
16
|
V. Barroso G, Puzović N, Dutheil JY. Inference of recombination maps from a single pair of genomes and its application to ancient samples. PLoS Genet 2019; 15:e1008449. [PMID: 31725722 PMCID: PMC6879166 DOI: 10.1371/journal.pgen.1008449] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 11/26/2019] [Accepted: 09/30/2019] [Indexed: 12/11/2022] Open
Abstract
Understanding the causes and consequences of recombination landscape evolution is a fundamental goal in genetics that requires recombination maps from across the tree of life. Such maps can be obtained from population genomic datasets, but require large sample sizes. Alternative methods are therefore necessary to research organisms where such datasets cannot be generated easily, such as non-model or ancient species. Here we extend the sequentially Markovian coalescent model to jointly infer demography and the spatial variation in recombination rate. Using extensive simulations and sequence data from humans, fruit-flies and a fungal pathogen, we demonstrate that iSMC accurately infers recombination maps under a wide range of scenarios-remarkably, even from a single pair of unphased genomes. We exploit this possibility and reconstruct the recombination maps of ancient hominins. We report that the ancient and modern maps are correlated in a manner that reflects the established phylogeny of Neanderthals, Denisovans, and modern human populations.
Collapse
Affiliation(s)
- Gustavo V. Barroso
- Max Planck Institute for Evolutionary Biology, Department of Evolutionary Genetics, August-Thienemann-Straße , Plön–GERMANY
- * E-mail:
| | - Nataša Puzović
- Max Planck Institute for Evolutionary Biology, Department of Evolutionary Genetics, August-Thienemann-Straße , Plön–GERMANY
| | - Julien Y. Dutheil
- Max Planck Institute for Evolutionary Biology, Department of Evolutionary Genetics, August-Thienemann-Straße , Plön–GERMANY
| |
Collapse
|
17
|
Ragsdale AP, Gravel S. Models of archaic admixture and recent history from two-locus statistics. PLoS Genet 2019; 15:e1008204. [PMID: 31181058 PMCID: PMC6586359 DOI: 10.1371/journal.pgen.1008204] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 06/20/2019] [Accepted: 05/17/2019] [Indexed: 11/18/2022] Open
Abstract
We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 - 8% genetic ancestry to individuals in world-wide populations.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| |
Collapse
|
18
|
Ralph PL. An empirical approach to demographic inference with genomic data. Theor Popul Biol 2019; 127:91-101. [PMID: 30978307 DOI: 10.1016/j.tpb.2019.03.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 03/21/2019] [Accepted: 03/27/2019] [Indexed: 01/20/2023]
Abstract
Inference with population genetic data usually treats the population pedigree as a nuisance parameter, the unobserved product of a past history of random mating. However, the history of genetic relationships in a given population is a fixed, unobserved object, and so an alternative approach is to treat this network of relationships as a complex object we wish to learn about, by observing how genomes have been noisily passed down through it. This paper explores this point of view, showing how to translate questions about population genetic data into calculations with a Poisson process of mutations on all ancestral genomes. This method is applied to give a robust interpretation to the f4 statistic used to identify admixture, and to design a new statistic that measures covariances in mean times to most recent common ancestor between two pairs of sequences. The method more generally interprets population genetic statistics in terms of sums of specific functions over ancestral genomes, thereby providing concrete, broadly interpretable interpretations for these statistics. This provides a method for describing demographic history without simplified demographic models. More generally, it brings into focus the population pedigree, which is averaged over in model-based demographic inference.
Collapse
Affiliation(s)
- Peter L Ralph
- Institute of Ecology and Evolution, Departments of Mathematics and Biology, University of Oregon, Eugene, OR, USA.
| |
Collapse
|
19
|
García-Cortés LA, Austerlitz F, de Cara MAR. An evaluation of the methods to estimate effective population size from measures of linkage disequilibrium. J Evol Biol 2018; 32:267-277. [PMID: 30589978 DOI: 10.1111/jeb.13411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 12/05/2018] [Accepted: 12/06/2018] [Indexed: 11/28/2022]
Abstract
In 1971, John Sved derived an approximate relationship between linkage disequilibrium (LD) and effective population size for an ideal finite population. This seminal work was extended by Sved and Feldman (Theor Pop Biol 4, 129, 1973) and Weir and Hill (Genetics 95, 477, 1980) who derived additional equations with the same purpose. These equations yield useful estimates of effective population size, as they require a single sample in time. As these estimates of effective population size are now commonly used on a variety of genomic data, from arrays of single nucleotide polymorphisms to whole genome data, some authors have investigated their bias through simulation studies and proposed corrections for different mating systems. However, the cause of the bias remains elusive. Here, we show the problems of using LD as a statistical measure and, analogously, the problems in estimating effective population size from such measure. For that purpose, we compare three commonly used approaches with a transition probability-based method that we develop here. It provides an exact computation of LD. We show here that the bias in the estimates of LD and effective population size are partly due to low-frequency markers, tightly linked markers or to a small total number of crossovers per generation. These biases, however, do not decrease when increasing sample size or using unlinked markers. Our results show the issues of such measures of effective population based on LD and suggest which of the method here studied should be used in empirical studies as well as the optimal distance between markers for such estimates.
Collapse
Affiliation(s)
| | - Frederic Austerlitz
- Laboratoire d'Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS/MNHN/Universite Paris 7, Museum National d'Histoire Naturelle, Paris, France
| | - M Angeles R de Cara
- Laboratoire d'Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS/MNHN/Universite Paris 7, Museum National d'Histoire Naturelle, Paris, France
| |
Collapse
|
20
|
Bertl J, Ringbauer H, Blum MG. Can secondary contact following range expansion be distinguished from barriers to gene flow? PeerJ 2018; 6:e5325. [PMID: 30294507 PMCID: PMC6171497 DOI: 10.7717/peerj.5325] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 07/01/2018] [Indexed: 11/20/2022] Open
Abstract
Secondary contact is the reestablishment of gene flow between sister populations that have diverged. For instance, at the end of the Quaternary glaciations in Europe, secondary contact occurred during the northward expansion of the populations which had found refugia in the southern peninsulas. With the advent of multi-locus markers, secondary contact can be investigated using various molecular signatures including gradients of allele frequency, admixture clines, and local increase of genetic differentiation. We use coalescent simulations to investigate if molecular data provide enough information to distinguish between secondary contact following range expansion and an alternative evolutionary scenario consisting of a barrier to gene flow in an isolation-by-distance model. We find that an excess of linkage disequilibrium and of genetic diversity at the suture zone is a unique signature of secondary contact. We also find that the directionality index ψ, which was proposed to study range expansion, is informative to distinguish between the two hypotheses. However, although evidence for secondary contact is usually conveyed by statistics related to admixture coefficients, we find that they can be confounded by isolation-by-distance. We recommend to account for the spatial repartition of individuals when investigating secondary contact in order to better reflect the complex spatio-temporal evolution of populations and species.
Collapse
Affiliation(s)
- Johanna Bertl
- Department of Molecular Medicine, Aarhus University, Aarhus, Denmark
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Harald Ringbauer
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Michael G.B. Blum
- Laboratoire TIMC-IMAG, UMR 5525, Université Grenoble Alpes, CNRS, Grenoble, France
| |
Collapse
|
21
|
Coalescence and Linkage Disequilibrium in Facultatively Sexual Diploids. Genetics 2018; 210:683-701. [PMID: 30097538 DOI: 10.1534/genetics.118.301244] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 08/10/2018] [Indexed: 01/26/2023] Open
Abstract
Under neutrality, linkage disequilibrium results from physically linked sites having nonindependent coalescent histories. In obligately sexual organisms, meiotic recombination is the dominant force separating linked variants from one another, and thus in determining the decay of linkage disequilibrium with physical distance. In facultatively sexual diploid organisms that principally reproduce clonally, mechanisms of mitotic exchange are expected to become relatively more important in shaping linkage disequilibrium. Here we outline mathematical and computational models of a facultative-sex coalescent process that includes meiotic and mitotic recombination, via both crossovers and gene conversion, to determine how linkage disequilibrium is affected with facultative sex. We demonstrate that the degree to which linkage disequilibrium is broken down by meiotic recombination simply scales with the probability of sex if it is sufficiently high (much greater than [Formula: see text] for population size N). However, with very rare sex (occurring with frequency on the order of [Formula: see text]), mitotic gene conversion plays a particularly important and complicated role because it both breaks down associations between sites and removes within-individual diversity. Strong population structure under rare sex leads to lower average linkage disequilibrium values than in panmictic populations, due to the influence of low-frequency polymorphisms created by allelic sequence divergence acting in individual subpopulations. These analyses provide information on how to interpret observed linkage disequilibrium patterns in facultative sexuals and to determine what genomic forces are likely to shape them.
Collapse
|
22
|
Zhang C, Sun M, Zhang X, Chen S, Nie G, Peng Y, Huang L, Ma X. AFLP-based genetic diversity of wild orchardgrass germplasm collections from Central Asia and Western China, and the relation to environmental factors. PLoS One 2018; 13:e0195273. [PMID: 29641553 PMCID: PMC5894997 DOI: 10.1371/journal.pone.0195273] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 03/19/2018] [Indexed: 12/31/2022] Open
Abstract
Dactylis glomerata L. (orchardgrass) is an important perennial forage species in temperate areas of the world. It is usually used for silage, grazing and hay because of its high nutritional value and reproducibility. Central Asia, Xinjiang and Tibetan Plateau in China possess various special micro-environments that harbor many valuable resources, while different degrees of degradation of the grassland ecosystem occurred due to climatic changing and human activities. Investigating the genetic diversity of wild D. glomerat could provide basis for collection, protection, and utilization of some excellent germplasm resources. Totally 210 individuals from 14 populations—five from Xinjiang, two from Kangding (Tibetan Plateau), and seven from Central Asia were identified using AFLP technology. The average values of Nei’s genetic diversity (Hj) and Shannon information index (Ho) were 0.383 and 0.394 respectively. UPGMA tree, STRUCTURE analysis and principal coordinate analysis (PCoA) showed populations from same region clustered together. AMOVA revealed 35.10% of the genetic differentiation (Fst) occurred among populations. Gene flow (Nm) was limited among all populations. Genetic diversity of D. glomerata was high but limited under isolation-by-distance pattern, resulting in high genetic differentiation and low gene flow among populations. Adjacent regions also exhibited similar results because of the barriers of high mountains. The environmental factors, such as precipitation, elevation, latitude and longitude also had some impacts on genetic diversity and structure pattern of populations.
Collapse
Affiliation(s)
- Chenglin Zhang
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
| | - Ming Sun
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
| | - Xinquan Zhang
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
| | - Shiyong Chen
- College of Life Science and Technology, Southwest University for Nationalities, Chengdu, China
| | - Gang Nie
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
| | - Yan Peng
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
| | - Linkai Huang
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
| | - Xiao Ma
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu, China
- * E-mail:
| |
Collapse
|
23
|
Hamilton MB, Tartakovsky M, Battocletti A. speed‐ne
: Software to simulate and estimate genetic effective population size (
N
e
) from linkage disequilibrium observed in single samples. Mol Ecol Resour 2018; 18:714-728. [DOI: 10.1111/1755-0998.12759] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 01/09/2018] [Accepted: 01/19/2018] [Indexed: 01/25/2023]
Affiliation(s)
| | | | - Amy Battocletti
- Department of Biology Regents Hall Georgetown University Washington DC USA
| |
Collapse
|
24
|
Durden C, Sullivant S. Identifiability of Phylogenetic Parameters from k-mer Data Under the Coalescent. Bull Math Biol 2018; 81:431-451. [PMID: 29392644 DOI: 10.1007/s11538-018-0399-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 01/19/2018] [Indexed: 11/30/2022]
Abstract
Distances between sequences based on their k-mer frequency counts can be used to reconstruct phylogenies without first computing a sequence alignment. Past work has shown that effective use of k-mer methods depends on (1) model-based corrections to distances based on k-mers and (2) breaking long sequences into blocks to obtain repeated trials from the sequence-generating process. Good performance of such methods is based on having many high-quality blocks with many homologous sites, which can be problematic to guarantee a priori. Nature provides natural blocks of sequences into homologous regions-namely, the genes. However, directly using past work in this setting is problematic because of possible discordance between different gene trees and the underlying species tree. Using the multispecies coalescent model as a basis, we derive model-based moment formulas that involve the species divergence times and the coalescent parameters. From this setting, we prove identifiability results for the tree and branch length parameters under the Jukes-Cantor model of sequence mutations.
Collapse
Affiliation(s)
- Chris Durden
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Seth Sullivant
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
25
|
Dapper AL, Payseur BA. Effects of Demographic History on the Detection of Recombination Hotspots from Linkage Disequilibrium. Mol Biol Evol 2018; 35:335-353. [PMID: 29045724 PMCID: PMC5850621 DOI: 10.1093/molbev/msx272] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
In some species, meiotic recombination is concentrated in small genomic regions. These "recombination hotspots" leave signatures in fine-scale patterns of linkage disequilibrium, raising the prospect that the genomic landscape of hotspots can be characterized from sequence variation. This approach has led to the inference that hotspots evolve rapidly in some species, but are conserved in others. Historic demographic events, such as population bottlenecks, are known to affect patterns of linkage disequilibrium across the genome, violating population genetic assumptions of this approach. Although such events are prevalent, demographic history is generally ignored when making inferences about the evolution of recombination hotspots. To determine the effect of demography on the detection of recombination hotspots, we use the coalescent to simulate haplotypes with a known recombination landscape. We measure the ability of popular linkage disequilibrium-based programs to detect hotspots across a range of demographic histories, including population bottlenecks, hidden population structure, population expansions, and population contractions. We find that demographic events have the potential to greatly reduce the power and increase the false positive rate of hotspot discovery. Neither the power nor the false positive rate of hotspot detection can be predicted without also knowing the demographic history of the sample. Our results suggest that ignoring demographic history likely overestimates the power to detect hotspots and therefore underestimates the degree of hotspot sharing between species. We suggest strategies for incorporating demographic history into population genetic inferences about recombination hotspots.
Collapse
Affiliation(s)
- Amy L Dapper
- Laboratory of Genetics, University of Wisconsin, Madison, WI
| | - Bret A Payseur
- Laboratory of Genetics, University of Wisconsin, Madison, WI
| |
Collapse
|
26
|
Fan Y, Zhang C, Wu W, He W, Zhang L, Ma X. Analysis of Genetic Diversity and Structure Pattern of Indigofera Pseudotinctoria in Karst Habitats of the Wushan Mountains Using AFLP Markers. Molecules 2017; 22:molecules22101734. [PMID: 29035322 PMCID: PMC6151804 DOI: 10.3390/molecules22101734] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 10/09/2017] [Accepted: 10/09/2017] [Indexed: 11/16/2022] Open
Abstract
Indigofera pseudotinctoria Mats is an agronomically and economically important perennial legume shrub with a high forage yield, protein content and strong adaptability, which is subject to natural habitat fragmentation and serious human disturbance. Until now, our knowledge of the genetic relationships and intraspecific genetic diversity for its wild collections is still poor, especially at small spatial scales. Here amplified fragment length polymorphism (AFLP) technology was employed for analysis of genetic diversity, differentiation, and structure of 364 genotypes of I. pseudotinctoria from 15 natural locations in Wushan Montain, a highly structured mountain with typical karst landforms in Southwest China. We also tested whether eco-climate factors has affected genetic structure by correlating genetic diversity with habitat features. A total of 515 distinctly scoreable bands were generated, and 324 of them were polymorphic. The polymorphic information content (PIC) ranged from 0.694 to 0.890 with an average of 0.789 per primer pair. On species level, Nei’s gene diversity (Hj), the Bayesian genetic diversity index (HB) and the Shannon information index (I) were 0.2465, 0.2363 and 0.3772, respectively. The high differentiation among all sampling sites was detected (FST = 0.2217, GST = 0.1746, G’ST = 0.2060, θB = 0.1844), and instead, gene flow among accessions (Nm = 1.1819) was restricted. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. This structure pattern may indicate joint effects by the neutral evolution and natural selection. Restricted Nm was observed across all accessions, and genetic barriers were detected between adjacent accessions due to specifically geographical landform.
Collapse
Affiliation(s)
- Yan Fan
- Chongqing Academy of Animal Husbandry, Chongqing 400039, China.
| | - Chenglin Zhang
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu 611130, China.
| | - Wendan Wu
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu 611130, China.
| | - Wei He
- Chongqing Academy of Animal Husbandry, Chongqing 400039, China.
| | - Li Zhang
- Chongqing Academy of Animal Husbandry, Chongqing 400039, China.
| | - Xiao Ma
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu 611130, China.
| |
Collapse
|
27
|
A non-zero variance of Tajima's estimator for two sequences even for infinitely many unlinked loci. Theor Popul Biol 2017; 122:22-29. [PMID: 28341209 DOI: 10.1016/j.tpb.2017.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 02/12/2017] [Accepted: 03/03/2017] [Indexed: 10/19/2022]
Abstract
The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima's estimator (θˆ), which is the average number of pairwise differences, does not vanish even as n→∞. The non-zero variance of θˆ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright-Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Varθˆ∕θ2≈ONe-1), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.
Collapse
|
28
|
Charlesworth B, Charlesworth D. Population genetics from 1966 to 2016. Heredity (Edinb) 2016; 118:2-9. [PMID: 27460498 PMCID: PMC5176116 DOI: 10.1038/hdy.2016.55] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 06/08/2016] [Accepted: 06/20/2016] [Indexed: 11/09/2022] Open
Abstract
We describe the astonishing changes and progress that have occurred in the field of population genetics over the past 50 years, slightly longer than the time since the first Population Genetics Group (PGG) meeting in January 1968. We review the major questions and controversies that have preoccupied population geneticists during this time (and were often hotly debated at PGG meetings). We show how theoretical and empirical work has combined to generate a highly productive interaction involving successive developments in the ability to characterise variability at the molecular level, to apply mathematical models to the interpretation of the data and to use the results to answer biologically important questions, even in nonmodel organisms. We also describe the changes from a field that was largely dominated by UK and North American biologists to a much more international one (with the PGG meetings having made important contributions to the increased number of population geneticists in several European countries). Although we concentrate on the earlier history of the field, because developments in recent years are more familiar to most contemporary researchers, we end with a brief outline of topics in which new understanding is still actively developing.
Collapse
Affiliation(s)
- B Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - D Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
29
|
Kamm JA, Spence JP, Chan J, Song YS. Two-Locus Likelihoods Under Variable Population Size and Fine-Scale Recombination Rate Estimation. Genetics 2016; 203:1381-99. [PMID: 27182948 PMCID: PMC4937484 DOI: 10.1534/genetics.115.184820] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 05/06/2016] [Indexed: 01/06/2023] Open
Abstract
Two-locus sampling probabilities have played a central role in devising an efficient composite-likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here new methods to compute the sampling probability for variable population size functions that are piecewise constant. Our main theoretical result, implemented in a new software package called LDpop, is a novel formula for the sampling probability that can be evaluated by numerically exponentiating a large but sparse matrix. This formula can handle moderate sample sizes ([Formula: see text]) and demographic size histories with a large number of epochs ([Formula: see text]). In addition, LDpop implements an approximate formula for the sampling probability that is reasonably accurate and scales to hundreds in sample size ([Formula: see text]). Finally, LDpop includes an importance sampler for the posterior distribution of two-locus genealogies, based on a new result for the optimal proposal distribution in the variable-size setting. Using our methods, we study how a sharp population bottleneck followed by rapid growth affects the correlation between partially linked sites. Then, through an extensive simulation study, we show that accounting for population size changes under such a demographic model leads to substantial improvements in fine-scale recombination rate estimation.
Collapse
Affiliation(s)
- John A Kamm
- Department of Statistics, University of California, Berkeley, California 94720 Computer Science Division, University of California, Berkeley, California 94720
| | - Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, California 94720
| | - Jeffrey Chan
- Computer Science Division, University of California, Berkeley, California 94720
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, California 94720 Computer Science Division, University of California, Berkeley, California 94720 Department of Integrative Biology, University of California, Berkeley, California 94720 Departments of Mathematics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
30
|
Rafajlović M, Emanuelsson A, Johannesson K, Butlin RK, Mehlig B. A universal mechanism generating clusters of differentiated loci during divergence-with-migration. Evolution 2016; 70:1609-21. [PMID: 27196373 PMCID: PMC5089645 DOI: 10.1111/evo.12957] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 05/04/2016] [Accepted: 05/06/2016] [Indexed: 02/02/2023]
Abstract
Genome‐wide patterns of genetic divergence reveal mechanisms of adaptation under gene flow. Empirical data show that divergence is mostly concentrated in narrow genomic regions. This pattern may arise because differentiated loci protect nearby mutations from gene flow, but recent theory suggests this mechanism is insufficient to explain the emergence of concentrated differentiation during biologically realistic timescales. Critically, earlier theory neglects an inevitable consequence of genetic drift: stochastic loss of local genomic divergence. Here, we demonstrate that the rate of stochastic loss of weak local differentiation increases with recombination distance to a strongly diverged locus and, above a critical recombination distance, local loss is faster than local “gain” of new differentiation. Under high migration and weak selection, this critical recombination distance is much smaller than the total recombination distance of the genomic region under selection. Consequently, divergence between populations increases by net gain of new differentiation within the critical recombination distance, resulting in tightly linked clusters of divergence. The mechanism responsible is the balance between stochastic loss and gain of weak local differentiation, a mechanism acting universally throughout the genome. Our results will help to explain empirical observations and lead to novel predictions regarding changes in genomic architectures during adaptive divergence.
Collapse
Affiliation(s)
- Marina Rafajlović
- Department of Physics, University of Gothenburg, SE-412 96, Gothenburg, Sweden. .,The Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, SE-405 30, Gothenburg, Sweden.
| | - Anna Emanuelsson
- Department of Physics, University of Gothenburg, SE-412 96, Gothenburg, Sweden
| | - Kerstin Johannesson
- The Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, SE-405 30, Gothenburg, Sweden.,Department of Marine Sciences-Tjärnö, University of Gothenburg, SE-452 96, Strömstad, Sweden
| | - Roger K Butlin
- The Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, SE-405 30, Gothenburg, Sweden.,Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, United Kingdom
| | - Bernhard Mehlig
- Department of Physics, University of Gothenburg, SE-412 96, Gothenburg, Sweden.,The Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, SE-405 30, Gothenburg, Sweden
| |
Collapse
|
31
|
Ormond L, Foll M, Ewing GB, Pfeifer SP, Jensen JD. Inferring the age of a fixed beneficial allele. Mol Ecol 2016; 25:157-69. [PMID: 26576754 DOI: 10.1111/mec.13478] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 10/14/2015] [Accepted: 11/09/2015] [Indexed: 12/28/2022]
Abstract
Estimating the age and strength of beneficial alleles is central to understanding how adaptation proceeds in response to changing environmental conditions. Several haplotype-based estimators exist for inferring the age of segregating beneficial mutations. Here, we develop an approximate Bayesian-based approach that rather estimates these parameters for fixed beneficial mutations in single populations. We integrate a range of existing diversity, site frequency spectrum, haplotype- and linkage disequilibrium-based summary statistics. We show that for strong selective sweeps on de novo mutations the method can estimate allele age and selection strength even in nonequilibrium demographic scenarios. We extend our approach to models of selection on standing variation, and co-infer the frequency at which selection began to act upon the mutation. Finally, we apply our method to estimate the age and selection strength of a previously identified mutation underpinning cryptic colour adaptation in a wild deer mouse population, and compare our findings with previously published estimates as well as with geological data pertaining to the presumed shift in selective pressure.
Collapse
Affiliation(s)
- Louise Ormond
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Matthieu Foll
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- International Agency for Research on Cancer (IARC), Lyon, France
| | - Gregory B Ewing
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Susanne P Pfeifer
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
32
|
Saura M, Tenesa A, Woolliams JA, Fernández A, Villanueva B. Evaluation of the linkage-disequilibrium method for the estimation of effective population size when generations overlap: an empirical case. BMC Genomics 2015; 16:922. [PMID: 26559809 PMCID: PMC4642667 DOI: 10.1186/s12864-015-2167-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 10/29/2015] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Within the genetic methods for estimating effective population size (N e ), the method based on linkage disequilibrium (LD) has advantages over other methods, although its accuracy when applied to populations with overlapping generations is a matter of controversy. It is also unclear the best way to account for mutation and sample size when this method is implemented. Here we have addressed the applicability of this method using genome-wide information when generations overlap by profiting from having available a complete and accurate pedigree from an experimental population of Iberian pigs. Precise pedigree-based estimates of N e were considered as a baseline against which to compare LD-based estimates. METHODS We assumed six different statistical models that varied in the adjustments made for mutation and sample size. The approach allowed us to determine the most suitable statistical model of adjustment when the LD method is used for species with overlapping generations. A novel approach used here was to treat different generations as replicates of the same population in order to assess the error of the LD-based N e estimates. RESULTS LD-based N e estimates obtained by estimating the mutation parameter from the data and by correcting sample size using the 1/2n term were the closest to pedigree-based estimates. The N e at the time of the foundation of the herd (26 generations ago) was 20.8 ± 3.7 (average and SD across replicates), while the pedigree-based estimate was 21. From that time on, this trend was in good agreement with that followed by pedigree-based N e. CONCLUSIONS Our results showed that when using genome-wide information, the LD method is accurate and broadly applicable to small populations even when generations overlap. This supports the use of the method for estimating N e when pedigree information is unavailable in order to effectively monitor and manage populations and to early detect population declines. To our knowledge this is the first study using replicates of empirical data to evaluate the applicability of the LD method by comparing results with accurate pedigree-based estimates.
Collapse
Affiliation(s)
- María Saura
- Departamento de Mejora Genética Animal, INIA, Carretera de la Coruña km 7.5, 28040, Madrid, Spain.
| | - Albert Tenesa
- The Roslin Institute and R(D)SVS, University of Edinburgh, EH25 9RG, Midlothian, UK.
| | - John A Woolliams
- The Roslin Institute and R(D)SVS, University of Edinburgh, EH25 9RG, Midlothian, UK.
| | - Almudena Fernández
- Departamento de Mejora Genética Animal, INIA, Carretera de la Coruña km 7.5, 28040, Madrid, Spain.
| | - Beatriz Villanueva
- Departamento de Mejora Genética Animal, INIA, Carretera de la Coruña km 7.5, 28040, Madrid, Spain.
| |
Collapse
|
33
|
Tassi F, Ghirotto S, Mezzavilla M, Vilaça ST, De Santi L, Barbujani G. Early modern human dispersal from Africa: genomic evidence for multiple waves of migration. INVESTIGATIVE GENETICS 2015; 6:13. [PMID: 26550467 PMCID: PMC4636834 DOI: 10.1186/s13323-015-0030-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 10/27/2015] [Indexed: 12/22/2022]
Abstract
Background Anthropological and genetic data agree in indicating the African continent as the main place of origin for anatomically modern humans. However, it is unclear whether early modern humans left Africa through a single, major process, dispersing simultaneously over Asia and Europe, or in two main waves, first through the Arab Peninsula into southern Asia and Oceania, and later through a northern route crossing the Levant. Results Here, we show that accurate genomic estimates of the divergence times between European and African populations are more recent than those between Australo-Melanesia and Africa and incompatible with the effects of a single dispersal. This difference cannot possibly be accounted for by the effects of either hybridization with archaic human forms in Australo-Melanesia or back migration from Europe into Africa. Furthermore, in several populations of Asia we found evidence for relatively recent genetic admixture events, which could have obscured the signatures of the earliest processes. Conclusions We conclude that the hypothesis of a single major human dispersal from Africa appears hardly compatible with the observed historical and geographical patterns of genome diversity and that Australo-Melanesian populations seem still to retain a genomic signature of a more ancient divergence from Africa Electronic supplementary material The online version of this article (doi:10.1186/s13323-015-0030-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Francesca Tassi
- Department of Life Sciences and Biotechnologies, University of Ferrara, Ferrara, Italy
| | - Silvia Ghirotto
- Department of Life Sciences and Biotechnologies, University of Ferrara, Ferrara, Italy
| | - Massimo Mezzavilla
- Institute for Maternal and Child Health-IRCCS "BurloGarofolo", University of Trieste, Trieste, Italy
| | - Sibelle Torres Vilaça
- Department of Life Sciences and Biotechnologies, University of Ferrara, Ferrara, Italy.,Present Address: Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Lisa De Santi
- Department of Life Sciences and Biotechnologies, University of Ferrara, Ferrara, Italy
| | - Guido Barbujani
- Department of Life Sciences and Biotechnologies, University of Ferrara, Ferrara, Italy
| |
Collapse
|
34
|
Abstract
The Genetic Society of America's Thomas Hunt Morgan Medal is awarded to an individual GSA member for lifetime achievement in the field of genetics. For over 40 years, 2015 recipient Brian Charlesworth has been a leader in both theoretical and empirical evolutionary genetics, making substantial contributions to our understanding of how evolution acts on genetic variation. Some of the areas in which Charlesworth's research has been most influential are the evolution of sex chromosomes, transposable elements, deleterious mutations, sexual reproduction, and life history. He also developed the influential theory of background selection, whereby the recurrent elimination of deleterious mutations reduces variation at linked sites, providing a general explanation for the correlation between recombination rate and genetic variation.
Collapse
|
35
|
The SMC' is a highly accurate approximation to the ancestral recombination graph. Genetics 2015; 200:343-55. [PMID: 25786855 DOI: 10.1534/genetics.114.173898] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 03/12/2015] [Indexed: 11/18/2022] Open
Abstract
Two sequentially Markov coalescent models (SMC and SMC') are available as tractable approximations to the ancestral recombination graph (ARG). We present a Markov process describing coalescence at two fixed points along a pair of sequences evolving under the SMC'. Using our Markov process, we derive a number of new quantities related to the pairwise SMC', thereby analytically quantifying for the first time the similarity between the SMC' and the ARG. We use our process to show that the joint distribution of pairwise coalescence times at recombination sites under the SMC' is the same as it is marginally under the ARG, which demonstrates that the SMC' is, in a particular well-defined, intuitive sense, the most appropriate first-order sequentially Markov approximation to the ARG. Finally, we use these results to show that population size estimates under the pairwise SMC are asymptotically biased, while under the pairwise SMC' they are approximately asymptotically unbiased.
Collapse
|
36
|
Global diversity lines - a five-continent reference panel of sequenced Drosophila melanogaster strains. G3-GENES GENOMES GENETICS 2015; 5:593-603. [PMID: 25673134 PMCID: PMC4390575 DOI: 10.1534/g3.114.015883] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Reference collections of multiple Drosophila lines with accumulating collections of “omics” data have proven especially valuable for the study of population genetics and complex trait genetics. Here we present a description of a resource collection of 84 strains of Drosophila melanogaster whose genome sequences were obtained after 12 generations of full-sib inbreeding. The initial rationale for this resource was to foster development of a systems biology platform for modeling metabolic regulation by the use of natural polymorphisms as perturbations. As reference lines, they are amenable to repeated phenotypic measurements, and already a large collection of metabolic traits have been assayed. Another key feature of these strains is their widespread geographic origin, coming from Beijing, Ithaca, Netherlands, Tasmania, and Zimbabwe. After obtaining 12.5× coverage of paired-end Illumina sequence reads, SNP and indel calls were made with the GATK platform. Thorough quality control was enabled by deep sequencing one line to >100×, and single-nucleotide polymorphisms and indels were validated using ddRAD-sequencing as an orthogonal platform. In addition, a series of preliminary population genetic tests were performed with these single-nucleotide polymorphism data for assessment of data quality. We found 83 segregating inversions among the lines, and as expected these were especially abundant in the African sample. We anticipate that this will make a useful addition to the set of reference D. melanogaster strains, thanks to its geographic structuring and unusually high level of genetic diversity.
Collapse
|
37
|
Lee YS, Woo Lee J, Kim H. Estimating effective population size of thoroughbred horses using linkage disequilibrium and theta (4Nμ) value. Livest Sci 2014. [DOI: 10.1016/j.livsci.2014.08.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
38
|
Wang M, Huang X, Li R, Xu H, Jin L, He Y. Detecting recent positive selection with high accuracy and reliability by conditional coalescent tree. Mol Biol Evol 2014; 31:3068-80. [PMID: 25135945 DOI: 10.1093/molbev/msu244] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Studies of natural selection, followed by functional validation, are shedding light on understanding of genetic mechanisms underlying human evolution and adaptation. Classic methods for detecting selection, such as the integrated haplotype score (iHS) and Fay and Wu's H statistic, are useful for candidate gene searching underlying positive selection. These methods, however, have limited capability to localize causal variants in selection target regions. In this study, we developed a novel method based on conditional coalescent tree to detect recent positive selection by counting unbalanced mutations on coalescent gene genealogies. Extensive simulation studies revealed that our method is more robust than many other approaches against biases due to various demographic effects, including population bottleneck, expansion, or stratification, while not sacrificing its power. Furthermore, our method demonstrated its superiority in localizing causal variants from massive linked genetic variants. The rate of successful localization was about 20-40% higher than that of other state-of-the-art methods on simulated data sets. On empirical data, validated functional causal variants of four well-known positive selected genes were all successfully localized by our method, such as ADH1B, MCM6, APOL1, and HBB. Finally, the computational efficiency of this new method was much higher than that of iHS implementations, that is, 24-66 times faster than the REHH package, and more than 10,000 times faster than the original iHS implementation. These magnitudes make our method suitable for applying on large sequencing data sets. Software can be downloaded from https://github.com/wavefancy/scct.
Collapse
Affiliation(s)
- Minxian Wang
- Department of Computational Regulatory Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Xin Huang
- Department of Computational Regulatory Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Ran Li
- Department of Computational Regulatory Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Hongyang Xu
- Department of Computational Regulatory Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Li Jin
- Department of Computational Regulatory Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Yungang He
- Department of Computational Regulatory Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
39
|
Fawcett JA, Iida T, Takuno S, Sugino RP, Kado T, Kugou K, Mura S, Kobayashi T, Ohta K, Nakayama JI, Innan H. Population genomics of the fission yeast Schizosaccharomyces pombe. PLoS One 2014; 9:e104241. [PMID: 25111393 PMCID: PMC4128662 DOI: 10.1371/journal.pone.0104241] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 07/06/2014] [Indexed: 02/02/2023] Open
Abstract
The fission yeast Schizosaccharomyces pombe has been widely used as a model eukaryote to study a diverse range of biological processes. However, population genetic studies of this species have been limited to date, and we know very little about the evolutionary processes and selective pressures that are shaping its genome. Here, we sequenced the genomes of 32 worldwide S. pombe strains and examined the pattern of polymorphisms across their genomes. In addition to introns and untranslated regions (UTRs), intergenic regions also exhibited lower levels of nucleotide diversity than synonymous sites, suggesting that a considerable amount of noncoding DNA is under selective constraint and thus likely to be functional. A number of genomic regions showed a reduction of nucleotide diversity probably caused by selective sweeps. We also identified a region close to the end of chromosome 3 where an extremely high level of divergence was observed between 5 of the 32 strains and the remain 27, possibly due to introgression, strong positive selection, or that region being responsible for reproductive isolation. Our study should serve as an important starting point in using a population genomics approach to further elucidate the biology of this important model organism.
Collapse
Affiliation(s)
- Jeffrey A. Fawcett
- Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
- * E-mail: (JAF); (JN); (HI)
| | | | - Shohei Takuno
- Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
| | - Ryuichi P. Sugino
- Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
- Department of Biology, Faculty of Sciences, Kyushu University, Fukuoka, Japan
| | - Tomoyuki Kado
- Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
| | - Kazuto Kugou
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Sachiko Mura
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Kunihiro Ohta
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Jun-ichi Nakayama
- Graduate School of Natural Sciences, Nagoya City University, Nagoya, Japan
- * E-mail: (JAF); (JN); (HI)
| | - Hideki Innan
- Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
- * E-mail: (JAF); (JN); (HI)
| |
Collapse
|
40
|
Zhu M, Zhu B, Wang YH, Wu Y, Xu L, Guo LP, Yuan ZR, Zhang LP, Gao X, Gao HJ, Xu SZ, Li JY. Linkage Disequilibrium Estimation of Chinese Beef Simmental Cattle Using High-density SNP Panels. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2014; 26:772-9. [PMID: 25049849 PMCID: PMC4093237 DOI: 10.5713/ajas.2012.12721] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Revised: 03/18/2013] [Accepted: 02/27/2013] [Indexed: 11/27/2022]
Abstract
Linkage disequilibrium (LD) plays an important role in genomic selection and mapping quantitative trait loci (QTL). In this study, the pattern of LD and effective population size (Ne) were investigated in Chinese beef Simmental cattle. A total of 640 bulls were genotyped with IlluminaBovinSNP50BeadChip and IlluminaBovinHDBeadChip. We estimated LD for each autosomal chromosome at the distance between two random SNPs of <0 to 25 kb, 25 to 50 kb, 50 to 100 kb, 100 to 500 kb, 0.5 to 1 Mb, 1 to 5 Mb and 5 to 10 Mb. The mean values of r2 were 0.30, 0.16 and 0.08, when the separation between SNPs ranged from 0 to 25 kb to 50 to 100 kb and then to 0.5 to 1 Mb, respectively. The LD estimates decreased as the distance increased in SNP pairs, and increased with the increase of minor allelic frequency (MAF) and with the decrease of sample sizes. Estimates of effective population size for Chinese beef Simmental cattle decreased in the past generations and Ne was 73 at five generations ago.
Collapse
Affiliation(s)
- M Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - B Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Y H Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Y Wu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - L Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - L P Guo
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Z R Yuan
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - L P Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - X Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - H J Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - S Z Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - J Y Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
41
|
Abstract
Although the analysis of linkage disequilibrium (LD) plays a central role in many areas of population genetics, the sampling variance of LD is known to be very large with high sensitivity to numbers of nucleotide sites and individuals sampled. Here we show that a genome-wide analysis of the distribution of heterozygous sites within a single diploid genome can yield highly informative patterns of LD as a function of physical distance. The proposed statistic, the correlation of zygosity, is closely related to the conventional population-level measure of LD, but is agnostic with respect to allele frequencies and hence likely less prone to outlier artifacts. Application of the method to several vertebrate species leads to the conclusion that >80% of recombination events are typically resolved by gene-conversion-like processes unaccompanied by crossovers, with the average lengths of conversion patches being on the order of one to several kilobases in length. Thus, contrary to common assumptions, the recombination rate between sites does not scale linearly with distance, often even up to distances of 100 kb. In addition, the amount of LD between sites separated by <200 bp is uniformly much greater than can be explained by the conventional neutral model, possibly because of the nonindependent origin of mutations within this spatial scale. These results raise questions about the application of conventional population-genetic interpretations to LD on short spatial scales and also about the use of spatial patterns of LD to infer demographic histories.
Collapse
|
42
|
Abstract
The "LD curve" relates the linkage disequilibrium (LD) between pairs of nucleotide sites to the distance that separates them along the chromosome. The shape of this curve reflects natural selection, admixture between populations, and the history of population size. This article derives new results about the last of these effects. When a population expands in size, the LD curve grows steeper, and this effect is especially pronounced following a bottleneck in population size. When a population shrinks, the LD curve rises but remains relatively flat. As LD converges toward a new equilibrium, its time path may not be monotonic. Following an episode of growth, for example, it declines to a low value before rising toward the new equilibrium. These changes happen at different rates for different LD statistics. They are especially slow for estimates of [Formula: see text], which therefore allow inferences about ancient population history. For the human population of Europe, these results suggest a history of population growth.
Collapse
|
43
|
Genomic and cranial phenotype data support multiple modern human dispersals from Africa and a southern route into Asia. Proc Natl Acad Sci U S A 2014; 111:7248-53. [PMID: 24753576 DOI: 10.1073/pnas.1323666111] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite broad consensus on Africa as the main place of origin for anatomically modern humans, their dispersal pattern out of the continent continues to be intensely debated. In extant human populations, the observation of decreasing genetic and phenotypic diversity at increasing distances from sub-Saharan Africa has been interpreted as evidence for a single dispersal, accompanied by a series of founder effects. In such a scenario, modern human genetic and phenotypic variation was primarily generated through successive population bottlenecks and drift during a rapid worldwide expansion out of Africa in the Late Pleistocene. However, recent genetic studies, as well as accumulating archaeological and paleoanthropological evidence, challenge this parsimonious model. They suggest instead a "southern route" dispersal into Asia as early as the late Middle Pleistocene, followed by a separate dispersal into northern Eurasia. Here we test these competing out-of-Africa scenarios by modeling hypothetical geographical migration routes and assessing their correlation with neutral population differentiation, as measured by genetic polymorphisms and cranial shape variables of modern human populations from Africa and Asia. We show that both lines of evidence support a multiple-dispersals model in which Australo-Melanesian populations are relatively isolated descendants of an early dispersal, whereas other Asian populations are descended from, or highly admixed with, members of a subsequent migration event.
Collapse
|
44
|
Abstract
Sex-antagonistic (SA) selection has major evolutionary consequences: it can drive genomic change, constrain adaptation, and maintain genetic variation for fitness. The recombining (or pseudoautosomal) regions of sex chromosomes are a promising setting in which to study SA selection because they tend to accumulate SA polymorphisms and because recombination allows us to deploy the tools of molecular evolution to locate targets of SA selection and quantify evolutionary forces. Here we use coalescent models to characterize the patterns of polymorphism expected within and divergence between recombining X and Y (or Z and W) sex chromosomes. SA selection generates peaks of divergence between X and Y that can extend substantial distances away from the targets of selection. Linkage disequilibrium between neutral sites is also inflated. We show how the pattern of divergence is altered when the SA polymorphism or the sex-determining region was recently established. We use data from the flowering plant Silene latifolia to illustrate how the strength of SA selection might be quantified using molecular data from recombining sex chromosomes.
Collapse
|
45
|
Genetic diversity and ecological niche modelling of wild barley: refugia, large-scale post-LGM range expansion and limited mid-future climate threats? PLoS One 2014; 9:e86021. [PMID: 24505252 PMCID: PMC3914776 DOI: 10.1371/journal.pone.0086021] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 12/04/2013] [Indexed: 11/19/2022] Open
Abstract
Describing genetic diversity in wild barley (Hordeum vulgare ssp. spontaneum) in geographic and environmental space in the context of current, past and potential future climates is important for conservation and for breeding the domesticated crop (Hordeum vulgare ssp. vulgare). Spatial genetic diversity in wild barley was revealed by both nuclear- (2,505 SNP, 24 nSSR) and chloroplast-derived (5 cpSSR) markers in 256 widely-sampled geo-referenced accessions. Results were compared with MaxEnt-modelled geographic distributions under current, past (Last Glacial Maximum, LGM) and mid-term future (anthropogenic scenario A2, the 2080s) climates. Comparisons suggest large-scale post-LGM range expansion in Central Asia and relatively small, but statistically significant, reductions in range-wide genetic diversity under future climate. Our analyses support the utility of ecological niche modelling for locating genetic diversity hotspots and determine priority geographic areas for wild barley conservation under anthropogenic climate change. Similar research on other cereal crop progenitors could play an important role in tailoring conservation and crop improvement strategies to support future human food security.
Collapse
|
46
|
Vinkhuyzen AAE, Wray NR, Yang J, Goddard ME, Visscher PM. Estimation and partition of heritability in human populations using whole-genome analysis methods. Annu Rev Genet 2013; 47:75-95. [PMID: 23988118 PMCID: PMC4037293 DOI: 10.1146/annurev-genet-111212-133258] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Understanding genetic variation of complex traits in human populations has moved from the quantification of the resemblance between close relatives to the dissection of genetic variation into the contributions of individual genomic loci. However, major questions remain unanswered: How much phenotypic variation is genetic; how much of the genetic variation is additive and can be explained by fitting all genetic variants simultaneously in one model, and what is the joint distribution of effect size and allele frequency at causal variants? We review and compare three whole-genome analysis methods that use mixed linear models (MLMs) to estimate genetic variation. In all methods, genetic variation is estimated from the relationship between close or distant relatives on the basis of pedigree information and/or single nucleotide polymorphisms (SNPs). We discuss theory, estimation procedures, bias, and precision of each method and review recent advances in the dissection of genetic variation of complex traits in human populations. By using genome-wide data, it is now established that SNPs in total account for far more of the genetic variation than the statistically highly significant SNPs that have been detected in genome-wide association studies. All SNPs together, however, do not account for all of the genetic variance estimated by pedigree-based methods. We explain possible reasons for this remaining "missing heritability."
Collapse
Affiliation(s)
- Anna AE Vinkhuyzen
- The University of Queensland, Queensland Brain Institute, Brisbane, Queensland, Australia
| | - Naomi R Wray
- The University of Queensland, Queensland Brain Institute, Brisbane, Queensland, Australia
| | - Jian Yang
- The University of Queensland, Queensland Brain Institute, Brisbane, Queensland, Australia
- The University of Queensland Diamantina Institute, The Translation Research Institute, Brisbane, Queensland, Australia
| | - Michael E Goddard
- University of Melbourne, Department of Food and Agricultural Systems, Parkville, Victoria, Australia
- Biosciences Research Division, Department of Primary Industries,Bundoora, Victoria, Australia
| | - Peter M Visscher
- The University of Queensland, Queensland Brain Institute, Brisbane, Queensland, Australia
- The University of Queensland Diamantina Institute, The Translation Research Institute, Brisbane, Queensland, Australia
| |
Collapse
|
47
|
Gattepaille LM, Jakobsson M, Blum MGB. Inferring population size changes with sequence and SNP data: lessons from human bottlenecks. Heredity (Edinb) 2013; 110:409-19. [PMID: 23423148 PMCID: PMC3630807 DOI: 10.1038/hdy.2012.120] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Reconstructing historical variation of population size from sequence and single-nucleotide polymorphism (SNP) data is valuable for understanding the evolutionary history of species. Changes in the population size of humans have been thoroughly investigated, and we review different methodologies of demographic reconstruction, specifically focusing on human bottlenecks. In addition to the classical approaches based on the site-frequency spectrum (SFS) or based on linkage disequilibrium, we also review more recent approaches that utilize atypical shared genomic fragments, such as identical by descent or homozygous segments between or within individuals. Compared with methods based on the SFS, these methods are well suited for detecting recent bottlenecks. In general, all these various methods suffer from bias and dependencies on confounding factors such as population structure or poor specification of the mutational and recombination processes, which can affect the demographic reconstruction. With the exception of SFS-based methods, the effects of confounding factors on the inference methods remain poorly investigated. We conclude that an important step when investigating population size changes rests on validating the demographic model by investigating to what extent the fitted demographic model can reproduce the main features of the polymorphism data.
Collapse
Affiliation(s)
- L M Gattepaille
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | | | | |
Collapse
|
48
|
A sequential coalescent algorithm for chromosomal inversions. Heredity (Edinb) 2013; 111:200-9. [PMID: 23632894 DOI: 10.1038/hdy.2013.38] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 02/04/2013] [Accepted: 03/25/2013] [Indexed: 01/06/2023] Open
Abstract
Chromosomal inversions are common in natural populations and are believed to be involved in many important evolutionary phenomena, including speciation, the evolution of sex chromosomes and local adaptation. While recent advances in sequencing and genotyping methods are leading to rapidly increasing amounts of genome-wide sequence data that reveal interesting patterns of genetic variation within inverted regions, efficient simulation methods to study these patterns are largely missing. In this work, we extend the sequential Markovian coalescent, an approximation to the coalescent with recombination, to include the effects of polymorphic inversions on patterns of recombination. Results show that our algorithm is fast, memory-efficient and accurate, making it feasible to simulate large inversions in large populations for the first time. The SMC algorithm enables studies of patterns of genetic variation (for example, linkage disequilibria) and tests of hypotheses (using simulation-based approaches) that were previously intractable.
Collapse
|
49
|
Tachida H. Linkage disequilibrium in a population undergoing periodic fragmentation and admixture. Genes Genet Syst 2012; 87:125-35. [PMID: 22820386 DOI: 10.1266/ggs.87.125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Glacial and interglacial cycles are considered to have caused the fragmentation and admixture of populations in many organisms. A simple model incorporating such periodic changes of the population structure is analysed in order to investigate the behaviour of neutral genetic variation at one and two loci. The equilibrium is reached very quickly in terms of cycles if the length of a cycle is long, as would be expected of the glaciation cycles. Heterozygosity and linkage disequilibrium are shown to depend on the length of time of the fragmented and admixed phases, population sizes, and number (n) of subpopulations in the fragmented phase. If the population size is small in the fragmented phase and its duration is long, the squared correlation coefficient of two loci (a measure of linkage disequilibrium) just after the admixture is approximated by 1/(n-1) for n > 1. After admixture, the correlation decays at a rate of approximately twice the recombination rate. Therefore, if post-glaciation admixture created linkage disequilibrium, we expect to observe linkage disequilibrium even between moderately linked loci, and its decay pattern along the chromosome is very different from that in a random mating population at equilibrium. This is especially true in organisms with long generation times such as trees.
Collapse
Affiliation(s)
- Hidenori Tachida
- Department of Biology, Faculty of Sciences, Kyushu University, Higashi-ku, Fukuoka, Japan.
| |
Collapse
|
50
|
Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 2012; 193:347-65. [PMID: 23222650 DOI: 10.1534/genetics.112.147983] [Citation(s) in RCA: 239] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.
Collapse
|