51
|
Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 2012; 193:347-65. [PMID: 23222650 DOI: 10.1534/genetics.112.147983] [Citation(s) in RCA: 239] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.
Collapse
|
52
|
Elhaik E. Empirical distributions of F(ST) from large-scale human polymorphism data. PLoS One 2012; 7:e49837. [PMID: 23185452 PMCID: PMC3504095 DOI: 10.1371/journal.pone.0049837] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Accepted: 10/12/2012] [Indexed: 12/19/2022] Open
Abstract
Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright's F(ST) that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-F(ST) may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically F(ST) analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global F(ST) distribution closely follows an exponential distribution. Third, although the overall F(ST) distribution is similarly shaped (inverse J), F(ST) distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-F(ST) of these groups is linear in allele frequency. These results suggest that investigating the extremes of the F(ST) distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
53
|
An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics 2012; 193:255-90. [PMID: 23150600 DOI: 10.1534/genetics.112.144329] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar "heavily skewed" reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between different population models.
Collapse
|
54
|
Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhué C, Pudlo P, Cornuet JM, Estoup A. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol Ecol 2012; 22:3165-78. [DOI: 10.1111/mec.12089] [Citation(s) in RCA: 219] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Revised: 09/04/2012] [Accepted: 09/12/2012] [Indexed: 12/17/2022]
Affiliation(s)
- Mathieu Gautier
- Inra; UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro); Campus international de Baillarguet, CS 30016, F-34988; Montferrier-sur-Lez; France
| | - Karim Gharbi
- The GenePool; School of Biological Sciences; University of Edinburgh; Edinburgh; EH9 3JT; UK
| | - Timothee Cezard
- The GenePool; School of Biological Sciences; University of Edinburgh; Edinburgh; EH9 3JT; UK
| | - Julien Foucaud
- Inra; UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro); Campus international de Baillarguet, CS 30016, F-34988; Montferrier-sur-Lez; France
| | - Carole Kerdelhué
- Inra; UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro); Campus international de Baillarguet, CS 30016, F-34988; Montferrier-sur-Lez; France
| | | | - Jean-Marie Cornuet
- Inra; UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro); Campus international de Baillarguet, CS 30016, F-34988; Montferrier-sur-Lez; France
| | - Arnaud Estoup
- Inra; UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro); Campus international de Baillarguet, CS 30016, F-34988; Montferrier-sur-Lez; France
| |
Collapse
|
55
|
Rikalainen K, Aspi J, Galarza JA, Koskela E, Mappes T. Maintenance of genetic diversity in cyclic populations-a longitudinal analysis in Myodes glareolus. Ecol Evol 2012; 2:1491-502. [PMID: 22957157 PMCID: PMC3434924 DOI: 10.1002/ece3.277] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2012] [Revised: 04/05/2012] [Accepted: 04/11/2012] [Indexed: 11/08/2022] Open
Abstract
Conspicuous cyclic changes in population density characterize many populations of small northern rodents. The extreme crashes in individual number are expected to reduce the amount of genetic variation within a population during the crash phases of the population cycle. By long-term monitoring of a bank vole (Myodes glareolus) population, we show that despite the substantial and repetitive crashes in the population size, high heterozygosity is maintained throughout the population cycle. The striking population density fluctuation in fact only slightly reduced the allelic richness of the population during the crash phases. Effective population sizes of vole populations remained also relatively high even during the crash phases. We further evaluated potential mechanisms contributing to the genetic diversity of the population and found that the peak phases are characterized by both a change in spatial pattern of individuals and a rapid accession of new alleles probably due to migration. We propose that these events act together in maintaining the high genetic diversity within cyclical populations.
Collapse
Affiliation(s)
- Kaisa Rikalainen
- Department of Biological and Environmental Science, University of JyväskyläP.O. Box 35, FI-40014 Jyväskylä, Finland
| | - Jouni Aspi
- Department of Biology, University of OuluP.O. Box 3000, FI-90014 Oulu, Finland
| | - Juan A Galarza
- Department of Biological and Environmental Science, University of JyväskyläP.O. Box 35, FI-40014 Jyväskylä, Finland
- Department of Biological and Environmental Science, Centre of Excellence in Biological Interactions, University of JyväskyläP.O. Box 35, FI-40014 Jyväskylä, Finland
| | - Esa Koskela
- Department of Biological and Environmental Science, University of JyväskyläP.O. Box 35, FI-40014 Jyväskylä, Finland
| | - Tapio Mappes
- Department of Biological and Environmental Science, University of JyväskyläP.O. Box 35, FI-40014 Jyväskylä, Finland
| |
Collapse
|
56
|
Griswold CK, Henry TA. Epistasis can increase multivariate trait diversity in haploid non-recombining populations. Theor Popul Biol 2012; 82:209-21. [PMID: 22771491 DOI: 10.1016/j.tpb.2012.06.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Revised: 06/21/2012] [Accepted: 06/23/2012] [Indexed: 11/18/2022]
Abstract
We evaluate the effect of epistasis on genetically-based multivariate trait variation in haploid non-recombining populations. In a univariate setting, past work has shown that epistasis reduces genetic variance (additive plus epistatic) in a population experiencing stabilizing selection. Here we show that in a multivariate setting, epistasis also reduces total genetic variation across the entire multivariate trait in a population experiencing stabilizing selection. But, we also show that the pattern of variation across the multivariate trait can be more even when epistasis occurs compared to when epistasis is absent, such that some character combinations will have more genetic variance when epistasis occurs compared to when epistasis is absent. In fact, a measure of generalized multivariate trait variation can be increased by epistasis under weak to moderate stabilizing selection conditions, as well as neutral conditions. Likewise, a measure of conditional evolvability can be increased by epistasis under weak to moderate stabilizing selection and neutral conditions. We investigate the nature of epistasis assuming a multivariate-normal model genetic effects and investigate the nature of epistasis underlying the biophysical properties of RNA. Increased multivariate diversity occurs for populations that are infinite in size, as well as populations that are finite in size. Our model of finite populations is explicitly genealogical and we link our findings about the evenness of eigenvalues with epistasis to prior work on the genealogical mapping of epistatic effects.
Collapse
|
57
|
Corbin L, Liu A, Bishop S, Woolliams J. Estimation of historical effective population size using linkage disequilibria with marker data. J Anim Breed Genet 2012; 129:257-70. [DOI: 10.1111/j.1439-0388.2012.01003.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
58
|
Evolutionary history of synthesis pathway genes for phloroglucinol and cyanide antimicrobials in plant-associated fluorescent pseudomonads. Mol Phylogenet Evol 2012; 63:877-90. [PMID: 22426436 DOI: 10.1016/j.ympev.2012.02.030] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Revised: 02/24/2012] [Accepted: 02/29/2012] [Indexed: 11/22/2022]
Abstract
Plant-beneficial fluorescent Pseudomonas spp. play important ecological roles. Here, their evolutionary history was investigated by a multilocus approach targeting genes involved in synthesis of secondary antimicrobial metabolites implicated in biocontrol of phytopathogens. Some of these genes were proposed to be ancestral, and this was investigated using a worldwide collection of 30 plant-colonizing fluorescent pseudomonads, based on phylogenetic analysis of 14 loci involved in production of 2,4-diacetylphloroglucinol (phlACBDE, phlF, intergenic locus phlA/phlF), hydrogen cyanide (hcnABC, anr) or global regulation of secondary metabolism (gacA, gacS, rsmZ). The 10 housekeeping loci rrs, dsbA, gyrB, rpoD, fdxA, recA, rpoB, rpsL, rpsG, and fusA served as controls. Each strain was readily distinguished from the others when considering allelic combinations for these 14 biocontrol-relevant loci. Topology comparisons based on Shimodaira-Hasegawa tests showed extensive incongruence when comparing single-locus phylogenetic trees with one another, but less when comparing (after sequence concatenation) trees inferred for genes involved in 2,4-diacetylphloroglucinol synthesis, hydrogen cyanide synthesis, or secondary metabolism global regulation with trees for housekeeping genes. The 14 loci displayed linkage disequilibrium, as housekeeping loci did, and all 12 protein-coding loci were subjected to purifying selection except for one positively-selected site in HcnA. Overall, the evolutionary history of Pseudomonas genes involved in synthesis of secondary antimicrobial metabolites important for biocontrol functions is in fact similar to that of housekeeping genes, and results suggest that they are ancestral in pseudomonads producing hydrogen cyanide and 2,4-diacetylphloroglucinol.
Collapse
|
59
|
Bürger R, Akerman A. The effects of linkage and gene flow on local adaptation: a two-locus continent-island model. Theor Popul Biol 2011; 80:272-88. [PMID: 21801739 PMCID: PMC3257863 DOI: 10.1016/j.tpb.2011.07.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Revised: 07/08/2011] [Accepted: 07/11/2011] [Indexed: 11/24/2022]
Abstract
Population subdivision and migration are generally considered to be important causes of linkage disequilibrium (LD). We explore the combined effects of recombination and gene flow on the amount of LD, the maintenance of polymorphism, and the degree of local adaptation in a subdivided population by analyzing a diploid, deterministic continent-island model with genic selection on two linked loci (i.e., no dominance or epistasis). For this simple model, we characterize explicitly all possible equilibrium configurations. Simple and intuitive approximations for many quantities of interest are obtained in limiting cases, such as weak migration, weak selection, weak or strong recombination. For instance, we derive explicit expressions for the measures D(=p(AB)-p(A)p(B)) and r(2) (the squared correlation in allelic state) of LD. They depend in qualitatively different ways on the migration rate. Remarkably high values of r(2) are maintained between weakly linked loci, especially if gene flow is low. We determine how the maximum amount of gene flow that admits preservation of the locally adapted haplotype, hence of polymorphism at both loci, depends on recombination rate and selection coefficients. We also investigate the evolution of differentiation by examining the invasion of beneficial mutants of small effect that are linked to an already present, locally adapted allele. Mutants of much smaller effect can invade successfully than predicted by naive single-locus theory provided they are at least weakly linked. Finally, the influence of linkage on the degree of local adaptation, the migration load, and the effective migration rate at a neutral locus is explored. We discuss possible consequences for the evolution of genetic architecture, in particular, for the emergence of clusters of tightly linked, slightly beneficial mutations and the evolution of recombination and chromosome inversions.
Collapse
|
60
|
Abstract
To model deviations from selectively neutral genetic variation caused by different forms of selection, it is necessary to first understand patterns of neutral variation. Best understood is neutral genetic variation at a single locus. But, as is well known, additional insights can be gained by investigating multiple loci. The resulting patterns reflect the degree of association (linkage) between loci and provide information about the underlying multilocus gene genealogies. The statistical properties of two-locus gene genealogies have been intensively studied for populations of constant size, as well as for simple demographic histories such as exponential population growth and single bottlenecks. By contrast, the combined effect of recombination and sustained demographic fluctuations is poorly understood. Addressing this issue, we study a two-locus Wright-Fisher model of a population subject to recurrent bottlenecks. We derive coalescent approximations for the covariance of the times to the most recent common ancestor at two loci in samples of two chromosomes. This covariance reflects the degree of association and thus linkage disequilibrium between these loci. We find, first, that an effective population-size approximation describes the numerically observed association between two loci provided that recombination occurs either much faster or much more slowly than the population-size fluctuations. Second, when recombination occurs frequently between but rarely within bottlenecks, we observe that the association of gene histories becomes independent of physical distance over a certain range of distances. Third, we show that in this case, a commonly used measure of linkage disequilibrium, σ(2)(d) (closely related to r(2)), fails to capture the long-range association between two loci. The reason is that constituent terms, each reflecting the long-range association, cancel. Fourth, we analyze a limiting case in which the long-range association can be described in terms of a Xi coalescent allowing for simultaneous multiple mergers of ancestral lines.
Collapse
|
61
|
Takuno S, Kado T, Sugino RP, Nakhleh L, Innan H. Population genomics in bacteria: a case study of Staphylococcus aureus. Mol Biol Evol 2011; 29:797-809. [PMID: 22009061 DOI: 10.1093/molbev/msr249] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
We analyzed the genome-wide pattern of single nucleotide polymorphisms (SNPs) in a sample with 12 strains of Staphylococcus aureus. Population structure of S. aureus seems to be complex, and the 12 strains were divided into five groups, named A, B, C, D, and E. We conducted a detailed analysis of the topologies of gene genealogies across the genomes and observed a high rate and frequency of tree-shape switching, indicating extensive homologous recombination. Most of the detected recombination occurred in the ancestral population of A, B, and C, whereas there are a number of small regions that exhibit evidence for homologous recombination with a distinct related species. As such regions would contain a number of novel mutations, it is suggested that homologous recombination would play a crucial role to maintain genetic variation within species. In the A-B-C ancestral population, we found multiple lines of evidence that the coalescent pattern is very similar to what is expected in a panmictic population, suggesting that this population is suitable to apply the standard population genetic theories. Our analysis showed that homologous recombination caused a dramatic decay in linkage disequilibrium (LD) and there is almost no LD between SNPs with distance more than 10 kb. Coalescent simulations demonstrated that a high rate of homologous recombination-a relative rate of 0.6 to the mutation rate with an average tract length of about 10 kb-is required to produce patterns similar to those observed in the S. aureus genomes. Our results call for more research into the evolutionary role of homologous recombination in bacterial populations.
Collapse
Affiliation(s)
- Shohei Takuno
- Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
| | | | | | | | | |
Collapse
|
62
|
The joint effects of background selection and genetic recombination on local gene genealogies. Genetics 2011; 189:251-66. [PMID: 21705759 DOI: 10.1534/genetics.111.130575] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data.
Collapse
|
63
|
Agudo R, Alcaide M, Rico C, Lemus JA, Blanco G, Hiraldo F, Donázar JA. Major histocompatibility complex variation in insular populations of the Egyptian vulture: inferences about the roles of genetic drift and selection. Mol Ecol 2011; 20:2329-40. [PMID: 21535276 DOI: 10.1111/j.1365-294x.2011.05107.x] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Insular populations have attracted the attention of evolutionary biologists because of their morphological and ecological peculiarities with respect to their mainland counterparts. Founder effects and genetic drift are known to distribute neutral genetic variability in these demes. However, elucidating whether these evolutionary forces have also shaped adaptive variation is crucial to evaluate the real impact of reduced genetic variation in small populations. Genes of the major histocompatibility complex (MHC) are classical examples of evolutionarily relevant loci because of their well-known role in pathogen confrontation and clearance. In this study, we aim to disentangle the partial roles of genetic drift and natural selection in the spatial distribution of MHC variation in insular populations. To this end, we integrate the study of neutral (22 microsatellites and one mtDNA locus) and MHC class II variation in one mainland (Iberia) and two insular populations (Fuerteventura and Menorca) of the endangered Egyptian vulture (Neophron percnopterus). Overall, the distribution of the frequencies of individual MHC alleles (n=17 alleles from two class II B loci) does not significantly depart from neutral expectations, which indicates a prominent role for genetic drift over selection. However, our results point towards an interesting co-evolution of gene duplicates that maintains different pairs of divergent alleles in strong linkage disequilibrium on islands. We hypothesize that the co-evolution of genes may counteract the loss of genetic diversity in insular demes, maximize antigen recognition capabilities when gene diversity is reduced, and promote the co-segregation of the most efficient allele combinations to cope with local pathogen communities.
Collapse
Affiliation(s)
- Rosa Agudo
- Department of Conservation Biology, Doñana Biological Station-CSIC, Avenida Américo Vespucio s/n, E-41092 Sevilla, Spain.
| | | | | | | | | | | | | |
Collapse
|
64
|
McEvoy BP, Powell JE, Goddard ME, Visscher PM. Human population dispersal "Out of Africa" estimated from linkage disequilibrium and allele frequencies of SNPs. Genome Res 2011; 21:821-9. [PMID: 21518737 DOI: 10.1101/gr.119636.110] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Genetic and fossil evidence supports a single, recent (<200,000 yr) origin of modern Homo sapiens in Africa, followed by later population divergence and dispersal across the globe (the "Out of Africa" model). However, there is less agreement on the exact nature of this migration event and dispersal of populations relative to one another. We use the empirically observed genetic correlation structure (or linkage disequilibrium) between 242,000 genome-wide single nucleotide polymorphisms (SNPs) in 17 global populations to reconstruct two key parameters of human evolution: effective population size (N(e)) and population divergence times (T). A linkage disequilibrium (LD)-based approach allows changes in human population size to be traced over time and reveals a substantial reduction in N(e) accompanying the "Out of Africa" exodus as well as the dramatic re-expansion of non-Africans as they spread across the globe. Secondly, two parallel estimates of population divergence times provide clear evidence of population dispersal patterns "Out of Africa" and subsequent dispersal of proto-European and proto-East Asian populations. Estimates of divergence times between European-African and East Asian-African populations are inconsistent with its simplest manifestation: a single dispersal from the continent followed by a split into Western and Eastern Eurasian branches. Rather, population divergence times are consistent with substantial ancient gene flow to the proto-European population after its divergence with proto-East Asians, suggesting distinct, early dispersals of modern H. sapiens from Africa. We use simulated genetic polymorphism data to demonstrate the validity of our conclusions against alternative population demographic scenarios.
Collapse
Affiliation(s)
- Brian P McEvoy
- Queensland Institute of Medical Research, Brisbane 4006, Australia
| | | | | | | |
Collapse
|
65
|
Barton NH, Kelleher J, Etheridge AM. A new model for extinction and recolonization in two dimensions: quantifying phylogeography. Evolution 2011; 64:2701-15. [PMID: 20408876 DOI: 10.1111/j.1558-5646.2010.01019.x] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Classical models of gene flow fail in three ways: they cannot explain large-scale patterns; they predict much more genetic diversity than is observed; and they assume that loosely linked genetic loci evolve independently. We propose a new model that deals with these problems. Extinction events kill some fraction of individuals in a region. These are replaced by offspring from a small number of parents, drawn from the preexisting population. This model of evolution forwards in time corresponds to a backwards model, in which ancestral lineages jump to a new location if they are hit by an event, and may coalesce with other lineages that are hit by the same event. We derive an expression for the identity in allelic state, and show that, over scales much larger than the largest event, this converges to the classical value derived by Wright and Malécot. However, rare events that cover large areas cause low genetic diversity, large-scale patterns, and correlations in ancestry between unlinked loci.
Collapse
Affiliation(s)
- Nicholas H Barton
- Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, West Mains Road, United Kingdom.
| | | | | |
Collapse
|
66
|
Rose CJ, Chapman JR, Marshall SDG, Lee SF, Batterham P, Ross HA, Newcomb RD. Selective sweeps at the organophosphorus insecticide resistance locus, Rop-1, have affected variation across and beyond the α-esterase gene cluster in the Australian sheep blowfly, Lucilia cuprina. Mol Biol Evol 2011; 28:1835-46. [PMID: 21228400 DOI: 10.1093/molbev/msr006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
A major theoretical consequence of selection at a locus is the genetic hitchhiking of linked sites (selective sweep). The extent of hitchhiking around a gene is related to the strength of selection and the rate of recombination, with its impact diminishing with distance from the selected site. At the Rop-1 locus of the sheep blowfly, Lucilia cuprina, polymorphisms at two different sites within the LcαE7 gene encode forms of the protein that confer organophosphorus insecticide resistance. To assess the impact of selection at these two sites on variation around LcαE7, we sequenced regions within six other genes along chromosome IV across isogenic (IV) strains of L. cuprina. High levels of linkage disequilibrium, characterized by low haplotype number (K) and diversity (H), and significant R(2) values were observed for two genes, LcαE1 and LcαE10, both members of the same α-esterase gene cluster as LcαE7. A significant R(2) value was also observed for a gene predicted to be the next closest to LcαE7, AL03, but not for any of the other genes, LcRpL13a, Lcdsx, or LcAce. Skews in the site frequency spectra toward high-frequency variants were significant for LcαE1 (Fay and Wu's H = -2.91), LcαE10 (H = -1.85), and Lcdsx (H = -2.00). Since the selective sweeps, two forms of likely returning variation were observed, including variation in microsatellites in an intron of LcαE10 and a recombination event between LcαE7 and LcαE10. These data suggest that two incomplete soft sweeps have occurred at LcαE7 that have significantly affected variation across, and beyond, the α-esterase gene cluster of L. cuprina. The speed and impact of these selective sweeps on surrounding genomic variation and the ability of L. cuprina to respond to future environmental challenges are discussed.
Collapse
Affiliation(s)
- Caroline J Rose
- Molecular Sensing, Human Responses, Food Innovation, The New Zealand Institute for Plant & Food Research Limited (Plant & Food Research), Auckland, New Zealand
| | | | | | | | | | | | | |
Collapse
|
67
|
Zeng K, Charlesworth B. The effects of demography and linkage on the estimation of selection and mutation parameters. Genetics 2010; 186:1411-24. [PMID: 20923980 PMCID: PMC2998320 DOI: 10.1534/genetics.110.122150] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2010] [Accepted: 09/27/2010] [Indexed: 11/18/2022] Open
Abstract
We explore the effects of demography and linkage on a maximum-likelihood (ML) method for estimating selection and mutation parameters in a reversible mutation model. This method assumes free recombination between sites and a randomly mating population of constant size and uses information from both polymorphic and monomorphic sites in the sample. Two likelihood-ratio test statistics were constructed under this ML framework: LRTγ for detecting selection and LRTκ for detecting mutational bias. By carrying out extensive simulations, we obtain the following results. When mutations are neutral and population size is constant, LRTγ and LRTκ follow a chi-square distribution with 1 d.f. regardless of the level of linkage, as long as the mutation rate is not very high. In addition, LRTγ and LRTκ are relatively insensitive to demographic effects and selection at linked sites. We find that the ML estimators of the selection and mutation parameters are usually approximately unbiased and that LRTκ usually has good power to detect mutational bias. Finally, with a recombination rate that is typical for Drosophila, LRTγ has good power to detect weak selection acting on synonymous sites. These results suggest that the method should be useful under many different circumstances.
Collapse
Affiliation(s)
- Kai Zeng
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| | | |
Collapse
|
68
|
Peng B, Amos CI. Forward-time simulation of realistic samples for genome-wide association studies. BMC Bioinformatics 2010; 11:442. [PMID: 20809983 PMCID: PMC2939614 DOI: 10.1186/1471-2105-11-442] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Accepted: 09/01/2010] [Indexed: 12/21/2022] Open
Abstract
Background Forward-time simulations have unique advantages in power and flexibility for the simulation of genetic samples of complex human diseases because they can closely mimic the evolution of human populations carrying these diseases. However, a number of methodological and computational constraints have prevented the power of this simulation method from being fully explored in existing forward-time simulation methods. Results Using a general-purpose forward-time population genetics simulation environment, we developed a forward-time simulation method that can be used to simulate realistic samples for genome-wide association studies. We examined the properties of this simulation method by comparing simulated samples with real data and demonstrated its wide applicability using four examples, including a simulation of case-control samples with a disease caused by multiple interacting genetic and environmental factors, a simulation of trio families affected by a disease-predisposing allele that had been subjected to either slow or rapid selective sweep, and a simulation of a structured population resulting from recent population admixture. Conclusions Our algorithm simulates populations that closely resemble the complex structure of the human genome, while allows the introduction of signals of natural selection. Because of its flexibility to generate different types of samples with arbitrary disease or quantitative trait models, this simulation method can simulate realistic samples to evaluate the performance of a wide variety of statistical gene mapping methods for genome-wide association studies.
Collapse
Affiliation(s)
- Bo Peng
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.
| | | |
Collapse
|
69
|
Geneva A, Garrigan D. Population genomics of secondary contact. Genes (Basel) 2010; 1:124-42. [PMID: 24710014 PMCID: PMC3960861 DOI: 10.3390/genes1010124] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Revised: 06/23/2010] [Accepted: 06/23/2010] [Indexed: 11/16/2022] Open
Abstract
One common form of reticulate evolution arises as a consequence of secondary contact between previously allopatric populations. Using extensive coalescent simulations, we describe the conditions for, and extent of, the introgression of genetic material into the genome of a colonizing population from an endemic population. The simulated coalescent histories are sampled from models that describe the evolution of entire chromosomes, thereby allowing the expected length of introgressed haplotypes to be estimated. The results indicate that our ability to identify reticulate evolution from genetic data is highly variable and depends critically upon the duration of the period of allopatry, the timing of the secondary contact event, as well as the sizes of the populations at the time of contact. One particularly interesting result arises when secondary contact occurs close to the time of a severe founder event, in this case, genetic introgression can be substantially more difficult to detect. However, if secondary contact occurs after such a founding event, when the range of the colonizing population increases, introgression is more readily detectable across the genome. This result may have important implications for our ability to detect introgression between ancestrally bottlenecked modern human populations and archaic hominin species, such as Neanderthals.
Collapse
Affiliation(s)
- Anthony Geneva
- Department of Biology, University of Rochester, Rochester, New York, USA.
| | - Daniel Garrigan
- Department of Biology, University of Rochester, Rochester, New York, USA.
| |
Collapse
|
70
|
|
71
|
Abstract
Many animals and plants have sex chromosomes that recombine over much of their length. Here we develop coalescent models for neutral sites on these chromosomes. The emphasis is on expected coalescence times (proportional to the expected amount of neutral genetic polymorphism), but we also derive some results for linkage disequilibria between neutral sites. We analyze the standard neutral model, a model with polymorphic Y chromosomes under balancing selection, and the invasion of a neo-Y chromosome. The results may be useful for testing hypotheses regarding how new sex chromosomes originate and how selection acts upon them.
Collapse
|
72
|
Rexroad CE, Vallejo RL. Estimates of linkage disequilibrium and effective population size in rainbow trout. BMC Genet 2009; 10:83. [PMID: 20003428 PMCID: PMC2800115 DOI: 10.1186/1471-2156-10-83] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2009] [Accepted: 12/14/2009] [Indexed: 12/19/2022] Open
Abstract
Background The use of molecular genetic technologies for broodstock management and selective breeding of aquaculture species is becoming increasingly more common with the continued development of genome tools and reagents. Several laboratories have produced genetic maps for rainbow trout to aid in the identification of loci affecting phenotypes of interest. These maps have resulted in the identification of many quantitative/qualitative trait loci affecting phenotypic variation in traits associated with albinism, disease resistance, temperature tolerance, sex determination, embryonic development rate, spawning date, condition factor and growth. Unfortunately, the elucidation of the precise allelic variation and/or genes underlying phenotypic diversity has yet to be achieved in this species having low marker densities and lacking a whole genome reference sequence. Experimental designs which integrate segregation analyses with linkage disequilibrium (LD) approaches facilitate the discovery of genes affecting important traits. To date the extent of LD has been characterized for humans and several agriculturally important livestock species but not for rainbow trout. Results We observed that the level of LD between syntenic loci decayed rapidly at distances greater than 2 cM which is similar to observations of LD in other agriculturally important species including cattle, sheep, pigs and chickens. However, in some cases significant LD was also observed up to 50 cM. Our estimate of effective population size based on genome wide estimates of LD for the NCCCWA broodstock population was 145, indicating that this population will respond well to high selection intensity. However, the range of effective population size based on individual chromosomes was 75.51 - 203.35, possibly indicating that suites of genes on each chromosome are disproportionately under selection pressures. Conclusions Our results indicate that large numbers of markers, more than are currently available for this species, will be required to enable the use of genome-wide integrated mapping approaches aimed at identifying genes of interest in rainbow trout.
Collapse
Affiliation(s)
- Caird E Rexroad
- USDA/ARS National Center for Cool and Cold Water Aquaculture, Leetown, West Virginia 25430, USA.
| | | |
Collapse
|
73
|
A genealogical interpretation of principal components analysis. PLoS Genet 2009; 5:e1000686. [PMID: 19834557 PMCID: PMC2757795 DOI: 10.1371/journal.pgen.1000686] [Citation(s) in RCA: 334] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2009] [Accepted: 09/16/2009] [Indexed: 11/24/2022] Open
Abstract
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference. Genetic variation in natural populations typically demonstrates structure arising from diverse processes including geographical isolation, founder events, migration, and admixture. One technique commonly used to uncover such structure is principal components analysis, which identifies the primary axes of variation in data and projects the samples onto these axes in a graphically appealing and intuitive manner. However, as the method is non-parametric, it can be hard to relate PCA to underlying process. Here, I show that the underlying genealogical history of the samples can be related directly to the PC projection. The result is useful because it is straightforward to predict the effects of different demographic processes on the sample genealogy. However, the result also reveals the limitations of PCA, in that multiple processes can give the same projections, it is strongly influenced by uneven sampling, and it discards important information in the spatial structure of genetic variation along chromosomes.
Collapse
|
74
|
Rosenberg NA, Vanliere JM. Replication of genetic associations as pseudoreplication due to shared genealogy. Genet Epidemiol 2009; 33:479-87. [PMID: 19191270 DOI: 10.1002/gepi.20400] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The genotypes of individuals in replicate genetic association studies have some level of correlation due to shared descent in the complete pedigree of all living humans. As a result of this genealogical sharing, replicate studies that search for genotype-phenotype associations using linkage disequilibrium between marker loci and disease-susceptibility loci can be considered as "pseudoreplicates" rather than true replicates. We examine the size of the pseudoreplication effect in association studies simulated from evolutionary models of the history of a population, evaluating the excess probability that both of a pair of studies detect a disease association compared to the probability expected under the assumption that the two studies are independent. Each of nine combinations of a demographic model and a penetrance model leads to a detectable pseudoreplication effect, suggesting that the degree of support that can be attributed to a replicated genetic association result is less than that which can be attributed to a replicated result in a context of true independence.
Collapse
Affiliation(s)
- Noah A Rosenberg
- Department of Human Genetics, Center for Computational Medicine and Biology, and the Life Sciences Institute, University of Michigan, Ann Arbor, Michigan 48109-2218, USA.
| | | |
Collapse
|
75
|
Correlation measures for linkage disequilibrium within and between populations. Genet Res (Camb) 2009; 91:183-92. [PMID: 19589188 DOI: 10.1017/s0016672309000159] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Correlation statistics can be used to measure the amount of linkage disequilibrium (LD) between two loci in subdivided populations. Within populations, the square of the correlation of gene frequencies, r2, is a convenient measure of LD. Between populations, the statistic rirj, for populations i and j, measures the relatedness of LD. Recurrence relationships for these two parameters are derived for the island model of population subdivision, under the assumptions of the linked identity-by-descent (LIBD) model in which correlation measures are equated to probability measures. The recurrence relationships closely predict the build-up of r2 and rirj following population subdivision in computer simulations. The LIBD model predicts that a steady state will be reached with r2 equal to 1/[1+4Nec(1+(k-1)rho)], where k is the number of island populations, Ne is the effective local population (island) size, and rho measures the ratio of migration (m) to recombination (c) and is equal to m/[c(k-1)+m]. For low values of m/c, rho=0, and E(r2) is equal to 1/(1+4Nec). For high values of m/c, rho=1, and E(r2) is equal to 1/(1+4kNec). The value of rirj following separation eventually settles down to a steady state whose expectation, E(rirj), is equal to E(r2) multiplied by rho. Equations predicting the change in rirj values are applied to the separation of African (Yoruba - YRI) and non-African (European - CEU) populations, using data from Hapmap. The primary data lead to an estimate of separation time of less than 1000 generations if there has been no migration, which is around one-third of minimum current estimates. Ancient rather than recent migration can explain the form of the data.
Collapse
|
76
|
Strasburg JL, Rieseberg LH. How robust are "isolation with migration" analyses to violations of the im model? A simulation study. Mol Biol Evol 2009; 27:297-310. [PMID: 19793831 DOI: 10.1093/molbev/msp233] [Citation(s) in RCA: 207] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Methods developed over the past decade have made it possible to estimate molecular demographic parameters such as effective population size, divergence time, and gene flow with unprecedented accuracy and precision. However, they make simplifying assumptions about certain aspects of the species' histories and the nature of the genetic data, and it is not clear how robust they are to violations of these assumptions. Here, we use simulated data sets to examine the effects of a number of violations of the "Isolation with Migration" (IM) model, including intralocus recombination, population structure, gene flow from an unsampled species, linkage among loci, and divergent selection, on demographic parameter estimates made using the program IMA. We also examine the effect of having data that fit a nucleotide substitution model other than the two relatively simple models available in IMA. We find that IMA estimates are generally quite robust to small to moderate violations of the IM model assumptions, comparable with what is often encountered in real-world scenarios. In particular, population structure within species, a condition encountered to some degree in virtually all species, has little effect on parameter estimates even for fairly high levels of structure. Likewise, most parameter estimates are robust to significant levels of recombination when data sets are pared down to apparently nonrecombining blocks, although substantial bias is introduced to several estimates when the entire data set with recombination is included. In contrast, a poor fit to the nucleotide substitution model can result in an increased error rate, in some cases due to a predictable bias and in other cases due to an increase in variance in parameter estimates among data sets simulated under the same conditions.
Collapse
|
77
|
Abstract
Abstract Linkage disequilibrium (LD), the association in populations between genes at linked loci, has achieved a high degree of prominence in recent years, primarily because of its use in identifying and cloning genes of medical importance. The field has recently been reviewed by Slatkin (2008). The present article is largely devoted to a review of the theory of LD in populations, including historical aspects.
Collapse
Affiliation(s)
- John A Sved
- School of Biological Sciences, University of Sydney, Australia.
| |
Collapse
|
78
|
Depaulis F, Orlando L, Hänni C. Using classical population genetics tools with heterochroneous data: time matters! PLoS One 2009; 4:e5541. [PMID: 19440242 PMCID: PMC2678253 DOI: 10.1371/journal.pone.0005541] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2008] [Accepted: 04/15/2009] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND New polymorphism datasets from heterochroneous data have arisen thanks to recent advances in experimental and microbial molecular evolution, and the sequencing of ancient DNA (aDNA). However, classical tools for population genetics analyses do not take into account heterochrony between subsets, despite potential bias on neutrality and population structure tests. Here, we characterize the extent of such possible biases using serial coalescent simulations. METHODOLOGY/PRINCIPAL FINDINGS We first use a coalescent framework to generate datasets assuming no or different levels of heterochrony and contrast most classical population genetic statistics. We show that even weak levels of heterochrony ( approximately 10% of the average depth of a standard population tree) affect the distribution of polymorphism substantially, leading to overestimate the level of polymorphism theta, to star like trees, with an excess of rare mutations and a deficit of linkage disequilibrium, which are the hallmark of e.g. population expansion (possibly after a drastic bottleneck). Substantial departures of the tests are detected in the opposite direction for more heterochroneous and equilibrated datasets, with balanced trees mimicking in particular population contraction, balancing selection, and population differentiation. We therefore introduce simple corrections to classical estimators of polymorphism and of the genetic distance between populations, in order to remove heterochrony-driven bias. Finally, we show that these effects do occur on real aDNA datasets, taking advantage of the currently available sequence data for Cave Bears (Ursus spelaeus), for which large mtDNA haplotypes have been reported over a substantial time period (22-130 thousand years ago (KYA)). CONCLUSIONS/SIGNIFICANCE Considering serial sampling changed the conclusion of several tests, indicating that neglecting heterochrony could provide significant support for false past history of populations and inappropriate conservation decisions. We therefore argue for systematically considering heterochroneous models when analyzing heterochroneous samples covering a large time scale.
Collapse
Affiliation(s)
- Frantz Depaulis
- Laboratoire d'Ecologie et Evolution, CNRS UMR 7625, UPMC Paris Universitas, Ecole Normale Supérieure, Paris, France
| | - Ludovic Orlando
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS, INRA, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Catherine Hänni
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS, INRA, Ecole Normale Supérieure de Lyon, Lyon, France
| |
Collapse
|
79
|
Eriksson A, Mahjani B, Mehlig B. Sequential Markov coalescent algorithms for population models with demographic structure. Theor Popul Biol 2009; 76:84-91. [PMID: 19433100 DOI: 10.1016/j.tpb.2009.05.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2009] [Revised: 05/04/2009] [Accepted: 05/04/2009] [Indexed: 10/24/2022]
Abstract
We analyse sequential Markov coalescent algorithms for populations with demographic structure: for a bottleneck model, a population-divergence model, and for a two-island model with migration. The sequential Markov coalescent method is an approximation to the coalescent suggested by McVean and Cardin, and by Marjoram and Wall. Within this algorithm we compute, for two individuals randomly sampled from the population, the correlation between times to the most recent common ancestor and the linkage probability corresponding to two different loci with recombination rate R between them. These quantities characterise the linkage between the two loci in question. We find that the sequential Markov coalescent method approximates the coalescent well in general in models with demographic structure. An exception is the case where individuals are sampled from populations separated by reduced gene flow. In this situation, the correlations may be significantly underestimated. We explain why this is the case.
Collapse
Affiliation(s)
- A Eriksson
- Department of Physics, University of Gothenburg, SE-41296 Gothenburg, Sweden
| | | | | |
Collapse
|
80
|
Tooming-Klunderud A, Fewer DP, Rohrlack T, Jokela J, Rouhiainen L, Sivonen K, Kristensen T, Jakobsen KS. Evidence for positive selection acting on microcystin synthetase adenylation domains in three cyanobacterial genera. BMC Evol Biol 2008; 8:256. [PMID: 18808704 PMCID: PMC2564945 DOI: 10.1186/1471-2148-8-256] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 09/22/2008] [Indexed: 11/30/2022] Open
Abstract
Background Cyanobacteria produce a wealth of secondary metabolites, including the group of small cyclic heptapeptide hepatotoxins that constitutes the microcystin family. The enzyme complex that directs the biosynthesis of microcystin is encoded in a single large gene cluster (mcy). mcy genes have a widespread distribution among cyanobacteria and are likely to have an ancient origin. The notable diversity within some of the Mcy modules is generated through various recombination events including horizontal gene transfer. Results A comparative analysis of the adenylation domains from the first module of McyB (McyB1) and McyC in the microcystin synthetase complex was performed on a large number of microcystin-producing strains from the Anabaena, Microcystis and Planktothrix genera. We found no decisive evidence for recombination between strains from different genera. However, we detected frequent recombination events in the mcyB and mcyC genes between strains within the same genus. Frequent interdomain recombination events were also observed between mcyB and mcyC sequences in Anabaena and Microcystis. Recombination and mutation rate ratios suggest that the diversification of mcyB and mcyC genes is driven by recombination events as well as point mutations in all three genera. Sequence analysis suggests that generally the adenylation domains of the first domain of McyB and McyC are under purifying selection. However, we found clear evidence for positive selection acting on a number of amino acid residues within these adenylation domains. These include residues important for active site selectivity of the adenylation domain, strongly suggesting selection for novel microcystin variants. Conclusion We provide the first clear evidence for positive selection acting on amino acid residues involved directly in the recognition and activation of amino acids incorporated into microcystin, indicating that the microcystin complement of a given strain may influence the ability of a particular strain to interact with its environment.
Collapse
Affiliation(s)
- Ave Tooming-Klunderud
- University of Oslo, Department of Biology, Centre for Ecological and Evolutionary Synthesis, 0316 Oslo, Norway.
| | | | | | | | | | | | | | | |
Collapse
|
81
|
Jensen JD, Thornton KR, Andolfatto P. An approximate bayesian estimator suggests strong, recurrent selective sweeps in Drosophila. PLoS Genet 2008; 4:e1000198. [PMID: 18802463 PMCID: PMC2529407 DOI: 10.1371/journal.pgen.1000198] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2008] [Accepted: 08/13/2008] [Indexed: 11/18/2022] Open
Abstract
The recurrent fixation of newly arising, beneficial mutations in a species reduces levels of linked neutral variability. Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant. We propose an approximate Bayesian (ABC) polymorphism-based estimator that can be used to distinguish between these models, and apply it to multi-locus data from Drosophila melanogaster. We investigate the extent to which inference about the strength of selection is sensitive to assumptions about the underlying distributions of the rates of substitution and recombination, the strength of selection, heterogeneity in mutation rate, as well as the population's demographic history. We show that assuming fixed values of selection parameters in estimation leads to overestimates of the strength of selection and underestimates of the rate. We estimate parameters for an African population of D. melanogaster (ŝ approximately 2E-03, ) and compare these to previous estimates. Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach. It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.
Collapse
Affiliation(s)
- Jeffrey D Jensen
- Section of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, California, United States of America.
| | | | | |
Collapse
|
82
|
Abstract
In a 2007 article, McVean studied the effect of recombination on linkage disequilibrium (LD) between two neutral loci located near a third locus that has undergone a selective sweep. The results demonstrated that two loci on the same side of a selected locus might show substantial LD, whereas the expected LD for two loci on opposite sides of a selected locus is zero. In this article, we extend McVean's model to include gene conversion. We show that one of the conclusions is strongly affected by gene conversion: when gene conversion is present, there may be substantial LD between two loci on opposite sides of a selective sweep.
Collapse
|
83
|
Temporal and spatial dynamics of human immunodeficiency virus type 1 circulating recombinant forms 08_BC and 07_BC in Asia. J Virol 2008; 82:9206-15. [PMID: 18596096 DOI: 10.1128/jvi.00399-08] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Human immunodeficiency virus type 1 (HIV-1) CRF08_BC and CRF07_BC are two major recombinants descended from subtypes B' and C. Despite their massive epidemic impact in China, their migration patterns and divergence times remain unknown. Phylogenetic and population genetic analyses were performed on 228 HIV-1 sequences representing CRF08_BC, CRF07_BC, and subtype C strains from different locations across China, India, and Myanmar. Genome-specific rates of evolution and divergence times were estimated using a Bayesian Markov chain Monte Carlo framework under various evolutionary models. CRF08_BC originated in 1990.3 (95% credible region [CR], 1988.6 to 1991.9) in Yunnan province before spreading to Guangxi (south) and Liaoning (northeast) around 1995. Inside Guangxi region, the eastward expansion of CRF08_BC continued from Baise city (west) to Binyang (central) between 1997 and 1998 and later spread into Pingxiang around 1999 in the south, mainly through injecting drug users. Additionally, CRF07_BC diverged from its common ancestor in 1993.3 (95% CR, 1991.2 to 1995.2) before crossing the border into southern Taiwan in late 1990s. Phylogenetic analysis indicates that both CRF08_BC and CRF07_BC can trace their origins to Yunnan. The parental Indian subtype C lineage likely entered China around 1981.2 (95% CR, 1976.7 to 1985.9). Using a multiple unlinked locus model, we also showed that the dates of divergence calculated in this study may not be significantly affected by intrasubtype recombination among different lineages. This is the first phylodynamic study depicting the spatiotemporal dynamics of HIV/AIDS in East Asia.
Collapse
|
84
|
Hellmann I, Mang Y, Gu Z, Li P, de la Vega FM, Clark AG, Nielsen R. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals. Genes Dev 2008; 18:1020-9. [PMID: 18411405 PMCID: PMC2493391 DOI: 10.1101/gr.074187.107] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2007] [Accepted: 04/07/2008] [Indexed: 01/25/2023]
Abstract
We introduce a simple, broadly applicable method for obtaining estimates of nucleotide diversity from genomic shotgun sequencing data. The method takes into account the special nature of these data: random sampling of genomic segments from one or more individuals and a relatively high error rate for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show that the elevated diversity in telomeric regions is mainly due to elevated mutation rates and not due to decreased levels of background selection. However, we find indications that telomeres as well as centromeres experience greater impact from natural selection than intrachromosomal regions. Finally, we identify a number of genomic regions with increased or reduced diversity compared with the local level of human-chimpanzee divergence and the local recombination rate.
Collapse
Affiliation(s)
- Ines Hellmann
- Departments of Integrative Biology and Statistics, University of California, Berkeley, California 94720, USA.
| | | | | | | | | | | | | |
Collapse
|
85
|
Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 2008; 451:998-1003. [PMID: 18288195 DOI: 10.1038/nature06742] [Citation(s) in RCA: 613] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2007] [Accepted: 01/29/2008] [Indexed: 11/09/2022]
Abstract
Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected--including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas--the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.
Collapse
Affiliation(s)
- Mattias Jakobsson
- Center for Computational Medicine and Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
86
|
Linkage disequilibrium under skewed offspring distribution among individuals in a population. Genetics 2008; 178:1517-32. [PMID: 18245371 DOI: 10.1534/genetics.107.075200] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Correlations in coalescence times between two loci are derived under selectively neutral population models in which the offspring of an individual can number on the order of the population size. The correlations depend on the rates of recombination and random drift and are shown to be functions of the parameters controlling the size and frequency of these large reproduction events. Since a prediction of linkage disequilibrium can be written in terms of correlations in coalescence times, it follows that the prediction of linkage disequilibrium is a function not only of the rate of recombination but also of the reproduction parameters. Low linkage disequilibrium is predicted if the offspring of a single individual frequently replace almost the entire population. However, high linkage disequilibrium can be predicted if the offspring of a single individual replace an intermediate fraction of the population. In some cases the model reproduces the standard Wright-Fisher predictions. Contrary to common intuition, high linkage disequilibrium can be predicted despite frequent recombination, and low linkage disequilibrium under infrequent recombination. Simulations support the analytical results but show that the variance of linkage disequilibrium is very large.
Collapse
|
87
|
Tenaillon MI, Austerlitz F, Tenaillon O. Apparent mutational hotspots and long distance linkage disequilibrium resulting from a bottleneck. J Evol Biol 2008; 21:541-50. [PMID: 18205779 DOI: 10.1111/j.1420-9101.2007.01490.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Genome wide patterns of nucleotide diversity and recombination reveal considerable variation including hotspots. Some studies suggest that these patterns are primarily dictated by individual locus history related at a broader scale to the population demographic history. Because bottlenecks have occurred in the history of numerous species, we undertook a simulation approach to investigate their impact on the patterns of aggregation of polymorphic sites and linkage disequilibrium (LD). We developed a new index (Polymorphism Aggregation Index) to characterize this aggregation and showed that variation in the density of polymorphic sites results from an interplay between the bottleneck scenario and the recombination rate. Under particular conditions, aggregation is maximized and apparent mutation hotspots resulting in a 50-fold increase in polymorphic sites density can occur. In similar conditions, long distance LD can be detected.
Collapse
Affiliation(s)
- M I Tenaillon
- UMR8120 de Génétique Végétale, INRA/Univ. Paris-Sud/CNRS/AgroParisTech, Ferme du Moulon, Gif-sur-Yvette, France.
| | | | | |
Collapse
|
88
|
Macpherson JM, González J, Witten DM, Davis JC, Rosenberg NA, Hirsh AE, Petrov DA. Nonadaptive explanations for signatures of partial selective sweeps in Drosophila. Mol Biol Evol 2008; 25:1025-42. [PMID: 18199829 DOI: 10.1093/molbev/msn007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
A beneficial mutation that has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether nonadaptive processes might generate similar haplotype configurations has not been extensively explored. Here, we consider 5 population genetic data sets taken from regions flanking high-frequency transposable elements in North American strains of Drosophila melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Whereas most of the data sets would be rejected as nonneutral under the standard neutral null model, only the data set for which there is strong external evidence in support of an adaptive transposition appears to be nonneutral under the more complex null model and in particular when demography is taken into account. High-frequency, derived mutations from a recently bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.
Collapse
|
89
|
Slate J, Pemberton JM. Admixture and patterns of linkage disequilibrium in a free-living vertebrate population. J Evol Biol 2007; 20:1415-27. [PMID: 17584236 DOI: 10.1111/j.1420-9101.2007.01339.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Linkage disequilibrium (LD), a measure of nonrandom association of alleles at different loci, is of great interest to evolutionary geneticists as it can be used to help identify loci that explain phenotypic variation. Surveys of the extent of LD across genomes have been carried out in a number of systems, most notably humans and model organisms. However, studies of natural populations of vertebrates have rarely been performed. Here, we describe an investigation of LD in a free-living island population of red deer Cervus elaphus. Relatively high levels of LD extended several tens of centimorgans, and significant LD was frequently detected between unlinked markers. The magnitude of LD varied depending on how the population was sampled. It also varied across different chromosomes, and was shown to be a function of sample size, intermarker distance and marker heterozygosity. A recent admixture event in the population led to an ephemeral increase in LD. Association mapping may be possible in this population, although a high 'baseline' level of LD could lead to false positive associations between marker loci and a trait of interest.
Collapse
Affiliation(s)
- J Slate
- Department of Animal & Plant Sciences, University of Sheffield, Sheffield, UK.
| | | |
Collapse
|
90
|
Kamau E, Charlesworth B, Charlesworth D. Linkage disequilibrium and recombination rate estimates in the self-incompatibility region of Arabidopsis lyrata. Genetics 2007; 176:2357-69. [PMID: 17565949 PMCID: PMC1950637 DOI: 10.1534/genetics.107.072231] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2007] [Accepted: 05/17/2007] [Indexed: 11/18/2022] Open
Abstract
Genetic diversity is unusually high at loci in the S-locus region of the self-incompatible species of the flowering plant, Arabidopsis lyrata, not just in the S loci themselves, but also at two nearby loci. In a previous study of a single natural population from Iceland, we attributed this elevated polymorphism to linkage disequilibrium (LD) between variants at loci close to the S locus and the S alleles, which are maintained in the population by balancing selection. With the four S-flanking loci whose diversity we previously studied, we could not determine the extent of the region linked to the S loci in which neutral sites are affected. We also could not exclude the possibility of a population bottleneck, or of admixture, as causes of the LD. We have now studied four more distant loci flanking the S-locus region, and more populations, and we analyze the results using a theoretical model of the effect of balancing selection on diversity at linked neutral sites within and between different functional S-allelic classes. In the model, diversity is a function of the number of selectively maintained alleles and the recombination distances from the selectively maintained sites. We use the model to estimate the number of different functional S alleles, their turnover rate, and recombination rates between the S-locus region and other loci. Our estimates suggest that there is a small region of very low recombination surrounding the S-locus region.
Collapse
Affiliation(s)
- Esther Kamau
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, W. Mains Road, Edinburgh, United Kingdom
| | | | | |
Collapse
|
91
|
Zauner H, Mayer WE, Herrmann M, Weller A, Erwig M, Sommer RJ. Distinct patterns of genetic variation in Pristionchus pacificus and Caenorhabditis elegans, two partially selfing nematodes with cosmopolitan distribution. Mol Ecol 2007; 16:1267-80. [PMID: 17391412 DOI: 10.1111/j.1365-294x.2006.03222.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Hermaphroditism has evolved several times independently in nematodes. The model organism Caenorhabditis elegans and Pristionchus pacificus are self-fertile hermaphrodites with rare facultative males. Both species are members of different families: C. elegans belongs to the Rhabditidae and P. pacificus to the Diplogastridae. Also, both species differ in their ecology: C. elegans is a soil-dwelling nematode that is often found in compost heaps. In contrast, field studies in Europe and North America indicate that Pristionchus nematodes are closely associated with scarab beetles. In C. elegans, several recent studies have found low genetic diversity and rare out-crossing events. Little is known about diversity levels and population structure in free-living hermaphroditic nematodes outside the genus Caenorhabditis. Taking a comparative approach, we analyse patterns of molecular diversity and linkage disequilibrium in 18 strains of P. pacificus from eight countries and four continents. Mitochondrial sequence data of P. pacificus isolates reveal a substantially higher genetic diversity on a global scale when compared to C. elegans. A mitochondrial-derived hermaphrodite phylogeny shows little geographic structuring, indicating several worldwide dispersal events. Amplified fragment length polymorphism and single strand conformation polymorphism analyses demonstrate a high degree of genome-wide linkage disequilibrium, which also extends to the mitochondrial genome. Together, these findings indicate distinct patterns of genetic variation of the two species. The low level of genetic diversity observed in C. elegans might reflect a recent human-associated dispersal, whereas the P. pacificus diversity might reflect a long-lasting and ongoing insect association. Thus, despite similar lifestyle characteristics in the laboratory, the reproductive mode of hermaphroditism with rare facultative males can result in distinct genetic variability patterns in different ecological settings.
Collapse
Affiliation(s)
- Hans Zauner
- Max Planck Institute for Developmental Biology, Department of Evolutionary Biology, 72076 Tübingen, Germany
| | | | | | | | | | | |
Collapse
|
92
|
Thornton KR, Jensen JD, Becquet C, Andolfatto P. Progress and prospects in mapping recent selection in the genome. Heredity (Edinb) 2007; 98:340-8. [PMID: 17473869 DOI: 10.1038/sj.hdy.6800967] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
One of the central goals of evolutionary biology is to understand the genetic basis of adaptive evolution. The availability of nearly complete genome sequences from a variety of organisms has facilitated the collection of data on naturally occurring genetic variation on the scale of hundreds of loci to whole genomes. Such data have changed the focus of molecular population genetics from making inferences about adaptive evolution at single loci to identifying which loci, out of hundreds to thousands, have been recent targets of natural selection. A major challenge in this effort is distinguishing the effects of selection from those of the demographic history of populations. Here we review some current progress and remaining challenges in the field.
Collapse
Affiliation(s)
- K R Thornton
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | | | | | | |
Collapse
|
93
|
Visscher PM. Variation of estimates of SNP and haplotype diversity and linkage disequilibrium in samples from the same population due to experimental and evolutionary sample size. Ann Hum Genet 2007; 71:119-26. [PMID: 17227482 DOI: 10.1111/j.1469-1809.2006.00305.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Studies of genetic polymorphisms and diversity between and within human populations are increasingly characterised by a very large number of genetic markers but using a relatively small number of individuals from which DNA samples were taken. In this report we examine the limitations of a small experimental sample size relative to a large genomic sample size, and quantify the sampling variance of a number of measures of diversity and linkage disequilibrium. The relationship between sample size and observed levels of polymorphism and haplotype diversity at the level of a gene is investigated under a neutral model of sequence evolution, using coalescent simulations. It is shown that the effect of evolutionary sampling, as manifested by differences between samples (genes) in measures of diversity estimated using very large sample sizes, is substantial, with a coefficient of variation of the number of detected polymorphic SNPs or haplotypes in the order of 15%. The effect of experimental design (sample size) is also very large, and a number of 'significant' results reported in the literature can be explained by sampling alone. The expected correlation coefficient of measures of linkage disequilibrium across samples from the same population has been quantified and found to be consistent with empirical estimates from the literature.
Collapse
Affiliation(s)
- P M Visscher
- Queensland Institute of Medical Research, Brisbane, Australia.
| |
Collapse
|
94
|
Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM. Recent human effective population size estimated from linkage disequilibrium. Genome Res 2007; 17:520-6. [PMID: 17351134 PMCID: PMC1832099 DOI: 10.1101/gr.6023607] [Citation(s) in RCA: 291] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Effective population size (N(e)) determines the amount of genetic variation, genetic drift, and linkage disequilibrium (LD) in populations. Here, we present the first genome-wide estimates of human effective population size from LD data. Chromosome-specific effective population size was estimated for all autosomes and the X chromosome from estimated LD between SNP pairs <100 kb apart. We account for variation in recombination rate by using coalescent-based estimates of fine-scale recombination rate from one sample and correlating these with LD in an independent sample. Phase I of the HapMap project produced between 18 and 22 million SNP pairs in samples from four populations: Yoruba from Ibadan (YRI), Nigeria; Japanese from Tokyo (JPT); Han Chinese from Beijing (HCB); and residents from Utah with ancestry from northern and western Europe (CEU). For CEU, JPT, and HCB, the estimate of effective population size, adjusted for SNP ascertainment bias, was approximately 3100, whereas the estimate for the YRI was approximately 7500, consistent with the out-of-Africa theory of ancestral human population expansion and concurrent bottlenecks. We show that the decay in LD over distance between SNPs is consistent with recent population growth. The estimates of N(e) are lower than previously published estimates based on heterozygosity, possibly because they represent one or more bottlenecks in human population size that occurred approximately 10,000 to 200,000 years ago.
Collapse
Affiliation(s)
- Albert Tenesa
- Colon Cancer Genetics Group, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
- MRC Human Genetics Unit, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
| | - Pau Navarro
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
| | - Ben J. Hayes
- Victorian Institute of Animal Science, DPI, Attwood 3049, Australia
| | - David L. Duffy
- Queensland Institute of Medical Research, Royal Brisbane Hospital, Brisbane 4006, Australia
| | - Geraldine M. Clarke
- The Wellcome Trust Centre for Human Genetics, The University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Mike E. Goddard
- Victorian Institute of Animal Science, DPI, Attwood 3049, Australia
- Institute of Land and Food Resources, University of Melbourne, Parkville 3010, Australia
| | - Peter M. Visscher
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
- Queensland Institute of Medical Research, Royal Brisbane Hospital, Brisbane 4006, Australia
- Corresponding author.E-mail ; fax +61-7-3362-0101
| |
Collapse
|
95
|
Abstract
The fixation of advantageous mutations by natural selection has a profound impact on patterns of linked neutral variation. While it has long been appreciated that such selective sweeps influence the frequency spectrum of nearby polymorphism, it has only recently become clear that they also have dramatic effects on local linkage disequilibrium. By extending previous results on the relationship between genealogical structure and linkage disequilibrium, I obtain simple expressions for the influence of a selective sweep on patterns of allelic association. I show that sweeps can increase, decrease, or even eliminate linkage disequilibrium (LD) entirely depending on the relative position of the selected and neutral loci. I also show the importance of the age of the neutral mutations in predicting their degree of association and describe the consequences of such results for the interpretation of empirical data. In particular, I demonstrate that while selective sweeps can eliminate LD, they generate patterns of genetic variation very different from those expected from recombination hotspots.
Collapse
Affiliation(s)
- Gil McVean
- Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom.
| |
Collapse
|
96
|
Minichiello MJ, Durbin R. Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 2006; 79:910-22. [PMID: 17033967 PMCID: PMC1698562 DOI: 10.1086/508901] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 09/01/2006] [Indexed: 12/26/2022] Open
Abstract
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.
Collapse
Affiliation(s)
- Mark J Minichiello
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, United Kingdom
| | | |
Collapse
|
97
|
Fraser DJ, Jones MW, McParland TL, Hutchings JA. Loss of historical immigration and the unsuccessful rehabilitation of extirpated salmon populations. CONSERV GENET 2006. [DOI: 10.1007/s10592-006-9188-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
98
|
Song YS, Song JS. Analytic computation of the expectation of the linkage disequilibrium coefficient r2. Theor Popul Biol 2006; 71:49-60. [PMID: 17069867 DOI: 10.1016/j.tpb.2006.09.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Revised: 06/25/2006] [Accepted: 09/13/2006] [Indexed: 11/19/2022]
Abstract
The squared correlation coefficient r(2) (sometimes denoted Delta(2)) is a measure of linkage disequilibrium that is widely used, but computing its expectation E[r(2)] in the population has remained an intriguing open problem. The expectation E[r(2)] is often approximated by the standard linkage deviation sigma(d)(2), which is a ratio of two expectations amenable to analytic computation. In this paper, a method of computing the population-wide E[r(2)] is introduced for a model with recurrent mutation, genetic drift and recombination. The approach is algebraic and is based on the diffusion process approximation. In the limit as the population-scaled recombination rate rho approaches infinity, it is shown rigorously that the asymptotic behavior of E[r(2)] is given by 1/rho+O(rho(-2)), which, incidentally, is the same as that of sigma(d)(2). A computer software that computes E[r(2)] numerically is available upon request.
Collapse
Affiliation(s)
- Yun S Song
- Department of Computer Science, University of California at Davis, 2063 Kemper Hall, One Shields Avenue, Davis, CA 95616, USA.
| | | |
Collapse
|
99
|
Pennings PS, Hermisson J. Soft sweeps III: the signature of positive selection from recurrent mutation. PLoS Genet 2006; 2:e186. [PMID: 17173482 PMCID: PMC1698945 DOI: 10.1371/journal.pgen.0020186] [Citation(s) in RCA: 200] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Accepted: 09/14/2006] [Indexed: 11/18/2022] Open
Abstract
Polymorphism data can be used to identify loci at which a beneficial allele has recently gone to fixation, given that an accurate description of the signature of selection is available. In the classical model that is used, a favored allele derives from a single mutational origin. This ignores the fact that beneficial alleles can enter a population recurrently by mutation during the selective phase. In this study, we present a combination of analytical and simulation results to demonstrate the effect of adaptation from recurrent mutation on summary statistics for polymorphism data from a linked neutral locus. We also analyze the power of standard neutrality tests based on the frequency spectrum or on linkage disequilibrium (LD) under this scenario. For recurrent beneficial mutation at biologically realistic rates, we find substantial deviations from the classical pattern of a selective sweep from a single new mutation. Deviations from neutrality in the level of polymorphism and in the frequency spectrum are much less pronounced than in the classical sweep pattern. In contrast, for levels of LD, the signature is even stronger if recurrent beneficial mutation plays a role. We suggest a variant of existing LD tests that increases their power to detect this signature. Populations adapt to their environment through fixation of beneficial alleles. Such fixation events leave a signature in neutral DNA variation of the population. An accurate description of this signature, also called a selective sweep, can be used to identify genes that have been involved in recent adaptations. The classical model of a selective sweep assumes that the beneficial allele was created only once by mutation, whereas the authors have shown, in a previous paper, that this assumption does not always hold. If a substitution involves multiple copies of an allele that have originated by independent mutation, it leads to a different signature, which the authors call a soft selective sweep. In this study, Pennings and Hermisson use analytical tools and coalescent simulations to describe this soft-sweep pattern. They show that this pattern is characterized by strong linkage disequilibrium. They also analyze the power of standard tests of neutrality to detect this pattern and suggest a variant of existing linkage-disequilibrium–based tests that increase the power to detect positive selection in the form of a soft selective sweep.
Collapse
Affiliation(s)
- Pleuni S Pennings
- Section of Evolutionary Biology, Department Biology II, Ludwig-Maximilians-University Munich, Planegg-Martinsried, Germany.
| | | |
Collapse
|
100
|
Ruderfer DM, Pratt SC, Seidel HS, Kruglyak L. Population genomic analysis of outcrossing and recombination in yeast. Nat Genet 2006; 38:1077-81. [PMID: 16892060 DOI: 10.1038/ng1859] [Citation(s) in RCA: 168] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2006] [Accepted: 07/10/2006] [Indexed: 11/09/2022]
Abstract
The budding yeast Saccharomyces cerevisiae has been used by humans for millennia to make wine, beer and bread. More recently, it became a key model organism for studies of eukaryotic biology and for genomic analysis. However, relatively little is known about the natural lifestyle and population genetics of yeast. One major question is whether genetically diverse yeast strains mate and recombine in the wild. We developed a method to infer the evolutionary history of a species from genome sequences of multiple individuals and applied it to whole-genome sequence data from three strains of Saccharomyces cerevisiae and the sister species Saccharomyces paradoxus. We observed a pattern of sequence variation among yeast strains in which ancestral recombination events lead to a mosaic of segments with shared genealogy. Based on sequence divergence and the inferred median size of shared segments (approximately 2,000 bp), we estimated that although any two strains have undergone approximately 16 million cell divisions since their last common ancestor, only 314 outcrossing events have occurred during this time (roughly one every 50,000 divisions). Local correlations in polymorphism rates indicate that linkage disequilibrium in yeast should extend over kilobases. Our results provide the initial foundation for population studies of association between genotype and phenotype in S. cerevisiae.
Collapse
Affiliation(s)
- Douglas M Ruderfer
- Lewis-Sigler Institute for Integrative Genomics and Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA
| | | | | | | |
Collapse
|