51
|
Bhaskar A, Wang YXR, Song YS. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res 2015; 25:268-79. [PMID: 25564017 PMCID: PMC4315300 DOI: 10.1101/gr.178756.114] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions.
Collapse
Affiliation(s)
- Anand Bhaskar
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA
| | - Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, California 94720, USA
| | - Yun S Song
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA; Department of Statistics, University of California, Berkeley, California 94720, USA; Department of Integrative Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
52
|
Uricchio LH, Torres R, Witte JS, Hernandez RD. Population genetic simulations of complex phenotypes with implications for rare variant association tests. Genet Epidemiol 2014; 39:35-44. [PMID: 25417809 DOI: 10.1002/gepi.21866] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 09/09/2014] [Accepted: 09/26/2014] [Indexed: 12/12/2022]
Abstract
Demographic events and natural selection alter patterns of genetic variation within populations and may play a substantial role in shaping the genetic architecture of complex phenotypes and disease. However, the joint impact of these basic evolutionary forces is often ignored in the assessment of statistical tests of association. Here, we provide a simulation-based framework for generating DNA sequences that incorporates selection and demography with flexible models for simulating phenotypic variation (sfs_coder). This tool also allows the user to perform locus-specific simulations by automatically querying annotated genomic functional elements and genetic maps. We demonstrate the effects of evolutionary forces on patterns of genetic variation by simulating recently inferred models of human selection and demography. We use these simulations to show that the demographic model and locus-specific features, such as the proportion of sites under selection, may have practical implications for estimating the statistical power of sequencing-based rare variant association tests. In particular, for some phenotype models, there may be higher power to detect rare variant associations in African populations compared to non-Africans, but power is considerably reduced in regions of the genome with rampant negative selection. Furthermore, we show that existing methods for simulating large samples based on resampling from a small set of observed haplotypes fail to recapitulate the distribution of rare variants in the presence of rapid population growth (as has been observed in several human populations).
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Graduate Program in Bioinformatics, University of California, San Francisco, California, United States of America
| | | | | | | |
Collapse
|
53
|
Wilson BA, Petrov DA, Messer PW. Soft selective sweeps in complex demographic scenarios. Genetics 2014; 198:669-84. [PMID: 25060100 PMCID: PMC4266194 DOI: 10.1534/genetics.114.165571] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/16/2014] [Indexed: 01/07/2023] Open
Abstract
Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such "hardening" of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.
Collapse
Affiliation(s)
- Benjamin A Wilson
- Department of Biology, Stanford University, Stanford, California 94305
| | - Dmitri A Petrov
- Department of Biology, Stanford University, Stanford, California 94305
| | - Philipp W Messer
- Department of Biology, Stanford University, Stanford, California 94305
| |
Collapse
|
54
|
Inferring population structure and demographic history using Y-STR data from worldwide populations. Mol Genet Genomics 2014; 290:141-50. [PMID: 25159112 DOI: 10.1007/s00438-014-0903-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 08/17/2014] [Indexed: 10/24/2022]
Abstract
The Y chromosome is one of the best genetic materials to explore the evolutionary history of human populations. Global analyses of Y chromosomal short tandem repeats (STRs) data can reveal very interesting world population structures and histories. However, previous Y-STR works tended to focus on small geographical ranges or only included limited sample sizes. In this study, we have investigated population structure and demographic history using 17 Y chromosomal STRs data of 979 males from 44 worldwide populations. The largest genetic distances have been observed between pairs of African and non-African populations. American populations with the lowest genetic diversities also showed large genetic distances and coancestry coefficients with other populations, whereas Eurasian populations displayed close genetic affinities. African populations tend to have the oldest time to the most recent common ancestors (TMRCAs), the largest effective population sizes and the earliest expansion times, whereas the American, Siberian, Melanesian, and isolated Atayal populations have the most recent TMRCAs and expansion times, and the smallest effective population sizes. This clear geographic pattern is well consistent with serial founder model for the origin of populations outside Africa. The Y-STR dataset presented here provides the most detailed view of worldwide population structure and human male demographic history, and additionally will be of great benefit to future forensic applications and population genetic studies.
Collapse
|
55
|
Impact of range expansions on current human genomic diversity. Curr Opin Genet Dev 2014; 29:22-30. [PMID: 25156518 DOI: 10.1016/j.gde.2014.07.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Revised: 07/09/2014] [Accepted: 07/25/2014] [Indexed: 12/19/2022]
Abstract
The patterns of population genetic diversity depend to a large extent on past demographic history. Most human populations are known to have gone recently through a series of range expansions within and out of Africa, but these spatial expansions are rarely taken into account when interpreting observed genomic diversity, possibly because they are difficult to model. Here we review available evidence in favour of range expansions out of Africa, and we discuss several of their consequences on neutral and selected diversity, including some recent observations on an excess of rare neutral and selected variants in large samples. We further show that in spatially subdivided populations, the sampling strategy can severely impact the resulting genetic diversity and be confounded by past demography. We conclude that ignoring the spatial structure of human population can lead to some misinterpretations of extant genetic diversity.
Collapse
|
56
|
Abstract
Evolutionary processes of natural selection may be expected to leave their mark on age patterns of survival and reproduction. Demographic theory includes three main strands--mutation accumulation, stochastic vitality, and optimal life histories. This paper reviews the three strands and, concentrating on mutation accumulation, extends a mathematical result with broad implications concerning the effect of interactions between small age-specific effects of deleterious mutant alleles. Empirical data from genomic sequencing along with prospects for combining strands of theory hold hope for future progress.
Collapse
|
57
|
Saint Pierre A, Genin E. How important are rare variants in common disease? Brief Funct Genomics 2014; 13:353-61. [DOI: 10.1093/bfgp/elu025] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
58
|
Arbiza L, Gottipati S, Siepel A, Keinan A. Contrasting X-linked and autosomal diversity across 14 human populations. Am J Hum Genet 2014; 94:827-44. [PMID: 24836452 DOI: 10.1016/j.ajhg.2014.04.011] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 04/15/2014] [Indexed: 12/29/2022] Open
Abstract
Contrasting the genetic diversity of the human X chromosome (X) and autosomes has facilitated understanding historical differences between males and females and the influence of natural selection. Previous studies based on smaller data sets have left questions regarding how empirical patterns extend to additional populations and which forces can explain them. Here, we address these questions by analyzing the ratio of X-to-autosomal (X/A) nucleotide diversity with the complete genomes of 569 females from 14 populations. Results show that X/A diversity is similar within each continental group but notably lower in European (EUR) and East Asian (ASN) populations than in African (AFR) populations. X/A diversity increases in all populations with increasing distance from genes, highlighting the stronger impact of diversity-reducing selection on X than on the autosomes. However, relative X/A diversity (between two populations) is invariant with distance from genes, suggesting that selection does not drive the relative reduction in X/A diversity in non-Africans (0.842 ± 0.012 for EUR-to-AFR and 0.820 ± 0.032 for ASN-to-AFR comparisons). Finally, an array of models with varying population bottlenecks, expansions, and migration from the latest studies of human demographic history account for about half of the observed reduction in relative X/A diversity from the expected value of 1. They predict values between 0.91 and 0.94 for EUR-to-AFR comparisons and between 0.91 and 0.92 for ASN-to-AFR comparisons. Further reductions can be predicted by more extreme demographic events in excess of those captured by the latest studies but, in the absence of these, also by historical sex-biased demographic events or other processes.
Collapse
|
59
|
Lohmueller KE. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet 2014; 10:e1004379. [PMID: 24875776 PMCID: PMC4038606 DOI: 10.1371/journal.pgen.1004379] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 03/28/2014] [Indexed: 02/06/2023] Open
Abstract
Population genetic studies have found evidence for dramatic population growth in recent human history. It is unclear how this recent population growth, combined with the effects of negative natural selection, has affected patterns of deleterious variation, as well as the number, frequency, and effect sizes of mutations that contribute risk to complex traits. Because researchers are performing exome sequencing studies aimed at uncovering the role of low-frequency variants in the risk of complex traits, this topic is of critical importance. Here I use simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome. I show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to a population that did not expand. Under a model where a mutation's effect on a trait is correlated with its effect on fitness, rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, when using a single-marker test, for a given false-positive rate and sample size, recent population growth decreases the expected number of significant associations with the trait relative to the number detected in a population that did not expand. However, in a model where there is no correlation between a mutation's effect on fitness and the effect on the trait, common variants account for much of the additive genetic variance, regardless of demography. Moreover, here demography does not affect the number of significant associations detected. These findings suggest recent population history may be an important factor influencing the power of association tests and in accounting for the missing heritability of certain complex traits.
Collapse
Affiliation(s)
- Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| |
Collapse
|
60
|
Gao F, Keinan A. High burden of private mutations due to explosive human population growth and purifying selection. BMC Genomics 2014; 15 Suppl 4:S3. [PMID: 25056720 PMCID: PMC4083409 DOI: 10.1186/1471-2164-15-s4-s3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Background Recent studies have shown that human populations have experienced a complex demographic history, including a recent epoch of rapid population growth that led to an excess in the proportion of rare genetic variants in humans today. This excess can impact the burden of private mutations for each individual, defined here as the proportion of heterozygous variants in each newly sequenced individual that are novel compared to another large sample of sequenced individuals. Results We calculated the burden of private mutations predicted by different demographic models, and compared with empirical estimates based on data from the NHLBI Exome Sequencing Project and data from the Neutral Regions (NR) dataset. We observed a significant excess in the proportion of private mutations in the empirical data compared with models of demographic history without a recent epoch of population growth. Incorporating recent growth into the model provides a much improved fit to empirical observations. This phenomenon becomes more marked for larger sample sizes, e.g. extrapolating to a scenario in which 10,000 individuals from the same population have been sequenced with perfect accuracy, still about 1 in 400 heterozygous sites (or about 6,000 variants) at the 10,001st individual are predicted to be novel, 18-times as predicted in the absence of recent population growth. The proportion of private mutations is additionally increased by purifying selection, which differentially affect mutations of different functional annotations. Conclusions The burden of private mutations for each individual, which are singletons (i.e. appearing in a single copy) in a larger sample that includes this individual, is predicted to be greatly increased by recent population growth, as well as by purifying selection. Comparison with empirical data supports that European populations have experienced recent rapid population growth, consistent with previous studies. These results have important implications to the design and analysis of sequencing-based association studies of complex human disease as they pertain to private and very rare variants. They also imply that personalized genomics will indeed have to be very personal in accounting for the large number of private mutations.
Collapse
|
61
|
Abstract
Understanding the forces that shape patterns of genetic variation across the genome is a major aim in evolutionary genetics. An emerging insight from analyses of genome-wide polymorphism and divergence data is that selection on linked sites can have an important impact on neutral genetic variation. However, in contrast to Drosophila, which exhibits a signature of recurrent hitchhiking, many plant genomes studied so far seem to mainly be affected by background selection. Moreover, many plants do not exhibit classic signatures of linked selection, such as a correlation between recombination rate and neutral diversity. In this review, I discuss the impact of genome architecture and mating system on the expected signature of linked selection in plants and review empirical evidence for linked selection, with a focus on plant model systems. Finally, I discuss the implications of linked selection for inference of demographic history in plants.
Collapse
|
62
|
|