1
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. Theor Popul Biol 2024:S0040-5809(24)00028-5. [PMID: 38492811 DOI: 10.1016/j.tpb.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 03/04/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l'aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France.
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany.
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States.
| |
Collapse
|
2
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.13.544589. [PMID: 37398501 PMCID: PMC10312651 DOI: 10.1101/2023.06.13.544589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l’aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States
| |
Collapse
|
3
|
Smith CCR, Tittes S, Ralph PL, Kern AD. Dispersal inference from population genetic variation using a convolutional neural network. Genetics 2023; 224:iyad068. [PMID: 37052957 PMCID: PMC10213498 DOI: 10.1093/genetics/iyad068] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/08/2023] [Accepted: 04/07/2023] [Indexed: 04/14/2023] Open
Abstract
The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.
Collapse
Affiliation(s)
- Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
4
|
Genetic and demographic consequences of range contraction patterns during biological annihilation. Sci Rep 2023; 13:1691. [PMID: 36717685 PMCID: PMC9886963 DOI: 10.1038/s41598-023-28927-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 01/27/2023] [Indexed: 01/31/2023] Open
Abstract
Species range contractions both contribute to, and result from, biological annihilation, yet do not receive the same attention as extinctions. Range contractions can lead to marked impacts on populations but are usually characterized only by reduction in extent of range. For effective conservation, it is critical to recognize that not all range contractions are the same. We propose three distinct patterns of range contraction: shrinkage, amputation, and fragmentation. We tested the impact of these patterns on populations of a generalist species using forward-time simulations. All three patterns caused 86-88% reduction in population abundance and significantly increased average relatedness, with differing patterns in declines of nucleotide diversity relative to the contraction pattern. The fragmentation pattern resulted in the strongest effects on post-contraction genetic diversity and structure. Defining and quantifying range contraction patterns and their consequences for Earth's biodiversity would provide useful and necessary information to combat biological annihilation.
Collapse
|
5
|
Forien R. Stochastic partial differential equations describing neutral genetic diversity under short range and long range dispersal. ELECTRON J PROBAB 2022. [DOI: 10.1214/22-ejp827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
6
|
Bradburd GS, Ralph PL. Spatial Population Genetics: It's About Time. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2019. [DOI: 10.1146/annurev-ecolsys-110316-022659] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Many important questions about the history and dynamics of organisms have a geographical component: How many are there, and where do they live? How do they move and interbreed across the landscape? How were they moving a thousand years ago, and where were the ancestors of a particular individual alive today? Answers to these questions can have profound consequences for our understanding of history, ecology, and the evolutionary process. In this review, we discuss how geographic aspects of the distribution, movement, and reproduction of organisms are reflected in their pedigree across space and time. Because the structure of the pedigree is what determines patterns of relatedness in modern genetic variation, our aim is to thus provide intuition for how these processes leave an imprint in genetic data. We also highlight some current methods and gaps in the statistical toolbox of spatial population genetics.
Collapse
Affiliation(s)
- Gideon S. Bradburd
- Ecology, Evolutionary Biology, and Behavior Group, Department of Integrative Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - Peter L. Ralph
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, Eugene, Oregon 97403, USA
- Department of Mathematics, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
7
|
Forien R. The stepping stone model in a random environment and the effect of local heterogneities on isolation by distance patterns. ELECTRON J PROBAB 2019. [DOI: 10.1214/19-ejp314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Sainudiin R, Véber A. Full likelihood inference from the site frequency spectrum based on the optimal tree resolution. Theor Popul Biol 2018; 124:1-15. [DOI: 10.1016/j.tpb.2018.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 06/11/2018] [Accepted: 07/09/2018] [Indexed: 10/28/2022]
|
9
|
Ringbauer H, Kolesnikov A, Field DL, Barton NH. Estimating Barriers to Gene Flow from Distorted Isolation-by-Distance Patterns. Genetics 2018; 208:1231-1245. [PMID: 29311149 PMCID: PMC5844333 DOI: 10.1534/genetics.117.300638] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 12/23/2017] [Indexed: 11/18/2022] Open
Abstract
In continuous populations with local migration, nearby pairs of individuals have on average more similar genotypes than geographically well-separated pairs. A barrier to gene flow distorts this classical pattern of isolation by distance. Genetic similarity is decreased for sample pairs on different sides of the barrier and increased for pairs on the same side near the barrier. Here, we introduce an inference scheme that uses this signal to detect and estimate the strength of a linear barrier to gene flow in two dimensions. We use a diffusion approximation to model the effects of a barrier on the geographic spread of ancestry backward in time. This approach allows us to calculate the chance of recent coalescence and probability of identity by descent. We introduce an inference scheme that fits these theoretical results to the geographic covariance structure of bialleleic genetic markers. It can estimate the strength of the barrier as well as several demographic parameters. We investigate the power of our inference scheme to detect barriers by applying it to a wide range of simulated data. We also showcase an example application to an Antirrhinum majus (snapdragon) flower-color hybrid zone, where we do not detect any signal of a strong genome-wide barrier to gene flow.
Collapse
Affiliation(s)
- Harald Ringbauer
- Institute of Science and Technology Austria, Klosterneuburg A-3400, Austria
| | | | - David L Field
- Department of Botany and Biodiversity Research, University of Vienna, A-1030, Austria
| | - Nicholas H Barton
- Institute of Science and Technology Austria, Klosterneuburg A-3400, Austria
| |
Collapse
|
10
|
Montano V. Coalescent inferences in conservation genetics: should the exception become the rule? Biol Lett 2017; 12:rsbl.2016.0211. [PMID: 27330172 DOI: 10.1098/rsbl.2016.0211] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 05/23/2016] [Indexed: 01/25/2023] Open
Abstract
Genetic estimates of effective population size (Ne) are an established means to develop informed conservation policies. Another key goal to pursue the conservation of endangered species is keeping the connectivity across fragmented environments, to which genetic inferences of gene flow and dispersal greatly contribute. Most current statistical tools for estimating such population demographic parameters are based on Kingman's coalescent (KC). However, KC is inappropriate for taxa displaying skewed reproductive variance, a property widely observed in natural species. Coalescent models that consider skewed reproductive success-called multiple merger coalescents (MMCs)-have been shown to substantially improve estimates of Ne when the distribution of offspring per capita is highly skewed. MMCs predictions of standard population genetic parameters, including the rate of loss of genetic variation and the fixation probability of strongly selected alleles, substantially depart from KC predictions. These extended models also allow studying gene genealogies in a spatial continuum, providing a novel theoretical framework to investigate spatial connectivity. Therefore, development of statistical tools based on MMCs should substantially improve estimates of population demographic parameters with major conservation implications.
Collapse
Affiliation(s)
- Valeria Montano
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
11
|
Etheridge A, Freeman N, Penington S, Straulino D. Branching Brownian motion and selection in the spatial $\Lambda$-Fleming–Viot process. ANN APPL PROBAB 2017. [DOI: 10.1214/16-aap1245] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Ringbauer H, Coop G, Barton NH. Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks. Genetics 2017; 205:1335-1351. [PMID: 28108588 PMCID: PMC5340342 DOI: 10.1534/genetics.116.196220] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Accepted: 01/13/2017] [Indexed: 12/12/2022] Open
Abstract
Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of [Formula: see text] during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance.
Collapse
Affiliation(s)
- Harald Ringbauer
- Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria
| | - Graham Coop
- Department of Evolution and Ecology, University of California, Davis, California 95616
- Center for Population Biology, University of California, Davis, California 95616
| | - Nicholas H Barton
- Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria
| |
Collapse
|
13
|
Demographic inference under the coalescent in a spatial continuum. Theor Popul Biol 2016; 111:43-50. [PMID: 27184386 DOI: 10.1016/j.tpb.2016.05.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 05/04/2016] [Accepted: 05/06/2016] [Indexed: 11/22/2022]
Abstract
Understanding population dynamics from the analysis of molecular and spatial data requires sound statistical modeling. Current approaches assume that populations are naturally partitioned into discrete demes, thereby failing to be relevant in cases where individuals are scattered on a spatial continuum. Other models predict the formation of increasingly tight clusters of individuals in space, which, again, conflicts with biological evidence. Building on recent theoretical work, we introduce a new genealogy-based inference framework that alleviates these issues. This approach effectively implements a stochastic model in which the distribution of individuals is homogeneous and stationary, thereby providing a relevant null model for the fluctuation of genetic diversity in time and space. Importantly, the spatial density of individuals in a population and their range of dispersal during the course of evolution are two parameters that can be inferred separately with this method. The validity of the new inference framework is confirmed with extensive simulations and the analysis of influenza sequences collected over five seasons in the USA.
Collapse
|
14
|
Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol 2016; 12:e1004842. [PMID: 27145223 PMCID: PMC4856371 DOI: 10.1371/journal.pcbi.1004842] [Citation(s) in RCA: 328] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/02/2016] [Indexed: 01/23/2023] Open
Abstract
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods. Our understanding of the distribution of genetic variation in natural populations has been driven by mathematical models of the underlying biological and demographic processes. A key strength of such coalescent models is that they enable efficient simulation of data we might see under a variety of evolutionary scenarios. However, current methods are not well suited to simulating genome-scale data sets on hundreds of thousands of samples, which is essential if we are to understand the data generated by population-scale sequencing projects. Similarly, processing the results of large simulations also presents researchers with a major challenge, as it can take many days just to read the data files. In this paper we solve these problems by introducing a new way to represent information about the ancestral process. This new representation leads to huge gains in simulation speed and storage efficiency so that large simulations complete in minutes and the output files can be processed in seconds.
Collapse
Affiliation(s)
- Jerome Kelleher
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | | | - Gilean McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
15
|
Joseph TA, Hickerson MJ, Alvarado-Serrano DF. Demographic inference under a spatially continuous coalescent model. Heredity (Edinb) 2016; 117:94-9. [PMID: 27118157 DOI: 10.1038/hdy.2016.28] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 03/18/2016] [Accepted: 03/18/2016] [Indexed: 01/19/2023] Open
Abstract
In contrast with the classical population genetics theory that models population structure as discrete panmictic units connected by migration, many populations exhibit heterogeneous spatial gradients in population connectivity across semi-continuous habitats. The historical dynamics of such spatially structured populations can be captured by a spatially explicit coalescent model recently proposed by Etheridge (2008) and Barton et al. (2010a, 2010b) and whereby allelic lineages are distributed in a two-dimensional spatial continuum and move within this continuum based on extinction and coalescent events. Though theoretically rigorous, this model, which we here refer to as the continuum model, has not yet been implemented for demographic inference. To this end, here we introduce and demonstrate a statistical pipeline that couples the coalescent simulator of Kelleher et al. (2014) that simulates genealogies under the continuum model, with an approximate Bayesian computation (ABC) framework for parameter estimation of neighborhood size (that is, the number of locally breeding individuals) and dispersal ability (that is, the distance an offspring can travel within a generation). Using empirically informed simulations and simulation-based ABC cross-validation, we first show that neighborhood size can be accurately estimated. We then apply our pipeline to the South African endemic shrub species Berkheya cuneata to use the resulting estimates of dispersal ability and neighborhood size to infer the average population density of the species. More generally, we show that spatially explicit coalescent models can be successfully integrated into model-based demographic inference.
Collapse
Affiliation(s)
- T A Joseph
- Biology Department, The City College of New York, City University of New York, New York, NY, USA
| | - M J Hickerson
- Biology Department, The City College of New York, City University of New York, New York, NY, USA.,Program in Ecology, Evolutionary Biology, & Behavior, The Graduate Center, City University of New York (CUNY), New York, NY, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - D F Alvarado-Serrano
- Biology Department, The City College of New York, City University of New York, New York, NY, USA
| |
Collapse
|
16
|
Furstenau TN, Cartwright RA. The effect of the dispersal kernel on isolation-by-distance in a continuous population. PeerJ 2016; 4:e1848. [PMID: 27069794 PMCID: PMC4824897 DOI: 10.7717/peerj.1848] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 03/04/2016] [Indexed: 11/29/2022] Open
Abstract
Under models of isolation-by-distance, population structure is determined by the probability of identity-by-descent between pairs of genes according to the geographic distance between them. Well established analytical results indicate that the relationship between geographical and genetic distance depends mostly on the neighborhood size of the population which represents a standardized measure of gene flow. To test this prediction, we model local dispersal of haploid individuals on a two-dimensional landscape using seven dispersal kernels: Rayleigh, exponential, half-normal, triangular, gamma, Lomax and Pareto. When neighborhood size is held constant, the distributions produce similar patterns of isolation-by-distance, confirming predictions. Considering this, we propose that the triangular distribution is the appropriate null distribution for isolation-by-distance studies. Under the triangular distribution, dispersal is uniform over the neighborhood area which suggests that the common description of neighborhood size as a measure of an effective, local panmictic population is valid for popular families of dispersal distributions. We further show how to draw random variables from the triangular distribution efficiently and argue that it should be utilized in other studies in which computational efficiency is important.
Collapse
Affiliation(s)
- Tara N Furstenau
- School of Life Sciences and the Biodesign Institute, Arizona State University , Tempe, AZ , United States of America
| | - Reed A Cartwright
- School of Life Sciences and the Biodesign Institute, Arizona State University , Tempe, AZ , United States of America
| |
Collapse
|
17
|
Kelleher J, Etheridge A, Véber A, Barton N. Spread of pedigree versus genetic ancestry in spatially distributed populations. Theor Popul Biol 2016; 108:1-12. [DOI: 10.1016/j.tpb.2015.10.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Revised: 09/07/2015] [Accepted: 10/27/2015] [Indexed: 11/28/2022]
|
18
|
Bradburd GS, Ralph PL, Coop GM. A Spatial Framework for Understanding Population Structure and Admixture. PLoS Genet 2016; 12:e1005703. [PMID: 26771578 PMCID: PMC4714911 DOI: 10.1371/journal.pgen.1005703] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 11/05/2015] [Indexed: 01/26/2023] Open
Abstract
Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build "geogenetic maps," which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.
Collapse
Affiliation(s)
- Gideon S. Bradburd
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California, United States of America
| | - Peter L. Ralph
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Graham M. Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California, United States of America
| |
Collapse
|
19
|
Abstract
Recent genomic studies have highlighted the important role of admixture in shaping genome-wide patterns of diversity. Past admixture leaves a population genomic signature of linkage disequilibrium (LD), reflecting the mixing of parental chromosomes by segregation and recombination. These patterns of LD can be used to infer the timing of admixture, but the results of inference can depend strongly on the assumed demographic model. Here, we introduce a theoretical framework for modeling patterns of LD in a geographic contact zone where two differentiated populations have come into contact and are mixing by diffusive local migration. Assuming that this secondary contact is recent enough that genetic drift can be ignored, we derive expressions for the expected LD and admixture tract lengths across geographic space as a function of the age of the contact zone and the dispersal distance of individuals. We develop an approach to infer age of contact zones, using population genomic data from multiple spatially sampled populations by fitting our model to the decay of LD with recombination distance. To demonstrate an application of our model, we use our approach to explore the fit of a geographic contact zone model to three human genomic data sets from populations in Indonesia, Central Asia, and India and compare our results to inference under different demographic models. We obtain substantially different results from those of the commonly used model of panmictic admixture, highlighting the sensitivity of admixture timing results to the choice of demographic model.
Collapse
|
20
|
Véber A, Wakolbinger A. The spatial Lambda-Fleming–Viot process: An event-based construction and a lookdown representation. ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES 2015. [DOI: 10.1214/13-aihp571] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Geneva AJ, Muirhead CA, Kingan SB, Garrigan D. A new method to scan genomes for introgression in a secondary contact model. PLoS One 2015; 10:e0118621. [PMID: 25874895 PMCID: PMC4396994 DOI: 10.1371/journal.pone.0118621] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 01/21/2015] [Indexed: 12/20/2022] Open
Abstract
Secondary contact between divergent populations or incipient species may result in the exchange and introgression of genomic material. We develop a simple DNA sequence measure, called Gmin, which is designed to identify genomic regions experiencing introgression in a secondary contact model. Gmin is defined as the ratio of the minimum between-population number of nucleotide differences in a genomic window to the average number of between-population differences. Although it is conceptually simple, one advantage of Gmin is that it is computationally inexpensive relative to model-based methods for detecting gene flow and it scales easily to the level of whole-genome analysis. We compare the sensitivity and specificity of Gmin to those of the widely used index of population differentiation, FST, and suggest a simple statistical test for identifying genomic outliers. Extensive computer simulations demonstrate that Gmin has both greater sensitivity and specificity for detecting recent introgression than does FST. Furthermore, we find that the sensitivity of Gmin is robust with respect to both the population mutation and recombination rates. Finally, a scan of Gmin across the X chromosome of Drosophila melanogaster identifies candidate regions of introgression between sub-Saharan African and cosmopolitan populations that were previously missed by other methods. These results show that Gmin is a biologically straightforward, yet powerful, alternative to FST, as well as to more computationally intensive model-based methods for detecting gene flow.
Collapse
Affiliation(s)
- Anthony J. Geneva
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Christina A. Muirhead
- Department of Biology, University of Rochester, Rochester, New York, United States of America
- Ronin Institute, Montclair, New Jersey, United States of America
| | - Sarah B. Kingan
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Daniel Garrigan
- Department of Biology, University of Rochester, Rochester, New York, United States of America
- * E-mail:
| |
Collapse
|
22
|
Kelleher J, Etheridge AM, Barton NH. Coalescent simulation in continuous space: algorithms for large neighbourhood size. Theor Popul Biol 2014; 95:13-23. [PMID: 24910324 DOI: 10.1016/j.tpb.2014.05.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Revised: 05/20/2014] [Accepted: 05/22/2014] [Indexed: 11/15/2022]
Abstract
Many species have an essentially continuous distribution in space, in which there are no natural divisions between randomly mating subpopulations. Yet, the standard approach to modelling these populations is to impose an arbitrary grid of demes, adjusting deme sizes and migration rates in an attempt to capture the important features of the population. Such indirect methods are required because of the failure of the classical models of isolation by distance, which have been shown to have major technical flaws. A recently introduced model of extinction and recolonisation in two dimensions solves these technical problems, and provides a rigorous technical foundation for the study of populations evolving in a spatial continuum. The coalescent process for this model is simply stated, but direct simulation is very inefficient for large neighbourhood sizes. We present efficient and exact algorithms to simulate this coalescent process for arbitrary sample sizes and numbers of loci, and analyse these algorithms in detail.
Collapse
Affiliation(s)
- J Kelleher
- Institute of Evolutionary Biology, University of Edinburgh, Kings Buildings, West Mains Road, Edinburgh EH9 3JT, UK.
| | - A M Etheridge
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK.
| | - N H Barton
- Institute of Science and Technology, Am Campus I, A-3400 Klosterneuburg, Austria.
| |
Collapse
|
23
|
Heuer B, Sturm A. On spatial coalescents with multiple mergers in two dimensions. Theor Popul Biol 2013; 87:90-104. [DOI: 10.1016/j.tpb.2012.11.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/13/2012] [Accepted: 11/29/2012] [Indexed: 11/16/2022]
|
24
|
Affiliation(s)
- John Wakeley
- Harvard University, 4096 Biological Laboratories, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| |
Collapse
|