1
|
Smith CCR, Patterson G, Ralph PL, Kern AD. Estimation of spatial demographic maps from polymorphism data using a neural network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585300. [PMID: 38559192 PMCID: PMC10980082 DOI: 10.1101/2024.03.15.585300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and source sink dynamics of dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity by descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology, and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.
Collapse
Affiliation(s)
- Chris C. R. Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Gilia Patterson
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Peter L. Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D. Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
2
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. Theor Popul Biol 2024:S0040-5809(24)00028-5. [PMID: 38492811 DOI: 10.1016/j.tpb.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 03/04/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l'aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France.
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany.
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States.
| |
Collapse
|
3
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574911. [PMID: 38260273 PMCID: PMC10802400 DOI: 10.1101/2024.01.09.574911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Vladimir M Jovanovic
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Noah Snyder-Mackler
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, USA
| | - Donald F Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Michael J Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hendrikje Westphal
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, USA
| | - Stefanie Bley
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julie E Horvath
- Department of Biological and Biomedical Sciences, North Carolina Central University, North Carolina, Durham, USA
- Research and Collections Section, North Carolina Museum of Natural Sciences, North Carolina, Raleigh, USA
- Department of Biological Sciences, North Carolina State University, North Carolina, Raleigh, USA
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Lauren J N Brent
- Centre for Research in Animal Behaviour, University of Exeter, Exeter, UK
| | - Michael L Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Marketing Department, the Wharton School of Business, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Department of Biology, Duke University, Durham, North Carolina, USA
- Duke University Population Research Institute, Durham, North Carolina, USA
| | - Katja Nowick
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anja Widdig
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| |
Collapse
|
4
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
5
|
Yüncü E, Işıldak U, Williams MP, Huber CD, Flegontova O, Vyazov LA, Changmai P, Flegontov P. False discovery rates of qpAdm-based screens for genetic admixture. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.25.538339. [PMID: 37904998 PMCID: PMC10614728 DOI: 10.1101/2023.04.25.538339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Although a broad range of methods exists for reconstructing population history from genome-wide single nucleotide polymorphism data, just a few methods gained popularity in archaeogenetics: principal component analysis (PCA); ADMIXTURE, an algorithm that models individuals as mixtures of multiple ancestral sources represented by actual or inferred populations; formal tests for admixture such as f3-statistics and D/f4-statistics; and qpAdm, a tool for fitting two-component and more complex admixture models to groups or individuals. Despite their popularity in archaeogenetics, which is explained by modest computational requirements and ability to analyze data of various types and qualities, protocols relying on qpAdm that screen numerous alternative models of varying complexity and find "fitting" models (often considering both estimated admixture proportions and p-values as a composite criterion of model fit) remain untested on complex simulated population histories in the form of admixture graphs of random topology. We analyzed genotype data extracted from such simulations and tested various types of high-throughput qpAdm protocols ("rotating" and "non-rotating", with or without temporal stratification of target groups and proxy ancestry sources, and with or without a "model competition" step). We caution that high-throughput qpAdm protocols may be inappropriate for exploratory analyses in poorly studied regions/periods since their false discovery rates varied between 12% and 68% depending on the details of the protocol and on the amount and quality of simulated data (i.e., >12% of fitting two-way admixture models imply gene flows that were not simulated). We demonstrate that for reducing false discovery rates of qpAdm protocols to nearly 0% it is advisable to use large SNP sets with low missing data rates, the rotating qpAdm protocol with a strictly enforced rule that target groups do not pre-date their proxy sources, and an unsupervised ADMIXTURE analysis as a way to verify feasible qpAdm models. Our study has a number of limitations: for instance, these recommendations depend on the assumption that the underlying genetic history is a complex admixture graph and not a stepping-stone model.
Collapse
Affiliation(s)
- Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Ulaş Işıldak
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Matthew P. Williams
- Department of Biology, Eberly College of Science, Pennsylvania State University, PA, USA
| | - Christian D. Huber
- Department of Biology, Eberly College of Science, Pennsylvania State University, PA, USA
| | - Olga Flegontova
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czechia
| | - Leonid A. Vyazov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Pavel Flegontov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
6
|
Smith CCR, Kern AD. disperseNN2: a neural network for estimating dispersal distance from georeferenced polymorphism data. BMC Bioinformatics 2023; 24:385. [PMID: 37817115 PMCID: PMC10566146 DOI: 10.1186/s12859-023-05522-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 10/05/2023] [Indexed: 10/12/2023] Open
Abstract
Spatial genetic variation is shaped in part by an organism's dispersal ability. We present a deep learning tool, disperseNN2, for estimating the mean per-generation dispersal distance from georeferenced polymorphism data. Our neural network performs feature extraction on pairs of genotypes, and uses the geographic information that comes with each sample. These attributes led disperseNN2 to outperform a state-of-the-art deep learning method that does not use explicit spatial information: the mean relative absolute error was reduced by 33% and 48% using sample sizes of 10 and 100 individuals, respectively. disperseNN2 is particularly useful for non-model organisms or systems with sparse genomic resources, as it uses unphased, single nucleotide polymorphisms as its input. The software is open source and available from https://github.com/kr-colab/disperseNN2 , with documentation located at https://dispersenn2.readthedocs.io/en/latest/ .
Collapse
Affiliation(s)
- Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, 97403, USA.
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, 97403, USA
| |
Collapse
|
7
|
Smith CCR, Kern AD. disperseNN2: a neural network for estimating dispersal distance from georeferenced polymorphism data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.30.551115. [PMID: 37577624 PMCID: PMC10418106 DOI: 10.1101/2023.07.30.551115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Spatial genetic variation is shaped in part by an organism's dispersal ability. We present a deep learning tool, disperseNN2, for estimating the mean per-generation dispersal distance from georeferenced polymorphism data. Our neural network performs feature extraction on pairs of genotypes, and uses the geographic information that comes with each sample. These attributes led disperseNN2 to outperform a state-of-the-art deep learning method that does not use explicit spatial information: the mean relative absolute error was reduced by 33% and 48% using sample sizes of 10 and 100 individuals, respectively. disperseNN2 is particularly useful for non-model organisms or systems with sparse genomic resources, as it uses unphased, single nucleotide polymorphisms as its input. The software is open source and available from https://github.com/kr-colab/disperseNN2, with documentation located at https://dispersenn2.readthedocs.io/en/latest/.
Collapse
Affiliation(s)
- Chris C. R. Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D. Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
8
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.13.544589. [PMID: 37398501 PMCID: PMC10312651 DOI: 10.1101/2023.06.13.544589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l’aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States
| |
Collapse
|
9
|
Arango-Isaza E, Capodiferro MR, Aninao MJ, Babiker H, Aeschbacher S, Achilli A, Posth C, Campbell R, Martínez FI, Heggarty P, Sadowsky S, Shimizu KK, Barbieri C. The genetic history of the Southern Andes from present-day Mapuche ancestry. Curr Biol 2023:S0960-9822(23)00607-3. [PMID: 37279753 DOI: 10.1016/j.cub.2023.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 03/01/2023] [Accepted: 05/05/2023] [Indexed: 06/08/2023]
Abstract
The southernmost regions of South America harbor some of the earliest evidence of human presence in the Americas. However, connections with the rest of the continent and the contextualization of present-day indigenous ancestries remain poorly resolved. In this study, we analyze the genetic ancestry of one of the largest indigenous groups in South America: the Mapuche. We generate genome-wide data from 64 participants from three Mapuche populations in Southern Chile: Pehuenche, Lafkenche, and Huilliche. Broadly, we describe three main ancestry blocks with a common origin, which characterize the Southern Cone, the Central Andes, and Amazonia. Within the Southern Cone, ancestors of the Mapuche lineages differentiated from those of the Far South during the Middle Holocene and did not experience further migration waves from the north. We find that the deep genetic split between the Central and Southern Andes is followed by instances of gene flow, which may have accompanied the southward spread of cultural traits from the Central Andes, including crops and loanwords from Quechua into Mapudungun (the language of the Mapuche). Finally, we report close genetic relatedness between the three populations analyzed, with the Huilliche characterized additionally by intense recent exchanges with the Far South. Our findings add new perspectives on the genetic (pre)history of South America, from the first settlement through to the present-day indigenous presence. Follow-up fieldwork took these results back to the indigenous communities to contextualize the genetic narrative alongside indigenous knowledge and perspectives.
Collapse
Affiliation(s)
- Epifanía Arango-Isaza
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland; Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich 8050, Switzerland.
| | - Marco Rosario Capodiferro
- Trinity College Dublin, Dublin 2, Ireland; Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia 27100, Italy
| | | | - Hiba Babiker
- Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| | - Simon Aeschbacher
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland
| | - Alessandro Achilli
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia 27100, Italy
| | - Cosimo Posth
- Institute for Archaeological Sciences, Archaeo, and Palaeogenetics, University of Tübingen, Tübingen 72074, Germany; Senckenberg Centre for Human Evolution and Palaeoenvironment, University of Tübingen, Tübingen 72074, Germany
| | - Roberto Campbell
- Escuela de Antropología, Pontificia Universidad Católica de Chile, Santiago 6904411, Chile
| | - Felipe I Martínez
- Escuela de Antropología, Pontificia Universidad Católica de Chile, Santiago 6904411, Chile; Center for Intercultural and Indigenous Research, Santiago 7820436, Chile
| | - Paul Heggarty
- "Waves" ERC Group, Department of Human Behavior, Evolution and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| | - Scott Sadowsky
- Department of Linguistics and Literature, Universidad de Cartagena, Cartagena 130001, Colombia
| | - Kentaro K Shimizu
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland; Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich 8050, Switzerland
| | - Chiara Barbieri
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland; Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich 8050, Switzerland; Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany.
| |
Collapse
|
10
|
Smith CCR, Tittes S, Ralph PL, Kern AD. Dispersal inference from population genetic variation using a convolutional neural network. Genetics 2023; 224:iyad068. [PMID: 37052957 PMCID: PMC10213498 DOI: 10.1093/genetics/iyad068] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/08/2023] [Accepted: 04/07/2023] [Indexed: 04/14/2023] Open
Abstract
The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.
Collapse
Affiliation(s)
- Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
11
|
Smith TB, Weissman DB. Isolation by distance in populations with power-law dispersal. G3 (BETHESDA, MD.) 2023; 13:jkad023. [PMID: 36718551 PMCID: PMC10085794 DOI: 10.1093/g3journal/jkad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 01/07/2023] [Indexed: 02/01/2023]
Abstract
Limited dispersal of individuals between generations results in isolation by distance, in which individuals further apart in space tend to be less related. Classic models of isolation by distance assume that dispersal distances are drawn from a thin-tailed distribution and predict that the proportion of the genome that is identical by descent between a pair of individuals should decrease exponentially with the spatial separation between them. However, in many natural populations, individuals occasionally disperse over very long distances. In this work, we use mathematical analysis and coalescent simulations to study the effect of long-range (power-law) dispersal on patterns of isolation by distance. We find that it leads to power-law decay of identity-by-descent at large distances with the same exponent as dispersal. We also find that broad power-law dispersal produces another, shallow power-law decay of identity-by-descent at short distances. These results suggest that the distribution of long-range dispersal events could be estimated from sequencing large population samples taken from a wide range of spatial scales.
Collapse
Affiliation(s)
- Tyler B Smith
- Department of Physics, Emory University, Atlanta, Georgia 30322, USA
| | - Daniel B Weissman
- Corresponding author: Department of Physics, Emory University, Atlanta, Georgia 30322, USA.
| |
Collapse
|
12
|
Hancock ZB, Toczydlowski RH, Bradburd GS. A spatial approach to jointly estimate Wright's neighborhood size and long-term effective population size. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.10.532094. [PMID: 36945591 PMCID: PMC10029013 DOI: 10.1101/2023.03.10.532094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Spatially continuous patterns of genetic differentiation, which are common in nature, are often poorly described by existing population genetic theory or methods that assume panmixia or discrete, clearly definable populations. There is therefore a need for statistical approaches in population genetics that can accommodate continuous geographic structure, and that ideally use georeferenced individuals as the unit of analysis, rather than populations or subpopulations. In addition, researchers are often interested describing the diversity of a population distributed continuously in space, and this diversity is intimately linked to the dispersal potential of the organism. A statistical model that leverages information from patterns of isolation-by-distance to jointly infer parameters that control local demography (such as Wright's neighborhood size), and the long-term effective size (Ne) of a population would be useful. Here, we introduce such a model that uses individual-level pairwise genetic and geographic distances to infer Wright's neighborhood size and long-term Ne. We demonstrate the utility of our model by applying it to complex, forward-time demographic simulations as well as an empirical dataset of the Red Sea clownfish (Amphiprion bicinctus). The model performed well on simulated data relative to alternative approaches and produced reasonable empirical results given the natural history of clownfish. The resulting inferences provide important insights into the population genetic dynamics of spatially structure populations.
Collapse
Affiliation(s)
- Zachary B. Hancock
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 481103, USA
| | | | - Gideon S. Bradburd
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 481103, USA
| |
Collapse
|
13
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Patterson N, Reich D. ancIBD - Screening for identity by descent segments in human ancient DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531671. [PMID: 36945531 PMCID: PMC10028887 DOI: 10.1101/2023.03.08.531671] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set ('1240k'). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germanÿ
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
14
|
Genetic and demographic consequences of range contraction patterns during biological annihilation. Sci Rep 2023; 13:1691. [PMID: 36717685 PMCID: PMC9886963 DOI: 10.1038/s41598-023-28927-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 01/27/2023] [Indexed: 01/31/2023] Open
Abstract
Species range contractions both contribute to, and result from, biological annihilation, yet do not receive the same attention as extinctions. Range contractions can lead to marked impacts on populations but are usually characterized only by reduction in extent of range. For effective conservation, it is critical to recognize that not all range contractions are the same. We propose three distinct patterns of range contraction: shrinkage, amputation, and fragmentation. We tested the impact of these patterns on populations of a generalist species using forward-time simulations. All three patterns caused 86-88% reduction in population abundance and significantly increased average relatedness, with differing patterns in declines of nucleotide diversity relative to the contraction pattern. The fragmentation pattern resulted in the strongest effects on post-contraction genetic diversity and structure. Defining and quantifying range contraction patterns and their consequences for Earth's biodiversity would provide useful and necessary information to combat biological annihilation.
Collapse
|
15
|
Genome-wide data from medieval German Jews show that the Ashkenazi founder event pre-dated the 14 th century. Cell 2022; 185:4703-4716.e16. [PMID: 36455558 PMCID: PMC9793425 DOI: 10.1016/j.cell.2022.11.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 08/26/2022] [Accepted: 11/01/2022] [Indexed: 12/05/2022]
Abstract
We report genome-wide data from 33 Ashkenazi Jews (AJ), dated to the 14th century, obtained following a salvage excavation at the medieval Jewish cemetery of Erfurt, Germany. The Erfurt individuals are genetically similar to modern AJ, but they show more variability in Eastern European-related ancestry than modern AJ. A third of the Erfurt individuals carried a mitochondrial lineage common in modern AJ and eight carried pathogenic variants known to affect AJ today. These observations, together with high levels of runs of homozygosity, suggest that the Erfurt community had already experienced the major reduction in size that affected modern AJ. The Erfurt bottleneck was more severe, implying substructure in medieval AJ. Overall, our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ.
Collapse
|
16
|
Forien R. Stochastic partial differential equations describing neutral genetic diversity under short range and long range dispersal. ELECTRON J PROBAB 2022. [DOI: 10.1214/22-ejp827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Arciero E, Dogra SA, Malawsky DS, Mezzavilla M, Tsismentzoglou T, Huang QQ, Hunt KA, Mason D, Sharif SM, van Heel DA, Sheridan E, Wright J, Small N, Carmi S, Iles MM, Martin HC. Fine-scale population structure and demographic history of British Pakistanis. Nat Commun 2021; 12:7189. [PMID: 34893604 PMCID: PMC8664933 DOI: 10.1038/s41467-021-27394-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 11/09/2021] [Indexed: 02/08/2023] Open
Abstract
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.
Collapse
Affiliation(s)
- Elena Arciero
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Sufyan A. Dogra
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Daniel S. Malawsky
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Massimo Mezzavilla
- grid.5133.40000 0001 1941 4308Department of Medical Sciences, University of Trieste, Trieste, Italy
| | - Theofanis Tsismentzoglou
- grid.9909.90000 0004 1936 8403Leeds Institute for Data Analytics, University of Leeds, Leeds, UK ,grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Qin Qin Huang
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Karen A. Hunt
- grid.4868.20000 0001 2171 1133Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Dan Mason
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Saghira Malik Sharif
- grid.415967.80000 0000 9965 1030Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - David A. van Heel
- grid.4868.20000 0001 2171 1133Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eamonn Sheridan
- grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - John Wright
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Neil Small
- grid.6268.a0000 0004 0379 5283Faculty of Health Studies, University of Bradford, Richmond Road, Bradford, UK
| | - Shai Carmi
- grid.9619.70000 0004 1937 0538Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mark M. Iles
- grid.9909.90000 0004 1936 8403Leeds Institute for Data Analytics, University of Leeds, Leeds, UK ,grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Hilary C. Martin
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
18
|
Parental relatedness through time revealed by runs of homozygosity in ancient DNA. Nat Commun 2021; 12:5425. [PMID: 34521843 PMCID: PMC8440622 DOI: 10.1038/s41467-021-25289-w] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 07/21/2021] [Indexed: 02/08/2023] Open
Abstract
Parental relatedness of present-day humans varies substantially across the globe, but little is known about the past. Here we analyze ancient DNA, leveraging that parental relatedness leaves genomic traces in the form of runs of homozygosity. We present an approach to identify such runs in low-coverage ancient DNA data aided by haplotype information from a modern phased reference panel. Simulation and experiments show that this method robustly detects runs of homozygosity longer than 4 centimorgan for ancient individuals with at least 0.3 × coverage. Analyzing genomic data from 1,785 ancient humans who lived in the last 45,000 years, we detect low rates of first cousin or closer unions across most ancient populations. Moreover, we find a marked decay in background parental relatedness co-occurring with or shortly after the advent of sedentary agriculture. We observe this signal, likely linked to increasing local population sizes, across several geographic transects worldwide.
Collapse
|
19
|
Kivisild T, Saag L, Hui R, Biagini SA, Pankratov V, D'Atanasio E, Pagani L, Saag L, Rootsi S, Mägi R, Metspalu E, Valk H, Malve M, Irdt K, Reisberg T, Solnik A, Scheib CL, Seidman DN, Williams AL, Tambets K, Metspalu M. Patterns of genetic connectedness between modern and medieval Estonian genomes reveal the origins of a major ancestry component of the Finnish population. Am J Hum Genet 2021; 108:1792-1806. [PMID: 34411538 DOI: 10.1016/j.ajhg.2021.07.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 07/23/2021] [Indexed: 11/20/2022] Open
Abstract
The Finnish population is a unique example of a genetic isolate affected by a recent founder event. Previous studies have suggested that the ancestors of Finnic-speaking Finns and Estonians reached the circum-Baltic region by the 1st millennium BC. However, high linguistic similarity points to a more recent split of their languages. To study genetic connectedness between Finns and Estonians directly, we first assessed the efficacy of imputation of low-coverage ancient genomes by sequencing a medieval Estonian genome to high depth (23×) and evaluated the performance of its down-sampled replicas. We find that ancient genomes imputed from >0.1× coverage can be reliably used in principal-component analyses without projection. By searching for long shared allele intervals (LSAIs; similar to identity-by-descent segments) in unphased data for >143,000 present-day Estonians, 99 Finns, and 14 imputed ancient genomes from Estonia, we find unexpectedly high levels of individual connectedness between Estonians and Finns for the last eight centuries in contrast to their clear differentiation by allele frequencies. High levels of sharing of these segments between Estonians and Finns predate the demographic expansion and late settlement process of Finland. One plausible source of this extensive sharing is the 8th-10th centuries AD migration event from North Estonia to Finland that has been proposed to explain uniquely shared linguistic features between the Finnish language and the northern dialect of Estonian and shared Christianity-related loanwords from Slavic. These results suggest that LSAI detection provides a computationally tractable way to detect fine-scale structure in large cohorts.
Collapse
Affiliation(s)
- Toomas Kivisild
- Department of Human Genetics, KU Leuven, Leuven 3000, Belgium; Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK.
| | - Lehti Saag
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Research Department of Genetics, Evolution, and Environment, University College London, London WC1E 6BT, UK
| | - Ruoyun Hui
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Vasili Pankratov
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Eugenia D'Atanasio
- Instituto di Biologia e Patologia Molecolari, Consiglio Nazionale delle Ricerche, Rome, Italy
| | - Luca Pagani
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Department of Biology, University of Padova, 35131 Padova, Italy
| | - Lauri Saag
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Siiri Rootsi
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Reedik Mägi
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Ene Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Heiki Valk
- Department of Archaeology, Institute of History and Archaeology, University of Tartu, Tartu 51014, Estonia
| | - Martin Malve
- Department of Archaeology, Institute of History and Archaeology, University of Tartu, Tartu 51014, Estonia
| | - Kadri Irdt
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Tuuli Reisberg
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Anu Solnik
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Christiana L Scheib
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK; St John's College, University of Cambridge, Cambridge CB2 1TP, UK
| | - Daniel N Seidman
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Kristiina Tambets
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Mait Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| |
Collapse
|
20
|
Kerdoncuff E, Lambert A, Achaz G. Testing for population decline using maximal linkage disequilibrium blocks. Theor Popul Biol 2020; 134:171-181. [DOI: 10.1016/j.tpb.2020.03.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 02/02/2023]
|
21
|
From molecules to populations: appreciating and estimating recombination rate variation. Nat Rev Genet 2020; 21:476-492. [DOI: 10.1038/s41576-020-0240-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/15/2020] [Indexed: 02/07/2023]
|
22
|
Battey CJ, Ralph PL, Kern AD. Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data. Genetics 2020; 215:193-214. [PMID: 32209569 PMCID: PMC7198281 DOI: 10.1534/genetics.120.303143] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 03/12/2020] [Indexed: 12/14/2022] Open
Abstract
Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result, many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here, we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies (GWAS). We find that most common summary statistics have distributions that differ substantially from those seen in well-mixed populations, especially when Wright's neighborhood size is < 100 and sampling is spatially clustered. "Stepping-stone" models reproduce some of these effects, but discretizing the landscape introduces artifacts that in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations revealed surprisingly little systematic bias. We also show that the combination of spatially autocorrelated environments and limited dispersal causes GWAS to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.
Collapse
Affiliation(s)
- C J Battey
- Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
| | - Peter L Ralph
- Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
| | - Andrew D Kern
- Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
| |
Collapse
|
23
|
Mapping co-ancestry connections between the genome of a Medieval individual and modern Europeans. Sci Rep 2020; 10:6843. [PMID: 32321996 PMCID: PMC7176696 DOI: 10.1038/s41598-020-64007-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 04/06/2020] [Indexed: 11/08/2022] Open
Abstract
Historical genetic links among similar populations can be difficult to establish. Identity by descent (IBD) analyses find genomic blocks that represent direct genealogical relationships among individuals. However, this method has rarely been applied to ancient genomes because IBD stretches are progressively fragmented by recombination and thus not recognizable after few tens of generations. To explore such genealogical relationships, we estimated long IBD blocks among modern Europeans, generating networks to uncover the genetic structures. We found that Basques, Sardinians, Icelanders and Orcadians form, each of them, highly intraconnected sub-clusters in a European network, indicating dense genealogical links within small, isolated populations. We also exposed individual genealogical links -such as the connection between one Basque and one Icelandic individual- that cannot be uncovered with other, widely used population genetics methods such as PCA or ADMIXTURE. Moreover, using ancient DNA technology we sequenced a Late Medieval individual (Barcelona, Spain) to high genomic coverage and identified IBD blocks shared between her and modern Europeans. The Medieval IBD blocks are statistically overrepresented only in modern Spaniards, which is the geographically closest population. This approach can be used to produce a fine-scale reflection of shared ancestry across different populations of the world, offering a direct genetic link from the past to the present.
Collapse
|
24
|
Abstract
Geographic patterns in human genetic diversity carry footprints of population history and provide insights for genetic medicine and its application across human populations. Summarizing and visually representing these patterns of diversity has been a persistent goal for human geneticists, and has revealed that genetic differentiation is frequently correlated with geographic distance. However, most analytical methods to represent population structure do not incorporate geography directly, and it must be considered post hoc alongside a visual summary of the genetic structure. Here, we estimate "effective migration" surfaces to visualize how human genetic diversity is geographically structured. The results reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, the relationship is often subtly distorted. Overall, the visualizations provide a new perspective on genetics and geography in humans and insight to the geographic distribution of human genetic variation.
Collapse
Affiliation(s)
- Benjamin M Peter
- Department of Human Genetics, University of Chicago, Chicago, IL
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Desislava Petkova
- Wellcome Trust Center for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL
- Department of Ecology & Evolution, University of Chicago, Chicago, IL
| |
Collapse
|
25
|
Leitwein M, Duranton M, Rougemont Q, Gagnaire PA, Bernatchez L. Using Haplotype Information for Conservation Genomics. Trends Ecol Evol 2020; 35:245-258. [DOI: 10.1016/j.tree.2019.10.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/18/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022]
|
26
|
Bradburd GS, Ralph PL. Spatial Population Genetics: It's About Time. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2019. [DOI: 10.1146/annurev-ecolsys-110316-022659] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Many important questions about the history and dynamics of organisms have a geographical component: How many are there, and where do they live? How do they move and interbreed across the landscape? How were they moving a thousand years ago, and where were the ancestors of a particular individual alive today? Answers to these questions can have profound consequences for our understanding of history, ecology, and the evolutionary process. In this review, we discuss how geographic aspects of the distribution, movement, and reproduction of organisms are reflected in their pedigree across space and time. Because the structure of the pedigree is what determines patterns of relatedness in modern genetic variation, our aim is to thus provide intuition for how these processes leave an imprint in genetic data. We also highlight some current methods and gaps in the statistical toolbox of spatial population genetics.
Collapse
Affiliation(s)
- Gideon S. Bradburd
- Ecology, Evolutionary Biology, and Behavior Group, Department of Integrative Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - Peter L. Ralph
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, Eugene, Oregon 97403, USA
- Department of Mathematics, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
27
|
Duranton M, Bonhomme F, Gagnaire P. The spatial scale of dispersal revealed by admixture tracts. Evol Appl 2019; 12:1743-1756. [PMID: 31548854 PMCID: PMC6752141 DOI: 10.1111/eva.12829] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 05/28/2019] [Indexed: 12/11/2022] Open
Abstract
Evaluating species dispersal across the landscape is essential to design appropriate management and conservation actions. However, technical difficulties often preclude direct measures of individual movement, while indirect genetic approaches rely on assumptions that sometimes limit their application. Here, we show that the temporal decay of admixture tracts lengths can be used to assess genetic connectivity within a population introgressed by foreign haplotypes. We present a proof-of-concept approach based on local ancestry inference in a high gene flow marine fish species, the European sea bass (Dicentrarchus labrax). Genetic admixture in the contact zone between Atlantic and Mediterranean sea bass lineages allows the introgression of Atlantic haplotype tracts within the Mediterranean Sea. Once introgressed, blocks of foreign ancestry are progressively eroded by recombination as they diffuse from the western to the eastern Mediterranean basin, providing a means to estimate dispersal. By comparing the length distributions of Atlantic tracts between two Mediterranean populations located at different distances from the contact zone, we estimated the average per-generation dispersal distance within the Mediterranean lineage to less than 50 km. Using simulations, we showed that this approach is robust to a range of demographic histories and sample sizes. Our results thus support that the length of admixture tracts can be used together with a recombination clock to estimate genetic connectivity in species for which the neutral migration-drift balance is not informative or simply does not exist.
Collapse
Affiliation(s)
- Maud Duranton
- ISEM, Univ Montpellier, CNRS, EPHE, IRDMontpellierFrance
| | | | | |
Collapse
|
28
|
Lundgren E, Ralph PL. Are populations like a circuit? Comparing isolation by resistance to a new coalescent-based method. Mol Ecol Resour 2019; 19:1388-1406. [PMID: 31099173 DOI: 10.1111/1755-0998.13035] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 04/22/2019] [Accepted: 05/01/2019] [Indexed: 11/27/2022]
Abstract
A number of methods commonly used in landscape genetics use an analogy to electrical resistance on a network to describe and fit barriers to movement across the landscape using genetic distance data. These are motivated by a mathematical equivalence between electrical resistance between two nodes of a network and the 'commute time', which is the mean time for a random walk on that network to leave one node, visit the other, and return. However, genetic data are more accurately modelled by a different quantity, the coalescence time. Here, we describe the differences between resistance distance and coalescence time, and explore the consequences for inference. We implemented a Bayesian method to infer effective movement rates and population sizes under both these models, and found that inference using commute times could produce misleading results in the presence of biased gene flow. We then used forwards-time simulation with continuous geography to demonstrate that coalescence-based inference remains more accurate than resistance-based methods on realistic data, but difficulties highlight the need for methods that explicitly model continuous, heterogeneous geography.
Collapse
Affiliation(s)
- Erik Lundgren
- Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Peter L Ralph
- Institute for Ecology and Evolution, University of Oregon, Eugene, OR, USA
| |
Collapse
|
29
|
Long-Distance Benefits of Marine Reserves: Myth or Reality? Trends Ecol Evol 2019; 34:342-354. [PMID: 30777295 DOI: 10.1016/j.tree.2019.01.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2018] [Revised: 01/05/2019] [Accepted: 01/07/2019] [Indexed: 02/08/2023]
Abstract
Long-distance (>40-km) dispersal from marine reserves is poorly documented; yet, it can provide essential benefits such as seeding fished areas or connecting marine reserves into networks. From a meta-analysis, we suggest that the spatial scale of marine connectivity is underestimated due to the limited geographic extent of sampling designs. We also found that the largest marine reserves (>1000km2) are the most isolated. These findings have important implications for the assessment of evolutionary, ecological, and socio-economic long-distance benefits of marine reserves. We conclude that existing methods to infer dispersal should consider the up-to-date genomic advances and also expand the spatial scale of sampling designs. Incorporating long-distance connectivity in conservation planning will contribute to increase the benefits of marine reserve networks.
Collapse
|
30
|
Peñalba JV, Joseph L, Moritz C. Current geography masks dynamic history of gene flow during speciation in northern Australian birds. Mol Ecol 2019; 28:630-643. [PMID: 30561150 DOI: 10.1111/mec.14978] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/11/2018] [Accepted: 12/12/2018] [Indexed: 12/25/2022]
Abstract
Genome divergence is greatly influenced by gene flow during early stages of speciation. As populations differentiate, geographic barriers can constrain gene flow and so affect the dynamics of divergence and speciation. Current geography, specifically disjunction and continuity of ranges, is often used to predict the historical gene flow during the divergence process. We test this prediction in eight meliphagoid bird species complexes codistributed in four regions. These regions are separated by known biogeographical barriers across northern Australia and Papua New Guinea. We find that bird populations currently separated by terrestrial habitat barriers within Australia and marine barriers between Australia and Papua New Guinea have a range of divergence levels and probability of gene flow not associated with current range connectivity. Instead, geographic distance and historical range connectivity better predict divergence and probability of gene flow. In this dynamic environmental context, we also find support for a nonlinear decrease of the probability of gene flow during the divergence process. The probability of gene flow initially decreases gradually after a certain level of divergence is reached. Its decrease then accelerates until the probability is close to zero. This implies that although geographic connectivity may have more of an effect early in speciation, other factors associated with higher divergence may play a more important role in influencing gene flow midway through and later in speciation. Current geographic connectivity may then mislead inferences regarding potential for gene flow during speciation under a complex and dynamic history of geographic and reproductive isolation.
Collapse
Affiliation(s)
- Joshua V Peñalba
- Ecology and Evolution, Australian National University, Acton, ACT, Australia.,Centre for Biodiversity Analysis, Acton, ACT, Australia.,Australian National Wildlife Collection, CSIRO National Research Collections Australia, Canberra, Canberra, ACT, Australia.,Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität Munich, Planegg-Martinsried, Germany
| | - Leo Joseph
- Centre for Biodiversity Analysis, Acton, ACT, Australia.,Australian National Wildlife Collection, CSIRO National Research Collections Australia, Canberra, Canberra, ACT, Australia
| | - Craig Moritz
- Ecology and Evolution, Australian National University, Acton, ACT, Australia.,Centre for Biodiversity Analysis, Acton, ACT, Australia
| |
Collapse
|
31
|
Al-Asadi H, Petkova D, Stephens M, Novembre J. Estimating recent migration and population-size surfaces. PLoS Genet 2019; 15:e1007908. [PMID: 30640906 PMCID: PMC6347299 DOI: 10.1371/journal.pgen.1007908] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 01/25/2019] [Accepted: 12/19/2018] [Indexed: 12/21/2022] Open
Abstract
In many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer maps of population sizes and migration rates associated with different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates associated with different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when using a similar method that ignores haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ∼3,000 years in Europe. We introduce a novel statistical method to infer migration rates and population sizes across space in recent time periods. Our approach builds upon the previously developed EEMS method, which infers effective migration rates under a dense lattice. Similarly, we infer demographic parameters under a lattice and use a (Voronoi) prior to regularize parameters of the model. However, our method differs from EEMS in a few key respects. First, we use the coalescent model parameterized by migration rates and population sizes while EEMS uses a resistance model. As another key difference, our method uses haplotype data while EEMS uses the average genetic distance. A consequence of using haplotype data is that our method can separately estimate migration rates and population sizes, which in essence is done by using a recombination rate map to calibrate the decay of haplotypes over time. An additional useful feature of haplotype data is that, by varying the lengths analyzed, we can infer demography associated with different recent time periods. We call our method MAPS for estimating Migration And Population-size Surfaces. To illustrate MAPS on real data, we analyze a genome-wide SNP dataset on 2224 individuals of European ancestry.
Collapse
Affiliation(s)
- Hussein Al-Asadi
- Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America.,Department of Statistics, University of Chicago, Illinois, United States of America
| | - Desislava Petkova
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Illinois, United States of America.,Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - John Novembre
- Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America.,Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
32
|
Forien R. The stepping stone model in a random environment and the effect of local heterogneities on isolation by distance patterns. ELECTRON J PROBAB 2019. [DOI: 10.1214/19-ejp314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
33
|
Cayuela H, Rougemont Q, Prunier JG, Moore JS, Clobert J, Besnard A, Bernatchez L. Demographic and genetic approaches to study dispersal in wild animal populations: A methodological review. Mol Ecol 2018; 27:3976-4010. [DOI: 10.1111/mec.14848] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 08/17/2018] [Accepted: 08/19/2018] [Indexed: 12/31/2022]
Affiliation(s)
- Hugo Cayuela
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Quentin Rougemont
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Jérôme G. Prunier
- Station d'Ecologie Théorique et Expérimentale; Unité Mixte de Recherche (UMR) 5321; Centre National de la Recherche Scientifique (CNRS); Université Paul Sabatier (UPS); Moulis France
| | - Jean-Sébastien Moore
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Jean Clobert
- Station d'Ecologie Théorique et Expérimentale; Unité Mixte de Recherche (UMR) 5321; Centre National de la Recherche Scientifique (CNRS); Université Paul Sabatier (UPS); Moulis France
| | - Aurélien Besnard
- CNRS; PSL Research University; EPHE; UM, SupAgro, IRD; INRA; UMR 5175 CEFE; Montpellier France
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| |
Collapse
|
34
|
Ringbauer H, Kolesnikov A, Field DL, Barton NH. Estimating Barriers to Gene Flow from Distorted Isolation-by-Distance Patterns. Genetics 2018; 208:1231-1245. [PMID: 29311149 PMCID: PMC5844333 DOI: 10.1534/genetics.117.300638] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 12/23/2017] [Indexed: 11/18/2022] Open
Abstract
In continuous populations with local migration, nearby pairs of individuals have on average more similar genotypes than geographically well-separated pairs. A barrier to gene flow distorts this classical pattern of isolation by distance. Genetic similarity is decreased for sample pairs on different sides of the barrier and increased for pairs on the same side near the barrier. Here, we introduce an inference scheme that uses this signal to detect and estimate the strength of a linear barrier to gene flow in two dimensions. We use a diffusion approximation to model the effects of a barrier on the geographic spread of ancestry backward in time. This approach allows us to calculate the chance of recent coalescence and probability of identity by descent. We introduce an inference scheme that fits these theoretical results to the geographic covariance structure of bialleleic genetic markers. It can estimate the strength of the barrier as well as several demographic parameters. We investigate the power of our inference scheme to detect barriers by applying it to a wide range of simulated data. We also showcase an example application to an Antirrhinum majus (snapdragon) flower-color hybrid zone, where we do not detect any signal of a strong genome-wide barrier to gene flow.
Collapse
Affiliation(s)
- Harald Ringbauer
- Institute of Science and Technology Austria, Klosterneuburg A-3400, Austria
| | | | - David L Field
- Department of Botany and Biodiversity Research, University of Vienna, A-1030, Austria
| | - Nicholas H Barton
- Institute of Science and Technology Austria, Klosterneuburg A-3400, Austria
| |
Collapse
|
35
|
Triska P, Chekanov N, Stepanov V, Khusnutdinova EK, Kumar GPA, Akhmetova V, Babalyan K, Boulygina E, Kharkov V, Gubina M, Khidiyatova I, Khitrinskaya I, Khrameeva EE, Khusainova R, Konovalova N, Litvinov S, Marusin A, Mazur AM, Puzyrev V, Ivanoshchuk D, Spiridonova M, Teslyuk A, Tsygankova S, Triska M, Trofimova N, Vajda E, Balanovsky O, Baranova A, Skryabin K, Tatarinova TV, Prokhortchouk E. Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe. BMC Genet 2017; 18:110. [PMID: 29297395 PMCID: PMC5751809 DOI: 10.1186/s12863-017-0578-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia. RESULTS We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region. We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the "Finno-Ugric" origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main "core", being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the "Great Siberian Vortex" directing genetic exchanges in populations across the Siberian part of Asia. Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts. CONCLUSIONS Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.
Collapse
MESH Headings
- Algorithms
- Asia
- DNA
- Datasets as Topic
- Emigration and Immigration/history
- Ethnicity/genetics
- Europe
- Female
- Genetic Variation
- Genetics, Population
- Genotyping Techniques
- History, 15th Century
- History, 16th Century
- History, 17th Century
- History, 18th Century
- History, 19th Century
- History, 20th Century
- History, 21st Century
- History, Ancient
- History, Medieval
- Humans
- Male
- Russia
Collapse
Affiliation(s)
- Petr Triska
- Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Nikolay Chekanov
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia
- "Genoanalytica" CJSC, Moscow, Russia
| | - Vadim Stepanov
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Elza K Khusnutdinova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
- Bashkir State University, Ufa, Russia
| | | | - Vita Akhmetova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
| | - Konstantin Babalyan
- Moscow Institute of Physics and Technology, Department of Molecular and Bio-Physics, Moscow, Russia
| | | | - Vladimir Kharkov
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Marina Gubina
- Institute of Cytology and Genetics, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Irina Khidiyatova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
- Bashkir State University, Ufa, Russia
| | - Irina Khitrinskaya
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Ekaterina E Khrameeva
- "Genoanalytica" CJSC, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia
| | - Rita Khusainova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
- Bashkir State University, Ufa, Russia
| | | | - Sergey Litvinov
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
| | - Andrey Marusin
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Alexandr M Mazur
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia
| | - Valery Puzyrev
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Dinara Ivanoshchuk
- Institute of Cytology and Genetics, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Maria Spiridonova
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Anton Teslyuk
- Moscow Institute of Physics and Technology, Department of Molecular and Bio-Physics, Moscow, Russia
| | - Svetlana Tsygankova
- Moscow Institute of Physics and Technology, Department of Molecular and Bio-Physics, Moscow, Russia
| | - Martin Triska
- Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Natalya Trofimova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
| | - Edward Vajda
- Department of Modern and Classical Languages, Western Washington University, Bellingham, WA, USA
| | - Oleg Balanovsky
- Research Centre for Medical Genetics, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Ancha Baranova
- Research Centre for Medical Genetics, Moscow, Russia
- School of Systems Biology, George Mason University, Fairfax, VA, USA
- Atlas Biomed Group, Moscow, Russia
| | - Konstantin Skryabin
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia
- Russian Scientific Centre "Kurchatov Institute", Moscow, Russia
- Department of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Tatiana V Tatarinova
- Vavilov Institute of General Genetics, Moscow, Russia.
- School of Systems Biology, George Mason University, Fairfax, VA, USA.
- Atlas Biomed Group, Moscow, Russia.
- Department of Biology, University of La Verne, La Verne, CA, USA.
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.
| | - Egor Prokhortchouk
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia.
- Department of Biology, Lomonosov Moscow State University, Moscow, Russia.
| |
Collapse
|
36
|
Deconstructing isolation-by-distance: The genomic consequences of limited dispersal. PLoS Genet 2017; 13:e1006911. [PMID: 28771477 PMCID: PMC5542401 DOI: 10.1371/journal.pgen.1006911] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 07/06/2017] [Indexed: 12/31/2022] Open
Abstract
Geographically limited dispersal can shape genetic population structure and result in a correlation between genetic and geographic distance, commonly called isolation-by-distance. Despite the prevalence of isolation-by-distance in nature, to date few studies have empirically demonstrated the processes that generate this pattern, largely because few populations have direct measures of individual dispersal and pedigree information. Intensive, long-term demographic studies and exhaustive genomic surveys in the Florida Scrub-Jay (Aphelocoma coerulescens) provide an excellent opportunity to investigate the influence of dispersal on genetic structure. Here, we used a panel of genome-wide SNPs and extensive pedigree information to explore the role of limited dispersal in shaping patterns of isolation-by-distance in both sexes, and at an exceedingly fine spatial scale (within ~10 km). Isolation-by-distance patterns were stronger in male-male and male-female comparisons than in female-female comparisons, consistent with observed differences in dispersal propensity between the sexes. Using the pedigree, we demonstrated how various genealogical relationships contribute to fine-scale isolation-by-distance. Simulations using field-observed distributions of male and female natal dispersal distances showed good agreement with the distribution of geographic distances between breeding individuals of different pedigree relationship classes. Furthermore, we built coalescent simulations parameterized by the observed dispersal curve, population density, and immigration rate, and showed how incorporating these extensions to Malécot’s theory of isolation-by-distance allows us to accurately reconstruct observed sex-specific isolation-by-distance patterns in autosomal and Z-linked SNPs. Therefore, patterns of fine-scale isolation-by-distance in the Florida Scrub-Jay can be well understood as a result of limited dispersal over contemporary timescales. Dispersal is a fundamental component of the life history of most organisms and therefore influences many biological processes. Dispersal is particularly important in creating genetic structure on the landscape. We often observe a pattern of decreased genetic relatedness between individuals as geographic distances increases, or isolation-by-distance. This pattern is particularly pronounced in organisms with extremely short dispersal distances. Despite the ubiquity of isolation-by-distance patterns in nature, there are few examples that explicitly demonstrate how limited dispersal influences spatial genetic structure. Here we investigate the processes that result in spatial genetic structure using the Florida Scrub-Jay, a bird with extremely limited dispersal behavior and extensive genome-wide data. We take advantage of the long-term monitoring of a contiguous population of Florida Scrub-Jays, which has resulted in a detailed pedigree and measurements of dispersal for hundreds of individuals. We show how limited dispersal results in close genealogical relatives living closer together geographically, which generates a strong pattern of isolation-by-distance at an extremely small spatial scale (<10 km) in just a few generations. Given the detailed dispersal, pedigree, and genomic data, we can achieve a fairly complete understanding of how dispersal shapes patterns of genetic diversity over short spatial scales.
Collapse
|