1
|
Agranat-Tamir L, Mooney JA, Rosenberg NA. Counting the genetic ancestors from source populations in members of an admixed population. Genetics 2024; 226:iyae011. [PMID: 38289724 PMCID: PMC10990421 DOI: 10.1093/genetics/iyae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/01/2024] Open
Abstract
In a genetically admixed population, admixed individuals possess genealogical and genetic ancestry from multiple source groups. Under a mechanistic model of admixture, we study the number of distinct ancestors from the source populations that the admixture represents. Combining a mechanistic admixture model with a recombination model that describes the probability that a genealogical ancestor is a genetic ancestor, for a member of a genetically admixed population, we count genetic ancestors from the source populations-those genealogical ancestors from the source populations who contribute to the genome of the modern admixed individual. We compare patterns in the numbers of genealogical and genetic ancestors across the generations. To illustrate the enumeration of genetic ancestors from source populations in an admixed group, we apply the model to the African-American population, extending recent results on the numbers of African and European genealogical ancestors that contribute to the pedigree of an African-American chosen at random, so that we also evaluate the numbers of African and European genetic ancestors who contribute to random African-American genomes. The model suggests that the autosomal genome of a random African-American born in the interval 1960-1965 contains genetic contributions from a mean of 162 African (standard deviation 47, interquartile range 127-192) and 32 European ancestors (standard deviation 14, interquartile range 21-43). The enumeration of genetic ancestors can potentially be performed in other diploid species in which admixture and recombination models can be specified.
Collapse
Affiliation(s)
| | - Jazlyn A Mooney
- Department of Quantitative & Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
2
|
Mooney JA, Agranat-Tamir L, Pritchard JK, Rosenberg NA. On the number of genealogical ancestors tracing to the source groups of an admixed population. Genetics 2023; 224:iyad079. [PMID: 37410594 PMCID: PMC10324943 DOI: 10.1093/genetics/iyad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 04/05/2023] [Indexed: 07/08/2023] Open
Abstract
Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual's genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75-85% value for African ancestry on average and 15-25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960-1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240-376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32-69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.
Collapse
Affiliation(s)
- Jazlyn A Mooney
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | | - Jonathan K Pritchard
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Severson AL, Carmi S, Rosenberg NA. Variance and limiting distribution of coalescence times in a diploid model of a consanguineous population. Theor Popul Biol 2021; 139:50-65. [PMID: 33675872 DOI: 10.1016/j.tpb.2021.02.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 02/14/2021] [Indexed: 10/22/2022]
Abstract
Recent modeling studies interested in runs of homozygosity (ROH) and identity by descent (IBD) have sought to connect these properties of genomic sharing to pairwise coalescence times. Here, we examine a variety of features of pairwise coalescence times in models that consider consanguinity. In particular, we extend a recent diploid analysis of mean coalescence times for lineage pairs within and between individuals in a consanguineous population to derive the variance of coalescence times, studying its dependence on the frequency of consanguinity and the kinship coefficient of consanguineous relationships. We also introduce a separation-of-time-scales approach that treats consanguinity models analogously to mathematically similar phenomena such as partial selfing, using this approach to obtain coalescence-time distributions. This approach shows that the consanguinity model behaves similarly to a standard coalescent, scaling population size by a factor 1-3c, where c represents the kinship coefficient of a randomly chosen mating pair. It provides the explanation for an earlier result describing mean coalescence time in the consanguinity model in terms of c. The results extend the potential to make predictions about ROH and IBD in relation to demographic parameters of diploid populations.
Collapse
Affiliation(s)
- Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Ein Kerem, 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
4
|
Battey CJ, Ralph PL, Kern AD. Predicting geographic location from genetic variation with deep neural networks. eLife 2020; 9:e54507. [PMID: 32511092 PMCID: PMC7324158 DOI: 10.7554/elife.54507] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/03/2020] [Indexed: 12/12/2022] Open
Abstract
Most organisms are more closely related to nearby than distant members of their species, creating spatial autocorrelations in genetic data. This allows us to predict the location of origin of a genetic sample by comparing it to a set of samples of known geographic origin. Here, we describe a deep learning method, which we call Locator, to accomplish this task faster and more accurately than existing approaches. In simulations, Locator infers sample location to within 4.1 generations of dispersal and runs at least an order of magnitude faster than a recent model-based approach. We leverage Locator's computational efficiency to predict locations separately in windows across the genome, which allows us to both quantify uncertainty and describe the mosaic ancestry and patterns of geographic mixing that characterize many populations. Applied to whole-genome sequence data from Plasmodium parasites, Anopheles mosquitoes, and global human populations, this approach yields median test errors of 16.9km, 5.7km, and 85km, respectively.
Collapse
Affiliation(s)
- CJ Battey
- University of Oregon, Institute of Ecology and EvolutionEugeneUnited States
| | - Peter L Ralph
- University of Oregon, Institute of Ecology and EvolutionEugeneUnited States
| | - Andrew D Kern
- University of Oregon, Institute of Ecology and EvolutionEugeneUnited States
| |
Collapse
|
5
|
Bradburd GS, Ralph PL. Spatial Population Genetics: It's About Time. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2019. [DOI: 10.1146/annurev-ecolsys-110316-022659] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Many important questions about the history and dynamics of organisms have a geographical component: How many are there, and where do they live? How do they move and interbreed across the landscape? How were they moving a thousand years ago, and where were the ancestors of a particular individual alive today? Answers to these questions can have profound consequences for our understanding of history, ecology, and the evolutionary process. In this review, we discuss how geographic aspects of the distribution, movement, and reproduction of organisms are reflected in their pedigree across space and time. Because the structure of the pedigree is what determines patterns of relatedness in modern genetic variation, our aim is to thus provide intuition for how these processes leave an imprint in genetic data. We also highlight some current methods and gaps in the statistical toolbox of spatial population genetics.
Collapse
Affiliation(s)
- Gideon S. Bradburd
- Ecology, Evolutionary Biology, and Behavior Group, Department of Integrative Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - Peter L. Ralph
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, Eugene, Oregon 97403, USA
- Department of Mathematics, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
6
|
Ryman N, Laikre L, Hössjer O. Do estimates of contemporary effective population size tell us what we want to know? Mol Ecol 2019; 28:1904-1918. [PMID: 30663828 PMCID: PMC6850010 DOI: 10.1111/mec.15027] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 01/14/2019] [Accepted: 01/15/2019] [Indexed: 12/25/2022]
Abstract
Estimation of effective population size (Ne) from genetic marker data is a major focus for biodiversity conservation because it is essential to know at what rates inbreeding is increasing and additive genetic variation is lost. But are these the rates assessed when applying commonly used Ne estimation techniques? Here we use recently developed analytical tools and demonstrate that in the case of substructured populations the answer is no. This is because the following: Genetic change can be quantified in several ways reflecting different types of Ne such as inbreeding (NeI), variance (NeV), additive genetic variance (NeAV), linkage disequilibrium equilibrium (NeLD), eigenvalue (NeE) and coalescence (NeCo) effective size. They are all the same for an isolated population of constant size, but the realized values of these effective sizes can differ dramatically in populations under migration. Commonly applied Ne‐estimators target NeV or NeLD of individual subpopulations. While such estimates are safe proxies for the rates of inbreeding and loss of additive genetic variation under isolation, we show that they are poor indicators of these rates in populations affected by migration. In fact, both the local and global inbreeding (NeI) and additive genetic variance (NeAV) effective sizes are consistently underestimated in a subdivided population. This is serious because these are the effective sizes that are relevant to the widely accepted 50/500 rule for short and long term genetic conservation. The bias can be infinitely large and is due to inappropriate parameters being estimated when applying theory for isolated populations to subdivided ones.
Collapse
Affiliation(s)
- Nils Ryman
- Department of Zoology, Division of Population Genetics, Stockholm University, Stockholm, Sweden
| | - Linda Laikre
- Department of Zoology, Division of Population Genetics, Stockholm University, Stockholm, Sweden
| | - Ola Hössjer
- Department of Mathematics, Stockholm University, Stockholm, Sweden
| |
Collapse
|
7
|
Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference. Theor Popul Biol 2017; 115:1-12. [DOI: 10.1016/j.tpb.2017.01.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 01/02/2017] [Accepted: 01/18/2017] [Indexed: 01/08/2023]
|
8
|
Cantet R, García-Baccino C, Rogberg-Muñoz A, Forneris N, Munilla S. Beyond genomic selection: The animal model strikes back (one generation)! J Anim Breed Genet 2017; 134:224-231. [DOI: 10.1111/jbg.12271] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 03/06/2017] [Indexed: 01/09/2023]
Affiliation(s)
- R.J.C. Cantet
- Departamento de Producción Animal; Facultad de Agronomía; Universidad de Buenos Aires; Ciudad Autónoma de Buenos Aires Argentina
- Instituto de Investigaciones en Producción Animal (INPA) UBA-CONICET; Buenos Aires Argentina
| | - C.A. García-Baccino
- Departamento de Producción Animal; Facultad de Agronomía; Universidad de Buenos Aires; Ciudad Autónoma de Buenos Aires Argentina
| | - A. Rogberg-Muñoz
- Departamento de Producción Animal; Facultad de Agronomía; Universidad de Buenos Aires; Ciudad Autónoma de Buenos Aires Argentina
- Instituto de Genética Veterinaria (IGEVET); Facultad de Ciencias Veterinarias; Universidad Nacional de La Plata (UNLP) - CONICET; La Plata Provincia de Buenos Aires Argentina
| | - N.S. Forneris
- Departamento de Producción Animal; Facultad de Agronomía; Universidad de Buenos Aires; Ciudad Autónoma de Buenos Aires Argentina
| | - S. Munilla
- Departamento de Producción Animal; Facultad de Agronomía; Universidad de Buenos Aires; Ciudad Autónoma de Buenos Aires Argentina
| |
Collapse
|
9
|
Joseph TA, Hickerson MJ, Alvarado-Serrano DF. Demographic inference under a spatially continuous coalescent model. Heredity (Edinb) 2016; 117:94-9. [PMID: 27118157 DOI: 10.1038/hdy.2016.28] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 03/18/2016] [Accepted: 03/18/2016] [Indexed: 01/19/2023] Open
Abstract
In contrast with the classical population genetics theory that models population structure as discrete panmictic units connected by migration, many populations exhibit heterogeneous spatial gradients in population connectivity across semi-continuous habitats. The historical dynamics of such spatially structured populations can be captured by a spatially explicit coalescent model recently proposed by Etheridge (2008) and Barton et al. (2010a, 2010b) and whereby allelic lineages are distributed in a two-dimensional spatial continuum and move within this continuum based on extinction and coalescent events. Though theoretically rigorous, this model, which we here refer to as the continuum model, has not yet been implemented for demographic inference. To this end, here we introduce and demonstrate a statistical pipeline that couples the coalescent simulator of Kelleher et al. (2014) that simulates genealogies under the continuum model, with an approximate Bayesian computation (ABC) framework for parameter estimation of neighborhood size (that is, the number of locally breeding individuals) and dispersal ability (that is, the distance an offspring can travel within a generation). Using empirically informed simulations and simulation-based ABC cross-validation, we first show that neighborhood size can be accurately estimated. We then apply our pipeline to the South African endemic shrub species Berkheya cuneata to use the resulting estimates of dispersal ability and neighborhood size to infer the average population density of the species. More generally, we show that spatially explicit coalescent models can be successfully integrated into model-based demographic inference.
Collapse
Affiliation(s)
- T A Joseph
- Biology Department, The City College of New York, City University of New York, New York, NY, USA
| | - M J Hickerson
- Biology Department, The City College of New York, City University of New York, New York, NY, USA.,Program in Ecology, Evolutionary Biology, & Behavior, The Graduate Center, City University of New York (CUNY), New York, NY, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - D F Alvarado-Serrano
- Biology Department, The City College of New York, City University of New York, New York, NY, USA
| |
Collapse
|