1
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
2
|
Mooney JA, Agranat-Tamir L, Pritchard JK, Rosenberg NA. On the number of genealogical ancestors tracing to the source groups of an admixed population. Genetics 2023; 224:iyad079. [PMID: 37410594 PMCID: PMC10324943 DOI: 10.1093/genetics/iyad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 04/05/2023] [Indexed: 07/08/2023] Open
Abstract
Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual's genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75-85% value for African ancestry on average and 15-25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960-1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240-376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32-69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.
Collapse
Affiliation(s)
- Jazlyn A Mooney
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | | - Jonathan K Pritchard
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Pedigree in the biparental Moran model. J Math Biol 2022; 84:51. [DOI: 10.1007/s00285-022-01752-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 12/07/2021] [Accepted: 04/05/2022] [Indexed: 11/25/2022]
|
4
|
Severson AL, Carmi S, Rosenberg NA. Variance and limiting distribution of coalescence times in a diploid model of a consanguineous population. Theor Popul Biol 2021; 139:50-65. [PMID: 33675872 DOI: 10.1016/j.tpb.2021.02.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 02/14/2021] [Indexed: 10/22/2022]
Abstract
Recent modeling studies interested in runs of homozygosity (ROH) and identity by descent (IBD) have sought to connect these properties of genomic sharing to pairwise coalescence times. Here, we examine a variety of features of pairwise coalescence times in models that consider consanguinity. In particular, we extend a recent diploid analysis of mean coalescence times for lineage pairs within and between individuals in a consanguineous population to derive the variance of coalescence times, studying its dependence on the frequency of consanguinity and the kinship coefficient of consanguineous relationships. We also introduce a separation-of-time-scales approach that treats consanguinity models analogously to mathematically similar phenomena such as partial selfing, using this approach to obtain coalescence-time distributions. This approach shows that the consanguinity model behaves similarly to a standard coalescent, scaling population size by a factor 1-3c, where c represents the kinship coefficient of a randomly chosen mating pair. It provides the explanation for an earlier result describing mean coalescence time in the consanguinity model in terms of c. The results extend the potential to make predictions about ROH and IBD in relation to demographic parameters of diploid populations.
Collapse
Affiliation(s)
- Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Ein Kerem, 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
5
|
Wakeley J. Developments in coalescent theory from single loci to chromosomes. Theor Popul Biol 2020; 133:56-64. [DOI: 10.1016/j.tpb.2020.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 02/19/2020] [Accepted: 02/26/2020] [Indexed: 10/24/2022]
|
6
|
Nelson D, Kelleher J, Ragsdale AP, Moreau C, McVean G, Gravel S. Accounting for long-range correlations in genome-wide simulations of large cohorts. PLoS Genet 2020; 16:e1008619. [PMID: 32369493 PMCID: PMC7266353 DOI: 10.1371/journal.pgen.1008619] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 06/02/2020] [Accepted: 01/21/2020] [Indexed: 11/20/2022] Open
Abstract
Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past. Coalescent theory has provided deep theoretical insight into patterns of human diversity. Implementations of coalescent models in simulation software such as ms have further provided tools to interpret thousands of genomic studies. Recent technical progress has allowed for a dramatic increase in the scale at which genomes can be both measured and simulated, opening up opportunities for a finer understanding of evolutionary biology. However, we show that coalescent simulations of long regions of the genome exhibit large biases in sample relatedness, distorting haplotype sharing and ancestry patterns in simulated cohorts. We trace these biases to basic assumptions of the coalescent model, and show how the assumptions can be relaxed to provide a better description of the observed patterns of genetic polymorphism at a fraction of the computational cost.
Collapse
Affiliation(s)
- Dominic Nelson
- McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Aaron P. Ragsdale
- McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
| | - Claudia Moreau
- Centre Intersectoriel en Santé Durable, Université du Québec à Chicoutimi, Saguenay, Québec, Canada
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Simon Gravel
- McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
- * E-mail:
| |
Collapse
|
7
|
The Effect of Consanguinity on Between-Individual Identity-by-Descent Sharing. Genetics 2019; 212:305-316. [PMID: 30926583 PMCID: PMC6499533 DOI: 10.1534/genetics.119.302136] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/22/2019] [Indexed: 11/18/2022] Open
Abstract
Consanguineous unions increase the rate at which identical genomic segments are paired within individuals to produce runs of homozygosity (ROH). The extent to which such unions affect identity-by-descent (IBD) genomic sharing between rather than within individuals in a population, however, is not immediately evident from within-individual ROH levels. Using the fact that the time to the most recent common ancestor [Formula: see text] for a pair of genomes at a specific locus is inversely related to the extent of IBD sharing between the genomes in the neighborhood of the locus, we study IBD sharing for a pair of genomes sampled either within the same individual or in different individuals. We develop a coalescent model for a set of mating pairs in a diploid population, treating the fraction of consanguineous unions as a parameter. Considering mating models that include unions between sibs, first cousins, and nth cousins, we determine the effect of the consanguinity rate on the mean [Formula: see text] for pairs of lineages sampled either within the same individual or in different individuals. The results indicate that consanguinity not only increases ROH sharing between the two genomes within an individual, it also increases IBD sharing between individuals in the population, the magnitude of the effect increasing with the kinship coefficient of the type of consanguineous union. Considering computations of ROH and between-individual IBD in Jewish populations whose consanguinity rates have been estimated from demographic data, we find that, in accord with the theoretical results, increases in consanguinity and ROH levels inflate levels of IBD sharing between individuals in a population. The results contribute more generally to the interpretation of runs of homozygosity, IBD sharing between individuals, and the relationship between ROH and IBD.
Collapse
|
8
|
Shchur V, Nielsen R. On the number of siblings and p-th cousins in a large population sample. J Math Biol 2018; 77:1279-1298. [PMID: 29876645 DOI: 10.1007/s00285-018-1252-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 05/03/2018] [Indexed: 11/28/2022]
Abstract
The number of individuals in a random sample with close relatives in the sample is a quantity of interest when designing Genome Wide Association Studies and other cohort based genetic, and non-genetic, studies. In this paper, we develop expressions for the distribution and expectation of the number of p-th cousins in a sample from a population of size N under two diploid Wright-Fisher models. We also develop simple asymptotic expressions for large values of N. For example, the expected proportion of individuals with at least one p-th cousin in a sample of K individuals, for a diploid dioecious Wright-Fisher model, is approximately [Formula: see text]. Our results show that a substantial fraction of individuals in the sample will have at least a second cousin if the sampling fraction (K / N) is on the order of [Formula: see text]. This confirms that, for large cohort samples, relatedness among individuals cannot easily be ignored.
Collapse
Affiliation(s)
- Vladimir Shchur
- Departments of Integrative Biology and Statistics, University of California Berkeley, 4098 Valley Life Sciences Building (VLSB), Berkeley, CA, 94720-3140, USA.
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California Berkeley, 4098 Valley Life Sciences Building (VLSB), Berkeley, CA, 94720-3140, USA.,Museum of Natural History, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| |
Collapse
|