1
|
Hobolth A, Rivas-González I, Bladt M, Futschik A. Phase-type distributions in mathematical population genetics: An emerging framework. Theor Popul Biol 2024; 157:14-32. [PMID: 38460602 DOI: 10.1016/j.tpb.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 02/29/2024] [Accepted: 03/04/2024] [Indexed: 03/11/2024]
Abstract
A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the 'phases' in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.
Collapse
Affiliation(s)
- Asger Hobolth
- Department of Mathematics, Aarhus University, Denmark.
| | | | - Mogens Bladt
- Department of Mathematical Sciences, University of Copenhagen, Denmark.
| | - Andreas Futschik
- Institute of Applied Statistics, Johannes Kepler University, Austria.
| |
Collapse
|
2
|
Cotter DJ, Severson AL, Kang JTL, Godrej HN, Carmi S, Rosenberg NA. Modeling the effects of consanguinity on autosomal and X-chromosomal runs of homozygosity and identity-by-descent sharing. G3 (BETHESDA, MD.) 2024; 14:jkad264. [PMID: 37972246 PMCID: PMC10849319 DOI: 10.1093/g3journal/jkad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/01/2023] [Accepted: 11/08/2023] [Indexed: 11/19/2023]
Abstract
Runs of homozygosity (ROH) and identity-by-descent (IBD) sharing can be studied in diploid coalescent models by noting that ROH and IBD-sharing at a genomic site are predicted to be inversely related to coalescence times-which in turn can be mathematically obtained in terms of parameters describing consanguinity rates. Comparing autosomal and X-chromosomal coalescent models, we consider ROH and IBD-sharing in relation to consanguinity that proceeds via multiple forms of first-cousin mating. We predict that across populations with different levels of consanguinity, (1) in a manner that is qualitatively parallel to the increase of autosomal IBD-sharing with autosomal ROH, X-chromosomal IBD-sharing increases with X-chromosomal ROH, owing to the dependence of both quantities on consanguinity levels; (2) even in the absence of consanguinity, X-chromosomal ROH and IBD-sharing levels exceed corresponding values for the autosomes, owing to the smaller population size and lower coalescence time for the X chromosome than for autosomes; (3) with matrilateral consanguinity, the relative increase in ROH and IBD-sharing on the X chromosome compared to the autosomes is greater than in the absence of consanguinity. Examining genome-wide SNPs in human populations for which consanguinity levels have been estimated, we find that autosomal and X-chromosomal ROH and IBD-sharing levels generally accord with the predictions. We find that each 1% increase in autosomal ROH is associated with an increase of 2.1% in X-chromosomal ROH, and each 1% increase in autosomal IBD-sharing is associated with an increase of 1.6% in X-chromosomal IBD-sharing. For each calculation, particularly for ROH, the estimate is reasonably close to the increase of 2% predicted by the population-size difference between autosomes and X chromosomes. The results support the utility of coalescent models for understanding patterns of genomic sharing and their dependence on sex-biased processes.
Collapse
Affiliation(s)
- Daniel J Cotter
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Jonathan T L Kang
- School of Math and Science, Singapore Polytechnic, 139651, Singapore
| | - Hormazd N Godrej
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Mooney JA, Agranat-Tamir L, Pritchard JK, Rosenberg NA. On the number of genealogical ancestors tracing to the source groups of an admixed population. Genetics 2023; 224:iyad079. [PMID: 37410594 PMCID: PMC10324943 DOI: 10.1093/genetics/iyad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 04/05/2023] [Indexed: 07/08/2023] Open
Abstract
Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual's genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75-85% value for African ancestry on average and 15-25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960-1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240-376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32-69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.
Collapse
Affiliation(s)
- Jazlyn A Mooney
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | | - Jonathan K Pritchard
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
4
|
Cotter DJ, Severson AL, Carmi S, Rosenberg NA. Limiting distribution of X-chromosomal coalescence times under first-cousin consanguineous mating. Theor Popul Biol 2022; 147:1-15. [PMID: 35973448 PMCID: PMC9867987 DOI: 10.1016/j.tpb.2022.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/21/2022] [Accepted: 07/22/2022] [Indexed: 01/26/2023]
Abstract
By providing additional opportunities for coalescence within families, the presence of consanguineous unions in a population reduces coalescence times relative to non-consanguineous populations. First-cousin consanguinity can take one of six forms differing in the configuration of sexes in the pedigree of the male and female cousins who join in a consanguineous union: patrilateral parallel, patrilateral cross, matrilateral parallel, matrilateral cross, bilateral parallel, and bilateral cross. Considering populations with each of the six types of first-cousin consanguinity individually and a population with a mixture of the four unilateral types, we examine coalescent models of consanguinity. We previously computed, for first-cousin consanguinity models, the mean coalescence time for X-chromosomal loci and the limiting distribution of coalescence times for autosomal loci. Here, we use the separation-of-time-scales approach to obtain the limiting distribution of coalescence times for X-chromosomal loci. This limiting distribution has an instantaneous coalescence probability that depends on the probability that a union is consanguineous; lineages that do not coalesce instantaneously coalesce according to an exponential distribution. We study the effects on the coalescence time distribution of the type of first-cousin consanguinity, showing that patrilateral-parallel and patrilateral-cross consanguinity have no effect on X-chromosomal coalescence time distributions and that matrilateral-parallel consanguinity decreases coalescence times to a greater extent than does matrilateral-cross consanguinity.
Collapse
Affiliation(s)
- Daniel J Cotter
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
5
|
Arciero E, Dogra SA, Malawsky DS, Mezzavilla M, Tsismentzoglou T, Huang QQ, Hunt KA, Mason D, Sharif SM, van Heel DA, Sheridan E, Wright J, Small N, Carmi S, Iles MM, Martin HC. Fine-scale population structure and demographic history of British Pakistanis. Nat Commun 2021; 12:7189. [PMID: 34893604 PMCID: PMC8664933 DOI: 10.1038/s41467-021-27394-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 11/09/2021] [Indexed: 02/08/2023] Open
Abstract
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.
Collapse
Affiliation(s)
- Elena Arciero
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Sufyan A Dogra
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | | | | | - Theofanis Tsismentzoglou
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Qin Qin Huang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Karen A Hunt
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Dan Mason
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Saghira Malik Sharif
- Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - David A van Heel
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eamonn Sheridan
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Neil Small
- Faculty of Health Studies, University of Bradford, Richmond Road, Bradford, UK
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mark M Iles
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Hilary C Martin
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
6
|
Cotter DJ, Severson AL, Rosenberg NA. The effect of consanguinity on coalescence times on the X chromosome. Theor Popul Biol 2021; 140:32-43. [PMID: 33901539 DOI: 10.1016/j.tpb.2021.03.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/22/2021] [Accepted: 03/26/2021] [Indexed: 10/21/2022]
Abstract
Consanguineous unions increase the frequency at which identical genomic segments are inherited along separate paths of descent, decreasing coalescence times for pairs of alleles drawn from an individual who is the offspring of a consanguineous pair. For an autosomal locus, it has recently been shown that the mean time to the most recent common ancestor (TMRCA) for two alleles in the same individual and the mean TMRCA for two alleles in two separate individuals both decrease with increasing consanguinity in a population. Here, we extend this analysis to the X chromosome, considering X-chromosomal coalescence times under a coalescent model with diploid, male-female mating pairs. We examine four possible first-cousin mating schemes that are equivalent in their effects on autosomes, but that have differing effects on the X chromosome: patrilateral-parallel, patrilateral-cross, matrilateral-parallel, and matrilateral-cross. In each mating model, we calculate mean TMRCA for X-chromosomal alleles sampled either within or between individuals. We describe a consanguinity effect on X-chromosomal TMRCA that differs from the autosomal pattern under matrilateral but not under patrilateral first-cousin mating. For matrilateral first cousins, the effect of consanguinity in reducing TMRCA is stronger on the X chromosome than on the autosomes, with an increased effect of parallel-cousin mating compared to cross-cousin mating. The theoretical computations support the utility of the model in understanding patterns of genomic sharing on the X chromosome.
Collapse
Affiliation(s)
- Daniel J Cotter
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
7
|
Sahoo SA, Zaidi RA, Anagol S, Mathieson I. Long Runs of Homozygosity Are Correlated with Marriage Preferences across Global Population Samples. Hum Biol 2021; 93:201-216. [PMID: 37701498 PMCID: PMC10497073 DOI: 10.1353/hub.2021.0011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Children of consanguineous unions carry long runs of homozygosity (ROH) in their genomes, due to their parents' recent shared ancestry. This increases the burden of recessive disease in populations with high levels of consanguinity and has been heavily studied in some groups. However, there has been little investigation of the broader effect of consanguinity on patterns of genetic variation on a global scale. This study, which collected published genetic data and information about marriage practice from 395 worldwide populations, shows that reported preference for cousin marriage has a detectable association with the distribution of long ROH in this sample, increasing the expected number of ROH longer than 10 cM by a factor of 2.2. Variation in marriage practice and consequent rates of consanguinity are therefore an important aspect of demographic history for the purposes of modeling human genetic variation. However, reported marriage practices explain a relatively small proportion of the variation in ROH distribution, and consequently, population genetic data are only partially informative about cultural preferences.
Collapse
Affiliation(s)
- Samali Anova Sahoo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - rslan A. Zaidi
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Santosh Anagol
- Business Economics and Public Policy, Wharton School of Business, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|