1
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. Theor Popul Biol 2024; 159:108-124. [PMID: 38492811 DOI: 10.1016/j.tpb.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 03/04/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l'aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France.
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany.
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States.
| |
Collapse
|
2
|
Cotter DJ, Severson AL, Kang JTL, Godrej HN, Carmi S, Rosenberg NA. Modeling the effects of consanguinity on autosomal and X-chromosomal runs of homozygosity and identity-by-descent sharing. G3 (BETHESDA, MD.) 2024; 14:jkad264. [PMID: 37972246 PMCID: PMC10849319 DOI: 10.1093/g3journal/jkad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/01/2023] [Accepted: 11/08/2023] [Indexed: 11/19/2023]
Abstract
Runs of homozygosity (ROH) and identity-by-descent (IBD) sharing can be studied in diploid coalescent models by noting that ROH and IBD-sharing at a genomic site are predicted to be inversely related to coalescence times-which in turn can be mathematically obtained in terms of parameters describing consanguinity rates. Comparing autosomal and X-chromosomal coalescent models, we consider ROH and IBD-sharing in relation to consanguinity that proceeds via multiple forms of first-cousin mating. We predict that across populations with different levels of consanguinity, (1) in a manner that is qualitatively parallel to the increase of autosomal IBD-sharing with autosomal ROH, X-chromosomal IBD-sharing increases with X-chromosomal ROH, owing to the dependence of both quantities on consanguinity levels; (2) even in the absence of consanguinity, X-chromosomal ROH and IBD-sharing levels exceed corresponding values for the autosomes, owing to the smaller population size and lower coalescence time for the X chromosome than for autosomes; (3) with matrilateral consanguinity, the relative increase in ROH and IBD-sharing on the X chromosome compared to the autosomes is greater than in the absence of consanguinity. Examining genome-wide SNPs in human populations for which consanguinity levels have been estimated, we find that autosomal and X-chromosomal ROH and IBD-sharing levels generally accord with the predictions. We find that each 1% increase in autosomal ROH is associated with an increase of 2.1% in X-chromosomal ROH, and each 1% increase in autosomal IBD-sharing is associated with an increase of 1.6% in X-chromosomal IBD-sharing. For each calculation, particularly for ROH, the estimate is reasonably close to the increase of 2% predicted by the population-size difference between autosomes and X chromosomes. The results support the utility of coalescent models for understanding patterns of genomic sharing and their dependence on sex-biased processes.
Collapse
Affiliation(s)
- Daniel J Cotter
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Jonathan T L Kang
- School of Math and Science, Singapore Polytechnic, 139651, Singapore
| | - Hormazd N Godrej
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Ki C, Terhorst J. Exact Decoding of a Sequentially Markov Coalescent Model in Genetics. J Am Stat Assoc 2023; 119:2242-2255. [PMID: 39323740 PMCID: PMC11421421 DOI: 10.1080/01621459.2023.2252570] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 08/01/2023] [Accepted: 08/17/2023] [Indexed: 09/27/2024]
Abstract
In statistical genetics, the sequentially Markov coalescent (SMC) is an important family of models for approximating the distribution of genetic variation data under complex evolutionary models. Methods based on SMC are widely used in genetics and evolutionary biology, with significant applications to genotype phasing and imputation, recombination rate estimation, and inferring population history. SMC allows for likelihood-based inference using hidden Markov models (HMMs), where the latent variable represents a genealogy. Because genealogies are continuous, while HMMs are discrete, SMC requires discretizing the space of trees in a way that is awkward and creates bias. In this work, we propose a method that circumvents this requirement, enabling SMC-based inference to be performed in the natural setting of a continuous state space. We derive fast, exact procedures for frequentist and Bayesian inference using SMC. Compared to existing methods, ours requires minimal user intervention or parameter tuning, no numerical optimization or E-M, and is faster and more accurate.
Collapse
Affiliation(s)
- Caleb Ki
- Department of Statistics, University of Michigan
| | | |
Collapse
|
4
|
Schweiger R, Durbin R. Ultrafast genome-wide inference of pairwise coalescence times. Genome Res 2023; 33:1023-1031. [PMID: 37562965 PMCID: PMC10538485 DOI: 10.1101/gr.277665.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/21/2023] [Indexed: 08/12/2023]
Abstract
The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.
Collapse
Affiliation(s)
- Regev Schweiger
- Department of Genetics, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| |
Collapse
|
5
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.13.544589. [PMID: 37398501 PMCID: PMC10312651 DOI: 10.1101/2023.06.13.544589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l’aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States
| |
Collapse
|
6
|
Genome-wide data from medieval German Jews show that the Ashkenazi founder event pre-dated the 14 th century. Cell 2022; 185:4703-4716.e16. [PMID: 36455558 PMCID: PMC9793425 DOI: 10.1016/j.cell.2022.11.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 08/26/2022] [Accepted: 11/01/2022] [Indexed: 12/05/2022]
Abstract
We report genome-wide data from 33 Ashkenazi Jews (AJ), dated to the 14th century, obtained following a salvage excavation at the medieval Jewish cemetery of Erfurt, Germany. The Erfurt individuals are genetically similar to modern AJ, but they show more variability in Eastern European-related ancestry than modern AJ. A third of the Erfurt individuals carried a mitochondrial lineage common in modern AJ and eight carried pathogenic variants known to affect AJ today. These observations, together with high levels of runs of homozygosity, suggest that the Erfurt community had already experienced the major reduction in size that affected modern AJ. The Erfurt bottleneck was more severe, implying substructure in medieval AJ. Overall, our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ.
Collapse
|
7
|
Cotter DJ, Severson AL, Carmi S, Rosenberg NA. Limiting distribution of X-chromosomal coalescence times under first-cousin consanguineous mating. Theor Popul Biol 2022; 147:1-15. [PMID: 35973448 PMCID: PMC9867987 DOI: 10.1016/j.tpb.2022.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/21/2022] [Accepted: 07/22/2022] [Indexed: 01/26/2023]
Abstract
By providing additional opportunities for coalescence within families, the presence of consanguineous unions in a population reduces coalescence times relative to non-consanguineous populations. First-cousin consanguinity can take one of six forms differing in the configuration of sexes in the pedigree of the male and female cousins who join in a consanguineous union: patrilateral parallel, patrilateral cross, matrilateral parallel, matrilateral cross, bilateral parallel, and bilateral cross. Considering populations with each of the six types of first-cousin consanguinity individually and a population with a mixture of the four unilateral types, we examine coalescent models of consanguinity. We previously computed, for first-cousin consanguinity models, the mean coalescence time for X-chromosomal loci and the limiting distribution of coalescence times for autosomal loci. Here, we use the separation-of-time-scales approach to obtain the limiting distribution of coalescence times for X-chromosomal loci. This limiting distribution has an instantaneous coalescence probability that depends on the probability that a union is consanguineous; lineages that do not coalesce instantaneously coalesce according to an exponential distribution. We study the effects on the coalescence time distribution of the type of first-cousin consanguinity, showing that patrilateral-parallel and patrilateral-cross consanguinity have no effect on X-chromosomal coalescence time distributions and that matrilateral-parallel consanguinity decreases coalescence times to a greater extent than does matrilateral-cross consanguinity.
Collapse
Affiliation(s)
- Daniel J Cotter
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
8
|
Parental relatedness through time revealed by runs of homozygosity in ancient DNA. Nat Commun 2021; 12:5425. [PMID: 34521843 PMCID: PMC8440622 DOI: 10.1038/s41467-021-25289-w] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 07/21/2021] [Indexed: 02/08/2023] Open
Abstract
Parental relatedness of present-day humans varies substantially across the globe, but little is known about the past. Here we analyze ancient DNA, leveraging that parental relatedness leaves genomic traces in the form of runs of homozygosity. We present an approach to identify such runs in low-coverage ancient DNA data aided by haplotype information from a modern phased reference panel. Simulation and experiments show that this method robustly detects runs of homozygosity longer than 4 centimorgan for ancient individuals with at least 0.3 × coverage. Analyzing genomic data from 1,785 ancient humans who lived in the last 45,000 years, we detect low rates of first cousin or closer unions across most ancient populations. Moreover, we find a marked decay in background parental relatedness co-occurring with or shortly after the advent of sedentary agriculture. We observe this signal, likely linked to increasing local population sizes, across several geographic transects worldwide.
Collapse
|
9
|
Deng Y, Song YS, Nielsen R. The distribution of waiting distances in ancestral recombination graphs. Theor Popul Biol 2021; 141:34-43. [PMID: 34186053 DOI: 10.1016/j.tpb.2021.06.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 06/11/2021] [Accepted: 06/16/2021] [Indexed: 11/25/2022]
Abstract
The ancestral recombination graph (ARG) contains the full genealogical information of the sample, and many population genetic inference problems can be solved using inferred or sampled ARGs. In particular, the waiting distance between tree changes along the genome can be used to make inference about the distribution and evolution of recombination rates. To this end, we here derive an analytic expression for the distribution of waiting distances between tree changes under the sequentially Markovian coalescent model and obtain an accurate approximation to the distribution of waiting distances for topology changes. We use these results to show that some of the recently proposed methods for inferring sequences of trees along the genome provide strongly biased distributions of waiting distances. In addition, we provide a correction to an undercounting problem facing all available ARG inference methods, thereby facilitating the use of ARG inference methods to estimate temporal changes in the recombination rate.
Collapse
Affiliation(s)
- Yun Deng
- Center for Computational Biology, University of California, Berkeley, CA 94720, United States of America.
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, CA 94720, United States of America; Computer Science Division, University of California, Berkeley, CA 94720, United States of America; Chan Zuckerberg Biohub, San Francisco, CA 94158, United States of America
| | - Rasmus Nielsen
- Department of Statistics, University of California, Berkeley, CA 94720, United States of America; Department of Integrative biology, University of California, Berkeley, CA 94720, United States of America.
| |
Collapse
|
10
|
Cotter DJ, Severson AL, Rosenberg NA. The effect of consanguinity on coalescence times on the X chromosome. Theor Popul Biol 2021; 140:32-43. [PMID: 33901539 DOI: 10.1016/j.tpb.2021.03.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/22/2021] [Accepted: 03/26/2021] [Indexed: 10/21/2022]
Abstract
Consanguineous unions increase the frequency at which identical genomic segments are inherited along separate paths of descent, decreasing coalescence times for pairs of alleles drawn from an individual who is the offspring of a consanguineous pair. For an autosomal locus, it has recently been shown that the mean time to the most recent common ancestor (TMRCA) for two alleles in the same individual and the mean TMRCA for two alleles in two separate individuals both decrease with increasing consanguinity in a population. Here, we extend this analysis to the X chromosome, considering X-chromosomal coalescence times under a coalescent model with diploid, male-female mating pairs. We examine four possible first-cousin mating schemes that are equivalent in their effects on autosomes, but that have differing effects on the X chromosome: patrilateral-parallel, patrilateral-cross, matrilateral-parallel, and matrilateral-cross. In each mating model, we calculate mean TMRCA for X-chromosomal alleles sampled either within or between individuals. We describe a consanguinity effect on X-chromosomal TMRCA that differs from the autosomal pattern under matrilateral but not under patrilateral first-cousin mating. For matrilateral first cousins, the effect of consanguinity in reducing TMRCA is stronger on the X chromosome than on the autosomes, with an increased effect of parallel-cousin mating compared to cross-cousin mating. The theoretical computations support the utility of the model in understanding patterns of genomic sharing on the X chromosome.
Collapse
Affiliation(s)
- Daniel J Cotter
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Severson AL, Carmi S, Rosenberg NA. Variance and limiting distribution of coalescence times in a diploid model of a consanguineous population. Theor Popul Biol 2021; 139:50-65. [PMID: 33675872 DOI: 10.1016/j.tpb.2021.02.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 02/14/2021] [Indexed: 10/22/2022]
Abstract
Recent modeling studies interested in runs of homozygosity (ROH) and identity by descent (IBD) have sought to connect these properties of genomic sharing to pairwise coalescence times. Here, we examine a variety of features of pairwise coalescence times in models that consider consanguinity. In particular, we extend a recent diploid analysis of mean coalescence times for lineage pairs within and between individuals in a consanguineous population to derive the variance of coalescence times, studying its dependence on the frequency of consanguinity and the kinship coefficient of consanguineous relationships. We also introduce a separation-of-time-scales approach that treats consanguinity models analogously to mathematically similar phenomena such as partial selfing, using this approach to obtain coalescence-time distributions. This approach shows that the consanguinity model behaves similarly to a standard coalescent, scaling population size by a factor 1-3c, where c represents the kinship coefficient of a randomly chosen mating pair. It provides the explanation for an earlier result describing mean coalescence time in the consanguinity model in terms of c. The results extend the potential to make predictions about ROH and IBD in relation to demographic parameters of diploid populations.
Collapse
Affiliation(s)
- Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Ein Kerem, 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Kerdoncuff E, Lambert A, Achaz G. Testing for population decline using maximal linkage disequilibrium blocks. Theor Popul Biol 2020; 134:171-181. [DOI: 10.1016/j.tpb.2020.03.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 02/02/2023]
|
13
|
Craig RJ, Böndel KB, Arakawa K, Nakada T, Ito T, Bell G, Colegrave N, Keightley PD, Ness RW. Patterns of population structure and complex haplotype sharing among field isolates of the green algaChlamydomonas reinhardtii. Mol Ecol 2019; 28:3977-3993. [DOI: 10.1111/mec.15193] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 07/05/2019] [Accepted: 07/17/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Rory J. Craig
- Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh Edinburgh UK
- Department of Biology University of Toronto Mississauga Mississauga ON Canada
| | - Katharina B. Böndel
- Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh Edinburgh UK
- Institute of Plant Breeding, Seed Science and Population Genetics University of Hohenheim Stuttgart Germany
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences Keio University Tsuruoka Japan
- Systems Biology Program Graduate School of Media and Governance Keio University Fujisawa Japan
| | - Takashi Nakada
- Institute for Advanced Biosciences Keio University Tsuruoka Japan
- Systems Biology Program Graduate School of Media and Governance Keio University Fujisawa Japan
- Faculty of Environment and Information Sciences Yokohama National University Yokohama Japan
| | - Takuro Ito
- Institute for Advanced Biosciences Keio University Tsuruoka Japan
- Systems Biology Program Graduate School of Media and Governance Keio University Fujisawa Japan
| | - Graham Bell
- Department of Biology McGill University Montreal QC Canada
| | - Nick Colegrave
- Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh Edinburgh UK
| | - Peter D. Keightley
- Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh Edinburgh UK
| | - Rob W. Ness
- Department of Biology University of Toronto Mississauga Mississauga ON Canada
| |
Collapse
|
14
|
The Effect of Consanguinity on Between-Individual Identity-by-Descent Sharing. Genetics 2019; 212:305-316. [PMID: 30926583 PMCID: PMC6499533 DOI: 10.1534/genetics.119.302136] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/22/2019] [Indexed: 11/18/2022] Open
Abstract
Consanguineous unions increase the rate at which identical genomic segments are paired within individuals to produce runs of homozygosity (ROH). The extent to which such unions affect identity-by-descent (IBD) genomic sharing between rather than within individuals in a population, however, is not immediately evident from within-individual ROH levels. Using the fact that the time to the most recent common ancestor [Formula: see text] for a pair of genomes at a specific locus is inversely related to the extent of IBD sharing between the genomes in the neighborhood of the locus, we study IBD sharing for a pair of genomes sampled either within the same individual or in different individuals. We develop a coalescent model for a set of mating pairs in a diploid population, treating the fraction of consanguineous unions as a parameter. Considering mating models that include unions between sibs, first cousins, and nth cousins, we determine the effect of the consanguinity rate on the mean [Formula: see text] for pairs of lineages sampled either within the same individual or in different individuals. The results indicate that consanguinity not only increases ROH sharing between the two genomes within an individual, it also increases IBD sharing between individuals in the population, the magnitude of the effect increasing with the kinship coefficient of the type of consanguineous union. Considering computations of ROH and between-individual IBD in Jewish populations whose consanguinity rates have been estimated from demographic data, we find that, in accord with the theoretical results, increases in consanguinity and ROH levels inflate levels of IBD sharing between individuals in a population. The results contribute more generally to the interpretation of runs of homozygosity, IBD sharing between individuals, and the relationship between ROH and IBD.
Collapse
|
15
|
Al-Asadi H, Petkova D, Stephens M, Novembre J. Estimating recent migration and population-size surfaces. PLoS Genet 2019; 15:e1007908. [PMID: 30640906 PMCID: PMC6347299 DOI: 10.1371/journal.pgen.1007908] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 01/25/2019] [Accepted: 12/19/2018] [Indexed: 12/21/2022] Open
Abstract
In many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer maps of population sizes and migration rates associated with different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates associated with different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when using a similar method that ignores haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ∼3,000 years in Europe. We introduce a novel statistical method to infer migration rates and population sizes across space in recent time periods. Our approach builds upon the previously developed EEMS method, which infers effective migration rates under a dense lattice. Similarly, we infer demographic parameters under a lattice and use a (Voronoi) prior to regularize parameters of the model. However, our method differs from EEMS in a few key respects. First, we use the coalescent model parameterized by migration rates and population sizes while EEMS uses a resistance model. As another key difference, our method uses haplotype data while EEMS uses the average genetic distance. A consequence of using haplotype data is that our method can separately estimate migration rates and population sizes, which in essence is done by using a recombination rate map to calibrate the decay of haplotypes over time. An additional useful feature of haplotype data is that, by varying the lengths analyzed, we can infer demography associated with different recent time periods. We call our method MAPS for estimating Migration And Population-size Surfaces. To illustrate MAPS on real data, we analyze a genome-wide SNP dataset on 2224 individuals of European ancestry.
Collapse
Affiliation(s)
- Hussein Al-Asadi
- Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America.,Department of Statistics, University of Chicago, Illinois, United States of America
| | - Desislava Petkova
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Illinois, United States of America.,Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - John Novembre
- Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America.,Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
16
|
Johndrow JE, Palacios JA. Exact limits of inference in coalescent models. Theor Popul Biol 2018; 125:75-93. [PMID: 30571959 PMCID: PMC6541399 DOI: 10.1016/j.tpb.2018.11.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2018] [Revised: 11/12/2018] [Accepted: 11/27/2018] [Indexed: 12/13/2022]
Abstract
Recovery of population size history from molecular sequence data is an important problem in population genetics. Inference commonly relies on a coalescent model linking the population size history to genealogies. The high computational cost of estimating parameters from these models usually compels researchers to select a subset of the available data or to rely on insufficient summary statistics for statistical inference. We consider the problem of recovering the true population size history from two possible alternatives on the basis of coalescent time data previously considered by Kim et al. (2015). We improve upon previous results by giving exact expressions for the probability of correctly distinguishing between the two hypotheses as a function of the separation between the alternative size histories, the number of individuals, loci, and the sampling times. In more complicated settings we estimate the exact probability of correct recovery by Monte Carlo simulation. Our results give considerably more pessimistic inferential limits than those previously reported. We also extended our analyses to pairwise SMC and SMC’ models of recombination. This work is relevant for optimal design when the inference goal is to test scientific hypotheses about population size trajectories in coalescent models with and without recombination.
Collapse
|
17
|
Yang S, Carmi S, Pe'er I. Rapidly Registering Identity-by-Descent Across Ancestral Recombination Graphs. J Comput Biol 2016; 23:495-507. [PMID: 27104872 DOI: 10.1089/cmb.2016.0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The genomes of remotely related individuals occasionally contain long segments that are identical by descent (IBD). Sharing of IBD segments has many applications in population and medical genetics, and it is thus desirable to study their properties in simulations. However, no current method provides a direct, efficient means to extract IBD segments from simulated genealogies. Here, we introduce computationally efficient approaches to extract ground-truth IBD segments from a sequence of genealogies, or equivalently, an ancestral recombination graph. Specifically, we use a two-step scheme, where we first identify putative shared segments by comparing the common ancestors of all pairs of individuals at some distance apart. This reduces the search space considerably, and we then proceed by determining the true IBD status of the candidate segments. Under some assumptions and when allowing a limited resolution of segment lengths, our run-time complexity is reduced from O(n(3) log n) for the naïve algorithm to O(n log n), where n is the number of individuals in the sample.
Collapse
Affiliation(s)
- Shuo Yang
- 1 Department of Computer Science, Columbia University , New York, New York
| | - Shai Carmi
- 3 Braun School of Public Health, Faculty of Medicine, Hebrew University, Jerusalem, Israel
| | - Itsik Pe'er
- 1 Department of Computer Science, Columbia University , New York, New York.,2 Department of Systems Biology, Columbia University , New York, New York
| |
Collapse
|
18
|
Fedorova L, Qiu S, Dutta R, Fedorov A. Atlas of Cryptic Genetic Relatedness Among 1000 Human Genomes. Genome Biol Evol 2016; 8:777-90. [PMID: 26907499 PMCID: PMC4824066 DOI: 10.1093/gbe/evw034] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
A novel computational method for detecting identical-by-descent (IBD) chromosomal segments between sequenced genomes is presented. It utilizes the distribution patterns of very rare genetic variants (vrGVs), which have minor allele frequencies <0.2%. Contrary to the existing probabilistic approaches our method is rather deterministic, because it considers a group of very rare events which cannot happen together only by chance. This method has been applied for exhaustive computational search of shared IBD segments among 1,092 sequenced individuals from 14 populations. It demonstrated that clusters of vrGVs are unique and powerful markers of genetic relatedness, that uncover IBD chromosomal segments between and within populations, irrespective of whether divergence was recent or occurred hundreds-to-thousands of years ago. We found that several IBD segments are shared by practically any possible pair of individuals belonging to the same population. Moreover, shared short IBD segments (median size 183 kb) were found in 10% of inter-continental human pairs, each comprising of a person from sub-Saharan Africa and a person from Southern Europe. The shortest shared IBD segments (median size 54 kb) were found in 0.42% of inter-continental pairs composed of individuals from Chinese/Japanese populations and Africans from Kenya and Nigeria. Knowledge of inheritance of IBD segments is important in clinical case–control and cohort studies, since unknown distant familial relationships could compromise interpretation of collected data. Clusters of vrGVs should be useful markers for familial relationship and common multifactorial disorders.
Collapse
Affiliation(s)
| | - Shuhao Qiu
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| | - Rajib Dutta
- Program in Biomedical Sciences, University of Toledo
| | - Alexei Fedorov
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| |
Collapse
|
19
|
The SMC' is a highly accurate approximation to the ancestral recombination graph. Genetics 2015; 200:343-55. [PMID: 25786855 DOI: 10.1534/genetics.114.173898] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 03/12/2015] [Indexed: 11/18/2022] Open
Abstract
Two sequentially Markov coalescent models (SMC and SMC') are available as tractable approximations to the ancestral recombination graph (ARG). We present a Markov process describing coalescence at two fixed points along a pair of sequences evolving under the SMC'. Using our Markov process, we derive a number of new quantities related to the pairwise SMC', thereby analytically quantifying for the first time the similarity between the SMC' and the ARG. We use our process to show that the joint distribution of pairwise coalescence times at recombination sites under the SMC' is the same as it is marginally under the ARG, which demonstrates that the SMC' is, in a particular well-defined, intuitive sense, the most appropriate first-order sequentially Markov approximation to the ARG. Finally, we use these results to show that population size estimates under the pairwise SMC are asymptotically biased, while under the pairwise SMC' they are approximately asymptotically unbiased.
Collapse
|