1
|
Kardos M, Waples RS. Low-coverage sequencing and Wahlund effect severely bias estimates of inbreeding, heterozygosity and effective population size in North American wolves. Mol Ecol 2024:e17415. [PMID: 38785346 DOI: 10.1111/mec.17415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 05/03/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024]
Abstract
vonHoldt et al. ((2024), Molecular Ecology, 33, e17231) (vH24) used low-coverage (average ~ 7X read depth) restriction site-associated DNA sequence data to estimate individual inbreeding and heterozygosity, and recent effective population size (Ne), in Great Lakes (GL) and Northern Rocky Mountain (RM) wolves. They concluded that RM heterozygosity rapidly declined between 1991 and 2020, and that Ne declined substantially in GL and RM over the last 50 generations. Here, we evaluate the effects of low sequence coverage and sampling strategy on vH24's findings and provide general recommendations for using sequence data to evaluate inbreeding, heterozygosity and Ne. Low-coverage sequencing resulted in downwardly biased estimates of individual inbreeding and heterozygosity, and an erroneous large temporal decline in RM heterozygosity due to declining read depth through time. Additionally, vH24's sampling strategy-which combined individuals from several genetically differentiated populations and across at least eight wolf generations-is expected to have resulted in severe downward bias in estimates of recent Ne for RM. We recommend using high-coverage sequence data (≥ $$ \ge $$ 15-20X) to estimate inbreeding and heterozygosity. Carefully filtering individuals, loci and genotypes, and using genotype imputation or likelihoods can help to minimise bias when low-coverage sequence data must be used. For estimation of contemporary Ne, the marginal benefits of using more than 103-104 loci are small, so aggressive filtering of loci with low average read depth potentially can retain most individuals without sacrificing much precision. Individuals are relatively more valuable than loci because analyses of contemporary Ne should focus on roughly single-generation samples from local breeding populations.
Collapse
Affiliation(s)
- Marty Kardos
- Conservation Biology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, Washington, USA
| | - Robin S Waples
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
2
|
Gargiulo R, Decroocq V, González‐Martínez SC, Paz‐Vinas I, Aury J, Lesur Kupin I, Plomion C, Schmitt S, Scotti I, Heuertz M. Estimation of contemporary effective population size in plant populations: Limitations of genomic datasets. Evol Appl 2024; 17:e13691. [PMID: 38707994 PMCID: PMC11069024 DOI: 10.1111/eva.13691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 03/22/2024] [Accepted: 04/03/2024] [Indexed: 05/07/2024] Open
Abstract
Effective population size (N e) is a pivotal evolutionary parameter with crucial implications in conservation practice and policy. Genetic methods to estimate N e have been preferred over demographic methods because they rely on genetic data rather than time-consuming ecological monitoring. Methods based on linkage disequilibrium (LD), in particular, have become popular in conservation as they require a single sampling and provide estimates that refer to recent generations. A software program based on the LD method, GONE, looks particularly promising to estimate contemporary and recent-historical N e (up to 200 generations in the past). Genomic datasets from non-model species, especially plants, may present some constraints to the use of GONE, as linkage maps and reference genomes are seldom available, and SNP genotyping is usually based on reduced-representation methods. In this study, we use empirical datasets from four plant species to explore the limitations of plant genomic datasets when estimating N e using the algorithm implemented in GONE, in addition to exploring some typical biological limitations that may affect N e estimation using the LD method, such as the occurrence of population structure. We show how accuracy and precision of N e estimates potentially change with the following factors: occurrence of missing data, limited number of SNPs/individuals sampled, and lack of information about the location of SNPs on chromosomes, with the latter producing a significant bias, previously unexplored with empirical data. We finally compare the N e estimates obtained with GONE for the last generations with the contemporary N e estimates obtained with the programs currentNe and NeEstimator.
Collapse
Affiliation(s)
| | | | | | - Ivan Paz‐Vinas
- Department of BiologyColorado State UniversityFort CollinsColoradoUSA
- CNRS, ENTPE, UMR5023 LEHNAUniversité Claude Bernard Lyon 1VilleurbanneFrance
| | - Jean‐Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ EvryUniversité Paris‐SaclayEvryFrance
| | | | | | - Sylvain Schmitt
- AMAPUniv. Montpellier, CIRAD, CNRS, INRAE, IRDMontpellierFrance
| | | | | |
Collapse
|
3
|
Fabreti LG, Coghill LM, Thomson RC, Höhna S, Brown JM. The Expected Behaviors of Posterior Predictive Tests and Their Unexpected Interpretation. Mol Biol Evol 2024; 41:msae051. [PMID: 38437512 PMCID: PMC10946647 DOI: 10.1093/molbev/msae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 01/09/2024] [Indexed: 03/06/2024] Open
Abstract
Poor fit between models of sequence or trait evolution and empirical data is known to cause biases and lead to spurious conclusions about evolutionary patterns and processes. Bayesian posterior prediction is a flexible and intuitive approach for detecting such cases of poor fit. However, the expected behavior of posterior predictive tests has never been characterized for evolutionary models, which is critical for their proper interpretation. Here, we show that the expected distribution of posterior predictive P-values is generally not uniform, in contrast to frequentist P-values used for hypothesis testing, and extreme posterior predictive P-values often provide more evidence of poor fit than typically appreciated. Posterior prediction assesses model adequacy under highly favorable circumstances, because the model is fitted to the data, which leads to expected distributions that are often concentrated around intermediate values. Nonuniform expected distributions of P-values do not pose a problem for the application of these tests, however, and posterior predictive P-values can be interpreted as the posterior probability that the fitted model would predict a dataset with a test statistic value as extreme as the value calculated from the observed data.
Collapse
Affiliation(s)
- Luiza Guimarães Fabreti
- GeoBio-Center, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, Munich 80333, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, Munich 80333, Germany
| | - Lyndon M Coghill
- Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA
- Present address: Division of Research, Innovation, and Impact & Department of Veterinary Pathobiology, University of Missouri, Columbia, MO 65211, USA
| | - Robert C Thomson
- School of Life Sciences, University of Hawai‘i at Mānoa, Honolulu, HI 96822, USA
| | - Sebastian Höhna
- GeoBio-Center, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, Munich 80333, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, Munich 80333, Germany
| | - Jeremy M Brown
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| |
Collapse
|
4
|
Waples RS. Practical application of the linkage disequilibrium method for estimating contemporary effective population size: A review. Mol Ecol Resour 2024; 24:e13879. [PMID: 37873672 DOI: 10.1111/1755-0998.13879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/26/2023] [Accepted: 09/29/2023] [Indexed: 10/25/2023]
Abstract
The method to estimate contemporary effective population size (Ne ) based on patterns of linkage disequilibrium (LD) at unlinked loci has been widely applied to natural and managed populations. The underlying model makes many simplifying assumptions, most of which have been evaluated in numerous studies published over the last two decades. Here, these performance evaluations are reviewed and summarized, with a focus on information that facilitates practical application to real populations in nature. Potential sources of bias that are discussed include calculation of r2 (a measure of LD), adjustments for sampling error, physical linkage, age structure, migration and spatial structure, mutation and selection, mating systems, changes in abundance, rare alleles, missing data, genotyping errors, data filtering choices and methods for combining multiple Ne estimates. Factors that affect precision are reviewed, including pseudoreplication that limits the information gained from large genomics datasets, constraints imposed by small samples of individuals, and the challenges in obtaining robust estimates for large populations. Topics that merit further research include the potential to weight r2 values by allele frequency, lump samples of individuals, use genotypic likelihoods rather than called genotypes, prune large LD values and apply the method to species practising partial monogamy.
Collapse
Affiliation(s)
- Robin S Waples
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
5
|
Mooney JA, Agranat-Tamir L, Pritchard JK, Rosenberg NA. On the number of genealogical ancestors tracing to the source groups of an admixed population. Genetics 2023; 224:iyad079. [PMID: 37410594 PMCID: PMC10324943 DOI: 10.1093/genetics/iyad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 04/05/2023] [Indexed: 07/08/2023] Open
Abstract
Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual's genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75-85% value for African ancestry on average and 15-25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960-1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240-376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32-69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.
Collapse
Affiliation(s)
- Jazlyn A Mooney
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | | - Jonathan K Pritchard
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Waples RS, Waples RK, Ward EJ. Pseudoreplication in genomics-scale datasets. Mol Ecol Resour 2021; 22:503-518. [PMID: 34351073 DOI: 10.1111/1755-0998.13482] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 06/14/2021] [Accepted: 07/23/2021] [Indexed: 11/30/2022]
Abstract
In genomics-scale datasets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df') compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here we measured pseudoreplication (quantified by the ratio df'/df) for a common metric of genetic differentiation (FST ) and a common measure of linkage disequilibrium between pairs of loci (r2 ). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df' and df'/df by measuring the rate of decline in the variance of mean FST and mean r2 as more loci were used. For both indices, df' increases with Ne and genome size, as expected. However, even for large Ne and large genomes, df' for mean r2 plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for FST , but df'/df ≤0.01 can occur in datasets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var(FST ), producing very conservative confidence intervals. Predicting df' based on our modeling results as a function of Ne , L, S, and genome size provides a robust way to quantify precision associated with genomics-scale datasets.
Collapse
Affiliation(s)
- Robin S Waples
- NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd. East, Seattle, WA, 98112, USA
| | - Ryan K Waples
- Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark.,Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Eric J Ward
- NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd. East, Seattle, WA, 98112, USA
| |
Collapse
|
7
|
Severson AL, Carmi S, Rosenberg NA. Variance and limiting distribution of coalescence times in a diploid model of a consanguineous population. Theor Popul Biol 2021; 139:50-65. [PMID: 33675872 DOI: 10.1016/j.tpb.2021.02.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 02/14/2021] [Indexed: 10/22/2022]
Abstract
Recent modeling studies interested in runs of homozygosity (ROH) and identity by descent (IBD) have sought to connect these properties of genomic sharing to pairwise coalescence times. Here, we examine a variety of features of pairwise coalescence times in models that consider consanguinity. In particular, we extend a recent diploid analysis of mean coalescence times for lineage pairs within and between individuals in a consanguineous population to derive the variance of coalescence times, studying its dependence on the frequency of consanguinity and the kinship coefficient of consanguineous relationships. We also introduce a separation-of-time-scales approach that treats consanguinity models analogously to mathematically similar phenomena such as partial selfing, using this approach to obtain coalescence-time distributions. This approach shows that the consanguinity model behaves similarly to a standard coalescent, scaling population size by a factor 1-3c, where c represents the kinship coefficient of a randomly chosen mating pair. It provides the explanation for an earlier result describing mean coalescence time in the consanguinity model in terms of c. The results extend the potential to make predictions about ROH and IBD in relation to demographic parameters of diploid populations.
Collapse
Affiliation(s)
- Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Ein Kerem, 9112102, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
8
|
Hohenlohe PA, Funk WC, Rajora OP. Population genomics for wildlife conservation and management. Mol Ecol 2020; 30:62-82. [PMID: 33145846 PMCID: PMC7894518 DOI: 10.1111/mec.15720] [Citation(s) in RCA: 180] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 10/02/2020] [Accepted: 10/29/2020] [Indexed: 12/21/2022]
Abstract
Biodiversity is under threat worldwide. Over the past decade, the field of population genomics has developed across nonmodel organisms, and the results of this research have begun to be applied in conservation and management of wildlife species. Genomics tools can provide precise estimates of basic features of wildlife populations, such as effective population size, inbreeding, demographic history and population structure, that are critical for conservation efforts. Moreover, population genomics studies can identify particular genetic loci and variants responsible for inbreeding depression or adaptation to changing environments, allowing for conservation efforts to estimate the capacity of populations to evolve and adapt in response to environmental change and to manage for adaptive variation. While connections from basic research to applied wildlife conservation have been slow to develop, these connections are increasingly strengthening. Here we review the primary areas in which population genomics approaches can be applied to wildlife conservation and management, highlight examples of how they have been used, and provide recommendations for building on the progress that has been made in this field.
Collapse
Affiliation(s)
- Paul A Hohenlohe
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho, USA
| | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, Colorado, USA
| | - Om P Rajora
- Faculty of Forestry and Environmental Management, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|
9
|
Santiago E, Novo I, Pardiñas AF, Saura M, Wang J, Caballero A. Recent Demographic History Inferred by High-Resolution Analysis of Linkage Disequilibrium. Mol Biol Evol 2020; 37:3642-3653. [DOI: 10.1093/molbev/msaa169] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
AbstractInferring changes in effective population size (Ne) in the recent past is of special interest for conservation of endangered species and for human history research. Current methods for estimating the very recent historical Ne are unable to detect complex demographic trajectories involving multiple episodes of bottlenecks, drops, and expansions. We develop a theoretical and computational framework to infer the demographic history of a population within the past 100 generations from the observed spectrum of linkage disequilibrium (LD) of pairs of loci over a wide range of recombination rates in a sample of contemporary individuals. The cumulative contributions of all of the previous generations to the observed LD are included in our model, and a genetic algorithm is used to search for the sequence of historical Ne values that best explains the observed LD spectrum. The method can be applied from large samples to samples of fewer than ten individuals using a variety of genotyping and DNA sequencing data: haploid, diploid with phased or unphased genotypes and pseudohaploid data from low-coverage sequencing. The method was tested by computer simulation for sensitivity to genotyping errors, temporal heterogeneity of samples, population admixture, and structural division into subpopulations, showing high tolerance to deviations from the assumptions of the model. Computer simulations also show that the proposed method outperforms other leading approaches when the inference concerns recent timeframes. Analysis of data from a variety of human and animal populations gave results in agreement with previous estimations by other methods or with records of historical events.
Collapse
Affiliation(s)
- Enrique Santiago
- Departamento de Biología Funcional, Facultad de Biología, Universidad de Oviedo, Oviedo, Spain
| | - Irene Novo
- Centro de Investigación Mariña, Departamento de Bioquímica, Genética e Inmunología, Edificio CC Experimentais, Campus de Vigo, Universidade de Vigo, Vigo, Spain
| | - Antonio F Pardiñas
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - María Saura
- Departamento de Mejora Genética Animal, INIA, Madrid, Spain
| | - Jinliang Wang
- Institute of Zoology, Zoological Society of London, London, United Kingdom
| | - Armando Caballero
- Centro de Investigación Mariña, Departamento de Bioquímica, Genética e Inmunología, Edificio CC Experimentais, Campus de Vigo, Universidade de Vigo, Vigo, Spain
| |
Collapse
|
10
|
Nelson D, Kelleher J, Ragsdale AP, Moreau C, McVean G, Gravel S. Accounting for long-range correlations in genome-wide simulations of large cohorts. PLoS Genet 2020; 16:e1008619. [PMID: 32369493 PMCID: PMC7266353 DOI: 10.1371/journal.pgen.1008619] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 06/02/2020] [Accepted: 01/21/2020] [Indexed: 11/20/2022] Open
Abstract
Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past. Coalescent theory has provided deep theoretical insight into patterns of human diversity. Implementations of coalescent models in simulation software such as ms have further provided tools to interpret thousands of genomic studies. Recent technical progress has allowed for a dramatic increase in the scale at which genomes can be both measured and simulated, opening up opportunities for a finer understanding of evolutionary biology. However, we show that coalescent simulations of long regions of the genome exhibit large biases in sample relatedness, distorting haplotype sharing and ancestry patterns in simulated cohorts. We trace these biases to basic assumptions of the coalescent model, and show how the assumptions can be relaxed to provide a better description of the observed patterns of genetic polymorphism at a fraction of the computational cost.
Collapse
Affiliation(s)
- Dominic Nelson
- McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Aaron P. Ragsdale
- McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
| | - Claudia Moreau
- Centre Intersectoriel en Santé Durable, Université du Québec à Chicoutimi, Saguenay, Québec, Canada
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Simon Gravel
- McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
- * E-mail:
| |
Collapse
|
11
|
The Effect of Consanguinity on Between-Individual Identity-by-Descent Sharing. Genetics 2019; 212:305-316. [PMID: 30926583 PMCID: PMC6499533 DOI: 10.1534/genetics.119.302136] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/22/2019] [Indexed: 11/18/2022] Open
Abstract
Consanguineous unions increase the rate at which identical genomic segments are paired within individuals to produce runs of homozygosity (ROH). The extent to which such unions affect identity-by-descent (IBD) genomic sharing between rather than within individuals in a population, however, is not immediately evident from within-individual ROH levels. Using the fact that the time to the most recent common ancestor [Formula: see text] for a pair of genomes at a specific locus is inversely related to the extent of IBD sharing between the genomes in the neighborhood of the locus, we study IBD sharing for a pair of genomes sampled either within the same individual or in different individuals. We develop a coalescent model for a set of mating pairs in a diploid population, treating the fraction of consanguineous unions as a parameter. Considering mating models that include unions between sibs, first cousins, and nth cousins, we determine the effect of the consanguinity rate on the mean [Formula: see text] for pairs of lineages sampled either within the same individual or in different individuals. The results indicate that consanguinity not only increases ROH sharing between the two genomes within an individual, it also increases IBD sharing between individuals in the population, the magnitude of the effect increasing with the kinship coefficient of the type of consanguineous union. Considering computations of ROH and between-individual IBD in Jewish populations whose consanguinity rates have been estimated from demographic data, we find that, in accord with the theoretical results, increases in consanguinity and ROH levels inflate levels of IBD sharing between individuals in a population. The results contribute more generally to the interpretation of runs of homozygosity, IBD sharing between individuals, and the relationship between ROH and IBD.
Collapse
|
12
|
Tavaré S, Buzbas EO. Introduction to the Paul Joyce special issue. Theor Popul Biol 2018; 122:1-2. [PMID: 30025565 DOI: 10.1016/j.tpb.2018.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Simon Tavaré
- DAMTP, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK
| | - Erkan Ozge Buzbas
- Department of Statistical Science, University of Idaho, 875 Perimeter Drive, MS 1104 Moscow, ID 83844, United States
| |
Collapse
|
13
|
Shchur V, Nielsen R. On the number of siblings and p-th cousins in a large population sample. J Math Biol 2018; 77:1279-1298. [PMID: 29876645 DOI: 10.1007/s00285-018-1252-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 05/03/2018] [Indexed: 11/28/2022]
Abstract
The number of individuals in a random sample with close relatives in the sample is a quantity of interest when designing Genome Wide Association Studies and other cohort based genetic, and non-genetic, studies. In this paper, we develop expressions for the distribution and expectation of the number of p-th cousins in a sample from a population of size N under two diploid Wright-Fisher models. We also develop simple asymptotic expressions for large values of N. For example, the expected proportion of individuals with at least one p-th cousin in a sample of K individuals, for a diploid dioecious Wright-Fisher model, is approximately [Formula: see text]. Our results show that a substantial fraction of individuals in the sample will have at least a second cousin if the sampling fraction (K / N) is on the order of [Formula: see text]. This confirms that, for large cohort samples, relatedness among individuals cannot easily be ignored.
Collapse
Affiliation(s)
- Vladimir Shchur
- Departments of Integrative Biology and Statistics, University of California Berkeley, 4098 Valley Life Sciences Building (VLSB), Berkeley, CA, 94720-3140, USA.
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California Berkeley, 4098 Valley Life Sciences Building (VLSB), Berkeley, CA, 94720-3140, USA.,Museum of Natural History, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| |
Collapse
|
14
|
Jennings WB. On the independent gene trees assumption in phylogenomic studies. Mol Ecol 2017; 26:4862-4871. [PMID: 28752599 DOI: 10.1111/mec.14274] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 07/13/2017] [Accepted: 07/24/2017] [Indexed: 11/28/2022]
Abstract
Multilocus coalescent methods for inferring species trees or historical demographic parameters typically require the assumption that gene trees for sampled SNPs or DNA sequence loci are conditionally independent given their species tree. In practice, researchers have used different criteria to delimit "independent loci." One criterion identifies sampled loci as being independent of each other if they undergo Mendelian independent assortment (IA criterion). O'Neill et al. (2013, Molecular Ecology, 22, 111-129) used this approach in their phylogeographic study of North American tiger salamander species complex. In two other studies, researchers developed a pair of related methods that employ an independent genealogies criterion (IG criterion), which considers the effects of population-level recombination on correlations between the gene trees of intrachromosomal loci. Here, I explain these three methods, illustrate their use with example data, and evaluate their efficacies. I show that the IA approach is more conservative, is simpler to use and requires fewer assumptions than the IG approaches. However, IG approaches can identify much larger numbers of independent loci than the IA method, which, in turn, allows researchers to obtain more precise and accurate estimates of species trees and historical demographic parameters. A disadvantage of the IG methods is that they require an estimate of the population recombination rate. Despite their drawbacks, IA and IG approaches provide molecular ecologists with promising a priori methods for selecting SNPs or DNA sequence loci that likely meet the independence assumption in coalescent-based phylogenomic studies.
Collapse
Affiliation(s)
- W Bryan Jennings
- Departamento de Vertebrados, Museu Nacional, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
15
|
Xue J, Lencz T, Darvasi A, Pe’er I, Carmi S. The time and place of European admixture in Ashkenazi Jewish history. PLoS Genet 2017; 13:e1006644. [PMID: 28376121 PMCID: PMC5380316 DOI: 10.1371/journal.pgen.1006644] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 02/18/2017] [Indexed: 12/21/2022] Open
Abstract
The Ashkenazi Jewish (AJ) population is important in genetics due to its high rate of Mendelian disorders. AJ appeared in Europe in the 10th century, and their ancestry is thought to comprise European (EU) and Middle-Eastern (ME) components. However, both the time and place of admixture are subject to debate. Here, we attempt to characterize the AJ admixture history using a careful application of new and existing methods on a large AJ sample. Our main approach was based on local ancestry inference, in which we first classified each AJ genomic segment as EU or ME, and then compared allele frequencies along the EU segments to those of different EU populations. The contribution of each EU source was also estimated using GLOBETROTTER and haplotype sharing. The time of admixture was inferred based on multiple statistics, including ME segment lengths, the total EU ancestry per chromosome, and the correlation of ancestries along the chromosome. The major source of EU ancestry in AJ was found to be Southern Europe (≈60–80% of EU ancestry), with the rest being likely Eastern European. The inferred admixture time was ≈30 generations ago, but multiple lines of evidence suggest that it represents an average over two or more events, pre- and post-dating the founder event experienced by AJ in late medieval times. The time of the pre-bottleneck admixture event, which was likely Southern European, was estimated to ≈25–50 generations ago. The Ashkenazi Jewish population has resided in Europe for much of its 1000-year existence. However, its ethnic and geographic origins are controversial, due to the scarcity of reliable historical records. Previous genetic studies have found links to Middle-Eastern and European ancestries, but the admixture history has not been studied in detail yet, partly due to technical difficulties in disentangling signals from multiple admixture events. Here, we present an in-depth analysis of the sources of European gene flow and the time of admixture events by using multiple new and existing methods and extensive simulations. Our results suggest a model of at least two events of European admixture. One event slightly pre-dated a late medieval founder event and was likely from a Southern European source. Another event post-dated the founder event and likely occurred in Eastern Europe. These results, as well as the methods introduced, will be highly valuable for geneticists and other researchers interested in Ashkenazi Jewish origins.
Collapse
Affiliation(s)
- James Xue
- Department of Computer Science, Columbia University, New York, New York, United States of America
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Todd Lencz
- Center for Psychiatric Neuroscience, The Feinstein Institute for Medical Research, North Shore-Long Island Jewish Health System, Manhasset, New York, United States of America
- Department of Psychiatry, Division of Research, The Zucker Hillside Hospital Division of the North Shore–Long Island Jewish Health System, Glen Oaks, New York, United States of America
- Departments of Psychiatry and Molecular Medicine, Hofstra Northwell School of Medicine, Hempstead, New York, United States of America
| | - Ariel Darvasi
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Itsik Pe’er
- Department of Computer Science, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
- * E-mail:
| |
Collapse
|