1
|
Setter D, Ebdon S, Jackson B, Lohse K. Estimating the rates of crossover and gene conversion from individual genomes. Genetics 2022; 222:6623412. [PMID: 35771626 PMCID: PMC9434185 DOI: 10.1093/genetics/iyac100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 06/01/2022] [Indexed: 11/14/2022] Open
Abstract
Recombination can occur either as a result of crossover or gene conversion events. Population genetic methods for inferring the rate of recombination from patterns of linkage disequilibrium generally assume a simple model of recombination that only involves crossover events and ignore gene conversion. However, distinguishing the 2 processes is not only necessary for a complete description of recombination, but also essential for understanding the evolutionary consequences of inversions and other genomic partitions in which crossover (but not gene conversion) is reduced. We present heRho, a simple composite likelihood scheme for coestimating the rate of crossover and gene conversion from individual diploid genomes. The method is based on analytic results for the distance-dependent probability of heterozygous and homozygous states at 2 loci. We apply heRho to simulations and data from the house mouse Mus musculus castaneus, a well-studied model. Our analyses show (1) that the rates of crossover and gene conversion can be accurately coestimated at the level of individual chromosomes and (2) that previous estimates of the population scaled rate of recombination ρ=4Ner under a pure crossover model are likely biased.
Collapse
Affiliation(s)
- Derek Setter
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, UK
| | - Sam Ebdon
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, UK
| | - Ben Jackson
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, UK
| |
Collapse
|
2
|
Ragsdale AP, Gravel S. Unbiased Estimation of Linkage Disequilibrium from Unphased Data. Mol Biol Evol 2020; 37:923-932. [PMID: 31697386 DOI: 10.1093/molbev/msz265] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| |
Collapse
|
3
|
Golding GB, Strobeck C. INCREASED NUMBER OF ALLELES FOUND IN HYBRID POPULATIONS DUE TO INTRAGENIC RECOMBINATION. Evolution 2017; 37:17-29. [PMID: 28568015 DOI: 10.1111/j.1558-5646.1983.tb05510.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/1981] [Revised: 03/31/1982] [Indexed: 11/26/2022]
Affiliation(s)
- G B Golding
- Department of Genetics, University of Alberta, Edmonton, Alberta, T6G 2E9
| | - C Strobeck
- Department of Genetics, University of Alberta, Edmonton, Alberta, T6G 2E9
| |
Collapse
|
4
|
Abstract
Although the analysis of linkage disequilibrium (LD) plays a central role in many areas of population genetics, the sampling variance of LD is known to be very large with high sensitivity to numbers of nucleotide sites and individuals sampled. Here we show that a genome-wide analysis of the distribution of heterozygous sites within a single diploid genome can yield highly informative patterns of LD as a function of physical distance. The proposed statistic, the correlation of zygosity, is closely related to the conventional population-level measure of LD, but is agnostic with respect to allele frequencies and hence likely less prone to outlier artifacts. Application of the method to several vertebrate species leads to the conclusion that >80% of recombination events are typically resolved by gene-conversion-like processes unaccompanied by crossovers, with the average lengths of conversion patches being on the order of one to several kilobases in length. Thus, contrary to common assumptions, the recombination rate between sites does not scale linearly with distance, often even up to distances of 100 kb. In addition, the amount of LD between sites separated by <200 bp is uniformly much greater than can be explained by the conventional neutral model, possibly because of the nonindependent origin of mutations within this spatial scale. These results raise questions about the application of conventional population-genetic interpretations to LD on short spatial scales and also about the use of spatial patterns of LD to infer demographic histories.
Collapse
|
5
|
Abstract
The "LD curve" relates the linkage disequilibrium (LD) between pairs of nucleotide sites to the distance that separates them along the chromosome. The shape of this curve reflects natural selection, admixture between populations, and the history of population size. This article derives new results about the last of these effects. When a population expands in size, the LD curve grows steeper, and this effect is especially pronounced following a bottleneck in population size. When a population shrinks, the LD curve rises but remains relatively flat. As LD converges toward a new equilibrium, its time path may not be monotonic. Following an episode of growth, for example, it declines to a low value before rising toward the new equilibrium. These changes happen at different rates for different LD statistics. They are especially slow for estimates of [Formula: see text], which therefore allow inferences about ancient population history. For the human population of Europe, these results suggest a history of population growth.
Collapse
|
6
|
Rasmussen MD, Hubisz MJ, Gronau I, Siepel A. Genome-wide inference of ancestral recombination graphs. PLoS Genet 2014; 10:e1004342. [PMID: 24831947 PMCID: PMC4022496 DOI: 10.1371/journal.pgen.1004342] [Citation(s) in RCA: 179] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 03/17/2014] [Indexed: 01/23/2023] Open
Abstract
The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.
Collapse
Affiliation(s)
- Matthew D. Rasmussen
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (MDR); (AS)
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambs, United Kingdom
- * E-mail: (MDR); (AS)
| |
Collapse
|
7
|
Romero PA, Arnold FH. Random field model reveals structure of the protein recombinational landscape. PLoS Comput Biol 2012; 8:e1002713. [PMID: 23055915 PMCID: PMC3464211 DOI: 10.1371/journal.pcbi.1002713] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 08/03/2012] [Indexed: 11/28/2022] Open
Abstract
We are interested in how intragenic recombination contributes to the evolution of proteins and how this mechanism complements and enhances the diversity generated by random mutation. Experiments have revealed that proteins are highly tolerant to recombination with homologous sequences (mutation by recombination is conservative); more surprisingly, they have also shown that homologous sequence fragments make largely additive contributions to biophysical properties such as stability. Here, we develop a random field model to describe the statistical features of the subset of protein space accessible by recombination, which we refer to as the recombinational landscape. This model shows quantitative agreement with experimental results compiled from eight libraries of proteins that were generated by recombining gene fragments from homologous proteins. The model reveals a recombinational landscape that is highly enriched in functional sequences, with properties dominated by a large-scale additive structure. It also quantifies the relative contributions of parent sequence identity, crossover locations, and protein fold to the tolerance of proteins to recombination. Intragenic recombination explores a unique subset of sequence space that promotes rapid molecular diversification and functional adaptation. Mutation and recombination are the primary sources of genetic variation in evolving populations. The relative benefit of these two diversification mechanisms and how they complement each other has been a long-standing question in evolutionary biology. While it is clear what types of genetic diversity these two mechanisms can create, a significant challenge is relating these sequence changes to changes in fitness. The fitness landscape, which describes this mapping from genotype to phenotype, is extraordinarily complex and defined over an incomprehensibly large space of sequences. Here, we develop a model of the landscape that relies not on the details of this mapping, but rather on the statistical relationships between sequences. By studying the expected values of landscape properties, we can gain insights into the structure of the landscape that are independent of the details of how genotype dictates phenotype. We use this random field model to understand how recombination explores a functionally enriched and diverse subset of protein sequence space.
Collapse
Affiliation(s)
| | - Frances H. Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
8
|
Georgiades K, Raoult D. How microbiology helps define the rhizome of life. Front Cell Infect Microbiol 2012; 2:60. [PMID: 22919651 PMCID: PMC3417629 DOI: 10.3389/fcimb.2012.00060] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/16/2012] [Indexed: 01/24/2023] Open
Abstract
In contrast to the tree of life (TOF) theory, species are mosaics of gene sequences with different origins. Observations of the extensive lateral sequence transfers in all organisms have demonstrated that the genomes of all life forms are collections of genes with different evolutionary histories that cannot be represented by a single TOF. Moreover, genes themselves commonly have several origins due to recombination. The human genome is not free from recombination events, so it is a mosaic like other organisms' genomes. Recent studies have demonstrated evidence for the integration of parasitic DNA into the human genome. Lateral transfer events have been accepted as major contributors of genome evolution in free-living bacteria. Furthermore, the accumulation of genomic sequence data provides evidence for extended genetic exchanges in intracellular bacteria and suggests that such events constitute an agent that promotes and maintains all bacterial species. Archaea and viruses also form chimeras containing primarily bacterial but also eukaryotic sequences. In addition to lateral transfers, orphan genes are indicative of the fact that gene creation is a permanent and unsettled phenomenon. Currently, a rhizome may more adequately represent the multiplicity and de novo creation of a genome. We wanted to confirm that the term “rhizome” in evolutionary biology applies to the entire cellular life history. This view of evolution should resemble a clump of roots representing the multiple origins of the repertoires of the genes of each species.
Collapse
Affiliation(s)
- Kalliopi Georgiades
- Faculté de Médecine La Timone, Unité de Recherche en Maladies Infectieuses Tropical Emergentes (URMITE), CNRS-IRD UMR 6236-198, Université de la Méditerranée Marseille, France
| | | |
Collapse
|
9
|
Haubold B, Pfaffelhuber P, Lynch M. mlRho - a program for estimating the population mutation and recombination rates from shotgun-sequenced diploid genomes. Mol Ecol 2010; 19 Suppl 1:277-84. [PMID: 20331786 DOI: 10.1111/j.1365-294x.2009.04482.x] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Improvements in sequencing technology over the past 5 years are leading to routine application of shotgun sequencing in the fields of ecology and evolution. However, the theory to estimate evolutionary parameters from these data is still being worked out. Here we present an extension and implementation of part of this theory, mlRho. This program can efficiently compute the following three maximum likelihood estimators based on shotgun sequence data obtained from single diploid individuals: the population mutation rate (4N(e)mu), the sequencing error rate, and the population recombination rate (4N(e)c). We demonstrate the accuracy of mlRho by applying it to simulated data sets. In addition, we analyse the genomes of the sea squirt Ciona intestinalis and the water flea Daphnia pulex. Ciona intestinalis is an obligate outcrosser, while D. pulex is a cyclic parthenogen, and we discuss how these contrasting life histories are reflected in our parameter estimates. The program mlRho is freely available from http://guanine.evolbio.mpg.de/mlRho.
Collapse
Affiliation(s)
- Bernhard Haubold
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany.
| | | | | |
Collapse
|
10
|
Abstract
SummaryThe gene conversion parameters which affect allele frequencies in populations are defined, and their ranges and typical values are given for several genera of fungi, where meiotic octads and tetrads provide the best information on conversion. Both gene conversion and disparity in direction of conversion are common. Data from Ascobolus immersus show that conversion properties are largely stable with time, but can be changed environmentally and by genetic conversion control factors. Equations are given for the interactions of selection, mutation and gene conversion in determining equilibrium frequencies. Numerical examples, using typical values of conversion parameters from the fungal data, show that for alleles which are selectively neutral or have very low selection coefficients, conversion will often have very large effects on their equilibrium frequencies and may lead to fixation. Where selection coefficients are higher, conversion has major effects on the frequencies of deleterious recessive alleles, but lesser effects on deleterious dominant alleles: a critical comparison is that of s with 2y. The available estimates for conversion parameters (at least in fungi) are of a magnitude to make gene conversion an important factor in evolution.
Collapse
|
11
|
Abstract
The model of genetic hitchhiking predicts a reduction in sequence diversity at a neutral locus closely linked to a beneficial allele. In addition, it has been shown that the same process results in a specific pattern of correlations (linkage disequilibrium) between neutral polymorphisms along the chromosome at the time of fixation of the beneficial allele. During the hitchhiking event, linkage disequilibrium on either side of the beneficial allele is built up whereas it is destroyed across the selected site. We derive explicit formulas for the expectation of the covariance measure D and standardized linkage disequilibrium sigma 2D between a pair of polymorphic sites. For our analysis we use the approximation of a star-like genealogy at the selected site. The resulting expressions are approximately correct in the limit of large selection coefficients. Using simulations we show that the resulting pattern of linkage disequilibrium is quickly-i.e., in <0.1N generations-destroyed after the fixation of the beneficial allele for moderately distant neutral loci, where N is the diploid population size.
Collapse
|
12
|
Jones D, Wakeley J. Recombination, gene conversion, and identity-by-descent at three loci. Theor Popul Biol 2008; 73:264-76. [DOI: 10.1016/j.tpb.2007.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2007] [Revised: 10/24/2007] [Accepted: 10/25/2007] [Indexed: 10/22/2022]
|
13
|
Linkage disequilibrium under skewed offspring distribution among individuals in a population. Genetics 2008; 178:1517-32. [PMID: 18245371 DOI: 10.1534/genetics.107.075200] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Correlations in coalescence times between two loci are derived under selectively neutral population models in which the offspring of an individual can number on the order of the population size. The correlations depend on the rates of recombination and random drift and are shown to be functions of the parameters controlling the size and frequency of these large reproduction events. Since a prediction of linkage disequilibrium can be written in terms of correlations in coalescence times, it follows that the prediction of linkage disequilibrium is a function not only of the rate of recombination but also of the reproduction parameters. Low linkage disequilibrium is predicted if the offspring of a single individual frequently replace almost the entire population. However, high linkage disequilibrium can be predicted if the offspring of a single individual replace an intermediate fraction of the population. In some cases the model reproduces the standard Wright-Fisher predictions. Contrary to common intuition, high linkage disequilibrium can be predicted despite frequent recombination, and low linkage disequilibrium under infrequent recombination. Simulations support the analytical results but show that the variance of linkage disequilibrium is very large.
Collapse
|
14
|
WOODRUFF DAVIDS. Genetic anomalies associated with Cerion hybrid zones: the origin and maintenance of new electromorphic variants called hybrizymes. Biol J Linn Soc Lond 2008. [DOI: 10.1111/j.1095-8312.1989.tb00495.x] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Cai X, Xu SS. Meiosis-driven genome variation in plants. Curr Genomics 2007; 8:151-61. [PMID: 18645601 PMCID: PMC2435351 DOI: 10.2174/138920207780833847] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Revised: 02/26/2007] [Accepted: 03/06/2007] [Indexed: 11/22/2022] Open
Abstract
Meiosis includes two successive divisions of the nucleus with one round of DNA replication and leads to the formation of gametes with half of the chromosomes of the mother cell during sexual reproduction. It provides a cytological basis for gametogenesis and nheritance in eukaryotes. Meiotic cell division is a complex and dynamic process that involves a number of molecular and cellular events, such as DNA and chromosome replication, chromosome pairing, synapsis and recombination, chromosome segregation, and cytokinesis. Meiosis maintains genome stability and integrity over sexual life cycles. On the other hand, meiosis generates genome variations in several ways. Variant meiotic recombination resulting from specific genome structures induces deletions, duplications, and other rearrangements within the genic and non-genic genomic regions and has been considered a major driving force for gene and genome evolution in nature. Meiotic abnormalities in chromosome segregation lead to chromosomally imbalanced gametes and aneuploidy. Meiotic restitution due to failure of the first or second meiotic division gives rise to unreduced gametes, which triggers polyploidization and genome expansion. This paper reviews research regarding meiosis-driven genome variation, including deletion and duplication of genomic regions, aneuploidy, and polyploidization, and discusses the effect of related meiotic events on genome variation and evolution in plants. Knowledge of various meiosis-driven genome variations provides insight into genome evolution and genetic variability in plants and facilitates plant genome research.
Collapse
Affiliation(s)
- Xiwen Cai
- Department of Plant Sciences, North Dakota State University
| | - Steven S Xu
- USDA-ARS, Northern Crop Science Laboratory, Fargo, ND 58105, USA
| |
Collapse
|
16
|
Abstract
The fixation of advantageous mutations by natural selection has a profound impact on patterns of linked neutral variation. While it has long been appreciated that such selective sweeps influence the frequency spectrum of nearby polymorphism, it has only recently become clear that they also have dramatic effects on local linkage disequilibrium. By extending previous results on the relationship between genealogical structure and linkage disequilibrium, I obtain simple expressions for the influence of a selective sweep on patterns of allelic association. I show that sweeps can increase, decrease, or even eliminate linkage disequilibrium (LD) entirely depending on the relative position of the selected and neutral loci. I also show the importance of the age of the neutral mutations in predicting their degree of association and describe the consequences of such results for the interpretation of empirical data. In particular, I demonstrate that while selective sweeps can eliminate LD, they generate patterns of genetic variation very different from those expected from recombination hotspots.
Collapse
Affiliation(s)
- Gil McVean
- Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom.
| |
Collapse
|
17
|
Webster MT, Clegg JB, Harding RM. Common 5' beta-globin RFLP haplotypes harbour a surprising level of ancestral sequence mosaicism. Hum Genet 2003; 113:123-39. [PMID: 12736816 DOI: 10.1007/s00439-003-0954-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2002] [Accepted: 03/20/2003] [Indexed: 12/30/2022]
Abstract
Blocks of linkage disequilibrium (LD) in the human genome represent segments of ancestral chromosomes. To investigate the relationship between LD and genealogy, we analysed diversity associated with restriction fragment length polymorphism (RFLP) haplotypes of the 5' beta-globin gene complex. Genealogical analyses were based on sequence alleles that spanned a 12.2-kb interval, covering 3.1 kb around the psibeta gene and 6.2 kb of the delta-globin gene and its 5' flanking sequence known as the R/T region. Diversity was sampled from a Kenyan Luo population where recent malarial selection has contributed to substantial LD. A single common sequence allele spanning the 12.2-kb interval exclusively identified the ancestral chromosome bearing the "Bantu" beta(s) (sickle-cell) RFLP haplotype. Other common 5' RFLP haplotypes comprised interspersed segments from multiple ancestral chromosomes. Nucleotide diversity was similar between psibeta and R/T-delta-globin but was non-uniformly distributed within the R/T-delta-globin region. High diversity associated with the 5' R/T identified two ancestral lineages that probably date back more than 2 million years. Within this genealogy, variation has been introduced into the 3' R/T by gene conversion from other ancestral chromosomes. Diversity in delta-globin was found to lead through parts of the main genealogy but to coalesce in a more recent ancestor. The well-known recombination hotspot is clearly restricted to the region 3' of delta-globin. Our analyses show that, whereas one common haplotype in a block of high LD represents a long segment from a single ancestral chromosome, others are mosaics of short segments from multiple ancestors related in genealogies of unsuspected complexity.
Collapse
Affiliation(s)
- Matthew T Webster
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, Headington, Oxford, OX3 9DS, UK
| | | | | |
Collapse
|
18
|
Wakeley J, Lessard S. Theory of the effects of population structure and sampling on patterns of linkage disequilibrium applied to genomic data from humans. Genetics 2003; 164:1043-53. [PMID: 12871914 PMCID: PMC1462626 DOI: 10.1093/genetics/164.3.1043] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We develop predictions for the correlation of heterozygosity and for linkage disequilibrium between two loci using a simple model of population structure that includes migration among local populations, or demes. We compare the results for a sample of size two from the same deme (a single-deme sample) to those for a sample of size two from two different demes (a scattered sample). The correlation in heterozygosity for a scattered sample is surprisingly insensitive to both the migration rate and the number of demes. In contrast, the correlation in heterozygosity for a single-deme sample is sensitive to both, and the effect of an increase in the number of demes is qualitatively similar to that of a decrease in the migration rate: both increase the correlation in heterozygosity. These same conclusions hold for a commonly used measure of linkage disequilibrium (r(2)). We compare the predictions of the theory to genomic data from humans and show that subdivision might account for a substantial portion of the genetic associations observed within the human genome, even though migration rates among local populations of humans are relatively large. Because correlations due to subdivision rather than to physical linkage can be large even in a single-deme sample, then if long-term migration has been important in shaping patterns of human polymorphism, the common practice of disease mapping using linkage disequilibrium in "isolated" local populations may be subject to error.
Collapse
Affiliation(s)
- John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
19
|
Abstract
For finite populations, differences in individual histories can cause between-locus allelic dependencies even for unlinked loci. The main motivation for this study is to quantify the effect of such dependencies on genotypic match probabilities. We compare the two-locus match probability, the probability that two individuals (four gametes) chosen at random will have the same genotype at both loci, with the probability computed as the product of the one-locus match probabilities. It is demonstrated that the product rule probability always underestimates the two-locus match probability. For highly mutable minisatellite loci, these probabilities can differ by an order of magnitude or more. A simplified three-locus problem is explored, providing evidence that the degree of under-estimation worsens for more loci.
Collapse
Affiliation(s)
- Cecelia Laurie
- Department of Mathematics, University of Alabama, Tuscaloosa, AL 35487-0350, USA.
| | | |
Collapse
|
20
|
Abstract
The degree of association between alleles at different loci, or linkage disequilibrium, is widely used to infer details of evolutionary processes. Here I explore how associations between alleles relate to properties of the underlying genealogy of sequences. Under the neutral, infinite-sites assumption I show that there is a direct correspondence between the covariance in coalescence times at different parts of the genome and the degree of linkage disequilibrium. These covariances can be calculated exactly under the standard neutral model and by Monte Carlo simulation under different demographic models. I show that the effects of population growth, population bottlenecks, and population structure on linkage disequilibrium can be described through their effects on the covariance in coalescence times.
Collapse
|
21
|
Reich DE, Schaffner SF, Daly MJ, McVean G, Mullikin JC, Higgins JM, Richter DJ, Lander ES, Altshuler D. Human genome sequence variation and the influence of gene history, mutation and recombination. Nat Genet 2002; 32:135-42. [PMID: 12161752 DOI: 10.1038/ng947] [Citation(s) in RCA: 235] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Variation in the human genome sequence is key to understanding susceptibility to disease in modern populations and the history of ancestral populations. Unlocking this information requires knowledge of the patterns and underlying causes of human sequence diversity. By applying a new population-genetic framework to two genome-wide polymorphism surveys, we find that the human genome contains sizeable regions (stretching over tens of thousands of base pairs) that have intrinsically high and low rates of sequence variation. We show that the primary determinant of these patterns is shared genealogical history. Only a fraction of the variation (at most 25%) is due to the local mutation rate. By measuring the average distance over which genealogical histories are typically preserved, these data provide the first genome-wide estimate of the average extent of correlation among variants (linkage disequilibrium). The results are best explained by extreme variability in the recombination rate at a fine scale, and provide the first empirical evidence that such recombination 'hot spots' are a general feature of the human genome and have a principal role in shaping genetic variation in the human population.
Collapse
Affiliation(s)
- David E Reich
- Whitehead Institute/MIT Center for Genome Research, One Kendall Square, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Affiliation(s)
- PETER E. SMOUSE
- Center for Theoretical and Applied Genetics, and Department of Ecology, Evolution and Natural Resources, Rutgers University, New Brunswick, NJ 08903–0231, USA
| |
Collapse
|
23
|
Burdzy K, Holyst R, Ingerman D, March P. Configurational transition in a Fleming - Viot-type model and probabilistic interpretation of Laplacian eigenfunctions. ACTA ACUST UNITED AC 1999. [DOI: 10.1088/0305-4470/29/11/004] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
24
|
Harding RM, Fullerton SM, Griffiths RC, Bond J, Cox MJ, Schneider JA, Moulin DS, Clegg JB. Archaic African and Asian lineages in the genetic ancestry of modern humans. Am J Hum Genet 1997; 60:772-89. [PMID: 9106523 PMCID: PMC1712470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
A 3-kb region encompassing the beta-globin gene has been analyzed for allelic sequence polymorphism in nine populations from Africa, Asia, and Europe. A unique gene tree was constructed from 326 sequences of 349 in the total sample. New maximum-likelihood methods for analyzing gene trees on the basis of coalescence theory have been used. The most recent common ancestor of the beta-globin gene tree is a sequence found only in Africa and estimated to have arisen approximately 800,000 years ago. There is no evidence for an exponential expansion out of a bottlenecked founding population, and an effective population size of approximately 10,000 has been maintained. Modest differences in levels of beta-globin diversity between Africa and Asia are better explained by greater African effective population size than by greater time depth. There may have been a reduction of Asian effective population size in recent evolutionary history. Characteristically Asian ancestry is estimated to be older than 200,000 years, suggesting that the ancestral hominid population at this time was widely dispersed across Africa and Asia. Patterns of beta-globin diversity suggest extensive worldwide late Pleistocene gene flow and are not easily reconciled with a unidirectional migration out of Africa 100,000 years ago and total replacement of archaic populations in Asia.
Collapse
Affiliation(s)
- R M Harding
- MRC Molecular Haematology Unit, Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Martinson JJ, Excoffier L, Swinburn C, Boyce AJ, Harding RM, Langaney A, Clegg JB. High diversity of alpha-globin haplotypes in a Senegalese population, including many previously unreported variants. Am J Hum Genet 1995; 57:1186-98. [PMID: 7485171 PMCID: PMC1801359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
RFLP haplotypes at the alpha-globin gene complex have been examined in 190 individuals from the Niokolo Mandenka population of Senegal: haplotypes were assigned unambiguously for 210 chromosomes. The Mandenka share with other African populations a sample size-independent haplotype diversity that is much greater than that in any non-African population: the number of haplotypes observed in the Mandenka is typically twice that seen in the non-African populations sampled to date. Of these haplotypes, 17.3% had not been observed in any previous surveys, and a further 19.1% have previously been reported only in African populations. The haplotype distribution shows clear differences between African and non-African peoples, but this is on the basis of population-specific haplotypes combined with haplotypes common to all. The relationship of the newly reported haplotypes to those previously recorded suggests that several mutation processes, particularly recombination as homologous exchange or gene conversion, have been involved in their production. A computer program based on the expectation-maximization (EM) algorithm was used to obtain maximum-likelihood estimates of haplotype frequencies for the entire data set: good concordance between the unambiguous and EM-derived sets was seen for the overall haplotype frequencies. Some of the low-frequency haplotypes reported by the estimation algorithm differ greatly, in structure, from those haplotypes known to be present in human populations, and they may not represent haplotypes actually present in the sample.
Collapse
Affiliation(s)
- J J Martinson
- MRC Molecular Haematology Unit, Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, United Kingdom
| | | | | | | | | | | | | |
Collapse
|
26
|
Fullerton SM, Harding RM, Boyce AJ, Clegg JB. Molecular and population genetic analysis of allelic sequence diversity at the human beta-globin locus. Proc Natl Acad Sci U S A 1994; 91:1805-9. [PMID: 7907422 PMCID: PMC43252 DOI: 10.1073/pnas.91.5.1805] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Allelic sequence polymorphism at the beta-globin locus was investigated in a group of 36 Melanesians. A 3-kilobase fragment containing the gene and its flanking regions was sequenced in 60 normal (beta A) and 12 thalassemic (intron 1, position 5, G-->C) chromosomes. Haplotype relationships between linked polymorphisms were derived by allele-specific PCR amplification and sequencing. Seventeen nucleotide polymorphisms and 2 length variants were identified, and these sites segregated as 17 sequence haplotypes in the normal chromosomes. This haplotype diversity is higher than that expected on the basis of the nucleotide polymorphism observed and is probably due to recombination and gene conversion. Nucleotide diversity at synonymous sites in the sample is 0.14%, suggesting an average age of sequence divergence of approximately 450,000 years, consistent with that expected for a neutrally evolving human nuclear locus.
Collapse
Affiliation(s)
- S M Fullerton
- Medical Research Council Molecular Haematology Unit, University of Oxford, John Radcliffe Hospital, Headington, United Kingdom
| | | | | | | |
Collapse
|
27
|
Harding RM, Boyce AJ, Martinson JJ, Flint J, Clegg JB. A computer simulation study of VNTR population genetics: constrained recombination rules out the infinite alleles model. Genetics 1993; 135:911-22. [PMID: 8293988 PMCID: PMC1205730 DOI: 10.1093/genetics/135.3.911] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Extensive allelic diversity in variable numbers of tandem repeats (VNTRs) has been discovered in the human genome. For population genetic studies of VNTRs, such as forensic applications, it is important to know whether a neutral mutation-drift balance of VNTR polymorphism can be represented by the infinite alleles model. The assumption of the infinite alleles model that each new mutant is unique is very likely to be violated by unequal sister chromatid exchange (USCE), the primary process believed to generate VNTR mutants. We show that increasing both mutation rates and misalignment constraint for intrachromosomal recombination in a computer simulation model reduces simulated VNTR diversity below the expectations of the infinite alleles model. Maximal constraint, represented as slippage of single repeats, reduces simulated VNTR diversity to levels expected from the stepwise mutation model. Although misalignment rule is the more important variable, mutation rate also has an effect. At moderate rates of USCE, simulated VNTR diversity fluctuates around infinite alleles expectation. However, if rates of USCE are high, as for hypervariable VNTRs, simulated VNTR diversity is consistently lower than predicted by the infinite alleles model. This has been observed for many VNTRs and accounted for by technical problems in distinguishing alleles of neighboring size classes. We use sampling theory to confirm the intrinsically poor fit to the infinite alleles model of both simulated VNTR diversity and observed VNTR polymorphisms sampled from two Papua New Guinean populations.
Collapse
Affiliation(s)
- R M Harding
- MRC Molecular Haematology Unit, University of Oxford, John Radcliffe Hospital, Headington, United Kingdom
| | | | | | | | | |
Collapse
|
28
|
|
29
|
Sharma S, Sandhu DK, Bagga PS. Isozyme polymorphism of endo-beta-1,4-glucanase in Aspergillus nidulans. Biochem Genet 1990; 28:21-9. [PMID: 2188645 DOI: 10.1007/bf00554818] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
An electrophoretic survey of the natural populations of Aspergillus nidulans, the A. nidulans group, and various species belonging to the genus Aspergillus from diverse geographical areas of India was carried out to determine the isozyme polymorphism of endoglucanase. The data revealed the presence of three forms of endoglucanase designated EG I, EG II, and EG III. In some isolates, EG I and EG II were present separately; in others, instead of two separate bands, one thick band was detected, which was designated EG I. In natural isolates of A. nidulans and the A. nidulans group, EG III was detected in most, but not all, isolates, while EG I and EG II were always present. However, in various other species of the genus Aspergillus, EG II was totally lacking. In all the populations at the EG I and EG II region, seven electrophoretic variants each were detected, and at the EG III region four variants were seen. The data suggest that there may be two structural genes for endoglucanase, one coding for proteins in the EG I/EG II zone and another for protein in the EG III zone.
Collapse
Affiliation(s)
- S Sharma
- School of Life Sciences, Guru Nanak Dev University, Punjab, India
| | | | | |
Collapse
|
30
|
Hudson RR. Estimating the recombination parameter of a finite population model without selection. Genet Res (Camb) 1987; 50:245-50. [PMID: 3443297 DOI: 10.1017/s0016672300023776] [Citation(s) in RCA: 317] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
SummaryAn estimator is proposed for the parameter C = 4Nc. where N is the population size andcis the recombination rate. The estimator is appropriate for use with sequence or restriction site data from random samples from within populations. Properties of the estimator are investigated for an infinite-sites neutral model using Monte Carlo simulations. The median and mode of the distribution of the estimator are close to the true value for all parameter values examined, but large data sets are required to obtain reliable estimates.
Collapse
|
31
|
|
32
|
Kaplan N, Hudson RR. The use of sample genealogies for studying a selectively neutral m-loci model with recombination. Theor Popul Biol 1985; 28:382-96. [PMID: 4071443 DOI: 10.1016/0040-5809(85)90036-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
A selectively neutral m-loci model with recombination is studied. A general method is developed to calculate the variance of the number of segregating sites in samples of arbitrary size and the m-loci homozygosity. The method is based on properties of the genealogy of the sample rather than diffusion approximations. To demonstrate the scope of the method several calculations are presented.
Collapse
|
33
|
Hudson RR. The sampling distribution of linkage disequilibrium under an infinite allele model without selection. Genetics 1985; 109:611-31. [PMID: 3979817 PMCID: PMC1216291 DOI: 10.1093/genetics/109.3.611] [Citation(s) in RCA: 109] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The sampling distributions of several statistics that measure the association of alleles on gametes (linkage disequilibrium) are estimated under a two-locus neutral infinite allele model using an efficient Monte Carlo method. An often used approximation for the mean squared linkage disequilibrium is shown to be inaccurate unless the proper statistical conditioning is used. The joint distribution of linkage disequilibrium and the allele frequencies in the sample is studied. This estimated joint distribution is sufficient for obtaining an approximate maximum likelihood estimate of C = 4Nc, where N is the population size and c is the recombination rate. It has been suggested that observations of high linkage disequilibrium might be a good basis for rejecting a neutral model in favor of a model in which natural selection maintains genetic variation. It is found that a single sample of chromosomes, examined at two loci cannot provide sufficient information for such a test if C less than 10, because with C this small, very high levels of linkage disequilibrium are not unexpected under the neutral model. In samples of size 50, it is found that, even when C is as large as 50, the distribution of linkage disequilibrium conditional on the allele frequencies is substantially different from the distribution when there is no linkage between the loci. When conditioned on the number of alleles at each locus in the sample, all of the sample statistics examined are nearly independent of theta = 4N mu, where mu is the neutral mutation rate.
Collapse
|
34
|
Two-locus, fourth-order gene frequency moments: Implications for the variance of squared linkage disequilibrium and the variance of homozygosity. Theor Popul Biol 1983. [DOI: 10.1016/0040-5809(83)90040-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
35
|
Kreitman M. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature 1983; 304:412-7. [PMID: 6410283 DOI: 10.1038/304412a0] [Citation(s) in RCA: 398] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The sequencing of eleven cloned Drosophila melanogaster alcohol dehydrogenase (Adh) genes from five natural populations has revealed a large number of previously hidden polymorphisms. Only one of the 43 polymorphisms results in an amino acid change, the one responsible for the two electrophoretic variants (fast, Adh-f, and slow, Adh-s) found in nearly all natural populations. The implication is that most amino acid changes in Adh would be selectively deleterious.
Collapse
|
36
|
Abstract
An infinite-site neutral allele model with crossing-over possible at any of an infinite number of sites is studied. A formula for the variance of the number of segregating sites in a sample of gametes is obtained. An approximate expression for the expected homozygosity is also derived. Simulation results are presented to indicate the accuracy of the approximations. The results concerning the number of segregating sites and the expected homozygosity indicate that a two-locus model and the infinite-site model behave similarly for 4Nu less than or equal to 2 and r less than or equal to 5u, where N is the population size, u is the neutral mutation rate, and r is the recombination rate. Simulations of a two-locus model and a four-locus model were also carried out to determine the effect of intragenic recombination on the homozygosity test of Watterson (Genetics 85, 789-814; 88, 405-417) and on the number of unique alleles in a sample. The results indicate that for 4Nu less than or equal to 2 and r less than or equal to 10u, the effect of recombination is quite small.
Collapse
|
37
|
Abstract
Previous studies of human populations have failed to find a significant relationship between genetic variability, as measured by total heterozygosity, and cistron size, as measured by subunit molecular weight of proteins, but the number of different rare alleles in human populations has been shown to be correlated with subunit size. The present paper examines these relationships further, utilizing data on electrophoretic variants at 27 loci for 12 human populations with a total of 800 000 individual system observation. The results indicate that, if genetic variability is measured by rare allele heterozygosity instead of total heterozygosity, there is a significant correlation with subunit size. In addition, there are significant differences for rare allele heterozygosity between multimeric and monomeric proteins, the range of variability being less in the multimers (and in the total) than for monomers. Finally, rare allele heterozygosity has a much bigger range of variability than the range of subunit size. By contrast, the range of rare allele heterozygosity between populations is less than ten-fold, a factor not evident in effective population sizes. Both interlocus and interpopulational estimates of relative electromorph mutation rates (REMR) have been calculated, utilizing the distributions of the number of different rare alleles as well as rare allele heterozygosity. The range of these estimates are much lower than the estimates given by Zouros (1979) using total heterozygosity as input.
Collapse
|
38
|
|
39
|
Wehrhahn CF, Gulizia C. A model of electrophoretic variation: finite populations and finite numbers of mutable sites. Theor Popul Biol 1980; 18:1-15. [PMID: 7466667 DOI: 10.1016/0040-5809(80)90036-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
40
|
Bhatia KK. Factors affecting electromorph mutation rates in man: an analysis of data from Australian Aborigines. Ann Hum Biol 1980; 7:45-54. [PMID: 7396405 DOI: 10.1080/03014468000004041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The constraints of molecular size and structure on the relative magnitudes of electromorph mutation rates as calculated indirectly have been studies using data for Australian Aborigines. The role of sample size in detecting rare electromorphs is important. In addition, subunit size shows a positive and subunit number a negative correlation with mutation rate. The differences in mutation rates were 2--9-fold when calculated for different categories of the data. The importance of physicochemical constraints are discussed.
Collapse
|
41
|
Singh RS. Genic heterogeneity within electrophoretic "alleles" and the pattern of variation among loci in Drosophila pseudoobscura. Genetics 1979; 93:997-1018. [PMID: 546677 PMCID: PMC1214126 DOI: 10.1093/genetics/93.4.997] [Citation(s) in RCA: 31] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
An investigation, similar to our previously reported xanthine dehydrogenase study, was undertaken to examine the extent of hidden genic variation at nine loci (five larval proteins, three esterases and one aldehyde oxidase) by sequential application of various electrophoretic criteria employing pH, gel concentration and buffer variation. Polymorphic loci appear to fall into two distinct groups: weakly polymorphic, including larval protein 6, 7, 8, 10 and 13 and esterase-1 and -6; and highly polymorphic, including esterase-5, Xdh and possibly Ao. Monomorphic loci may belong to a third group different from all polymorphic loci. Bogota, a geographical isolate that is reproductively isolated from the mainland population, was found to be genetically distinct at four of the ten loci examined in detail so far, including Xdh, whereas previously it was found to be genetically distinct at none. These results are discussed in the light of balancing selection, neutral and mutation-selection hypotheses of genic variation in natural populations.
Collapse
|
42
|
Morgan K, Strobeck C. Is intragenic recombination a factor in the maintenance of genetic variation in natural populations? Nature 1979; 277:383-4. [PMID: 551258 DOI: 10.1038/277383a0] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|