1
|
Dapper AL, Payseur BA. Effects of Demographic History on the Detection of Recombination Hotspots from Linkage Disequilibrium. Mol Biol Evol 2018; 35:335-353. [PMID: 29045724 PMCID: PMC5850621 DOI: 10.1093/molbev/msx272] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
In some species, meiotic recombination is concentrated in small genomic regions. These "recombination hotspots" leave signatures in fine-scale patterns of linkage disequilibrium, raising the prospect that the genomic landscape of hotspots can be characterized from sequence variation. This approach has led to the inference that hotspots evolve rapidly in some species, but are conserved in others. Historic demographic events, such as population bottlenecks, are known to affect patterns of linkage disequilibrium across the genome, violating population genetic assumptions of this approach. Although such events are prevalent, demographic history is generally ignored when making inferences about the evolution of recombination hotspots. To determine the effect of demography on the detection of recombination hotspots, we use the coalescent to simulate haplotypes with a known recombination landscape. We measure the ability of popular linkage disequilibrium-based programs to detect hotspots across a range of demographic histories, including population bottlenecks, hidden population structure, population expansions, and population contractions. We find that demographic events have the potential to greatly reduce the power and increase the false positive rate of hotspot discovery. Neither the power nor the false positive rate of hotspot detection can be predicted without also knowing the demographic history of the sample. Our results suggest that ignoring demographic history likely overestimates the power to detect hotspots and therefore underestimates the degree of hotspot sharing between species. We suggest strategies for incorporating demographic history into population genetic inferences about recombination hotspots.
Collapse
Affiliation(s)
- Amy L Dapper
- Laboratory of Genetics, University of Wisconsin, Madison, WI
| | - Bret A Payseur
- Laboratory of Genetics, University of Wisconsin, Madison, WI
| |
Collapse
|
2
|
Variation in Recombination Rate: Adaptive or Not? Trends Genet 2017; 33:364-374. [DOI: 10.1016/j.tig.2017.03.003] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 03/06/2017] [Accepted: 03/07/2017] [Indexed: 01/30/2023]
|
3
|
Kamm JA, Spence JP, Chan J, Song YS. Two-Locus Likelihoods Under Variable Population Size and Fine-Scale Recombination Rate Estimation. Genetics 2016; 203:1381-99. [PMID: 27182948 PMCID: PMC4937484 DOI: 10.1534/genetics.115.184820] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 05/06/2016] [Indexed: 01/06/2023] Open
Abstract
Two-locus sampling probabilities have played a central role in devising an efficient composite-likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here new methods to compute the sampling probability for variable population size functions that are piecewise constant. Our main theoretical result, implemented in a new software package called LDpop, is a novel formula for the sampling probability that can be evaluated by numerically exponentiating a large but sparse matrix. This formula can handle moderate sample sizes ([Formula: see text]) and demographic size histories with a large number of epochs ([Formula: see text]). In addition, LDpop implements an approximate formula for the sampling probability that is reasonably accurate and scales to hundreds in sample size ([Formula: see text]). Finally, LDpop includes an importance sampler for the posterior distribution of two-locus genealogies, based on a new result for the optimal proposal distribution in the variable-size setting. Using our methods, we study how a sharp population bottleneck followed by rapid growth affects the correlation between partially linked sites. Then, through an extensive simulation study, we show that accounting for population size changes under such a demographic model leads to substantial improvements in fine-scale recombination rate estimation.
Collapse
Affiliation(s)
- John A Kamm
- Department of Statistics, University of California, Berkeley, California 94720 Computer Science Division, University of California, Berkeley, California 94720
| | - Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, California 94720
| | - Jeffrey Chan
- Computer Science Division, University of California, Berkeley, California 94720
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, California 94720 Computer Science Division, University of California, Berkeley, California 94720 Department of Integrative Biology, University of California, Berkeley, California 94720 Departments of Mathematics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
4
|
Abstract
The diffusion-generator approximation technique developed by De Iorio and Griffiths (2004a) is a very useful method of constructing importance-sampling proposal distributions. Being based on general mathematical principles, the method can be applied to various models in population genetics. In this paper we extend the technique to the neutral coalescent model with recombination, thus obtaining novel sampling distributions for the two-locus model. We consider the case with subdivided population structure, as well as the classic case with only a single population. In the latter case we also consider the importance-sampling proposal distributions suggested by Fearnhead and Donnelly (2001), and show that their two-locus distributions generally differ from ours. In the case of the infinitely-many-alleles model, our approximate sampling distributions are shown to be generally closer to the true distributions than are Fearnhead and Donnelly's.
Collapse
|
5
|
Genetic Background, Maternal Age, and Interaction Effects Mediate Rates of Crossing Over in Drosophila melanogaster Females. G3-GENES GENOMES GENETICS 2016; 6:1409-16. [PMID: 26994290 PMCID: PMC4856091 DOI: 10.1534/g3.116.027631] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Meiotic recombination is a genetic process that is critical for proper chromosome segregation in many organisms. Despite being fundamental for organismal fitness, rates of crossing over vary greatly between taxa. Both genetic and environmental factors contribute to phenotypic variation in crossover frequency, as do genotype-environment interactions. Here, we test the hypothesis that maternal age influences rates of crossing over in a genotypic-specific manner. Using classical genetic techniques, we estimated rates of crossing over for individual Drosophila melanogaster females from five strains over their lifetime from a single mating event. We find that both age and genetic background significantly contribute to observed variation in recombination frequency, as do genotype-age interactions. We further find differences in the effect of age on recombination frequency in the two genomic regions surveyed. Our results highlight the complexity of recombination rate variation and reveal a new role of genotype by maternal age interactions in mediating recombination rate.
Collapse
|
6
|
Hunter CM, Huang W, Mackay TFC, Singh ND. The Genetic Architecture of Natural Variation in Recombination Rate in Drosophila melanogaster. PLoS Genet 2016; 12:e1005951. [PMID: 27035832 PMCID: PMC4817973 DOI: 10.1371/journal.pgen.1005951] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 03/01/2016] [Indexed: 01/01/2023] Open
Abstract
Meiotic recombination ensures proper chromosome segregation in many sexually reproducing organisms. Despite this crucial function, rates of recombination are highly variable within and between taxa, and the genetic basis of this variation remains poorly understood. Here, we exploit natural variation in the inbred, sequenced lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to map genetic variants affecting recombination rate. We used a two-step crossing scheme and visible markers to measure rates of recombination in a 33 cM interval on the X chromosome and in a 20.4 cM interval on chromosome 3R for 205 DGRP lines. Though we cannot exclude that some biases exist due to viability effects associated with the visible markers used in this study, we find ~2-fold variation in recombination rate among lines. Interestingly, we further find that recombination rates are uncorrelated between the two chromosomal intervals. We performed a genome-wide association study to identify genetic variants associated with recombination rate in each of the two intervals surveyed. We refined our list of candidate variants and genes associated with recombination rate variation and selected twenty genes for functional assessment. We present strong evidence that five genes are likely to contribute to natural variation in recombination rate in D. melanogaster; these genes lie outside the canonical meiotic recombination pathway. We also find a weak effect of Wolbachia infection on recombination rate and we confirm the interchromosomal effect. Our results highlight the magnitude of population variation in recombination rate present in D. melanogaster and implicate new genetic factors mediating natural variation in this quantitative trait. During meiosis, homologous chromosomes exchange genetic material through recombination. In most sexually reproducing species, recombination is necessary for chromosomes to properly segregate. Recombination defects can generate gametes with an incorrect number of chromosomes, which is devastating for organismal fitness. Despite the central role of recombination for chromosome segregation, recombination is highly variable process both within and between species. Though it is clear that this variation is due at least in part to genetics, the specific genes contributing to variation in recombination within and between species remain largely unknown. This is particularly true in the model organism, Drosophila melanogaster. Here, we use the D. melanogaster Genetic Reference Panel to determine the scale of population-level variation in recombination rate and to identify genes significantly associated with this variation. We estimated rates of recombination on two different chromosomes in 205 strains of D. melanogaster. We also used genome-wide association mapping to identify genetic factors associated with recombination rate variation. We find that recombination rate on the two chromosomes are independent traits. We further find that population-level variation in recombination is mediated by many loci of small effect, and that the genes contributing to variation in recombination rate are outside of the well-characterized meiotic recombination pathway.
Collapse
Affiliation(s)
- Chad M. Hunter
- Program in Genetics, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, United States of America
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, North Carolina, United States of America
- * E-mail:
| | - Wen Huang
- Program in Genetics, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, United States of America
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, North Carolina, United States of America
- Initiative in Biological Complexity, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Trudy F. C. Mackay
- Program in Genetics, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, United States of America
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Nadia D. Singh
- Program in Genetics, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, United States of America
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| |
Collapse
|
7
|
Recombination hotspots: Models and tools for detection. DNA Repair (Amst) 2016; 40:47-56. [PMID: 26991854 DOI: 10.1016/j.dnarep.2016.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 02/09/2016] [Indexed: 11/22/2022]
Abstract
Recombination hotspots are the regions within the genome where the rate, and the frequency of recombination are optimum with a size varying from 1 to 2kb. The recombination event is mediated by the double-stranded break formation, guided by the combined enzymatic action of DNA topoisomerase and Spo 11 endonuclease. These regions are distributed non-uniformly throughout the human genome and cause distortions in the genetic map. Numerous lines of evidence suggest that the number of hotspots known in humans has increased manifold in recent years. A few facts about the hotspot evolutions were also put forward, indicating the differences in the hotspot position between chimpanzees and humans. In mice, recombination hot spots were found to be clustered within the major histocompatibility complex (MHC) region. Several models, that help explain meiotic recombination has been proposed. Moreover, scientists also developed some computational tools to locate the hotspot position and estimate their recombination rate in humans is of great interest to population and medical geneticists. Here we reviewed the molecular mechanisms, models and in silico prediction techniques of hot spot residues.
Collapse
|
8
|
Talas F, McDonald BA. Genome-wide analysis of Fusarium graminearum field populations reveals hotspots of recombination. BMC Genomics 2015; 16:996. [PMID: 26602546 PMCID: PMC4659151 DOI: 10.1186/s12864-015-2166-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 10/29/2015] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Fusarium graminearum (Fg) is a ubiquitous pathogen of wheat, barley and maize causing Fusarium head blight. Large annual yield losses and contamination of foodstuffs with harmful mycotoxins make Fg one of the most-studied plant pathogens. Analyses of natural field populations can lead to a better understanding of the evolutionary processes affecting this pathogen. Restriction site associated DNA sequencing (RADseq) was used to conduct population genomics analyses including 213 pathogen isolates from 13 German field populations of Fg. RESULTS High genetic diversity was found within Fg field populations and low differentiation (FST = 0.003) was found among populations. Linkage disequilibrium (LD) decayed rapidly over a distance of 1000 bp. The low multilocus LD indicates that significant sexual recombination occurs in all populations. Several recombination hotspots were detected on each chromosome, but different chromosomes showed different levels of recombination. There was some evidence for selection hotspots. CONCLUSIONS The population genomic structure of Fg is consistent with a high degree of sexual recombination that is not equally distributed across the chromosomes. The high gene flow found among these field populations should enable this pathogen to adapt rapidly to changes in its environment, including deployment of resistant cultivars, applications of fungicides and a warming climate.
Collapse
Affiliation(s)
- Firas Talas
- ETH Zurich, Institute of Integrative Biology, Zurich (IBZ), Plant Pathology, 8092, Zurich, Switzerland.
| | - Bruce A McDonald
- ETH Zurich, Institute of Integrative Biology, Zurich (IBZ), Plant Pathology, 8092, Zurich, Switzerland.
| |
Collapse
|
9
|
Hellenthal G, Busby GB, Band G, Wilson JF, Capelli C, Falush D, Myers S. A genetic atlas of human admixture history. Science 2014; 343:747-751. [PMID: 24531965 PMCID: PMC4209567 DOI: 10.1126/science.1243518] [Citation(s) in RCA: 477] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Modern genetic data combined with appropriate statistical methods have the potential to contribute substantially to our understanding of human history. We have developed an approach that exploits the genomic structure of admixed populations to date and characterize historical mixture events at fine scales. We used this to produce an atlas of worldwide human admixture history, constructed by using genetic data alone and encompassing over 100 events occurring over the past 4000 years. We identified events whose dates and participants suggest they describe genetic impacts of the Mongol empire, Arab slave trade, Bantu expansion, first millennium CE migrations in Eastern Europe, and European colonialism, as well as unrecorded events, revealing admixture to be an almost universal force shaping human populations.
Collapse
Affiliation(s)
- Garrett Hellenthal
- UCL Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK
| | - George B.J. Busby
- Department of Zoology, Oxford University, South Parks Road, Oxford OX1 3PS, UK
| | - Gavin Band
- Wellcome Trust Centre for Human Genetics, Oxford University, Roosevelt Drive, Oxford OX3 7BN, UK
| | - James F. Wilson
- Centre for Population Health Sciences, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK
| | - Cristian Capelli
- Department of Zoology, Oxford University, South Parks Road, Oxford OX1 3PS, UK
| | - Daniel Falush
- Max Planck Institute for Evolutionary Anthropology, DeutscherPlatz 6, 04103 Leipzig, Germany
| | - Simon Myers
- Wellcome Trust Centre for Human Genetics, Oxford University, Roosevelt Drive, Oxford OX3 7BN, UK
- Department of Statistics, Oxford University, 1 South Parks Road, Oxford OX1 3TG, UK
| |
Collapse
|
10
|
Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 2013; 194:647-62. [PMID: 23608192 DOI: 10.1534/genetics.112.149096] [Citation(s) in RCA: 124] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.
Collapse
|
11
|
Fine-scale heterogeneity in crossover rate in the garnet-scalloped region of the Drosophila melanogaster X chromosome. Genetics 2013; 194:375-87. [PMID: 23410829 DOI: 10.1534/genetics.112.146746] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Homologous recombination affects myriad aspects of genome evolution, from standing levels of nucleotide diversity to the efficacy of natural selection. Rates of crossing over show marked variability at all scales surveyed, including species-, population-, and individual-level differences. Even within genomes, crossovers are nonrandomly distributed in a wide diversity of taxa. Although intra- and intergenomic heterogeneities in crossover distribution have been documented in Drosophila, the scale and degree of crossover rate heterogeneity remain unclear. In addition, the genetic features mediating this heterogeneity are unknown. Here we quantify fine-scale heterogeneity in crossover distribution in a 2.1-Mb region of the Drosophila melanogaster X chromosome by localizing crossover breakpoints in 2500 individuals, each containing a single crossover in this specific X chromosome region. We show 90-fold variation in rates of crossing over at a 5-kb scale, place this variation in the context of several aspects of genome evolution, and identify several genetic features associated with crossover rates. Our results shed new light on the scale and magnitude of crossover rate heterogeneity in D. melanogaster and highlight potential features mediating this heterogeneity.
Collapse
|
12
|
Chan AH, Jenkins PA, Song YS. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet 2012; 8:e1003090. [PMID: 23284288 PMCID: PMC3527307 DOI: 10.1371/journal.pgen.1003090] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 09/29/2012] [Indexed: 01/18/2023] Open
Abstract
Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity. Recombination is a process by which chromosomes exchange genetic material during meiosis. It is important in evolution because it provides offspring with new combinations of genes, and so estimating the rate of recombination is of fundamental importance in various population genomic inference problems. In this paper, we develop a new statistical method to enable robust estimation of fine-scale recombination maps of Drosophila, a genus of common fruit flies, in which the background recombination rate is high and natural selection has been prevalent. We apply our method to produce fine-scale recombination maps for a North American population and an African population of D. melanogaster. For both populations, we find extensive fine-scale variation in recombination rate throughout the genome. We provide a quantitative characterization of the similarities and differences between the recombination maps of the two populations; our study reveals high correlation at broad scales and low correlation at fine scales, as has been documented among human populations. We also examine the correlation between various genomic features. Furthermore, using a conservative approach, we find a handful of putative recombination “hotspot” regions with solid statistical support for a local elevation of at least 10 times the background recombination rate.
Collapse
Affiliation(s)
- Andrew H. Chan
- Computer Science Division, University of California Berkeley, Berkeley, California, United States of America
| | - Paul A. Jenkins
- Computer Science Division, University of California Berkeley, Berkeley, California, United States of America
| | - Yun S. Song
- Computer Science Division, University of California Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
13
|
Paul JS, Song YS. Blockwise HMM computation for large-scale population genomic inference. ACTA ACUST UNITED AC 2012; 28:2008-15. [PMID: 22641715 DOI: 10.1093/bioinformatics/bts314] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets. RESULTS To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible. AVAILABILITY Software available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT yss@eecs.berkeley.edu.
Collapse
Affiliation(s)
- Joshua S Paul
- Computer Science Division and Department of Statistics, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
14
|
Abstract
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm.
Collapse
|
15
|
Paul JS, Steinrücken M, Song YS. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 2011; 187:1115-28. [PMID: 21270390 PMCID: PMC3070520 DOI: 10.1534/genetics.110.125534] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 01/21/2011] [Indexed: 02/07/2023] Open
Abstract
The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence data, estimating recombination rates, and inferring human colonization history.
Collapse
Affiliation(s)
- Joshua S. Paul
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720
| | - Matthias Steinrücken
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720
| | - Yun S. Song
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720
| |
Collapse
|
16
|
Abstract
Homologous recombination during meiosis is critical for the formation of gametes. Recombination is initiated by programmed DNA double-strand breaks which preferentially occur at hotspots dispersed throughout the genome. These double-strand breaks are repaired from the homolog, resulting in either a crossover or noncrossover product. Multiple noncrossover events are required for homolog pairing, and at least one crossover is critical for proper chromosome segregation at the first meiotic division. Consequently, homologous recombination in meiosis occurs at high frequencies. This chapter describes how to characterize crossovers and noncrossovers at a hotspot in mice using allele-specific PCR. Amplification of recombinant products directly from sperm DNA is a powerful approach to determine recombination frequencies and map recombination breakpoints, providing insight into homologous recombination mechanisms.
Collapse
Affiliation(s)
- Francesca Cole
- Developmental Biology Program, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA.
| | | |
Collapse
|
17
|
A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination. Genetics 2010; 186:321-38. [PMID: 20592264 DOI: 10.1534/genetics.110.117986] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimating recombination rates, inferring local ancestry in admixed populations, and importance sampling of coalescent genealogies. Unfortunately, the true CSD under the coalescent with recombination is not known, so approximations, formulated as hidden Markov models, have been proposed in the past. These approximations have led to a number of useful statistical tools, but it is important to recognize that they were not derived from, though were certainly motivated by, principles underlying the coalescent process. The goal of this article is to develop a principled approach to derive improved CSDs directly from the underlying population genetics model. Our approach is based on the diffusion process approximation and the resulting mathematical expressions admit intuitive genealogical interpretations, which we utilize to introduce further approximations and make our method scalable in the number of loci. The general algorithm presented here applies to an arbitrary number of loci and an arbitrary finite-alleles recurrent mutation model. Empirical results are provided to demonstrate that our new CSDs are in general substantially more accurate than previously proposed approximations.
Collapse
|
18
|
Khil PP, Camerini-Otero RD. Genetic crossovers are predicted accurately by the computed human recombination map. PLoS Genet 2010; 6:e1000831. [PMID: 20126534 PMCID: PMC2813264 DOI: 10.1371/journal.pgen.1000831] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 12/28/2009] [Indexed: 11/26/2022] Open
Abstract
Hotspots of meiotic recombination can change rapidly over time. This instability and the reported high level of inter-individual variation in meiotic recombination puts in question the accuracy of the calculated hotspot map, which is based on the summation of past genetic crossovers. To estimate the accuracy of the computed recombination rate map, we have mapped genetic crossovers to a median resolution of 70 Kb in 10 CEPH pedigrees. We then compared the positions of crossovers with the hotspots computed from HapMap data and performed extensive computer simulations to compare the observed distributions of crossovers with the distributions expected from the calculated recombination rate maps. Here we show that a population-averaged hotspot map computed from linkage disequilibrium data predicts well present-day genetic crossovers. We find that computed hotspot maps accurately estimate both the strength and the position of meiotic hotspots. An in-depth examination of not-predicted crossovers shows that they are preferentially located in regions where hotspots are found in other populations. In summary, we find that by combining several computed population-specific maps we can capture the variation in individual hotspots to generate a hotspot map that can predict almost all present-day genetic crossovers. In eukaryotes genetic crossovers are responsible for generating genetic diversity and ensuring the proper segregation of chromosomes. Genetic crossovers are tightly clustered in hotspots. Although the existence of hotspots in humans is clearly proven, mechanisms of their formation and the regulation of meiotic recombination in general remain poorly understood. An additional complication in studies of meiotic recombination is the fact that the direct experimental mapping of human hotspots on a genome-wide scale is not feasible with current methods. The best available indirect methods compute the position of hotspots from patterns of historic associations between genetic markers in population samples. In this study we determined the positions of genetic crossovers in ten pedigrees of European origin and then compared the positions of crossovers with the hotspots computed from HapMap data. Importantly, we find that the population-averaged computed map is in close agreement with the observed distribution of genetic crossovers. We also find that cryptic hotspots that are not easily detected in the computed European map can be more effectively identified if other populations are included in the analysis. Our analysis shows that high-resolution recombination profiles are highly similar between distantly related populations and that by including computed hotspots from several populations we can predict nearly all crossovers.
Collapse
Affiliation(s)
- Pavel P. Khil
- Genetics and Biochemistry Branch, The National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - R. Daniel Camerini-Otero
- Genetics and Biochemistry Branch, The National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
19
|
Cairns KM, Wolff JN, Brooks RC, Ballard JWO. Evidence of recent population expansion in the field cricket Teleogryllus commodus. AUST J ZOOL 2010. [DOI: 10.1071/zo09118] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The patterns of intraspecific genetic variation can be driven by large-scale environmental events or smaller-scale phenomena such as land clearing. In Australia, European farming techniques have altered the landscape by increasing the amount of arable farmland. We hypothesised that this increase in farmland would result in a concomitant increase in the effective population size of the black field cricket (Teleogryllus commodus). To test our hypothesis, we investigated genetic variation in 1350 bp of mitochondrial mtDNA and in two nuclear encoded loci, hexokinase and elongation factor 1-α, from 20 crickets collected at Smiths Lake, New South Wales. Molecular variation in T. commodus was characterised by an over-representation of singleton mutations (negative Tajima’s D and Fu and Li’s D) in all loci studied. Further, HKA tests do not suggest that selection is acting on any one gene. Combined, these data support the hypothesis that population expansion is the force driving molecular variation in T. commodus. If an increase in agricultural habitats is the cause of population expansion in T. commodus we hypothesise greater genetic subdivision in natural than farmland habitats. An alternative possibility is that the effective geographical range of the species has increased but the density at a given site remains unchanged.
Collapse
|
20
|
Singh ND, Aquadro CF, Clark AG. Estimation of fine-scale recombination intensity variation in the white-echinus interval of D. melanogaster. J Mol Evol 2009; 69:42-53. [PMID: 19504037 DOI: 10.1007/s00239-009-9250-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Revised: 04/27/2009] [Accepted: 05/15/2009] [Indexed: 01/19/2023]
Abstract
Accurate assessment of local recombination rate variation is crucial for understanding the recombination process and for determining the impact of natural selection on linked sites. In Drosophila, local recombination intensity has been estimated primarily by statistical approaches, by estimating the local slope of the relationship between the physical and genetic maps. However, these estimates are limited in resolution and, as a result, the physical scale at which recombination intensity varies in Drosophila is largely unknown. Although there is some evidence suggesting as much as a 40-fold variation in crossover rate at a local scale in D. pseudoobscura, little is known about the fine-scale structure of recombination rate variation in D. melanogaster. Here we experimentally examine the fine-scale distribution of crossover events in a 1.2-Mb region on the D. melanogaster X chromosome using a classic genetic mapping approach. Our results show that crossover frequency is significantly heterogeneous within this region, varying approximately 3.5-fold. Simulations suggest that this degree of heterogeneity is sufficient to affect levels of standing nucleotide diversity, although the magnitude of this effect is small. We recover no statistical association between empirical estimates of nucleotide diversity and recombination intensity, which is likely due to the limited number of loci sampled in our population genetic data set. However, codon bias is significantly negatively correlated with fine-scale recombination intensity estimates, as expected. Our results shed light on the relevant physical scale to consider in evolutionary analyses relating to recombination rate and highlight the motivations to increase the resolution of the recombination map in Drosophila.
Collapse
Affiliation(s)
- Nadia D Singh
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| | | | | |
Collapse
|
21
|
Statistical power analysis of neutrality tests under demographic expansions, contractions and bottlenecks with recombination. Genetics 2008; 179:555-67. [PMID: 18493071 DOI: 10.1534/genetics.107.083006] [Citation(s) in RCA: 217] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Several tests have been proposed to detect departures of nucleotide variability patterns from neutral expectations. However, very different kinds of evolutionary processes, such as selective events or demographic changes, can produce similar deviations from these tests, thus making interpretation difficult when a significant departure of neutrality is detected. Here we study the effects of demography and recombination upon neutrality tests by analyzing their power under sudden population expansions, sudden contractions, and bottlenecks. We evaluate tests based on the frequency spectrum of mutations and the distribution of haplotypes and explore the consequences of using incorrect estimates of the rates of recombination when testing for neutrality. We show that tests that rely on haplotype frequencies-especially Fs and ZnS, which are based, respectively, on the number of different haplotypes and on the r2 values between all pairs of polymorphic sites-are the most powerful for detecting expansions on nonrecombining genomic regions. Nevertheless, they are strongly affected by misestimations of recombination, so they should not be used when recombination levels are unknown. Instead, class I tests, particularly Tajima's D or R2, are recommended.
Collapse
|
22
|
Fraction of informative recombinations: a heuristic approach to analyze recombination rates. Genetics 2008; 178:2069-79. [PMID: 18430934 DOI: 10.1534/genetics.107.082255] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In this article we present a new heuristic approach (informative recombinations, InfRec) to analyze recombination density at the sequence level. InfRec is intuitive and easy and combines previously developed methods that (i) resolve genotypes into haplotypes, (ii) estimate the minimum number of recombinations, and (iii) evaluate the fraction of informative recombinations. We tested this approach in its sliding-window version on 117 genes from the SeattleSNPs program, resequenced in 24 African-Americans (AAs) and 23 European-Americans (EAs). We obtained population recombination rate estimates (rho(obs)) of 0.85 and 0.37 kb(-1) in AAs and EAs, respectively. Coalescence simulations indicated that these values account for both the recombinations and the gene conversions in the history of the sample. The intensity of rho(obs) varied considerably along the sequence, revealing the presence of recombination hotspots. Overall, we observed approximately 80% of recombinations in one-third and approximately 50% in only 10% of the sequence. InfRec performance, tested on published simulated and additional experimental data sets, was similar to that of other hotspot detection methods. Fast, intuitive, and visual, InfRec is not constrained by sample size limitations. It facilitates understanding data and provides a simple and flexible tool to analyze recombination intensity along the sequence.
Collapse
|
23
|
Griffiths RC, Jenkins PA, Song YS. IMPORTANCE SAMPLING AND THE TWO-LOCUS MODEL WITH SUBDIVIDED POPULATION STRUCTURE. ADV APPL PROBAB 2008; 40:473-500. [PMID: 19936262 DOI: 10.1239/aap/1214950213] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The diffusion-generator approximation technique developed by De Iorio and Griffiths (2004a) is a very useful method of constructing importance sampling proposal distributions. Being based on general mathematical principles, the method can be applied to various models in population genetics. In this paper we extend the technique to the neutral coalescent model with recombination, thus obtaining novel sampling distributions for the two-locus model. We consider the case with subdivided population structure, as well as the classic case with only a single population. In the latter case we also consider the importance sampling proposal distributions suggested by Fearnhead and Donnelly (2001), and show that their two-locus distributions generally differ from ours. In the case of the infinitely-many-alleles model, our approximate sampling distributions are shown to be generally closer to the true distributions than are Fearnhead and Donnelly's.
Collapse
|
24
|
Abstract
Our understanding of the details of mammalian meiotic recombination has recently advanced significantly. Sperm typing technologies, linkage studies, and computational inferences from population genetic data have together provided information in unprecedented detail about the location and activity of the sites of crossing-over in mice and humans. The results show that the vast majority of meiotic recombination events are localized to narrow DNA regions (hot spots) that constitute only a small fraction of the genome. The data also suggest that the molecular basis of hot spot activity is unlikely to be strictly determined by specific DNA sequence motifs in cis. Further molecular studies are needed to understand how hot spots originate, function and evolve.
Collapse
Affiliation(s)
- Norman Arnheim
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089-2910, USA.
| | | | | |
Collapse
|
25
|
Hellenthal G, Auton A, Falush D. Inferring human colonization history using a copying model. PLoS Genet 2008; 4:e1000078. [PMID: 18497854 PMCID: PMC2367454 DOI: 10.1371/journal.pgen.1000078] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2007] [Accepted: 04/18/2008] [Indexed: 01/12/2023] Open
Abstract
Genome-wide scans of genetic variation can potentially provide detailed information on how modern humans colonized the world but require new methods of analysis. We introduce a statistical approach that uses Single Nucleotide Polymorphism (SNP) data to identify sharing of chromosomal segments between populations and uses the pattern of sharing to reconstruct a detailed colonization scenario. We apply our model to the SNP data for the 53 populations of the Human Genome Diversity Project described in Conrad et al. (Nature Genetics 38,1251-60, 2006). Our results are consistent with the consensus view of a single “Out-of-Africa” bottleneck and serial dilution of diversity during global colonization, including a prominent East Asian bottleneck. They also suggest novel details including: (1) the most northerly East Asian population in the sample (Yakut) has received a significant genetic contribution from the ancestors of the most northerly European one (Orcadian). (2) Native South Americans have received ancestry from a source closely related to modern North-East Asians (Mongolians and Oroquen) that is distinct from the sources for native North Americans, implying multiple waves of migration into the Americas. A detailed depiction of the peopling of the world is available in animated form. Humans like to tell stories. Amongst the most captivating is the story of the global spread of modern humans from their original homeland in Africa. Traditionally this has been the preserve of anthropologists, but geneticists are starting to make an important contribution. However, genetic evidence is typically analyzed in the context of anthropological preconceptions. For genetics to provide an accurate and detailed history without reference to anthropology, methods are required that translate DNA sequence data into histories. We introduce a statistical method that has three virtues. First, it is based on a copying model that incorporates the block-by-block inheritance of DNA from one generation to the next. This allows it to capture the rich information provided by patterns of DNA sharing across the whole genome. Second, its parameter space includes an enormous number of possible colonization scenarios, meaning that inferences are correspondingly rich in detail. Third, the inferred colonization scenario is determined algorithmically. We have applied this method to data from 53 human populations and find that while the current consensus is broadly supported, some populations have surprising histories. This scenario can be viewed as a movie, making it transparent where statistical analysis ends and where interpretation begins.
Collapse
Affiliation(s)
| | - Adam Auton
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Daniel Falush
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Department of Microbiology, Environmental Research Institute, Cork, Ireland
- * E-mail:
| |
Collapse
|
26
|
Li N. The promise of composite likelihood methods for addressing computationally intensive challenges. ADVANCES IN GENETICS 2008; 60:637-654. [PMID: 18358335 DOI: 10.1016/s0065-2660(07)00422-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
High-dimensional genetic data, due to its complex correlation structure, poses an enormous challenge to standard likelihood-based methods for making statistical inference. As an approximation, composite likelihood has proved to be a successful strategy for some genetic applications. It has the potential to see even wider application and much research is needed. We first give a brief description of composite likelihood. The advantage of this method and potential challenges in inference are noted. Next, its applications in genetic studies are reviewed, specifically in estimating population genetics parameters such as recombination rate, and in multi-locus linkage disequilibrium mapping of disease genes with some discussion about future research directions.
Collapse
Affiliation(s)
- Na Li
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
27
|
Khil PP, Camerini-Otero RD. Variation in patterns of human meiotic recombination. GENOME DYNAMICS 2008; 5:117-127. [PMID: 18948711 DOI: 10.1159/000166623] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In the last 30 years it has become evident that patterns of meiotic recombination can be highly variable among individuals. The evidence comes from both low and high resolution analyses of hotspots of recombination in human and other species. In addition, a comparison of the recombination profiles in closely related species such as human and chimpanzee reveals essentially no correlation in the position of hotspots. Although the variation in hotspots of meiotic recombination is clearly documented, the mechanisms responsible for such variation are far from being understood. Here we will review the available evidence of natural variation in meiotic recombination and will discuss potential implications of this variation on the functional mechanisms of crossover formation and control.
Collapse
Affiliation(s)
- P P Khil
- Genetics and Biochemistry Branch, NIDDK, National Institutes of Health, Bethesda, Md., USA
| | | |
Collapse
|
28
|
Jiang P, Wu H, Wei J, Sang F, Sun X, Lu Z. RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Res 2007; 35:W47-51. [PMID: 17478517 PMCID: PMC1933199 DOI: 10.1093/nar/gkm217] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In the yeast, meiotic recombination is initiated by double-strand DNA breaks (DSBs) which occur at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Although observations concerning individual hot/cold spots have given clues as to the mechanism of recombination initiation, the prediction of hot/cold spots from DNA sequence information is a challenging task. In this article, we introduce a random forest (RF) prediction model to detect recombination hot/cold spots from yeast genome. The out-of-bag (OOB) estimation of the model indicated that the RF classifier achieved high prediction performance with 82.05% total accuracy and 0.638 Mattew's correlation coefficient (MCC) value. Compared with an alternative machine-learning algorithm, support vector machine (SVM), the RF method outperforms it in both sensitivity and specificity. The prediction model is implemented as a web server (RF-DYMHC) and it is freely available at http://www.bioinf.seu.edu.cn/Recombination/rf_dymhc.htm. Given a yeast genome and prediction parameters (RI-value and non-overlapping window scan size), the program reports the predicted hot/cold spots and marks them in color.
Collapse
Affiliation(s)
| | | | | | | | | | - Zuhong Lu
- *To whom correspondence should be addressed: +86 25 83793779+86 25 83793779
| |
Collapse
|
29
|
Freudenberg J, Fu YH, Ptácek LJ. Human recombination rates are increased around accelerated conserved regions—evidence for continued selection? Bioinformatics 2007; 23:1441-3. [PMID: 17463031 DOI: 10.1093/bioinformatics/btm137] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION We hypothesized that recombination rates might be increased at genetic loci that are subject to more intense selection. Here, we test this hypothesis by using a recently published set of accelerated conserved regions and fine-scale recombination rate estimates provided by the HapMap project. RESULTS We observed that fine-scale recombination rates are increased around conserved noncoding regions that show accelerated evolution in human or chimp, as compared to noncoding regions showing accelerated evolution in mouse and those being conserved between human and fugu. Recombination rates around hominid accelerated conserved regions (ACRs) are furthermore increased as compared to exonic regions. On the other hand, GC-content is reduced around ACRs, excluding a major confounding influence of GC-content on the observed variation in recombination rate. CONCLUSION Our observations indicate that selection intensity could be an important determinant of local recombination rate variation and that continued positive selection might act at many ACR loci. Alternatively, a confounding factor needs to be found that causes a congruent signal in recombination rate estimates based on human polymorphism data and in the comparative genomic data. Researchers who consider the explanation involving selection as more likely may expect more common functional sequence variants at ACRs in genetic association studies.
Collapse
Affiliation(s)
- Jan Freudenberg
- University of California San Francisco, Department of Neurology, Institute of Human Genetics, San Francisco, CA 94158-2922, USA.
| | | | | |
Collapse
|
30
|
Raedt TD, Stephens M, Heyns I, Brems H, Thijs D, Messiaen L, Stephens K, Lazaro C, Wimmer K, Kehrer-Sawatzki H, Vidaud D, Kluwe L, Marynen P, Legius E. Conservation of hotspots for recombination in low-copy repeats associated with the NF1 microdeletion. Nat Genet 2006; 38:1419-23. [PMID: 17115058 DOI: 10.1038/ng1920] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2006] [Accepted: 10/16/2006] [Indexed: 11/08/2022]
Abstract
Several large-scale studies of human genetic variation have provided insights into processes such as recombination that have shaped human diversity. However, regions such as low-copy repeats (LCRs) have proven difficult to characterize, hindering efforts to understand the processes operating in these regions. We present a detailed study of genetic variation and underlying recombination processes in two copies of an LCR (NF1REPa and NF1REPc) on chromosome 17 involved in the generation of NF1 microdeletions and in a third copy (REP19) on chromosome 19 from which the others originated over 6.7 million years ago. We find evidence for shared hotspots of recombination among the LCRs. REP19 seems to contain hotspots in the same place as the nonallelic recombination hotspots in NF1REPa and NF1REPc. This apparent conservation of patterns of recombination hotspots in moderately diverged paralogous regions contrasts with recent evidence that these patterns are not conserved in less-diverged orthologous regions of chimpanzees.
Collapse
Affiliation(s)
- Thomas De Raedt
- Department of Human Genetics, Catholic University Leuven, Leuven, Belgium
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Ostrovsky O, Korostishevsky M, Levite I, Leiba M, Galski H, Gazit E, Vlodavsky I, Nagler A. Characterization of HPSE gene single nucleotide polymorphisms in Jewish populations of Israel. Acta Haematol 2006; 117:57-64. [PMID: 17095861 DOI: 10.1159/000096790] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Accepted: 05/15/2006] [Indexed: 12/23/2022]
Abstract
Heparanase is a mammalian endoglucuronidase responsible for heparan sulfate (HS) degradation. HS is a major constituent of the extracellular matrix (ECM) and HS-degrading activity plays a decisive role in fundamental biological processes associated with remodeling of the ECM, such as cancer metastasis, angiogenesis and inflammation. There is great interest in the prospect of genome-wide association studies to identify genetic factors underlying complex diseases. It is important to establish a detailed description of the heparanase (HPSE) gene single nucleotide polymorphisms (SNPs). In this study, four Israeli Jewish populations (Ashkenazi, North African, Mediterranean and Near Eastern) were examined for 7 HPSE gene SNPs. Four out of 7 SNPs (rs4693608, db11099592, rs4364254, db6856901) were found to be polymorphic. Population comparisons revealed significant differences in SNPs allele frequency between Near Eastern and each of the other three populations. Genotype and allele frequencies in Jewish populations were different from non-Jewish populations, except for a certain similarity to Caucasians. Although the distance between SNPs is relatively small, the db11099592 SNP was in linkage disequilibrium (LD) only with the proximal SNP rs4693608. LD between distal SNPs rs4364254 and db6856901 was found only in Mediterraneans and North Africans. The current study provides a characterization of the normally occurring HPSE gene SNPs in different populations. This information is obligatory for further studies on the linkage between these SNPs and heparanase expression and function in various pathological processes, primarily cancer progression.
Collapse
Affiliation(s)
- Olga Ostrovsky
- Laboratory of Molecular Immunobiology, Department of Hematology and Bone Marrow Transplantation, Chaim Sheba Medical Center, Tel Hashomer, Israel.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Marjoram P, Tavaré S. Modern computational approaches for analysing molecular genetic variation data. Nat Rev Genet 2006; 7:759-70. [PMID: 16983372 DOI: 10.1038/nrg1961] [Citation(s) in RCA: 158] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
An explosive growth is occurring in the quantity, quality and complexity of molecular variation data that are being collected. Historically, such data have been analysed by using model-based methods. Models are useful for sharpening intuition, for explanation and for prediction: they add to our understanding of how the data were formed, and they can provide quantitative answers to questions of interest. We outline some of these model-based approaches, including the coalescent, and discuss the applicability of the computational methods that are necessary given the highly complex nature of current and future data sets.
Collapse
Affiliation(s)
- Paul Marjoram
- University of Southern California, Keck School of Medicine, Preventive Medicine, 1540 Alcazar Street, CHP-220, Los Angeles, California 90089-99011, USA
| | | |
Collapse
|
33
|
Abstract
MOTIVATION There is much local variation in recombination rates across the human genome--with the majority of recombination occurring in recombination hotspots--short regions of around approximately 2 kb in length that have much higher recombination rates than neighbouring regions. Knowledge of this local variation is important, e.g. in the design and analysis of association studies for disease genes. Population genetic data, such as that generated by the HapMap project, can be used to infer the location of these hotspots. We present a new, efficient and powerful method for detecting recombination hotspots from population data. RESULTS We compare our method with four current methods for detecting hotspots. It is orders of magnitude quicker, and has greater power, than two related approaches. It appears to be more powerful than HotspotFisher, though less accurate at inferring the precise positions of the hotspot. It was also more powerful than LDhot in some situations: particularly for weaker hotspots (10-40 times the background rate) when SNP density is lower (< 1/kb). AVAILABILITY Program, data sets, and full details of results are available at: http://www.maths.lancs.ac.uk/~fearnhea/Hotspot.
Collapse
Affiliation(s)
- Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK.
| |
Collapse
|
34
|
Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, Rosenberg NA, Pritchard JK. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet 2006; 38:1251-60. [PMID: 17057719 DOI: 10.1038/ng1911] [Citation(s) in RCA: 355] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2006] [Accepted: 09/22/2006] [Indexed: 12/30/2022]
Abstract
Recent genomic surveys have produced high-resolution haplotype information, but only in a small number of human populations. We report haplotype structure across 12 Mb of DNA sequence in 927 individuals representing 52 populations. The geographic distribution of haplotypes reflects human history, with a loss of haplotype diversity as distance increases from Africa. Although the extent of linkage disequilibrium (LD) varies markedly across populations, considerable sharing of haplotype structure exists, and inferred recombination hotspot locations generally match across groups. The four samples in the International HapMap Project contain the majority of common haplotypes found in most populations: averaging across populations, 83% of common 20-kb haplotypes in a population are also common in the most similar HapMap sample. Consequently, although the portability of tag SNPs based on the HapMap is reduced in low-LD Africans, the HapMap will be helpful for the design of genome-wide association mapping studies in nearly all human populations.
Collapse
Affiliation(s)
- Donald F Conrad
- Department of Human Genetics, University of Chicago, 920 East 58th Street, Chicago, Illinois 60637, USA
| | | | | | | | | | | | | |
Collapse
|
35
|
Hellenthal G, Stephens M. Insights into recombination from population genetic variation. Curr Opin Genet Dev 2006; 16:565-72. [PMID: 17049225 DOI: 10.1016/j.gde.2006.10.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Accepted: 10/04/2006] [Indexed: 11/20/2022]
Abstract
Patterns of genetic variation in natural populations are shaped by, and hence carry valuable information about, the underlying recombination process. In the past five years, the increasing availability of large-scale population genetic data on dense sets of markers, coupled with advances in statistical methods for extracting information from these data, have led to several important advances in our understanding of the recombination process in humans. These advances include the identification of large numbers of 'hotspots', where recombination appears to take place considerably more frequently than in the surrounding sequence, and the identification of DNA sequence motifs that are associated with the locations of these hotspots.
Collapse
Affiliation(s)
- Garrett Hellenthal
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | | |
Collapse
|
36
|
Li J, Zhang MQ, Zhang X. A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data. Am J Hum Genet 2006; 79:628-39. [PMID: 16960799 PMCID: PMC1592557 DOI: 10.1086/508066] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2006] [Accepted: 07/25/2006] [Indexed: 11/03/2022] Open
Abstract
Computational detection of recombination hotspots from population polymorphism data is important both for understanding the nature of recombination and for applications such as association studies. We propose a new method for this task based on a multiple-hotspot model and an (approximate) log-likelihood ratio test. A truncated, weighted pairwise log-likelihood is introduced and applied to the calculation of the log-likelihood ratio, and a forward-selection procedure is adopted to search for the optimal hotspot predictions. The method shows a relatively high power with a low false-positive rate in detecting multiple hotspots in simulation data and has a performance comparable to the best results of leading computational methods in experimental data for which recombination hotspots have been characterized by sperm-typing experiments. The method can be applied to both phased and unphased data directly, with a very fast computational speed. We applied the method to the 10 500-kb regions of the HapMap ENCODE data and found 172 hotspots among the three populations, with average hotspot width of 2.4 kb. By comparisons with the simulation data, we found some evidence that hotspots are not all identical across populations. The correlations between detected hotspots and several genomic characteristics were examined. In particular, we observed that DNaseI-hypersensitive sites are enriched in hotspots, suggesting the existence of human beta hotspots similar to those found in yeast.
Collapse
Affiliation(s)
- Jun Li
- Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
37
|
Padhukasahasram B, Wall JD, Marjoram P, Nordborg M. Estimating recombination rates from single-nucleotide polymorphisms using summary statistics. Genetics 2006; 174:1517-28. [PMID: 16980396 PMCID: PMC1667054 DOI: 10.1534/genetics.106.060723] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe a novel method for jointly estimating crossing-over and gene-conversion rates from population genetic data using summary statistics. The performance of our method was tested on simulated data sets and compared with the composite-likelihood method of R. R. Hudson. For several realistic parameter values, the new method performed similarly to the composite-likelihood approach for estimating crossing-over rates and better when estimating gene-conversion rates. We used our method to analyze a human data set recently genotyped by Perlegen Sciences.
Collapse
Affiliation(s)
- Badri Padhukasahasram
- Molecular and Computational Biology and Biostatistics Division, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California 90089, USA.
| | | | | | | |
Collapse
|
38
|
Abstract
Meiotic recombination in humans is thought to occur as part of the resolution of DSBs (double-strand breaks). The repair of DSBs potentially leads to biases in DNA repair that can distort the population frequency of the alleles at single-nucleotide polymorphisms. Genome-wide variation data provide evidence for a weak fixation bias in favour of G and C alleles that is strongest at the centre of inferred recombination hotspots.
Collapse
Affiliation(s)
- C C A Spencer
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX13TG, UK.
| |
Collapse
|
39
|
Spencer CCA, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G. The influence of recombination on human genetic diversity. PLoS Genet 2006; 2:e148. [PMID: 17044736 PMCID: PMC1575889 DOI: 10.1371/journal.pgen.0020148] [Citation(s) in RCA: 203] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Accepted: 07/31/2006] [Indexed: 11/25/2022] Open
Abstract
In humans, the rate of recombination, as measured on the megabase scale, is positively associated with the level of genetic variation, as measured at the genic scale. Despite considerable debate, it is not clear whether these factors are causally linked or, if they are, whether this is driven by the repeated action of adaptive evolution or molecular processes such as double-strand break formation and mismatch repair. We introduce three innovations to the analysis of recombination and diversity: fine-scale genetic maps estimated from genotype experiments that identify recombination hotspots at the kilobase scale, analysis of an entire human chromosome, and the use of wavelet techniques to identify correlations acting at different scales. We show that recombination influences genetic diversity only at the level of recombination hotspots. Hotspots are also associated with local increases in GC content and the relative frequency of GC-increasing mutations but have no effect on substitution rates. Broad-scale association between recombination and diversity is explained through covariance of both factors with base composition. To our knowledge, these results are the first evidence of a direct and local influence of recombination hotspots on genetic variation and the fate of individual mutations. However, that hotspots have no influence on substitution rates suggests that they are too ephemeral on an evolutionary time scale to have a strong influence on broader scale patterns of base composition and long-term molecular evolution. Patterns of genetic variation in the human genome provide a history of the evolutionary forces that have shaped our species. The role of one factor, recombination, in shaping variation is much debated. The observation is that regions of the genome with high recombination also have high levels of genetic variation, but this pattern can be interpreted as evidence for either repeated, widespread adaptive evolution or correlation through neutral factors such as base composition. To resolve this issue, the authors constructed a genetic map of human Chromosome 20 that has a resolution more than three orders in magnitude greater than previous maps. By comparing the location of recombination hotspots with patterns of genetic variation, evolution, and base composition, the authors show that recombination has only a very local influence on diversity, which suggests that molecular mechanisms, such as mismatch-associated repair or double-strand break formation, not adaptive evolution, drives the association.
Collapse
Affiliation(s)
| | - Panos Deloukas
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Sarah Hunt
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Jim Mullikin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Simon Myers
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Broad Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bernard Silverman
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Peter Donnelly
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | | | - Gil McVean
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|