1
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. Am J Hum Genet 2024; 111:691-700. [PMID: 38513668 PMCID: PMC11023918 DOI: 10.1016/j.ajhg.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
2
|
Schrider DR. Allelic gene conversion softens selective sweeps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.05.570141. [PMID: 38106127 PMCID: PMC10723294 DOI: 10.1101/2023.12.05.570141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The prominence of positive selection, in which beneficial mutations are favored by natural selection and rapidly increase in frequency, is a subject of intense debate. Positive selection can result in selective sweeps, in which the haplotype(s) bearing the adaptive allele "sweep" through the population, thereby removing much of the genetic diversity from the region surrounding the target of selection. Two models of selective sweeps have been proposed: classical sweeps, or "hard sweeps", in which a single copy of the adaptive allele sweeps to fixation, and "soft sweeps", in which multiple distinct copies of the adaptive allele leave descendants after the sweep. Soft sweeps can be the outcome of recurrent mutation to the adaptive allele, or the presence of standing genetic variation consisting of multiple copies of the adaptive allele prior to the onset of selection. Importantly, soft sweeps will be common when populations can rapidly adapt to novel selective pressures, either because of a high mutation rate or because adaptive alleles are already present. The prevalence of soft sweeps is especially controversial, and it has been noted that selection on standing variation or recurrent mutations may not always produce soft sweeps. Here, we show that the inverse is true: selection on single-origin de novo mutations may often result in an outcome that is indistinguishable from a soft sweep. This is made possible by allelic gene conversion, which "softens" hard sweeps by copying the adaptive allele onto multiple genetic backgrounds, a process we refer to as a "pseudo-soft" sweep. We carried out a simulation study examining the impact of gene conversion on sweeps from a single de novo variant in models of human, Drosophila, and Arabidopsis populations. The fraction of simulations in which gene conversion had produced multiple haplotypes with the adaptive allele upon fixation was appreciable. Indeed, under realistic demographic histories and gene conversion rates, even if selection always acts on a single-origin mutation, sweeps involving multiple haplotypes are more likely than hard sweeps in large populations, especially when selection is not extremely strong. Thus, even when the mutation rate is low or there is no standing variation, hard sweeps are expected to be the exception rather than the rule in large populations. These results also imply that the presence of signatures of soft sweeps does not necessarily mean that adaptation has been especially rapid or is not mutation limited.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
3
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565574. [PMID: 37961601 PMCID: PMC10635131 DOI: 10.1101/2023.11.03.565574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.
Collapse
Affiliation(s)
| | - Brian L. Browning
- Department of Biostatistics, University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| |
Collapse
|
4
|
Ki C, Terhorst J. Exact Decoding of a Sequentially Markov Coalescent Model in Genetics. J Am Stat Assoc 2023; 119:2242-2255. [PMID: 39323740 PMCID: PMC11421421 DOI: 10.1080/01621459.2023.2252570] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 08/01/2023] [Accepted: 08/17/2023] [Indexed: 09/27/2024]
Abstract
In statistical genetics, the sequentially Markov coalescent (SMC) is an important family of models for approximating the distribution of genetic variation data under complex evolutionary models. Methods based on SMC are widely used in genetics and evolutionary biology, with significant applications to genotype phasing and imputation, recombination rate estimation, and inferring population history. SMC allows for likelihood-based inference using hidden Markov models (HMMs), where the latent variable represents a genealogy. Because genealogies are continuous, while HMMs are discrete, SMC requires discretizing the space of trees in a way that is awkward and creates bias. In this work, we propose a method that circumvents this requirement, enabling SMC-based inference to be performed in the natural setting of a continuous state space. We derive fast, exact procedures for frequentist and Bayesian inference using SMC. Compared to existing methods, ours requires minimal user intervention or parameter tuning, no numerical optimization or E-M, and is faster and more accurate.
Collapse
Affiliation(s)
- Caleb Ki
- Department of Statistics, University of Michigan
| | | |
Collapse
|
5
|
Lauterbur ME, Cavassim MIA, Gladstein AL, Gower G, Pope NS, Tsambos G, Adrion J, Belsare S, Biddanda A, Caudill V, Cury J, Echevarria I, Haller BC, Hasan AR, Huang X, Iasi LNM, Noskova E, Obsteter J, Pavinato VAC, Pearson A, Peede D, Perez MF, Rodrigues MF, Smith CCR, Spence JP, Teterina A, Tittes S, Unneberg P, Vazquez JM, Waples RK, Wohns AW, Wong Y, Baumdicker F, Cartwright RA, Gorjanc G, Gutenkunst RN, Kelleher J, Kern AD, Ragsdale AP, Ralph PL, Schrider DR, Gronau I. Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations. eLife 2023; 12:RP84874. [PMID: 37342968 DOI: 10.7554/elife.84874] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2023] Open
Abstract
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
Collapse
Affiliation(s)
- M Elise Lauterbur
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, United States
| | - Maria Izabel A Cavassim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | | | - Graham Gower
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Nathaniel S Pope
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Georgia Tsambos
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Jeffrey Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Ancestry DNA, San Francisco, United States
| | - Saurabh Belsare
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | | | - Victoria Caudill
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jean Cury
- Universite Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numerique, Orsay, France
| | | | - Benjamin C Haller
- Department of Computational Biology, Cornell University, Ithaca, United States
| | - Ahmed R Hasan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, Canada
| | - Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | | | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, St Petersburg, Russian Federation
| | - Jana Obsteter
- Agricultural Institute of Slovenia, Department of Animal Science, Ljubljana, Slovenia
| | | | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - David Peede
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, United States
- Center for Computational Molecular Biology, Brown University, Providence, United States
| | - Manolo F Perez
- Department of Genetics and Evolution, Federal University of Sao Carlos, Sao Carlos, Brazil
| | - Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jeffrey P Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Anastasia Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Per Unneberg
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Juan Manuel Vazquez
- Department of Integrative Biology, University of California, Berkeley, Berkeley, United States
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, United States
| | | | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Franz Baumdicker
- Cluster of Excellence - Controlling Microbes to Fight Infections, Eberhard Karls Universit¨at Tubingen, Tubingen, Germany
| | - Reed A Cartwright
- School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, United States
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, United States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, United States
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Department of Mathematics, University of Oregon, Eugene, United States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| |
Collapse
|
6
|
Guarracino A, Buonaiuto S, de Lima LG, Potapova T, Rhie A, Koren S, Rubinstein B, Fischer C, Gerton JL, Phillippy AM, Colonna V, Garrison E. Recombination between heterologous human acrocentric chromosomes. Nature 2023; 617:335-343. [PMID: 37165241 PMCID: PMC10172130 DOI: 10.1038/s41586-023-05976-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 03/17/2023] [Indexed: 05/12/2023]
Abstract
The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications1,2. Although the resolution of these regions in the first complete assembly of a human genome-the Telomere-to-Telomere Consortium's CHM13 assembly (T2T-CHM13)-provided a model of their homology3, it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination6,7. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations8, and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago9.
Collapse
Affiliation(s)
- Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Vincenza Colonna
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| |
Collapse
|
7
|
Setter D, Ebdon S, Jackson B, Lohse K. Estimating the rates of crossover and gene conversion from individual genomes. Genetics 2022; 222:iyac100. [PMID: 35771626 PMCID: PMC9434185 DOI: 10.1093/genetics/iyac100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 06/01/2022] [Indexed: 11/14/2022] Open
Abstract
Recombination can occur either as a result of crossover or gene conversion events. Population genetic methods for inferring the rate of recombination from patterns of linkage disequilibrium generally assume a simple model of recombination that only involves crossover events and ignore gene conversion. However, distinguishing the 2 processes is not only necessary for a complete description of recombination, but also essential for understanding the evolutionary consequences of inversions and other genomic partitions in which crossover (but not gene conversion) is reduced. We present heRho, a simple composite likelihood scheme for coestimating the rate of crossover and gene conversion from individual diploid genomes. The method is based on analytic results for the distance-dependent probability of heterozygous and homozygous states at 2 loci. We apply heRho to simulations and data from the house mouse Mus musculus castaneus, a well-studied model. Our analyses show (1) that the rates of crossover and gene conversion can be accurately coestimated at the level of individual chromosomes and (2) that previous estimates of the population scaled rate of recombination ρ=4Ner under a pure crossover model are likely biased.
Collapse
Affiliation(s)
- Derek Setter
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Sam Ebdon
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Ben Jackson
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
8
|
Wall JD, Robinson JA, Cox LA. High-Resolution Estimates of Crossover and Noncrossover Recombination from a Captive Baboon Colony. Genome Biol Evol 2022; 14:evac040. [PMID: 35325119 PMCID: PMC9048888 DOI: 10.1093/gbe/evac040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2022] [Indexed: 11/17/2022] Open
Abstract
Homologous recombination has been extensively studied in humans and a handful of model organisms. Much less is known about recombination in other species, including nonhuman primates. Here, we present a study of crossovers (COs) and noncrossover (NCO) recombination in olive baboons (Papio anubis) from two pedigrees containing a total of 20 paternal and 17 maternal meioses, and compare these results to linkage disequilibrium (LD) based recombination estimates from 36 unrelated olive baboons. We demonstrate how COs, combined with LD-based recombination estimates, can be used to identify genome assembly errors. We also quantify sex-specific differences in recombination rates, including elevated male CO and reduced female CO rates near telomeres. Finally, we add to the increasing body of evidence suggesting that while most NCO recombination tracts in mammals are short (e.g., <500 bp), there is a non-negligible fraction of longer (e.g., >1 kb) NCO tracts. For NCO tracts shorter than 10 kb, we fit a mixture of two (truncated) geometric distributions model to the NCO tract length distribution and estimate that >99% of all NCO tracts are very short (mean 24 bp), but the remaining tracts can be quite long (mean 4.3 kb). A single geometric distribution model for NCO tract lengths is incompatible with the data, suggesting that LD-based methods for estimating NCO recombination rates that make this assumption may need to be modified.
Collapse
Affiliation(s)
- Jeffrey D. Wall
- Institute for Human Genetics, University of California San Francisco, USA
| | | | - Laura A. Cox
- Center for Precision Medicine, Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, USA
| |
Collapse
|
9
|
Harkness A, Goldberg EE, Brandvain Y. Diversification or Collapse of Self-Incompatibility Haplotypes as a Rescue Process. Am Nat 2021; 197:E89-E109. [DOI: 10.1086/712424] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
10
|
Abstract
Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here, we describe recombination landscape estimation using recurrent neural networks (ReLERNN), a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, although largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.
Collapse
Affiliation(s)
- Jeffrey R Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR
| | - Jared G Galloway
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR
| |
Collapse
|
11
|
Adapting Biased Gene Conversion theory to account for intensive GC-content deterioration in the human genome by novel mutations. PLoS One 2020; 15:e0232167. [PMID: 32353016 PMCID: PMC7192473 DOI: 10.1371/journal.pone.0232167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 04/09/2020] [Indexed: 12/23/2022] Open
Abstract
We examined seventy million well-characterized human mutations, and their impact on G+C-compositional dynamics, in order to understand the formation and maintenance of major genomic nucleotide sequence patterns. Among novel mutations, those that change a strong (S) base pair G:C/C:G to a weak (W) pair A:T/T:A occur at nearly twice the frequency of the opposite mutations. Such imbalance puts strong downward pressure on overall GC-content. However, along protracted paths to fixation, S→W mutations are much less likely to propagate than W→S mutations. The magnitude of relative propagation disadvantages for S→W mutations is inexplicable by any currently-accepted model. This fact forced us to re-examine the quantitative features of Biased Gene Conversion (BGC) theory. Revised parameters of BGC that, per average individual, convert 7–14 W base pairs into S pairs, would account for the S-content turnover differences between new and old mutations, and make BGC an instrumental force for nucleotide dynamics and evolution. BGC should thus be considered seriously in both theories and biomedical practice. In particular, BGC should be taken into account during allele imputations, where missing SNP alleles are computationally predicted based on the information about several neighboring alleles. Finally, we analyzed the effect of neighboring nucleotide context on the mutation frequencies, dynamics, and GC-composition turnover. For this purpose, we examined genomic regions having extremely biased nucleotide compositions (enriched for S-, W-, purine/pyrimidine strand asymmetry, or AC/GT-strand asymmetry). It was found that point mutations in these regions preferentially degrade the nucleotide inhomogeneities, decreasing the sequence biases. Degradation of sequence bias is highest for novel mutations, and considerably lower for older mutations (those widespread across populations). Besides BGC, there may be additional, still uncharacterized molecular mechanisms that either preserve genomic regions with biased nucleotide compositions from mutational degradation or fail to degrade such inhomogeneities in specific chromosomal regions.
Collapse
|
12
|
Campbell MC, Ashong B, Teng S, Harvey J, Cross CN. Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6. BMC Evol Biol 2019; 19:218. [PMID: 31791241 PMCID: PMC6889576 DOI: 10.1186/s12862-019-1516-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 09/20/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lymphotoxin-α (LTα), located in the Major Histocompatibility Complex (MHC) class III region on chromosome 6, encodes a cytotoxic protein that mediates a variety of antiviral responses among other biological functions. Furthermore, several genotypes at this gene have been implicated in the onset of a number of complex diseases, including myocardial infarction, autoimmunity, and various types of cancer. However, little is known about levels of nucleotide variation and linkage disequilibrium (LD) in and near LTα, which could also influence phenotypic variance. To address this gap in knowledge, we examined sequence variation across ~ 10 kilobases (kbs), encompassing LTα and the upstream region, in 2039 individuals from the 1000 Genomes Project originating from 21 global populations. RESULTS Here, we observed striking patterns of diversity, including an excess of intermediate-frequency alleles, the maintenance of multiple common haplotypes and a deep coalescence time for variation (dating > 1.0 million years ago), in global populations. While these results are generally consistent with a model of balancing selection, we also uncovered a signature of positive selection in the form of long-range LD on chromosomes with derived alleles primarily in Eurasian populations. To reconcile these findings, which appear to support different models of selection, we argue that selective sweeps (particularly, soft sweeps) of multiple derived alleles in and/or near LTα occurred in non-Africans after their ancestors left Africa. Furthermore, these targets of selection were predicted to alter transcription factor binding site affinity and protein stability, suggesting they play a role in gene function. Additionally, our data also showed that a subset of these functional adaptive variants are present in archaic hominin genomes. CONCLUSIONS Overall, this study identified candidate functional alleles in a biologically-relevant genomic region, and offers new insights into the evolutionary origins of these loci in modern human populations.
Collapse
Affiliation(s)
- Michael C. Campbell
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Bryan Ashong
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Shaolei Teng
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Jayla Harvey
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Christopher N. Cross
- Department of Anatomy, College of Medicine, Howard University, Washington, DC 20059 USA
| |
Collapse
|
13
|
Tian X, Browning BL, Browning SR. Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent. Am J Hum Genet 2019; 105:883-893. [PMID: 31587867 DOI: 10.1016/j.ajhg.2019.09.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/09/2019] [Indexed: 12/20/2022] Open
Abstract
The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].
Collapse
|
14
|
Schweiger R, Erlich Y, Carmi S. FactorialHMM: fast and exact inference in factorial hidden Markov models. Bioinformatics 2019; 35:2162-2164. [PMID: 30445428 DOI: 10.1093/bioinformatics/bty944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 11/07/2018] [Accepted: 11/13/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Hidden Markov models (HMMs) are powerful tools for modeling processes along the genome. In a standard genomic HMM, observations are drawn, at each genomic position, from a distribution whose parameters depend on a hidden state, and the hidden states evolve along the genome as a Markov chain. Often, the hidden state is the Cartesian product of multiple processes, each evolving independently along the genome. Inference in these so-called Factorial HMMs has a naïve running time that scales as the square of the number of possible states, which by itself increases exponentially with the number of sub-chains; such a running time scaling is impractical for many applications. While faster algorithms exist, there is no available implementation suitable for developing bioinformatics applications. RESULTS We developed FactorialHMM, a Python package for fast exact inference in Factorial HMMs. Our package allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, we allow the inference of all key quantities related to HMMs: (i) the (Viterbi) sequence of states with the highest posterior probability; (ii) the likelihood of the data and (iii) the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states. Our package is highly modular, providing the user with maximal flexibility for developing downstream applications. AVAILABILITY AND IMPLEMENTATION https://github.com/regevs/factorial_hmm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.,MyHeritage, Or Yehuda, Israel
| | - Yaniv Erlich
- MyHeritage, Or Yehuda, Israel.,Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY, USA.,Department of Systems Biology, Center for Computational Biology and Bioinformatics (C2B2), Columbia University, New York, NY, USA.,New York Genome Center, New York, NY, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
15
|
Dutta R, Saha-Mandal A, Cheng X, Qiu S, Serpen J, Fedorova L, Fedorov A. 1000 human genomes carry widespread signatures of GC biased gene conversion. BMC Genomics 2018; 19:256. [PMID: 29661137 PMCID: PMC5902838 DOI: 10.1186/s12864-018-4593-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 03/12/2018] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND GC-Biased Gene Conversion (gBGC) is one of the important theories put forward to explain profound long-range non-randomness in nucleotide compositions along mammalian chromosomes. Nucleotide changes due to gBGC are hard to distinguish from regular mutations. Here, we present an algorithm for analysis of millions of known SNPs that detects a subset of so-called "SNP flip-over" events representing recent gBGC nucleotide changes, which occurred in previous generations via non-crossover meiotic recombination. RESULTS This algorithm has been applied in a large-scale analysis of 1092 sequenced human genomes. Altogether, 56,328 regions on all autosomes have been examined, which revealed 223,955 putative gBGC cases leading to SNP flip-overs. We detected a strong bias (11.7% ± 0.2% excess) in AT- > GC over GC- > AT base pair changes within the entire set of putative gBGC cases. CONCLUSIONS On average, a human gamete acquires 7 SNP flip-over events, in which one allele is replaced by its complementary allele during the process of meiotic non-crossover recombination. In each meiosis event, on average, gBGC results in replacement of 7 AT base pairs by GC base pairs, while only 6 GC pairs are replaced by AT pairs. Therefore, every human gamete is enriched by one GC pair. Happening over millions of years of evolution, this bias may be a noticeable force in changing the nucleotide composition landscape along chromosomes.
Collapse
Affiliation(s)
- Rajib Dutta
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Present Address: Center for Cardiovascular and Pulmonary Research, Nationwide Children’s Hospital, 700 Children’s Dr, Columbus, OH USA
| | - Arnab Saha-Mandal
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Present Address: Biochemistry and Molecular Biology Graduate Program, Cumming School of Medicine, University of Calgary, Calgary, AB T2N4N1 Canada
| | - Xi Cheng
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| | - Shuhao Qiu
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| | - Jasmine Serpen
- SURF Program, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- College of Arts and Sciences, Washington University in St. Louis, 1 Brookings Dr, St. Louis, MO 63130 USA
| | | | - Alexei Fedorov
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| |
Collapse
|
16
|
Population genetic evidence for positive and purifying selection acting at the human IFN-γ locus in Africa. Genes Immun 2018; 20:143-157. [PMID: 29599512 DOI: 10.1038/s41435-018-0016-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 01/22/2018] [Accepted: 01/26/2018] [Indexed: 01/09/2023]
Abstract
Despite its critical role in the defense against microbial infection and tumor development, little is known about the range of nucleotide and haplotype variation at IFN-γ, or the evolutionary forces that have shaped patterns of diversity at this locus. To address this gap in knowledge, we examined sequence data from the IFN-γ gene in 1461 individuals from 15 worldwide populations. Our analyses uncovered novel patterns of variation in distinct African populations, including an excess of high frequency-derived alleles, unusually long haplotype structure surrounding the IFN-γ gene, and a "star-like" genealogy of African-specific haplotypes carrying variants previously associated with infectious disease. We also inferred a deep time to coalescence of variation at IFN-γ (~ 0.8 million years ago) and ancient ages for common polymorphisms predating the evolution of modern humans. Taken together, these results are congruent with a model of positive selection on standing variation in African populations. Furthermore, we inferred that common variants in intron 3 of IFN-γ are the likely targets of selection. In addition, we observed a paucity of non-synonymous substitutions relative to synonymous changes in the exons of IFN-γ in African and non-African populations, suggestive of strong purifying selection. Therefore, we contend that positive and purifying selection have influenced levels of diversity in different regions of IFN-γ, implying that these distinct genic regions are, or have been, functionally important. Overall, this study provides additional insights into the evolutionary events that have contributed to the frequency and distribution of alleles having a role in human health and disease.
Collapse
|
17
|
Fyon F, Lenormand T. Cis-regulator runaway and divergence in asexuals. Evolution 2018; 72:426-439. [DOI: 10.1111/evo.13424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 12/06/2017] [Accepted: 12/09/2017] [Indexed: 12/26/2022]
Affiliation(s)
- Frédéric Fyon
- CEFE, CNRS, Univ Montpellier, Univ Paul Valéry Montpellier 3, EPHE, IRD; Montpellier France
| | - Thomas Lenormand
- CEFE, CNRS, Univ Montpellier, Univ Paul Valéry Montpellier 3, EPHE, IRD; Montpellier France
| |
Collapse
|
18
|
Korunes KL, Noor MAF. Gene conversion and linkage: effects on genome evolution and speciation. Mol Ecol 2016; 26:351-364. [DOI: 10.1111/mec.13736] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Revised: 06/07/2016] [Accepted: 06/22/2016] [Indexed: 12/12/2022]
|
19
|
Affiliation(s)
- Yun S Song
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720, Department of Mathematics and Department of Biology and University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
20
|
Beck EA, Thompson AC, Sharbrough J, Brud E, Llopart A. Gene flow between Drosophila yakuba and Drosophila santomea in subunit V of cytochrome c oxidase: A potential case of cytonuclear cointrogression. Evolution 2015; 69:1973-86. [PMID: 26155926 PMCID: PMC5042076 DOI: 10.1111/evo.12718] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2014] [Revised: 06/13/2015] [Accepted: 06/16/2015] [Indexed: 12/11/2022]
Abstract
Introgression is the effective exchange of genetic information between species through natural hybridization. Previous genetic analyses of the Drosophila yakuba—D. santomea hybrid zone showed that the mitochondrial genome of D. yakuba had introgressed into D. santomea and completely replaced its native form. Since mitochondrial proteins work intimately with nuclear‐encoded proteins in the oxidative phosphorylation (OXPHOS) pathway, we hypothesized that some nuclear genes in OXPHOS cointrogressed along with the mitochondrial genome. We analyzed nucleotide variation in the 12 nuclear genes that form cytochrome c oxidase (COX) in 33 Drosophila lines. COX is an OXPHOS enzyme composed of both nuclear‐ and mitochondrial‐encoded proteins and shows evidence of cytonuclear coadaptation in some species. Using maximum‐likelihood methods, we detected significant gene flow from D. yakuba to D. santomea for the entire COX complex. Interestingly, the signal of introgression is concentrated in the three nuclear genes composing subunit V, which shows population migration rates significantly greater than the background level of introgression in these species. The detection of introgression in three proteins that work together, interact directly with the mitochondrial‐encoded core, and are critical for early COX assembly suggests this could be a case of cytonuclear cointrogression.
Collapse
Affiliation(s)
- Emily A Beck
- Interdisciplinary Graduate Program in Genetics, The University of Iowa, Iowa City, Iowa, 52242
| | - Aaron C Thompson
- The Department of Biology, The University of Iowa, Iowa City, IA, 52242
| | - Joel Sharbrough
- The Department of Biology, The University of Iowa, Iowa City, IA, 52242
| | - Evgeny Brud
- The Department of Biology, The University of Iowa, Iowa City, IA, 52242
| | - Ana Llopart
- Interdisciplinary Graduate Program in Genetics, The University of Iowa, Iowa City, Iowa, 52242. .,The Department of Biology, The University of Iowa, Iowa City, IA, 52242.
| |
Collapse
|
21
|
Williams AL, Genovese G, Dyer T, Altemose N, Truax K, Jun G, Patterson N, Myers SR, Curran JE, Duggirala R, Blangero J, Reich D, Przeworski M. Non-crossover gene conversions show strong GC bias and unexpected clustering in humans. eLife 2015; 4. [PMID: 25806687 PMCID: PMC4404656 DOI: 10.7554/elife.04637] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 03/20/2015] [Indexed: 12/15/2022] Open
Abstract
Although the past decade has seen tremendous progress in our understanding of fine-scale recombination, little is known about non-crossover (NCO) gene conversion. We report the first genome-wide study of NCO events in humans. Using SNP array data from 98 meioses, we identified 103 sites affected by NCO, of which 50/52 were confirmed in sequence data. Overlap with double strand break (DSB) hotspots indicates that most of the events are likely of meiotic origin. We estimate that a site is involved in a NCO at a rate of 5.9 × 10(-6)/bp/generation, consistent with sperm-typing studies, and infer that tract lengths span at least an order of magnitude. Observed NCO events show strong allelic bias at heterozygous AT/GC SNPs, with 68% (58-78%) transmitting GC alleles (p = 5 × 10(-4)). Strikingly, in 4 of 15 regions with resequencing data, multiple disjoint NCO tracts cluster in close proximity (∼20-30 kb), a phenomenon not previously seen in mammals.
Collapse
Affiliation(s)
- Amy L Williams
- Department of Biological Sciences, Columbia University, New York, United States
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, United States
| | - Thomas Dyer
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, United States
| | - Nicolas Altemose
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| | - Katherine Truax
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, United States
| | - Goo Jun
- Department of Biostatistics, University of Michigan, Ann Arbor, United States
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, United States
| | - Simon R Myers
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| | - Joanne E Curran
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, United States
| | - Ravi Duggirala
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, United States
| | - John Blangero
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, United States
| | - David Reich
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, United States
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, United States
| | | |
Collapse
|
22
|
Yin J. Hypothesis testing of meiotic recombination rates from population genetic data. BMC Genet 2014; 15:122. [PMID: 25433522 PMCID: PMC4267743 DOI: 10.1186/s12863-014-0122-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 10/28/2014] [Indexed: 11/10/2022] Open
Abstract
Background Meiotic recombination, one of the central biological processes studied in population genetics, comes in two known forms: crossovers and gene conversions. A number of previous studies have shown that when one of these two events is nonexistent in the genealogical model, the point estimation of the corresponding recombination rate by population genetic methods tends to be inflated. Therefore, it has become necessary to obtain statistical evidence from population genetic data about whether one of the two recombination events is absent. Results In this paper, we formulate this problem in a hypothesis testing framework and devise a testing procedure based on the likelihood ratio test (LRT). However, because the null value (i.e., zero) lies on the boundary of the parameter space, the regularity conditions for the large‐sample approximation to the distribution of the LRT statistic do not apply. In turn, the standard chi‐squared approximation is inaccurate. To address this critical issue, we propose a parametric bootstrap procedure to obtain an approximate p‐value for the observed test statistic. Coalescent simulations are conducted to show that our approach yields accurate null p‐values that closely follow the theoretical prediction while the estimated alternative p‐values tend to concentrate closer to zero. Finally, the method is demonstrated on a real biological data set from the telomere of the X chromosome of African Drosophila melanogaster. Conclusions Our methodology provides a necessary complement to the existing procedures of estimating meiotic recombination rates from population genetic data. Electronic supplementary material The online version of this article (doi:10.1186/s12863-014-0122-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junming Yin
- Department of Management Information Systems, Eller College of Management, University of Arizona, Tucson, 85721, USA.
| |
Collapse
|
23
|
Abstract
Recombination maps of ancestral species can be constructed from comparative analyses of genomes from closely related species, exemplified by a recently published map of the human-chimpanzee ancestor. Such maps resolve differences in recombination rate between species into changes along individual branches in the speciation tree, and allow identification of associated changes in the genomic sequences. We describe how coalescent hidden Markov models are able to call individual recombination events in ancestral species through inference of incomplete lineage sorting along a genomic alignment. In the great apes, speciation events are sufficiently close in time that a map can be inferred for the ancestral species at each internal branch - allowing evolution of recombination rate to be tracked over evolutionary time scales from speciation event to speciation event. We see this approach as a way of characterizing the evolution of recombination rate and the genomic properties that influence it.
Collapse
|
24
|
Popovic I, Marko PB, Wares JP, Hart MW. Selection and demographic history shape the molecular evolution of the gamete compatibility protein bindin in Pisaster sea stars. Ecol Evol 2014; 4:1567-88. [PMID: 24967076 PMCID: PMC4063459 DOI: 10.1002/ece3.1042] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Revised: 02/15/2014] [Accepted: 02/26/2014] [Indexed: 12/18/2022] Open
Abstract
Reproductive compatibility proteins have been shown to evolve rapidly under positive selection leading to reproductive isolation, despite the potential homogenizing effects of gene flow. This process has been implicated in both primary divergence among conspecific populations and reinforcement during secondary contact; however, these two selective regimes can be difficult to discriminate from each other. Here, we describe the gene that encodes the gamete compatibility protein bindin for three sea star species in the genus Pisaster. First, we compare the full-length bindin-coding sequence among all three species and analyze the evolutionary relationships between the repetitive domains of the variable second bindin exon. The comparison suggests that concerted evolution of repetitive domains has an effect on bindin divergence among species and bindin variation within species. Second, we characterize population variation in the second bindin exon of two species: We show that positive selection acts on bindin variation in Pisaster ochraceus but not in Pisaster brevispinus, which is consistent with higher polyspermy risk in P. ochraceus. Third, we show that there is no significant genetic differentiation among populations and no apparent effect of sympatry with congeners that would suggest selection based on reinforcement. Fourth, we combine bindin and cytochrome c oxidase 1 data in isolation-with-migration models to estimate gene flow parameter values and explore the historical demographic context of our positive selection results. Our findings suggest that positive selection on bindin divergence among P. ochraceus alleles can be accounted for in part by relatively recent northward population expansions that may be coupled with the potential homogenizing effects of concerted evolution.
Collapse
Affiliation(s)
- Iva Popovic
- Department of Biological Sciences, Simon Fraser UniversityBurnaby, British Columbia, Canada
| | - Peter B Marko
- Department of Biology, University of Hawai'iMānoa, Hawaii
| | - John P Wares
- Department of Genetics, University of GeorgiaAthens, Georgia
| | - Michael W Hart
- Department of Biological Sciences, Simon Fraser UniversityBurnaby, British Columbia, Canada
| |
Collapse
|
25
|
Herrig DK, Modrick AJ, Brud E, Llopart A. Introgression in the Drosophila subobscura--D. Madeirensis sister species: evidence of gene flow in nuclear genes despite mitochondrial differentiation. Evolution 2013; 68:705-19. [PMID: 24152112 PMCID: PMC4255303 DOI: 10.1111/evo.12295] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2013] [Accepted: 10/15/2013] [Indexed: 12/19/2022]
Abstract
Species hybridization, and thus the potential for gene flow, was once viewed as reproductive mistake. However, recent analysis based on large datasets and newly developed models suggest that gene exchange is not as rare as originally suspected. To investigate the history and speciation of the closely related species Drosophila subobscura, D. madeirensis, and D. guanche, we obtained polymorphism and divergence data for 26 regions throughout the genome, including the Y chromosome and mitochondrial DNA. We found that the D. subobscura X/autosome ratio of silent nucleotide diversity is significantly smaller than the 0.75 expected under neutrality. This pattern, if held genomewide, may reflect a faster accumulation of beneficial mutations on the X chromosome than on autosomes. We also detected evidence of gene flow in autosomal regions, while sex chromosomes remain distinct. This is consistent with the large X effect on hybrid male sterility seen in this system and the presence of two X chromosome inversions fixed between species. Overall, our data conform to chromosomal speciation models in which rearrangements are proposed to serve as gene flow barriers. Contrary to other observations in Drosophila, the mitochondrial genome appears resilient to gene flow in the presence of nuclear exchange.
Collapse
Affiliation(s)
- Danielle K Herrig
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA, 52242
| | | | | | | |
Collapse
|
26
|
Padhukasahasram B, Rannala B. Meiotic gene-conversion rate and tract length variation in the human genome. Eur J Hum Genet 2013:ejhg201330. [PMID: 23443031 DOI: 10.1038/ejhg.2013.30] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Revised: 12/17/2012] [Accepted: 01/10/2013] [Indexed: 01/11/2023] Open
Abstract
Meiotic recombination occurs in the form of two different mechanisms called crossing-over and gene-conversion and both processes have an important role in shaping genetic variation in populations. Although variation in crossing-over rates has been studied extensively using sperm-typing experiments, pedigree studies and population genetic approaches, our knowledge of variation in gene-conversion parameters (ie, rates and mean tract lengths) remains far from complete. To explore variability in population gene-conversion rates and its relationship to crossing-over rate variation patterns, we have developed and validated using coalescent simulations a comprehensive Bayesian full-likelihood method that can jointly infer crossing-over and gene-conversion rates as well as tract lengths from population genomic data under general variable rate models with recombination hotspots. Here, we apply this new method to SNP data from multiple human populations and attempt to characterize for the first time the fine-scale variation in gene-conversion parameters along the human genome. We find that the estimated ratio of gene-conversion to crossing-over rates varies considerably across genomic regions as well as between populations. However, there is a great degree of uncertainty associated with such estimates. We also find substantial evidence for variation in the mean conversion tract length. The estimated tract lengths did not show any negative relationship with the local heterozygosity levels in our analysis.European Journal of Human Genetics advance online publication, 27 February 2013; doi:10.1038/ejhg.2013.30.
Collapse
Affiliation(s)
- Badri Padhukasahasram
- 1] Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, MI, USA [2] Genome Center and Department of Evolution and Ecology, University of California, Davis, Davis, CA, USA
| | - Bruce Rannala
- Genome Center and Department of Evolution and Ecology, University of California, Davis, Davis, CA, USA
| |
Collapse
|
27
|
Comeron JM, Ratnappan R, Bailin S. The many landscapes of recombination in Drosophila melanogaster. PLoS Genet 2012; 8:e1002905. [PMID: 23071443 PMCID: PMC3469467 DOI: 10.1371/journal.pgen.1002905] [Citation(s) in RCA: 334] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2012] [Accepted: 07/02/2012] [Indexed: 01/06/2023] Open
Abstract
Recombination is a fundamental biological process with profound evolutionary implications. Theory predicts that recombination increases the effectiveness of selection in natural populations. Yet, direct tests of this prediction have been restricted to qualitative trends due to the lack of detailed characterization of recombination rate variation across genomes and within species. The use of imprecise recombination rates can also skew population genetic analyses designed to assess the presence and mode of selection across genomes. Here we report the first integrated high-resolution description of genomic and population variation in recombination, which also distinguishes between the two outcomes of meiotic recombination: crossing over (CO) and gene conversion (GC). We characterized the products of 5,860 female meioses in Drosophila melanogaster by genotyping a total of 139 million informative SNPs and mapped 106,964 recombination events at a resolution down to 2 kilobases. This approach allowed us to generate whole-genome CO and GC maps as well as a detailed description of variation in recombination among individuals of this species. We describe many levels of variation in recombination rates. At a large-scale (100 kb), CO rates exhibit extreme and highly punctuated variation along chromosomes, with hot and coldspots. We also show extensive intra-specific variation in CO landscapes that is associated with hotspots at low frequency in our sample. GC rates are more uniformly distributed across the genome than CO rates and detectable in regions with reduced or absent CO. At a local scale, recombination events are associated with numerous sequence motifs and tend to occur within transcript regions, thus suggesting that chromatin accessibility favors double-strand breaks. All these non-independent layers of variation in recombination across genomes and among individuals need to be taken into account in order to obtain relevant estimates of recombination rates, and should be included in a new generation of population genetic models of the interaction between selection and linkage.
Collapse
Affiliation(s)
- Josep M Comeron
- Department of Biology, University of Iowa, Iowa City, Iowa, USA.
| | | | | |
Collapse
|
28
|
Sun Y, Ambrose JH, Haughey BS, Webster TD, Pierrie SN, Muñoz DF, Wellman EC, Cherian S, Lewis SM, Berchowitz LE, Copenhaver GP. Deep genome-wide measurement of meiotic gene conversion using tetrad analysis in Arabidopsis thaliana. PLoS Genet 2012; 8:e1002968. [PMID: 23055940 PMCID: PMC3464199 DOI: 10.1371/journal.pgen.1002968] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 08/08/2012] [Indexed: 11/18/2022] Open
Abstract
Gene conversion, the non-reciprocal exchange of genetic information, is one of the potential products of meiotic recombination. It can shape genome structure by acting on repetitive DNA elements, influence allele frequencies at the population level, and is known to be implicated in human disease. But gene conversion is hard to detect directly except in organisms, like fungi, that group their gametes following meiosis. We have developed a novel visual assay that enables us to detect gene conversion events directly in the gametes of the flowering plant Arabidopsis thaliana. Using this assay we measured gene conversion events across the genome of more than one million meioses and determined that the genome-wide average frequency is 3.5×10(-4) conversions per locus per meiosis. We also detected significant locus-to-locus variation in conversion frequency but no intra-locus variation. Significantly, we found one locus on the short arm of chromosome 4 that experienced 3-fold to 6-fold more gene conversions than the other loci tested. Finally, we demonstrated that we could modulate conversion frequency by varying experimental conditions.
Collapse
Affiliation(s)
- Yujin Sun
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jonathan H. Ambrose
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Brena S. Haughey
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Tyler D. Webster
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Sarah N. Pierrie
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Daniela F. Muñoz
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Emily C. Wellman
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Shalom Cherian
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Scott M. Lewis
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Luke E. Berchowitz
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Gregory P. Copenhaver
- Department of Biology and the Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, The University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
29
|
Wang J, Fan HC, Behr B, Quake SR. Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 2012; 150:402-12. [PMID: 22817899 DOI: 10.1016/j.cell.2012.06.030] [Citation(s) in RCA: 364] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Revised: 05/31/2012] [Accepted: 06/13/2012] [Indexed: 02/01/2023]
Abstract
Meiotic recombination and de novo mutation are the two main contributions toward gamete genome diversity, and many questions remain about how an individual human's genome is edited by these two processes. Here, we describe a high-throughput method for single-cell whole-genome analysis that was used to measure the genomic diversity in one individual's gamete genomes. A microfluidic system was used for highly parallel sample processing and to minimize nonspecific amplification. High-density genotyping results from 91 single cells were used to create a personal recombination map, which was consistent with population-wide data at low resolution but revealed significant differences from pedigree data at higher resolution. We used the data to test for meiotic drive and found evidence for gene conversion. High-throughput sequencing on 31 single cells was used to measure the frequency of large-scale genome instability, and deeper sequencing of eight single cells revealed de novo mutation rates with distinct characteristics.
Collapse
Affiliation(s)
- Jianbin Wang
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | | | | | |
Collapse
|
30
|
Steinrücken M, Paul JS, Song YS. A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol 2012; 87:51-61. [PMID: 23010245 DOI: 10.1016/j.tpb.2012.08.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 08/20/2012] [Accepted: 08/28/2012] [Indexed: 10/27/2022]
Abstract
Conditional sampling distributions (CSDs), sometimes referred to as copying models, underlie numerous practical tools in population genomic analyses. Though an important application that has received much attention is the inference of population structure, the explicit exchange of migrants at specified rates has not hitherto been incorporated into the CSD in a principled framework. Recently, in the case of a single panmictic population, a sequentially Markov CSD has been developed as an accurate, efficient approximation to a principled CSD derived from the diffusion process dual to the coalescent with recombination. In this paper, the sequentially Markov CSD framework is extended to incorporate subdivided population structure, thus providing an efficiently computable CSD that admits a genealogical interpretation related to the structured coalescent with migration and recombination. As a concrete application, it is demonstrated empirically that the CSD developed here can be employed to yield accurate estimation of a wide range of migration rates.
Collapse
|
31
|
Paul JS, Song YS. Blockwise HMM computation for large-scale population genomic inference. ACTA ACUST UNITED AC 2012; 28:2008-15. [PMID: 22641715 DOI: 10.1093/bioinformatics/bts314] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets. RESULTS To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible. AVAILABILITY Software available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT yss@eecs.berkeley.edu.
Collapse
Affiliation(s)
- Joshua S Paul
- Computer Science Division and Department of Statistics, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
32
|
Pervasive recombination and sympatric genome diversification driven by frequency-dependent selection in Borrelia burgdorferi, the Lyme disease bacterium. Genetics 2011; 189:951-66. [PMID: 21890743 DOI: 10.1534/genetics.111.130773] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
How genomic diversity within bacterial populations originates and is maintained in the presence of frequent recombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterial agent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understand mechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing major genomic groups in North America and Europe. Linkage analysis of >13,500 single-nucleotide polymorphisms revealed pervasive horizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affects genome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population fitness, recombination constrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations of frequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groups associated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targeting a small number of surface-antigen loci (ospC in particular) sufficiently explains the maintenance of sympatric genome diversity in B. burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomic groups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed as constituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence.
Collapse
|
33
|
Abstract
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm.
Collapse
|
34
|
Katzman S, Capra JA, Haussler D, Pollard KS. Ongoing GC-biased evolution is widespread in the human genome and enriched near recombination hot spots. Genome Biol Evol 2011; 3:614-26. [PMID: 21697099 PMCID: PMC3157837 DOI: 10.1093/gbe/evr058] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Fast evolving regions of many metazoan genomes show a bias toward substitutions that change weak (A,T) into strong (G,C) base pairs. Single-nucleotide polymorphisms (SNPs) do not share this pattern, suggesting that it results from biased fixation rather than biased mutation. Supporting this hypothesis, analyses of polymorphism in specific regions of the human genome have identified a positive correlation between weak to strong (W→S) SNPs and derived allele frequency (DAF), suggesting that SNPs become increasingly GC biased over time, especially in regions of high recombination. Using polymorphism data generated by the 1000 Genomes Project from 179 individuals from 4 human populations, we evaluated the extent and distribution of ongoing GC-biased evolution in the human genome. We quantified GC fixation bias by comparing the DAFs of W→S mutations and S→W mutations using a Mann-Whitney U test. Genome-wide, W→S SNPs have significantly higher DAFs than S→W SNPs. This pattern is widespread across the human genome but varies in magnitude along the chromosomes. We found extreme GC-biased evolution in neighborhoods of recombination hot spots, a significant correlation between GC bias and recombination rate, and an inverse correlation between GC bias and chromosome arm length. These findings demonstrate the presence of ongoing fixation bias favoring G and C alleles throughout the human genome and suggest that the bias is caused by a recombination-associated process, such as GC-biased gene conversion.
Collapse
Affiliation(s)
- Sol Katzman
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, USA
| | | | | | | |
Collapse
|
35
|
Jacquemin J, Chaparro C, Laudié M, Berger A, Gavory F, Goicoechea JL, Wing RA, Cooke R. Long-range and targeted ectopic recombination between the two homeologous chromosomes 11 and 12 in Oryza species. Mol Biol Evol 2011; 28:3139-50. [PMID: 21616911 DOI: 10.1093/molbev/msr144] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Whole genome duplication (WGD) and subsequent evolution of gene pairs have been shown to have shaped the present day genomes of most, if not all, plants and to have played an essential role in the evolution of many eukaryotic genomes. Analysis of the rice (Oryza sativa ssp. japonica) genome sequence suggested an ancestral WGD ∼50-70 Ma common to all cereals and a segmental duplication between chromosomes 11 and 12 as recently as 5 Ma. More recent studies based on coding sequences have demonstrated that gene conversion is responsible for the high sequence conservation which suggested such a recent duplication. We previously showed that gene conversion has been a recurrent process throughout the Oryza genus and in closely related species and that orthologous duplicated regions are also highly conserved in other cereal genomes. We have extended these studies to compare megabase regions of genomic (coding and noncoding) sequences between two cultivated (O. sativa, Oryza glaberrima) and one wild (Oryza brachyantha) rice species using a novel approach of topological incongruency. The high levels of intraspecies conservation of both gene and nongene sequences, particularly in O. brachyantha, indicate long-range conversion events less than 4 Ma in all three species. These observations demonstrate megabase-scale conversion initiated within a highly rearranged region located at ∼2.1 Mb from the chromosome termini and emphasize the importance of gene conversion in cereal genome evolution.
Collapse
Affiliation(s)
- J Jacquemin
- Laboratoire Génome et Développement des Plantes, Unité Mixte de Recherche Centre National de la Recherche Scientifique/Institut de Recherche pour le Développement/Université de Perpignan Via Domitia, Université de Perpignan, Perpignan-Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Paul JS, Steinrücken M, Song YS. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 2011; 187:1115-28. [PMID: 21270390 PMCID: PMC3070520 DOI: 10.1534/genetics.110.125534] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 01/21/2011] [Indexed: 02/07/2023] Open
Abstract
The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence data, estimating recombination rates, and inferring human colonization history.
Collapse
Affiliation(s)
- Joshua S. Paul
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720
| | - Matthias Steinrücken
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720
| | - Yun S. Song
- Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720
| |
Collapse
|
37
|
Clark AG, Wang X, Matise T. Contrasting methods of quantifying fine structure of human recombination. Annu Rev Genomics Hum Genet 2010; 11:45-64. [PMID: 20690817 DOI: 10.1146/annurev-genom-082908-150031] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
There has been considerable excitement over the ability to construct linkage maps based only on genome-wide genotype data for single nucleotide polymorphic sites (SNPs) in a population sample. These maps, which are derived from estimates of linkage disequilibrium (LD), rely on population genetics theory to relate the decay of LD to the local rate of recombination, but other population processes also come into play. Here we contrast these LD maps to the classically derived, pedigree-based human recombination maps. The LD maps have a level of resolution greatly exceeding that of the pedigree maps, and at this fine scale, sperm typing allows a means of validation. While at a gross level both the pedigree maps and the sperm typing methods generally agree with LD maps, there are significant local differences between them, and the fact that these maps measure different genetic features should be remembered when using them for other genetic inferences.
Collapse
Affiliation(s)
- Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| | | | | |
Collapse
|
38
|
A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination. Genetics 2010; 186:321-38. [PMID: 20592264 DOI: 10.1534/genetics.110.117986] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimating recombination rates, inferring local ancestry in admixed populations, and importance sampling of coalescent genealogies. Unfortunately, the true CSD under the coalescent with recombination is not known, so approximations, formulated as hidden Markov models, have been proposed in the past. These approximations have led to a number of useful statistical tools, but it is important to recognize that they were not derived from, though were certainly motivated by, principles underlying the coalescent process. The goal of this article is to develop a principled approach to derive improved CSDs directly from the underlying population genetics model. Our approach is based on the diffusion process approximation and the resulting mathematical expressions admit intuitive genealogical interpretations, which we utilize to introduce further approximations and make our method scalable in the number of loci. The general algorithm presented here applies to an arbitrary number of loci and an arbitrary finite-alleles recurrent mutation model. Empirical results are provided to demonstrate that our new CSDs are in general substantially more accurate than previously proposed approximations.
Collapse
|
39
|
Arguello JR, Zhang Y, Kado T, Fan C, Zhao R, Innan H, Wang W, Long M. Recombination yet inefficient selection along the Drosophila melanogaster subgroup's fourth chromosome. Mol Biol Evol 2010; 27:848-61. [PMID: 20008457 PMCID: PMC2877538 DOI: 10.1093/molbev/msp291] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A central goal of evolutionary genetics is an understanding of the forces responsible for the observed variation, both within and between species. Theoretical and empirical work have demonstrated that genetic recombination contributes to this variation by breaking down linkage between nucleotide sites, thus allowing them to behave independently and for selective forces to act efficiently on them. The Drosophila fourth chromosome, which is believed to experience no-or very low-rates of recombination has been an important model for investigating these effects. Despite previous efforts, central questions regarding the extent of recombination and the predominant modes of selection acting on it remain open. In order to more comprehensively test hypotheses regarding recombination and its potential influence on selection along the fourth chromosome, we have resequenced regions from most of its genes from Drosophila melanogaster, D. simulans, and D. yakuba. These data, along with available outgroup sequence, demonstrate that recombination is low but significantly greater than zero for the three species. Despite there being recombination, there is strong evidence that its frequency is low enough to have rendered selection relatively inefficient. The signatures of relaxed constraint can be detected at both the level of polymorphism and divergence.
Collapse
Affiliation(s)
- J. Roman Arguello
- Committee on Evolutionary Biology, University of Chicago
- Department of Ecology and Evolution, University of Chicago
| | - Yue Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Tomoyuki Kado
- Hayama Center for Advanced Studies, The Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
| | - Chuanzhu Fan
- Department of Ecology and Evolution, University of Chicago
| | - Ruoping Zhao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Hideki Innan
- Hayama Center for Advanced Studies, The Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
| | - Wen Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Manyuan Long
- Committee on Evolutionary Biology, University of Chicago
- Department of Ecology and Evolution, University of Chicago
| |
Collapse
|
40
|
Abstract
Interlocus gene conversion can homogenize DNA sequences of duplicated regions with high homology. Such nonvertical events sometimes cause a misleading evolutionary interpretation of data when the effect of gene conversion is ignored. To avoid this problem, it is crucial to test the data for the presence of gene conversion. Here, we performed extensive simulations to compare four major methods to detect gene conversion. One might expect that the power increases with increase of the gene conversion rate. However, we found this is true for only two methods. For the other two, limited power is expected when gene conversion is too frequent. We suggest using multiple methods to minimize the chance of missing the footprint of gene conversion.
Collapse
|
41
|
Yin J, Jordan MI, Song YS. Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data. Bioinformatics 2009; 25:i231-9. [PMID: 19477993 PMCID: PMC2687983 DOI: 10.1093/bioinformatics/btp229] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: Two known types of meiotic recombination are crossovers and gene conversions. Although they leave behind different footprints in the genome, it is a challenging task to tease apart their relative contributions to the observed genetic variation. In particular, for a given population SNP dataset, the joint estimation of the crossover rate, the gene conversion rate and the mean conversion tract length is widely viewed as a very difficult problem. Results: In this article, we devise a likelihood-based method using an interleaved hidden Markov model (HMM) that can jointly estimate the aforementioned three parameters fundamental to recombination. Our method significantly improves upon a recently proposed method based on a factorial HMM. We show that modeling overlapping gene conversions is crucial for improving the joint estimation of the gene conversion rate and the mean conversion tract length. We test the performance of our method on simulated data. We then apply our method to analyze real biological data from the telomere of the X chromosome of Drosophila melanogaster, and show that the ratio of the gene conversion rate to the crossover rate for the region may not be nearly as high as previously claimed. Availability: A software implementation of the algorithms discussed in this article is available at http://www.cs.berkeley.edu/∼yss/software.html. Contact:yss@eecs.berkeley.edu
Collapse
Affiliation(s)
- Junming Yin
- Computer Science Division and Department of Statistics, University of California, Berkeley, CA, USA
| | | | | |
Collapse
|
42
|
Characterization of equine and other vertebrate TLR3, TLR7, and TLR8 genes. Immunogenetics 2009; 61:529-39. [DOI: 10.1007/s00251-009-0381-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2009] [Accepted: 06/08/2009] [Indexed: 01/15/2023]
|
43
|
Davison D, Pritchard JK, Coop G. An approximate likelihood for genetic data under a model with recombination and population splitting. Theor Popul Biol 2009; 75:331-45. [PMID: 19362099 PMCID: PMC3108256 DOI: 10.1016/j.tpb.2009.04.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Revised: 03/26/2009] [Accepted: 04/02/2009] [Indexed: 10/20/2022]
Abstract
We describe a new approximate likelihood for population genetic data under a model in which a single ancestral population has split into two daughter populations. The approximate likelihood is based on the 'Product of Approximate Conditionals' likelihood and 'copying model' of Li and Stephens [Li, N., Stephens, M., 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165 (4), 2213-2233]. The approach developed here may be used for efficient approximate likelihood-based analyses of unlinked data. However our copying model also considers the effects of recombination. Hence, a more important application is to loosely-linked haplotype data, for which efficient statistical models explicitly featuring non-equilibrium population structure have so far been unavailable. Thus, in addition to the information in allele frequency differences about the timing of the population split, the method can also extract information from the lengths of haplotypes shared between the populations. There are a number of challenges posed by extracting such information, which makes parameter estimation difficult. We discuss how the approach could be extended to identify haplotypes introduced by migrants.
Collapse
Affiliation(s)
- D Davison
- Committee on Evolutionary Biology, University of Chicago, USA.
| | | | | |
Collapse
|
44
|
Haddrill PR, Waldron FM, Charlesworth B. Elevated levels of expression associated with regions of the Drosophila genome that lack crossing over. Biol Lett 2009; 4:758-61. [PMID: 18782733 DOI: 10.1098/rsbl.2008.0376] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The recombinational environment influences patterns of molecular evolution through the effects of Hill-Robertson interference. Here, we examine genome-wide patterns of gene expression with respect to recombinational environment in Drosophila melanogaster. We find that regions of the genome lacking crossing over exhibit elevated levels of expression, and this is most pronounced for genes on the entirely non-crossing over fourth chromosome. We find no evidence for differences in the patterns of gene expression between regions of high, intermediate and low crossover frequencies. These results suggest that, in the absence of crossing over, selection to maintain control of expression may be compromised, perhaps due to the accumulation of deleterious mutations in regulatory regions. Alternatively, higher gene expression may be evolving to compensate for defective protein products or reduced translational efficiency.
Collapse
Affiliation(s)
- Penelope R Haddrill
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Ashworth Laboratories, King's Buildings, Edinburgh EH9 3JT, UK.
| | | | | |
Collapse
|
45
|
Haddrill PR, Charlesworth B. Non-neutral processes drive the nucleotide composition of non-coding sequences in Drosophila. Biol Lett 2008; 4:438-41. [PMID: 18505714 PMCID: PMC2515589 DOI: 10.1098/rsbl.2008.0174] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The nature of the forces affecting base composition is a key question in genome evolution. There is uncertainty as to whether differences in the GC contents of non-coding sequences reflect differences in mutational bias, or in the intensity of selection or biased gene conversion. We have used a polymorphism dataset for non-coding sequences on the X chromosome of Drosophila simulans to examine this question. The proportion of GC→AT versus AT→GC polymorphic mutations in a locus is correlated with its GC content. This implies the action of forces that favour GC over AT base pairs, which are apparently strongest in GC-rich sequences.
Collapse
Affiliation(s)
- Penelope R Haddrill
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Ashworth Laboratories, King's Buildings, Edinburgh, UK.
| | | |
Collapse
|
46
|
Abstract
In a 2007 article, McVean studied the effect of recombination on linkage disequilibrium (LD) between two neutral loci located near a third locus that has undergone a selective sweep. The results demonstrated that two loci on the same side of a selected locus might show substantial LD, whereas the expected LD for two loci on opposite sides of a selected locus is zero. In this article, we extend McVean's model to include gene conversion. We show that one of the conclusions is strongly affected by gene conversion: when gene conversion is present, there may be substantial LD between two loci on opposite sides of a selective sweep.
Collapse
|
47
|
Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet 2008; 9:477-85. [PMID: 18427557 PMCID: PMC5124487 DOI: 10.1038/nrg2361] [Citation(s) in RCA: 789] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Linkage disequilibrium--the nonrandom association of alleles at different loci--is a sensitive indicator of the population genetic forces that structure a genome. Because of the explosive growth of methods for assessing genetic variation at a fine scale, evolutionary biologists and human geneticists are increasingly exploiting linkage disequilibrium in order to understand past evolutionary and demographic events, to map genes that are associated with quantitative characters and inherited diseases, and to understand the joint evolution of linked sets of genes. This article introduces linkage disequilibrium, reviews the population genetic processes that affect it and describes some of its uses. At present, linkage disequilibrium is used much more extensively in the study of humans than in non-humans, but that is changing as technological advances make extensive genomic studies feasible in other species.
Collapse
Affiliation(s)
- Montgomery Slatkin
- Department of Integrative Biology, University of California, Berkeley, California 94720-3140, USA.
| |
Collapse
|
48
|
Bullaughey K, Przeworski M, Coop G. No effect of recombination on the efficacy of natural selection in primates. Genome Res 2008; 18:544-54. [PMID: 18199888 DOI: 10.1101/gr.071548.107] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Population genetic theory suggests that natural selection should be less effective in regions of low recombination, potentially leading to differences in rates of adaptation among recombination environments. To date, this prediction has mainly been tested in Drosophila, with somewhat conflicting results. We investigated the association between human recombination rates and adaptation in primates, by considering rates of protein evolution (measured by d(N)/d(S)) between human, chimpanzee, and rhesus macaque. We found no correlation between either broad- or fine-scale rates of recombination and rates of protein evolution, once GC content is taken into account. Moreover, genes in regions of very low recombination, which are expected to show the most pronounced reduction in the efficacy of selection, do not evolve at a different rate than other genes. Thus, there is no evidence for differences in the efficacy of selection across recombinational environments. An interesting implication is that indirect selection for recombination modifiers has probably been a weak force in primate evolution.
Collapse
Affiliation(s)
- Kevin Bullaughey
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
49
|
Khil PP, Camerini-Otero RD. Variation in patterns of human meiotic recombination. GENOME DYNAMICS 2008; 5:117-127. [PMID: 18948711 DOI: 10.1159/000166623] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In the last 30 years it has become evident that patterns of meiotic recombination can be highly variable among individuals. The evidence comes from both low and high resolution analyses of hotspots of recombination in human and other species. In addition, a comparison of the recombination profiles in closely related species such as human and chimpanzee reveals essentially no correlation in the position of hotspots. Although the variation in hotspots of meiotic recombination is clearly documented, the mechanisms responsible for such variation are far from being understood. Here we will review the available evidence of natural variation in meiotic recombination and will discuss potential implications of this variation on the functional mechanisms of crossover formation and control.
Collapse
Affiliation(s)
- P P Khil
- Genetics and Biochemistry Branch, NIDDK, National Institutes of Health, Bethesda, Md., USA
| | | |
Collapse
|