1
|
Hara Y, Kuraku S. The impact of local genomic properties on the evolutionary fate of genes. eLife 2023; 12:82290. [PMID: 37223962 DOI: 10.7554/elife.82290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 04/25/2023] [Indexed: 05/25/2023] Open
Abstract
Functionally indispensable genes are likely to be retained and otherwise to be lost during evolution. This evolutionary fate of a gene can also be affected by factors independent of gene dispensability, including the mutability of genomic positions, but such features have not been examined well. To uncover the genomic features associated with gene loss, we investigated the characteristics of genomic regions where genes have been independently lost in multiple lineages. With a comprehensive scan of gene phylogenies of vertebrates with a careful inspection of evolutionary gene losses, we identified 813 human genes whose orthologs were lost in multiple mammalian lineages: designated 'elusive genes.' These elusive genes were located in genomic regions with rapid nucleotide substitution, high GC content, and high gene density. A comparison of the orthologous regions of such elusive genes across vertebrates revealed that these features had been established before the radiation of the extant vertebrates approximately 500 million years ago. The association of human elusive genes with transcriptomic and epigenomic characteristics illuminated that the genomic regions containing such genes were subject to repressive transcriptional regulation. Thus, the heterogeneous genomic features driving gene fates toward loss have been in place and may sometimes have relaxed the functional indispensability of such genes. This study sheds light on the complex interplay between gene function and local genomic properties in shaping gene evolution that has persisted since the vertebrate ancestor.
Collapse
Affiliation(s)
- Yuichiro Hara
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Shigehiro Kuraku
- Molecular Life History Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan
- Department of Genetics, Sokendai (Graduate University for Advanced Studies), Mishima, Japan
- RIKEN Center for Biosystems Dynamics Research, Kobe, Japan
| |
Collapse
|
2
|
Foley NM, Mason VC, Harris AJ, Bredemeyer KR, Damas J, Lewin HA, Eizirik E, Gatesy J, Karlsson EK, Lindblad-Toh K, Springer MS, Murphy WJ, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. A genomic timescale for placental mammal evolution. Science 2023; 380:eabl8189. [PMID: 37104581 DOI: 10.1126/science.abl8189] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
The precise pattern and timing of speciation events that gave rise to all living placental mammals remain controversial. We provide a comprehensive phylogenetic analysis of genetic variation across an alignment of 241 placental mammal genome assemblies, addressing prior concerns regarding limited genomic sampling across species. We compared neutral genome-wide phylogenomic signals using concatenation and coalescent-based approaches, interrogated phylogenetic variation across chromosomes, and analyzed extensive catalogs of structural variants. Interordinal relationships exhibit relatively low rates of phylogenomic conflict across diverse datasets and analytical methods. Conversely, X-chromosome versus autosome conflicts characterize multiple independent clades that radiated during the Cenozoic. Genomic time trees reveal an accumulation of cladogenic events before and immediately after the Cretaceous-Paleogene (K-Pg) boundary, implying important roles for Cretaceous continental vicariance and the K-Pg extinction in the placental radiation.
Collapse
Affiliation(s)
- Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Victor C Mason
- Institute of Cell Biology, University of Bern, Bern, Switzerland
| | - Andrew J Harris
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | - Joana Damas
- The Genome Center, University of California, Davis, CA, USA
| | - Harris A Lewin
- The Genome Center, University of California, Davis, CA, USA
- Department of Evolution and Ecology, University of California, Davis, CA, USA
| | - Eduardo Eizirik
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Elinor K Karlsson
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, University of Massachussetts Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, USA
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Gergelits V, Parvanov E, Simecek P, Forejt J. Chromosome-wide characterization of meiotic noncrossovers (gene conversions) in mouse hybrids. Genetics 2021; 217:1-14. [PMID: 33683354 DOI: 10.1093/genetics/iyaa013] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 11/13/2020] [Indexed: 01/16/2023] Open
Abstract
During meiosis, the recombination-initiating DNA double-strand breaks (DSBs) are repaired by crossovers or noncrossovers (gene conversions). While crossovers are easily detectable, noncrossover identification is hampered by the small size of their converted tracts and the necessity of sequence polymorphism. We report identification and characterization of a mouse chromosome-wide set of noncrossovers by next-generation sequencing of 10 mouse intersubspecific chromosome substitution strains. Based on 94 identified noncrossovers, we determined the mean length of a conversion tract to be 32 bp. The spatial chromosome-wide distribution of noncrossovers and crossovers significantly differed, although both sets overlapped the known hotspots of PRDM9-directed histone methylation and DNA DSBs, thus supporting their origin in the standard DSB repair pathway. A significant deficit of noncrossovers descending from asymmetric DSBs proved their proposed adverse effect on meiotic recombination and pointed to sister chromatids as an alternative template for their repair. The finding has implications for the molecular mechanism of hybrid sterility in mice from crosses between closely related Mus musculus musculus and Mus musculus domesticus subspecies.
Collapse
Affiliation(s)
- Vaclav Gergelits
- Laboratory of Mouse Molecular Genetics, Division BIOCEV, Institute of Molecular Genetics, Czech Academy of Sciences, CZ-25250 Vestec, Czech Republic.,Department of Cell Biology, Faculty of Science, Charles University, CZ-12000 Prague, Czech Republic
| | - Emil Parvanov
- Laboratory of Mouse Molecular Genetics, Division BIOCEV, Institute of Molecular Genetics, Czech Academy of Sciences, CZ-25250 Vestec, Czech Republic
| | - Petr Simecek
- Laboratory of Mouse Molecular Genetics, Division BIOCEV, Institute of Molecular Genetics, Czech Academy of Sciences, CZ-25250 Vestec, Czech Republic
| | - Jiri Forejt
- Laboratory of Mouse Molecular Genetics, Division BIOCEV, Institute of Molecular Genetics, Czech Academy of Sciences, CZ-25250 Vestec, Czech Republic
| |
Collapse
|
4
|
Abstract
Recombination increases the local GC-content in genomic regions through GC-biased gene conversion (gBGC). The recent discovery of a large genomic region with extreme GC-content in the fat sand rat Psammomys obesus provides a model to study the effects of gBGC on chromosome evolution. Here, we compare the GC-content and GC-to-AT substitution patterns across protein-coding genes of four gerbil species and two murine rodents (mouse and rat). We find that the known high-GC region is present in all the gerbils, and is characterized by high substitution rates for all mutational categories (AT-to-GC, GC-to-AT, and GC-conservative) both at synonymous and nonsynonymous sites. A higher AT-to-GC than GC-to-AT rate is consistent with the high GC-content. Additionally, we find more than 300 genes outside the known region with outlying values of AT-to-GC synonymous substitution rates in gerbils. Of these, over 30% are organized into at least 17 large clusters observable at the megabase-scale. The unusual GC-skewed substitution pattern suggests the evolution of genomic regions with very high recombination rates in the gerbil lineage, which can lead to a runaway increase in GC-content. Our results imply that rapid evolution of GC-content is possible in mammals, with gerbil species providing a powerful model to study the mechanisms of gBGC.
Collapse
Affiliation(s)
- Rodrigo Pracana
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | | - John F Mulley
- School of Natural Sciences, Bangor University, Bangor, Gwynedd, United Kingdom
| | | |
Collapse
|
5
|
Pichon F, Shen Y, Busato F, P Jochems S, Jacquelin B, Grand RL, Deleuze JF, Müller-Trutwin M, Tost J. Analysis and annotation of DNA methylation in two nonhuman primate species using the Infinium Human Methylation 450K and EPIC BeadChips. Epigenomics 2021; 13:169-186. [PMID: 33471557 DOI: 10.2217/epi-2020-0200] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Aim: Nonhuman primates are essential for research on many human diseases. The Infinium Human Methylation450/EPIC BeadChips are popular tools for the study of the methylation state across the human genome at affordable cost. Methods: We performed a precise evaluation and re-annotation of the BeadChip probes for the analysis of genome-wide DNA methylation patterns in rhesus macaques and African green monkeys through in silico analyses combined with functional validation by pyrosequencing. Results: Up to 165,847 of the 450K and 261,545 probes of the EPIC BeadChip can be reliably used. The annotation files are provided in a format compatible with a variety of standard bioinformatic pipelines. Conclusion: Our study will facilitate high-throughput DNA methylation analyses in Macaca mulatta and Chlorocebus sabaeus.
Collapse
Affiliation(s)
- Fabien Pichon
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France
| | - Yimin Shen
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France.,Laboratory for Bioinformatics, Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 75010 Paris, France
| | - Florence Busato
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France
| | - Simon P Jochems
- Institut Pasteur, HIV Inflammation & Persistence Unit, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Paris, France.,Leiden University Medical Center, Leiden, The Netherlands
| | | | - Roger Le Grand
- Université Paris-Saclay, Inserm, CEA, Center for Immunology of Viral, Auto-immune, Hematological and Bacterial diseases (IMVA-HB/IDMIT), Fontenay-aux-Roses, France
| | - Jean-Francois Deleuze
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France.,Laboratory for Bioinformatics, Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 75010 Paris, France
| | | | - Jörg Tost
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France
| |
Collapse
|
6
|
Woerner AE, Veeramah KR, Watkins JC, Hammer MF. The Role of Phylogenetically Conserved Elements in Shaping Patterns of Human Genomic Diversity. Mol Biol Evol 2020; 35:2284-2295. [PMID: 30113695 DOI: 10.1093/molbev/msy145] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Evolutionary genetic studies have shown a positive correlation between levels of nucleotide diversity and either rates of recombination or genetic distance to genes. Both positive-directional and purifying selection have been offered as the source of these correlations via genetic hitchhiking and background selection, respectively. Phylogenetically conserved elements (CEs) are short (∼100 bp), widely distributed (comprising ∼5% of genome), sequences that are often found far from genes. While the function of many CEs is unknown, CEs also are associated with reduced diversity at linked sites. Using high coverage (>80×) whole genome data from two human populations, the Yoruba and the CEU, we perform fine scale evaluations of diversity, rates of recombination, and linkage to genes. We find that the local rate of recombination has a stronger effect on levels of diversity than linkage to genes, and that these effects of recombination persist even in regions far from genes. Our whole genome modeling demonstrates that, rather than recombination or GC-biased gene conversion, selection on sites within or linked to CEs better explains the observed genomic diversity patterns. A major implication is that very few sites in the human genome are predicted to be free of the effects of selection. These sites, which we refer to as the human "neutralome," comprise only 1.2% of the autosomes and 5.1% of the X chromosome. Demographic analysis of the neutralome reveals larger population sizes and lower rates of growth for ancestral human populations than inferred by previous analyses.
Collapse
Affiliation(s)
- August E Woerner
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ.,Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY
| | | | - Michael F Hammer
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ
| |
Collapse
|
7
|
Borges R, Szöllősi GJ, Kosiol C. Quantifying GC-Biased Gene Conversion in Great Ape Genomes Using Polymorphism-Aware Models. Genetics 2019; 212:1321-1336. [PMID: 31147380 PMCID: PMC6707462 DOI: 10.1534/genetics.119.302074] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 05/20/2019] [Indexed: 11/18/2022] Open
Abstract
As multi-individual population-scale data become available, more complex modeling strategies are needed to quantify genome-wide patterns of nucleotide usage and associated mechanisms of evolution. Recently, the multivariate neutral Moran model was proposed. However, it was shown insufficient to explain the distribution of alleles in great apes. Here, we propose a new model that includes allelic selection. Our theoretical results constitute the basis of a new Bayesian framework to estimate mutation rates and selection coefficients from population data. We apply the new framework to a great ape dataset, where we found patterns of allelic selection that match those of genome-wide GC-biased gene conversion (gBGC). In particular, we show that great apes have patterns of allelic selection that vary in intensity-a feature that we correlated with great apes' distinct demographies. We also demonstrate that the AT/GC toggling effect decreases the probability of a substitution, promoting more polymorphisms in the base composition of great ape genomes. We further assess the impact of GC-bias in molecular analysis, and find that mutation rates and genetic distances are estimated under bias when gBGC is not properly accounted for. Our results contribute to the discussion on the tempo and mode of gBGC evolution, while stressing the need for gBGC-aware models in population genetics and phylogenetics.
Collapse
Affiliation(s)
- Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Wien, Wien, Austria
| | - Gergely J Szöllősi
- Department of Biological Physics, MTA-ELTE "Lendulet" Evolutionary Genomics Research Group, Eötvös University, Pázmány P. stny. 1A, Budapest 1117, Hungary
| | - Carolin Kosiol
- Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Wien, Wien, Austria
- Centre for Biological Diversity, School of Biology, University of St Andrews, Fife KY16 9TH, UK
| |
Collapse
|
8
|
Laurin-Lemay S, Rodrigue N, Lartillot N, Philippe H. Conditional Approximate Bayesian Computation: A New Approach for Across-Site Dependency in High-Dimensional Mutation-Selection Models. Mol Biol Evol 2019; 35:2819-2834. [PMID: 30203003 DOI: 10.1093/molbev/msy173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation-selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
Collapse
Affiliation(s)
- Simon Laurin-Lemay
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS 5558, Université Lyon 1, Lyon, France
| | - Hervé Philippe
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.,Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, France
| |
Collapse
|
9
|
Nath D, Deka H, Uddin A, Chakraborty S. Chronic obstructive pulmonary disease: A crosstalk on nucleotide compositional dynamics and codon usage patterns of the genes involved in disease. J Cell Biochem 2019; 120:7649-7656. [PMID: 30390329 DOI: 10.1002/jcb.28039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 10/15/2018] [Indexed: 01/24/2023]
Abstract
Chronic obstructive pulmonary disease (COPD), a lung disease, affects a large number of people worldwide, leading to death. Here, we analyzed the compositional features and trends of codon usage of the genes influencing COPD to understand molecular biology, genetics, and evolutionary relationships of these genes as no work was reported yet. Coding sequences of COPD genes were found to be rich in guanine-cytosine (GC) content. A high value (34-60) of the effective number of codons of the genes indicated low codon usage bias (CUB). Correspondence analysis suggested that the COPD genes were distinct in their codon usage patterns. Relative synonymous codon usage values of codons differed between the more preferred codons and the less-preferred ones. Correlation analysis between overall nucleotides and those at third codon position revealed that mutation pressure might influence the CUB of the genes. The high correlation between GC12 and GC3 signified that directional mutation pressure might have operated at all the three codon positions in COPD genes.
Collapse
Affiliation(s)
- Durbba Nath
- Department of Biotechnology, Assam University, Silchar, India
| | - Himangshu Deka
- Department of Biotechnology, Assam University, Silchar, India
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Algapur, India
| | | |
Collapse
|
10
|
Pouyet F, Aeschbacher S, Thiéry A, Excoffier L. Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences. eLife 2018; 7:e36317. [PMID: 30125248 PMCID: PMC6177262 DOI: 10.7554/elife.36317] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/17/2018] [Indexed: 12/15/2022] Open
Abstract
Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.
Collapse
Affiliation(s)
- Fanny Pouyet
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Simon Aeschbacher
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
| | - Alexandre Thiéry
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics, Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
11
|
Dutta R, Saha-Mandal A, Cheng X, Qiu S, Serpen J, Fedorova L, Fedorov A. 1000 human genomes carry widespread signatures of GC biased gene conversion. BMC Genomics 2018; 19:256. [PMID: 29661137 PMCID: PMC5902838 DOI: 10.1186/s12864-018-4593-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 03/12/2018] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND GC-Biased Gene Conversion (gBGC) is one of the important theories put forward to explain profound long-range non-randomness in nucleotide compositions along mammalian chromosomes. Nucleotide changes due to gBGC are hard to distinguish from regular mutations. Here, we present an algorithm for analysis of millions of known SNPs that detects a subset of so-called "SNP flip-over" events representing recent gBGC nucleotide changes, which occurred in previous generations via non-crossover meiotic recombination. RESULTS This algorithm has been applied in a large-scale analysis of 1092 sequenced human genomes. Altogether, 56,328 regions on all autosomes have been examined, which revealed 223,955 putative gBGC cases leading to SNP flip-overs. We detected a strong bias (11.7% ± 0.2% excess) in AT- > GC over GC- > AT base pair changes within the entire set of putative gBGC cases. CONCLUSIONS On average, a human gamete acquires 7 SNP flip-over events, in which one allele is replaced by its complementary allele during the process of meiotic non-crossover recombination. In each meiosis event, on average, gBGC results in replacement of 7 AT base pairs by GC base pairs, while only 6 GC pairs are replaced by AT pairs. Therefore, every human gamete is enriched by one GC pair. Happening over millions of years of evolution, this bias may be a noticeable force in changing the nucleotide composition landscape along chromosomes.
Collapse
Affiliation(s)
- Rajib Dutta
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Present Address: Center for Cardiovascular and Pulmonary Research, Nationwide Children’s Hospital, 700 Children’s Dr, Columbus, OH USA
| | - Arnab Saha-Mandal
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Present Address: Biochemistry and Molecular Biology Graduate Program, Cumming School of Medicine, University of Calgary, Calgary, AB T2N4N1 Canada
| | - Xi Cheng
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| | - Shuhao Qiu
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| | - Jasmine Serpen
- SURF Program, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- College of Arts and Sciences, Washington University in St. Louis, 1 Brookings Dr, St. Louis, MO 63130 USA
| | | | - Alexei Fedorov
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| |
Collapse
|
12
|
Smith TCA, Arndt PF, Eyre-Walker A. Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans. PLoS Genet 2018; 14:e1007254. [PMID: 29590096 PMCID: PMC5891062 DOI: 10.1371/journal.pgen.1007254] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/09/2018] [Accepted: 02/13/2018] [Indexed: 01/17/2023] Open
Abstract
It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investigate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show different patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that cannot be explained by variation at smaller scales, however the level of this variation is modest at large scales-at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore structure of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between species is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered.
Collapse
Affiliation(s)
| | - Peter F. Arndt
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
13
|
Identification of susceptible genes for complex chronic diseases based on disease risk functional SNPs and interaction networks. J Biomed Inform 2017; 74:137-144. [DOI: 10.1016/j.jbi.2017.09.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 09/15/2017] [Accepted: 09/16/2017] [Indexed: 01/05/2023]
|
14
|
Glinsky GV. Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens. Genome Biol Evol 2016; 8:2774-88. [PMID: 27503290 PMCID: PMC5630920 DOI: 10.1093/gbe/evw185] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8–10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements.
Collapse
Affiliation(s)
- Gennadi V Glinsky
- Institute of Engineering in Medicine, University of California-San Diego
| |
Collapse
|
15
|
Kenigsberg E, Yehuda Y, Marjavaara L, Keszthelyi A, Chabes A, Tanay A, Simon I. The mutation spectrum in genomic late replication domains shapes mammalian GC content. Nucleic Acids Res 2016; 44:4222-32. [PMID: 27085808 PMCID: PMC4872117 DOI: 10.1093/nar/gkw268] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/10/2016] [Accepted: 03/30/2016] [Indexed: 11/14/2022] Open
Abstract
Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations.
Collapse
Affiliation(s)
- Ephraim Kenigsberg
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Yishai Yehuda
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Lisette Marjavaara
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrea Keszthelyi
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrei Chabes
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Amos Tanay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Itamar Simon
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
16
|
Simonti CN, Capra JA. The evolution of the human genome. Curr Opin Genet Dev 2015; 35:9-15. [PMID: 26338498 DOI: 10.1016/j.gde.2015.08.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 08/08/2015] [Accepted: 08/12/2015] [Indexed: 02/05/2023]
Abstract
Human genomes hold a record of the evolutionary forces that have shaped our species. Advances in DNA sequencing, functional genomics, and population genetic modeling have deepened our understanding of human demographic history, natural selection, and many other long-studied topics. These advances have also revealed many previously underappreciated factors that influence the evolution of the human genome, including functional modifications to DNA and histones, conserved 3D topological chromatin domains, structural variation, and heterogeneous mutation patterns along the genome. Using evolutionary theory as a lens to study these phenomena will lead to significant breakthroughs in understanding what makes us human and why we get sick.
Collapse
Affiliation(s)
- Corinne N Simonti
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37235, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37235, USA; Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA; Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37235, USA.
| |
Collapse
|
17
|
Sharma AS, Gupta HO, Prasad R. PPDB: A Tool for Investigation of Plants Physiology Based on Gene Ontology. Interdiscip Sci 2015; 7:295-308. [DOI: 10.1007/s12539-015-0017-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Revised: 01/10/2014] [Accepted: 02/07/2014] [Indexed: 01/23/2023]
|
18
|
Glémin S, Arndt PF, Messer PW, Petrov D, Galtier N, Duret L. Quantification of GC-biased gene conversion in the human genome. Genome Res 2015; 25:1215-28. [PMID: 25995268 PMCID: PMC4510005 DOI: 10.1101/gr.185488.114] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 05/18/2015] [Indexed: 11/25/2022]
Abstract
Much evidence indicates that GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, a detailed quantification of the process is still lacking. The strength of gBGC can be measured from the analysis of derived allele frequency spectra (DAF), but this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors and by spatial heterogeneity in gBGC strength. We propose a new general method to quantify gBGC from DAF spectra, incorporating polarization errors, taking spatial heterogeneity into account, and jointly estimating mutation bias. Applying it to human polymorphism data from the 1000 Genomes Project, we show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. Genome-wide, the intensity of gBGC is in the nearly neutral area. However, given that recombination occurs primarily within recombination hotspots, 1%–2% of the human genome is subject to strong gBGC. On average, gBGC is stronger in African than in non-African populations, reflecting differences in effective population sizes. However, due to more heterogeneous recombination landscapes, the fraction of the genome affected by strong gBGC is larger in non-African than in African populations. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that, in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.
Collapse
Affiliation(s)
- Sylvain Glémin
- Institut des Sciences de l'Evolution (ISEM - UMR 5554 Université de Montpellier-CNRS-IRD-EPHE), 34095 Montpellier, France; Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Philipp W Messer
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Dmitri Petrov
- Department of Biology, Stanford University, Stanford, California 94305-5020, USA
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution (ISEM - UMR 5554 Université de Montpellier-CNRS-IRD-EPHE), 34095 Montpellier, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Lyon 1, 69622 Villeurbanne, France
| |
Collapse
|
19
|
Clément Y, Fustier MA, Nabholz B, Glémin S. The bimodal distribution of genic GC content is ancestral to monocot species. Genome Biol Evol 2014; 7:336-48. [PMID: 25527839 PMCID: PMC4316631 DOI: 10.1093/gbe/evu278] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots.
Collapse
Affiliation(s)
- Yves Clément
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, Montpellier, France Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | | | - Benoit Nabholz
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | - Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| |
Collapse
|
20
|
Figuet E, Ballenghien M, Romiguier J, Galtier N. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates. Genome Biol Evol 2014; 7:240-50. [PMID: 25527834 PMCID: PMC4316630 DOI: 10.1093/gbe/evu277] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.
Collapse
Affiliation(s)
- Emeric Figuet
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France
| | - Marion Ballenghien
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France
| | - Jonathan Romiguier
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France Department of Ecology and Evolution, Biophore, University of Lausanne, Switzerland
| | - Nicolas Galtier
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France
| |
Collapse
|
21
|
Lachance J, Tishkoff SA. Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. Am J Hum Genet 2014; 95:408-20. [PMID: 25279983 PMCID: PMC4185123 DOI: 10.1016/j.ajhg.2014.09.008] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 08/21/2014] [Accepted: 09/10/2014] [Indexed: 10/25/2022] Open
Abstract
Gene conversion results in the nonreciprocal transfer of genetic information between two recombining sequences, and there is evidence that this process is biased toward G and C alleles. However, the strength of GC-biased gene conversion (gBGC) in human populations and its effects on hereditary disease have yet to be assessed on a genomic scale. Using high-coverage whole-genome sequences of African hunter-gatherers, agricultural populations, and primate outgroups, we quantified the effects of GC-biased gene conversion on population genomic data sets. We find that genetic distances (FST and population branch statistics) are modified by gBGC. In addition, the site frequency spectrum is left-shifted when ancestral alleles are favored by gBGC and right-shifted when derived alleles are favored by gBGC. Allele frequency shifts due to gBGC mimic the effects of natural selection. As expected, these effects are strongest in high-recombination regions of the human genome. By comparing the relative rates of fixation of unbiased and biased sites, the strength of gene conversion was estimated to be on the order of Nb ≈ 0.05 to 0.09. We also find that derived alleles favored by gBGC are much more likely to be homozygous than derived alleles at unbiased SNPs (+42.2% to 62.8%). This results in a curse of the converted, whereby gBGC causes substantial increases in hereditary disease risks. Taken together, our findings reveal that GC-biased gene conversion has important population genetic and public health implications.
Collapse
MESH Headings
- Bias
- Evolution, Molecular
- Gene Conversion
- Gene Frequency
- Genes, Recessive/genetics
- Genetic Diseases, Inborn/genetics
- Genetics, Population
- Genome, Human/genetics
- Humans
- Models, Genetic
- Models, Theoretical
- Polymorphism, Single Nucleotide/genetics
- Recombination, Genetic
- Selection, Genetic/genetics
Collapse
Affiliation(s)
- Joseph Lachance
- Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Sarah A Tishkoff
- Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
22
|
Sharma AS, Gupta HO, Prasad R. PPDB - A tool for investigation of plants physiology based on gene ontology. Interdiscip Sci 2014. [PMID: 25183354 DOI: 10.1007/s12539-013-0065-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Revised: 01/10/2014] [Accepted: 02/07/2014] [Indexed: 09/29/2022]
Abstract
Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible on-line ( http://www.iitr.ernet.in/ajayshiv/ ) through a user friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multi-component complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.
Collapse
Affiliation(s)
- Ajay Shiv Sharma
- Department of Electrical Engineering, Indian Institute of Technology Roorkee, Roorkee, 247667, India,
| | | | | |
Collapse
|
23
|
Hubisz MJ, Pollard KS. Exploring the genesis and functions of Human Accelerated Regions sheds light on their role in human evolution. Curr Opin Genet Dev 2014; 29:15-21. [PMID: 25156517 DOI: 10.1016/j.gde.2014.07.005] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Revised: 07/23/2014] [Accepted: 07/25/2014] [Indexed: 12/31/2022]
Abstract
Human accelerated regions (HARs) are DNA sequences that changed very little throughout mammalian evolution, but then experienced a burst of changes in humans since divergence from chimpanzees. This unexpected evolutionary signature is suggestive of deeply conserved function that was lost or changed on the human lineage. Since their discovery, the actual roles of HARs in human evolution have remained somewhat elusive, due to their being almost exclusively non-coding sequences with no annotation. Ongoing research is beginning to crack this problem by leveraging new genome sequences, functional genomics data, computational approaches, and genetic assays to reveal that many HARs are developmental gene regulatory elements and RNA genes, most of which evolved their uniquely human mutations through positive selection before divergence of archaic hominins and diversification of modern humans.
Collapse
Affiliation(s)
- Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, 102D Weill Hall, Ithaca, NY 14853, USA
| | - Katherine S Pollard
- Gladstone Institutes, Division of Biostatistics & Institute for Human Genetics, University of California, 1650 Owens Street, San Francisco, CA 94158, USA.
| |
Collapse
|
24
|
Hinch AG, Altemose N, Noor N, Donnelly P, Myers SR. Recombination in the human Pseudoautosomal region PAR1. PLoS Genet 2014; 10:e1004503. [PMID: 25033397 PMCID: PMC4102438 DOI: 10.1371/journal.pgen.1004503] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 05/27/2014] [Indexed: 12/19/2022] Open
Abstract
The pseudoautosomal region (PAR) is a short region of homology between the mammalian X and Y chromosomes, which has undergone rapid evolution. A crossover in the PAR is essential for the proper disjunction of X and Y chromosomes in male meiosis, and PAR deletion results in male sterility. This leads the human PAR with the obligatory crossover, PAR1, to having an exceptionally high male crossover rate, which is 17-fold higher than the genome-wide average. However, the mechanism by which this obligatory crossover occurs remains unknown, as does the fine-scale positioning of crossovers across this region. Recent research in mice has suggested that crossovers in PAR may be mediated independently of the protein PRDM9, which localises virtually all crossovers in the autosomes. To investigate recombination in this region, we construct the most fine-scale genetic map containing directly observed crossovers to date using African-American pedigrees. We leverage recombination rates inferred from the breakdown of linkage disequilibrium in human populations and investigate the signatures of DNA evolution due to recombination. Further, we identify direct PRDM9 binding sites using ChIP-seq in human cells. Using these independent lines of evidence, we show that, in contrast with mouse, PRDM9 does localise peaks of recombination in the human PAR1. We find that recombination is a far more rapid and intense driver of sequence evolution in PAR1 than it is on the autosomes. We also show that PAR1 hotspot activities differ significantly among human populations. Finally, we find evidence that PAR1 hotspot positions have changed between human and chimpanzee, with no evidence of sharing among the hottest hotspots. We anticipate that the genetic maps built and validated in this work will aid research on this vital and fascinating region of the genome.
Collapse
Affiliation(s)
- Anjali G. Hinch
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| | - Nicolas Altemose
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| | - Nudrat Noor
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| | - Peter Donnelly
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| | - Simon R. Myers
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, United Kingdom
| |
Collapse
|
25
|
Glémin S, Clément Y, David J, Ressayre A. GC content evolution in coding regions of angiosperm genomes: a unifying hypothesis. Trends Genet 2014; 30:263-70. [PMID: 24916172 DOI: 10.1016/j.tig.2014.05.002] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 05/09/2014] [Accepted: 05/13/2014] [Indexed: 01/06/2023]
Abstract
In angiosperms (as in other species), GC content varies along and between genes, within a genome, and between genomes of different species, but the reason for this distribution is still an open question. Grass genomes are particularly intriguing because they exhibit a strong bimodal distribution of genic GC content and a sharp 5'-3' decreasing GC content gradient along most genes. Here, we propose a unifying model to explain the main patterns of GC content variation at the gene and genome scale. We argue that GC content patterns could be mainly determined by the interactions between gene structure, recombination patterns, and GC-biased gene conversion. Recent studies on fine-scale recombination maps in angiosperms support this hypothesis and previous results also fit this model. We propose that our model could be used as a null hypothesis to search for additional forces that affect GC content in angiosperms.
Collapse
Affiliation(s)
- Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, UMR 5554 CNRS, Université Montpellier 2, F-34095 Montpellier, France.
| | - Yves Clément
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, UMR 5554 CNRS, Université Montpellier 2, F-34095 Montpellier, France; Montpellier SupAgro, Unité Mixte de Recherche 1334 Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Jacques David
- Montpellier SupAgro, Unité Mixte de Recherche 1334 Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Adrienne Ressayre
- INRA, UMR de Génétique Végétale, INRA/CNRS/Univ Paris-Sud/AgroParistech, Ferme du Moulon, F-91190 Gif sur Yvette, France
| |
Collapse
|
26
|
Borštnik B, Pumpernik D. The apparent enhancement of CpG transversions in primate lineage is a consequence of multiple replacements. J Bioinform Comput Biol 2014; 12:1450011. [PMID: 24969749 DOI: 10.1142/s0219720014500115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We claim that the apparently enhanced CpG transversions in the form CpG to CpC/GpG or to ApG/CpT are caused by the hypermutable CpG to CpA/TpG transition. The nucleotide replacement counts obtained from the human/chimpanzee/gorilla/orangutan sequence alignments representing the replacements due to the evolutionary species divergence and the results of 1000 genomes project that provide us with the differences due to the intraspecies diversification were analyzed to estimate the ratio of CpG versus non-CpG transversion probabilities. The trinucleotide replacement counts were extracted from the regions that are free of functional constraints. The CpG transversion probabilities based upon the genomic comparisons were found to exceed more than twice the non-CpG transversions. The diversity data emerging from 14 population groups were partitioned in five classes as a function of the parameter quantifying the spread of the polymorphic allele among the group of individuals. The results based upon the human polymorphism exhibit a trend where CpG over non-CpG transversion probability ratio is less and less exceeding unity as the values of the derived allele frequency (DAF) of snps are diminishing. A computer simulation of a simplified model indicates that the phenomenon of the apparent enhancement of CpG transversions can have its source in the interference of the entropic effects with the maximum likelihood methodologies.
Collapse
Affiliation(s)
- Branko Borštnik
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | |
Collapse
|
27
|
Yang P, Wu M, Guo J, Kwoh CK, Przytycka TM, Zheng J. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms. BMC Bioinformatics 2014; 15:48. [PMID: 24533858 PMCID: PMC3936957 DOI: 10.1186/1471-2105-15-48] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2012] [Accepted: 01/27/2014] [Indexed: 11/10/2022] Open
Abstract
Background As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Results Recently, an algorithm called “LDsplit” has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. Conclusions LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of recombination hotspots among individuals, opening a new avenue for motif finding. Tested on an established motif and simulated datasets, LDsplit shows promise to discover novel DNA motifs for meiotic recombination hotspots.
Collapse
Affiliation(s)
| | | | | | | | | | - Jie Zheng
- Bioinformatics Research Centre (BIRC), School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore.
| |
Collapse
|
28
|
Odenthal-Hesse L, Berg IL, Veselis A, Jeffreys AJ, May CA. Transmission distortion affecting human noncrossover but not crossover recombination: a hidden source of meiotic drive. PLoS Genet 2014; 10:e1004106. [PMID: 24516398 PMCID: PMC3916235 DOI: 10.1371/journal.pgen.1004106] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/27/2013] [Indexed: 11/19/2022] Open
Abstract
Meiotic recombination ensures the correct segregation of homologous chromosomes during gamete formation and contributes to DNA diversity through both large-scale reciprocal crossovers and very localised gene conversion events, also known as noncrossovers. Considerable progress has been made in understanding factors such as PRDM9 and SNP variants that influence the initiation of recombination at human hotspots but very little is known about factors acting downstream. To address this, we simultaneously analysed both types of recombinant molecule in sperm DNA at six highly active hotspots, and looked for disparity in the transmission of allelic variants indicative of any cis-acting influences. At two of the hotspots we identified a novel form of biased transmission that was exclusive to the noncrossover class of recombinant, and which presumably arises through differences between crossovers and noncrossovers in heteroduplex formation and biased mismatch repair. This form of biased gene conversion is not predicted to influence hotspot activity as previously noted for SNPs that affect recombination initiation, but does constitute a powerful and previously undetected source of recombination-driven meiotic drive that by extrapolation may affect thousands of recombination hotspots throughout the human genome. Intriguingly, at both of the hotspots described here, this drive favours strong (G/C) over weak (A/T) base pairs as might be predicted from the well-established correlations between high GC content and recombination activity in mammalian genomes.
Collapse
Affiliation(s)
| | - Ingrid L. Berg
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Amelia Veselis
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Alec J. Jeffreys
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Celia A. May
- Department of Genetics, University of Leicester, Leicester, United Kingdom
- * E-mail:
| |
Collapse
|
29
|
Tatarinova T, Elhaik E, Pellegrini M. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 2013; 5:1443-56. [PMID: 23833164 PMCID: PMC3762193 DOI: 10.1093/gbe/evt103] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The GC content in the third codon position (GC3) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC3 was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC3 from 5′ to 3′. Moreover, GC3-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC3 bimodal distribution we hypothesize that GC3 has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC3 distribution and tested the association between GC3, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC3, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC3 and genic CpG methylation. The comparison between 5′-3′ gradients of CG3-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC3 and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC3-poor and GC3-rich genes are the products of several competing processes.
Collapse
Affiliation(s)
- Tatiana Tatarinova
- Laboratory of Applied Pharmacokinetics and Bioinformatics, University of Southern California.
| | | | | |
Collapse
|
30
|
Robinson MC, Stone EA, Singh ND. Population genomic analysis reveals no evidence for GC-biased gene conversion in Drosophila melanogaster. Mol Biol Evol 2013; 31:425-33. [PMID: 24214536 DOI: 10.1093/molbev/mst220] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene conversion is the nonreciprocal exchange of genetic material between homologous chromosomes. Multiple lines of evidence from a variety of taxa strongly suggest that gene conversion events are biased toward GC-bearing alleles. However, in Drosophila, the data have largely been indirect and unclear, with some studies supporting the predictions of a GC-biased gene conversion model and other data showing contradictory findings. Here, we test whether gene conversion events are GC-biased in Drosophila melanogaster using whole-genome polymorphism and divergence data. Our results provide no support for GC-biased gene conversion and thus suggest that this process is unlikely to significantly contribute to patterns of polymorphism and divergence in this system.
Collapse
Affiliation(s)
- Matthew C Robinson
- Department of Biological Sciences, Program in Genetics, North Carolina State University
| | | | | |
Collapse
|
31
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
32
|
Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes. PLoS Genet 2013; 9:e1003684. [PMID: 23966869 PMCID: PMC3744432 DOI: 10.1371/journal.pgen.1003684] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 06/14/2013] [Indexed: 01/03/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. Interpreting patterns of DNA sequence variation in the genomes of closely related species is critically important for understanding the causes and functional effects of nucleotide substitutions. Classical models describe patterns of substitution in terms of the fundamental forces of mutation, recombination, neutral drift, and natural selection. However, an entirely separate force, called GC-biased gene conversion (gBGC), also appears to have an important influence on substitution patterns in many species. gBGC is a recombination-associated evolutionary process that favors the fixation of strong (G/C) over weak (A/T) alleles. In mammals, gBGC is thought to promote variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations. However, its genome-wide influence remains poorly understood, in part because, it is difficult to incorporate gBGC into statistical models of evolution. In this paper, we describe a new evolutionary model that jointly describes the effects of selection and gBGC and apply it to the human and chimpanzee genomes. Our genome-wide predictions of gBGC tracts indicate that gBGC has been an important force in recent human evolution. Our publicly available computer program, called phastBias, and our genome-wide predictions will enable other researchers to consider gBGC in their analyses.
Collapse
Affiliation(s)
- John A. Capra
- Gladstone Institutes, University of California, San Francisco, California, United States of America
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Dennis Kostka
- Department of Developmental Biology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Katherine S. Pollard
- Gladstone Institutes, University of California, San Francisco, California, United States of America
- Institute for Human Genetics and Division of Biostatistics, University of California, San Francisco, California, United States of America
- * E-mail: (KSP); (AS)
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (KSP); (AS)
| |
Collapse
|
33
|
Oliveira PH, da Silva CL, Cabral JMS. An appraisal of human mitochondrial DNA instability: new insights into the role of non-canonical DNA structures and sequence motifs. PLoS One 2013; 8:e59907. [PMID: 23555828 PMCID: PMC3612095 DOI: 10.1371/journal.pone.0059907] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 02/20/2013] [Indexed: 01/29/2023] Open
Abstract
Mitochondrial DNA (mtDNA) deletion mutations are frequently observed in aged postmitotic tissues and are the cause of a wide range of human disorders. Presently, the molecular bases underlying mtDNA deletion formation remain a matter of intense debate, and it is commonly accepted that several mechanisms contribute to the spectra of mutations in the mitochondrial genome. In this work we performed an extensive screening of human mtDNA deletions and evaluated the association between breakpoint density and presence of non-canonical DNA elements and over-represented sequence motifs. Our observations support the involvement of helix-distorting intrinsically curved regions and long G-tetrads in eliciting instability events. In addition, higher breakpoint densities were consistently observed within GC-skewed regions and in the close vicinity of the degenerate sequence motif YMMYMNNMMHM. A parallelism is also established with hot spot motifs previously identified in the nuclear genome, as well as with the minimal binding site for the mitochondrial transcription termination factor mTERF. This study extends the current knowledge on the mechanisms driving mitochondrial rearrangements and opens up exciting avenues for further research.
Collapse
Affiliation(s)
- Pedro H Oliveira
- Department of Bioengineering and Institute for Biotechnology and Bioengineering, Instituto Superior Técnico, Lisbon, Portugal.
| | | | | |
Collapse
|
34
|
Lesecque Y, Mouchiroud D, Duret L. GC-biased gene conversion in yeast is specifically associated with crossovers: molecular mechanisms and evolutionary significance. Mol Biol Evol 2013; 30:1409-19. [PMID: 23505044 PMCID: PMC3649680 DOI: 10.1093/molbev/mst056] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a process associated with recombination that favors the transmission of GC alleles over AT alleles during meiosis. gBGC plays a major role in genome evolution in many eukaryotes. However, the molecular mechanisms of gBGC are still unknown. Different steps of the recombination process could potentially cause gBGC: the formation of double-strand breaks (DSBs), the invasion of the homologous or sister chromatid, and the repair of mismatches in heteroduplexes. To investigate these models, we analyzed a genome-wide data set of crossovers (COs) and noncrossovers (NCOs) in Saccharomyces cerevisiae. We demonstrate that the overtransmission of GC alleles is specific to COs and that it occurs among conversion tracts in which all alleles are converted from the same donor haplotype. Thus, gBGC results from a process that leads to long-patch repair. We show that gBGC is associated with longer tracts and that it is driven by the nature (GC or AT) of the alleles located at the extremities of the tract. These observations invalidate the hypotheses that gBGC is due to the base excision repair machinery or to a bias in DSB formation and suggest that in S. cerevisiae, gBGC is caused by the mismatch repair (MMR) system. We propose that the presence of nicks on both DNA strands during CO resolution could be the cause of the bias in MMR activity. Our observations are consistent with the hypothesis that gBGC is a nonadaptive consequence of a selective pressure to limit the mutation rate in mitotic cells.
Collapse
Affiliation(s)
- Yann Lesecque
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université de Lyon, Université Lyon 1, Villeurbanne, France
| | | | | |
Collapse
|
35
|
Montgomery SB, Goode DL, Kvikstad E, Albers CA, Zhang ZD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V, Richardson R, Davis J, MacArthur DG, Sidow A, Duret L, Gerstein M, Makova KD, Marchini J, McVean G, Lunter G. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res 2013; 23:749-61. [PMID: 23478400 PMCID: PMC3638132 DOI: 10.1101/gr.148718.112] [Citation(s) in RCA: 163] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%–48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Collapse
Affiliation(s)
- Stephen B Montgomery
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Günther T, Lampei C, Schmid KJ. Mutational bias and gene conversion affect the intraspecific nitrogen stoichiometry of the Arabidopsis thaliana transcriptome. Mol Biol Evol 2012; 30:561-8. [PMID: 23115321 DOI: 10.1093/molbev/mss249] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The transcriptome and proteome of Arabidopsis thaliana are reduced in nitrogen content when compared with other taxa, which may result from ecological nitrogen limitation. We hypothesized that if the A. thaliana transcriptome is selected for a low nitrogen content, nitrogen-reducing derived alleles of single nucleotide polymorphisms (SNPs) should segregate at higher frequencies than nitrogen-increasing alleles. This pattern should be stronger in populations with a larger effective population size (N(e)) if natural selection is more efficient in large than in small populations. We analyzed variation in the nitrogen content in the transcriptome of 80 natural accessions of A. thaliana. In contrast to our expectations, derived alleles increase the nitrogen content in all accessions, and there is a positive correlation between nitrogen difference and derived allele frequency, which is strongest with nonsynonymous SNPs (nsSNPs). Also, there is a positive correlation between nitrogen difference and N(e) that was mainly caused by nsSNPs. These observations led us to reject the hypothesis that the transcriptome of A. thaliana is currently under selection to reduce nitrogen content. Instead, we show that a change in nitrogen content is a side effect of interacting evolutionary factors that influence base composition and include mutational bias, purifying selection of functionally deleterious alleles, and GC-biased gene conversion. We provide strong evidence that GC-biased gene conversion may play an important role for base composition in the highly selfing plant A. thaliana.
Collapse
Affiliation(s)
- Torsten Günther
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
| | | | | |
Collapse
|
37
|
Lartillot N. Phylogenetic patterns of GC-biased gene conversion in placental mammals and the evolutionary dynamics of recombination landscapes. Mol Biol Evol 2012; 30:489-502. [PMID: 23079417 DOI: 10.1093/molbev/mss239] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a major evolutionary force shaping genomic nucleotide landscapes, distorting the estimation of the strength of selection, and having potentially deleterious effects on genome-wide fitness. Yet, a global quantitative picture, at large evolutionary scale, of the relative strength of gBGC compared with selection and random drift is still lacking. Furthermore, owing to its dependence on the local recombination rate, gBGC results in modulations of the substitution patterns along genomes and across time which, if correctly interpreted, may yield quantitative insights into the long-term evolutionary dynamics of recombination landscapes. Deriving a model of the substitution process at putatively neutral nucleotide positions from population-genetics arguments, and accounting for among-lineage and among-gene effects, we propose a reconstruction of the variation in gBGC intensity at the scale of placental mammals, and of its scaling with body-size and karyotypic traits. Our results are compatible with a simple population genetics model relating gBGC to effective population size and recombination rate. In addition, among-gene variation and phylogenetic patterns of exon-specific levels of gBGC reveal the presence of rugged recombination landscapes, and suggest that short-lived recombination hot-spots are a general feature of placentals. Across placental mammals, variation in gBGC strength spans two orders of magnitude, at its lowest in apes, strongest in lagomorphs, microbats or tenrecs, and near or above the nearly neutral threshold in most other lineages. Combined with among-gene variation, such high levels of biased gene conversion are likely to significantly impact midly selected positions, and to represent a substantial mutation load. Altogether, our analysis suggests a more important role of gBGC in placental genome evolution, compared with what could have been anticipated from studies conducted in anthropoid primates.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
38
|
Lartillot N. Interaction between selection and biased gene conversion in mammalian protein-coding sequence evolution revealed by a phylogenetic covariance analysis. Mol Biol Evol 2012; 30:356-68. [PMID: 23024185 DOI: 10.1093/molbev/mss231] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
According to the nearly-neutral model, variation in long-term effective population size among species should result in correlated variation in the ratio of nonsynonymous over synonymous substitution rates (dN/dS). Previous empirical investigations in mammals have been consistent with this prediction, suggesting an important role for nearly-neutral effects on protein-coding sequence evolution. GC-biased gene conversion (gBGC), on the other hand, is increasingly recognized as a major evolutionary force shaping genome nucleotide composition. When sufficiently strong compared with random drift, gBGC may significantly interfere with a nearly-neutral regime and impact dN/dS in a complex manner. Here, we investigate the phylogenetic correlations between dN/dS, the equilibrium GC composition (GC*), and several life-history and karyotypic traits in placental mammals. We show that the equilibrium GC composition decreases with body mass and increases with the number of chromosomes, suggesting a modulation of the strength of biased gene conversion due to changes in effective population size and genome-wide recombination rate. The variation in dN/dS is complex and only partially fits the prediction of the nearly-neutral theory. However, specifically restricting estimation of the dN/dS ratio on GC-conservative transversions, which are immune from gBGC, results in correlations that are more compatible with a nearly-neutral interpretation. Our investigation indicates the presence of complex interactions between selection and biased gene conversion and suggests that further mechanistic development is warranted, to tease out mutation, selection, drift, and conversion.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
39
|
Speranskaya AS, Krinitsina AA, Kudryavtseva AV, Poltronieri P, Santino A, Oparina NY, Dmitriev AA, Belenikin MS, Guseva MA, Shevelev AB. Impact of recombination on polymorphism of genes encoding Kunitz-type protease inhibitors in the genus Solanum. Biochimie 2012; 94:1687-96. [DOI: 10.1016/j.biochi.2012.03.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 03/11/2012] [Indexed: 11/29/2022]
|
40
|
Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, Humburg P, Iqbal Z, Lunter G, Maller J, Hernandez RD, Melton C, Venkat A, Nobrega MA, Bontrop R, Myers S, Donnelly P, Przeworski M, McVean G. A fine-scale chimpanzee genetic map from population sequencing. Science 2012; 336:193-8. [PMID: 22422862 PMCID: PMC3532813 DOI: 10.1126/science.1216872] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To study the evolution of recombination rates in apes, we developed methodology to construct a fine-scale genetic map from high-throughput sequence data from 10 Western chimpanzees, Pan troglodytes verus. Compared to the human genetic map, broad-scale recombination rates tend to be conserved, but with exceptions, particularly in regions of chromosomal rearrangements and around the site of ancestral fusion in human chromosome 2. At fine scales, chimpanzee recombination is dominated by hotspots, which show no overlap with those of humans even though rates are similarly elevated around CpG islands and decreased within genes. The hotspot-specifying protein PRDM9 shows extensive variation among Western chimpanzees, and there is little evidence that any sequence motifs are enriched in hotspots. The contrasting locations of hotspots provide a natural experiment, which demonstrates the impact of recombination on base composition.
Collapse
Affiliation(s)
- Adam Auton
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Genetics, Albert Einstein College of Medicine, New York, New York, USA
| | - Adi Fledel-Alon
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Susanne Pfeifer
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Oliver Venn
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Laure Ségurel
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
| | - Teresa Street
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Ellen M. Leffler
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Rory Bowden
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
- Oxford Biomedical Research Centre, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 9DU, UK
| | - Ivy Aneas
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - John Broxholme
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Peter Humburg
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Gerton Lunter
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Julian Maller
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Ryan D. Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143-0912, USA
| | - Cord Melton
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Aarti Venkat
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
| | - Marcelo A. Nobrega
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Ronald Bontrop
- Department of Comparative Genetics and Refinement, Biomedical Primate Research Center, Lange Kleiweg 139 2288 GJ, Rijswijk, Netherlands
| | - Simon Myers
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Peter Donnelly
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Molly Przeworski
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| |
Collapse
|
41
|
Serres-Giardi L, Belkhir K, David J, Glémin S. Patterns and evolution of nucleotide landscapes in seed plants. THE PLANT CELL 2012; 24:1379-97. [PMID: 22492812 PMCID: PMC3398553 DOI: 10.1105/tpc.111.093674] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Nucleotide landscapes, which are the way base composition is distributed along a genome, strongly vary among species. The underlying causes of these variations have been much debated. Though mutational bias and selection were initially invoked, GC-biased gene conversion (gBGC), a recombination-associated process favoring the G and C over A and T bases, is increasingly recognized as a major factor. As opposed to vertebrates, evolution of GC content is less well known in plants. Most studies have focused on the GC-poor and homogeneous Arabidopsis thaliana genome and the much more GC-rich and heterogeneous rice (Oryza sativa) genome and have often been generalized as a dicot/monocot dichotomy. This vision is clearly phylogenetically biased and does not allow understanding the mechanisms involved in GC content evolution in plants. To tackle these issues, we used EST data from more than 200 species and provided the most comprehensive description of gene GC content across the seed plant phylogeny so far available. As opposed to the classically assumed dicot/monocot dichotomy, we found continuous variations in GC content from the probably ancestral GC-poor and homogeneous genomes to the more derived GC-rich and highly heterogeneous ones, with several independent enrichment episodes. Our results suggest that gBGC could play a significant role in the evolution of GC content in plant genomes.
Collapse
Affiliation(s)
- Laurana Serres-Giardi
- Institut des Sciences de l’Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier 2, F-34095 Montpellier, France
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Khalid Belkhir
- Institut des Sciences de l’Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier 2, F-34095 Montpellier, France
| | - Jacques David
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Sylvain Glémin
- Institut des Sciences de l’Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier 2, F-34095 Montpellier, France
- Address correspondence to
| |
Collapse
|
42
|
Frenkel S, Kirzhner V, Korol A. Organizational heterogeneity of vertebrate genomes. PLoS One 2012; 7:e32076. [PMID: 22384143 PMCID: PMC3288070 DOI: 10.1371/journal.pone.0032076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 01/23/2012] [Indexed: 01/06/2023] Open
Abstract
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Collapse
Affiliation(s)
| | | | - Abraham Korol
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel
| |
Collapse
|
43
|
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, Ritchie GRS, Ruffier M, Schuster M, Sobral D, Tang YA, Taylor K, Trevanion S, Vandrovcova J, White S, Wilson M, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Harrow J, Herrero J, Hubbard TJP, Parker A, Proctor G, Spudich G, Vogel J, Yates A, Zadissa A, Searle SMJ. Ensembl 2012. Nucleic Acids Res 2011; 40:D84-90. [PMID: 22086963 PMCID: PMC3245178 DOI: 10.1093/nar/gkr991] [Citation(s) in RCA: 806] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
Collapse
Affiliation(s)
- Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|