1
|
Huttener R, Thorrez L, Veld TI, Granvik M, Van Lommel L, Waelkens E, Derua R, Lemaire K, Goyvaerts L, De Coster S, Buyse J, Schuit F. Sequencing refractory regions in bird genomes are hotspots for accelerated protein evolution. BMC Ecol Evol 2021; 21:176. [PMID: 34537008 PMCID: PMC8449477 DOI: 10.1186/s12862-021-01905-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/31/2021] [Indexed: 11/29/2022] Open
Abstract
Background Approximately 1000 protein encoding genes common for vertebrates are still unannotated in avian genomes. Are these genes evolutionary lost or are they not yet found for technical reasons? Using genome landscapes as a tool to visualize large-scale regional effects of genome evolution, we reexamined this question. Results On basis of gene annotation in non-avian vertebrate genomes, we established a list of 15,135 common vertebrate genes. Of these, 1026 were not found in any of eight examined bird genomes. Visualizing regional genome effects by our sliding window approach showed that the majority of these "missing" genes can be clustered to 14 regions of the human reference genome. In these clusters, an additional 1517 genes (often gene fragments) were underrepresented in bird genomes. The clusters of “missing” genes coincided with regions of very high GC content, particularly in avian genomes, making them “hidden” because of incomplete sequencing. Moreover, proteins encoded by genes in these sequencing refractory regions showed signs of accelerated protein evolution. As a proof of principle for this idea we experimentally characterized the mRNA and protein products of four "hidden" bird genes that are crucial for energy homeostasis in skeletal muscle: ALDOA, ENO3, PYGM and SLC2A4. Conclusions A least part of the “missing” genes in bird genomes can be attributed to an artifact caused by the difficulty to sequence regions with extreme GC% (“hidden” genes). Biologically, these “hidden” genes are of interest as they encode proteins that evolve more rapidly than the genome wide average. Finally we show that four of these “hidden” genes encode key proteins for energy metabolism in flight muscle. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01905-7.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium.,Tissue Engineering Laboratory, Department of Development and Regeneration, KU Leuven Campus Kulak, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - L Van Lommel
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - E Waelkens
- Laboratory of Protein Phosphorylation and Proteomics, KU Leuven, Leuven, Belgium
| | - R Derua
- Laboratory of Protein Phosphorylation and Proteomics, KU Leuven, Leuven, Belgium
| | - K Lemaire
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - L Goyvaerts
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - S De Coster
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - J Buyse
- Laboratory of Livestock Physiology, Department of Biosystems, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium.
| |
Collapse
|
2
|
Suurväli J, Whiteley AR, Zheng Y, Gharbi K, Leptin M, Wiehe T. The Laboratory Domestication of Zebrafish: From Diverse Populations to Inbred Substrains. Mol Biol Evol 2021; 37:1056-1069. [PMID: 31808937 PMCID: PMC7086173 DOI: 10.1093/molbev/msz289] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
We know from human genetic studies that practically all aspects of biology are strongly influenced by the genetic background, as reflected in the advent of “personalized medicine.” Yet, with few exceptions, this is not taken into account when using laboratory populations as animal model systems for research in these fields. Laboratory strains of zebrafish (Danio rerio) are widely used for research in vertebrate developmental biology, behavior, and physiology, for modeling diseases, and for testing pharmaceutic compounds in vivo. However, all of these strains are derived from artificial bottleneck events and therefore are likely to represent only a fraction of the genetic diversity present within the species. Here, we use restriction site-associated DNA sequencing to genetically characterize wild populations of zebrafish from India, Nepal, and Bangladesh, and to compare them to previously published data on four common laboratory strains. We measured nucleotide diversity, heterozygosity, and allele frequency spectra, and find that wild zebrafish are much more diverse than laboratory strains. Further, in wild zebrafish, there is a clear signal of GC-biased gene conversion that is missing in laboratory strains. We also find that zebrafish populations in Nepal and Bangladesh are most distinct from all other strains studied, making them an attractive subject for future studies of zebrafish population genetics and molecular ecology. Finally, isolates of the same strains kept in different laboratories show a pattern of ongoing differentiation into genetically distinct substrains. Together, our findings broaden the basis for future genetic, physiological, pharmaceutic, and evolutionary studies in Danio rerio.
Collapse
Affiliation(s)
- Jaanus Suurväli
- Institute for Genetics, University of Cologne, Cologne, Germany
| | - Andrew R Whiteley
- Wildlife Biology Program, Department of Ecosystem and Conservation Sciences, College of Forestry and Conservation, University of Montana, Missoula, MT
| | - Yichen Zheng
- Institute for Genetics, University of Cologne, Cologne, Germany
| | - Karim Gharbi
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh, Edinburgh, United Kingdom.,Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Maria Leptin
- Institute for Genetics, University of Cologne, Cologne, Germany
| | - Thomas Wiehe
- Institute for Genetics, University of Cologne, Cologne, Germany
| |
Collapse
|
3
|
Gao NL, He Z, Zhu Q, Jiang P, Hu S, Chen WH. Selection for Cheaper Amino Acids Drives Nucleotide Usage at the Start of Translation in Eukaryotic Genes. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:949-957. [PMID: 33741525 PMCID: PMC9403032 DOI: 10.1016/j.gpb.2021.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 05/30/2019] [Accepted: 08/18/2019] [Indexed: 12/04/2022]
Abstract
Coding regions have complex interactions among multiple selective forces, which are manifested as biases in nucleotide composition. Previous studies have revealed a decreasing GC gradient from the 5′-end to 3′-end of coding regions in various organisms. We confirmed that this gradient is universal in eukaryotic genes, but the decrease only starts from the ∼ 25th codon. This trend is mostly found in nonsynonymous (ns) sites at which the GC gradient is universal across the eukaryotic genome. Increased GC contents at ns sites result in cheaper amino acids, indicating a universal selection for energy efficiency toward the N-termini of encoded proteins. Within a genome, the decreasing GC gradient is intensified from lowly to highly expressed genes (more and more protein products), further supporting this hypothesis. This reveals a conserved selective constraint for cheaper amino acids at the translation start that drives the increased GC contents at ns sites. Elevated GC contents can facilitate transcription but result in a more stable local secondary structure around the start codon and subsequently impede translation initiation. Conversely, the GC gradients at four-fold and two-fold synonymous sites vary across species. They could decrease or increase, suggesting different constraints acting at the GC contents of different codon sites in different species. This study reveals that the overall GC contents at the translation start are consequences of complex interactions among several major biological processes that shape the nucleotide sequences, especially efficient energy usage.
Collapse
Affiliation(s)
- Na L Gao
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Institute for Computer Science and Cluster of Excellence on Plant Sciences, Heinrich Heine University, Duesseldorf 40225, Germany
| | - Zilong He
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Interdisciplinary Innovation Institute of Medicine and Engineering, Beihang University, Beijing 100191, China
| | - Qianhui Zhu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Puzi Jiang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|
4
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
5
|
Bolívar P, Guéguen L, Duret L, Ellegren H, Mugal CF. GC-biased gene conversion conceals the prediction of the nearly neutral theory in avian genomes. Genome Biol 2019; 20:5. [PMID: 30616647 PMCID: PMC6322265 DOI: 10.1186/s13059-018-1613-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 12/17/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The nearly neutral theory of molecular evolution predicts that the efficacy of natural selection increases with the effective population size. This prediction has been verified by independent observations in diverse taxa, which show that life-history traits are strongly correlated with measures of the efficacy of selection, such as the dN/dS ratio. Surprisingly, avian taxa are an exception to this theory because correlations between life-history traits and dN/dS are apparently absent. Here we explore the role of GC-biased gene conversion on estimates of substitution rates as a potential driver of these unexpected observations. RESULTS We analyze the relationship between dN/dS estimated from alignments of 47 avian genomes and several proxies for effective population size. To distinguish the impact of GC-biased gene conversion from selection, we use an approach that accounts for non-stationary base composition and estimate dN/dS separately for changes affected or unaffected by GC-biased gene conversion. This analysis shows that the impact of GC-biased gene conversion on substitution rates can explain the lack of correlations between life-history traits and dN/dS. Strong correlations between life-history traits and dN/dS are recovered after accounting for GC-biased gene conversion. The correlations are robust to variation in base composition and genomic location. CONCLUSIONS Our study shows that gene sequence evolution across a wide range of avian lineages meets the prediction of the nearly neutral theory, the efficacy of selection increases with effective population size. Moreover, our study illustrates that accounting for GC-biased gene conversion is important to correctly estimate the strength of selection.
Collapse
Affiliation(s)
- Paulina Bolívar
- Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Laurent Guéguen
- Laboratoire de Biologie et Biométrie Évolutive CNRS UMR 5558, Université Claude Bernard Lyon 1, Lyon, France
| | - Laurent Duret
- Laboratoire de Biologie et Biométrie Évolutive CNRS UMR 5558, Université Claude Bernard Lyon 1, Lyon, France
| | - Hans Ellegren
- Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Carina F. Mugal
- Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| |
Collapse
|
6
|
Tiemann-Boege I, Schwarz T, Striedner Y, Heissl A. The consequences of sequence erosion in the evolution of recombination hotspots. Philos Trans R Soc Lond B Biol Sci 2018; 372:rstb.2016.0462. [PMID: 29109225 PMCID: PMC5698624 DOI: 10.1098/rstb.2016.0462] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2017] [Indexed: 12/18/2022] Open
Abstract
Meiosis is initiated by a double-strand break (DSB) introduced in the DNA by a highly controlled process that is repaired by recombination. In many organisms, recombination occurs at specific and narrow regions of the genome, known as recombination hotspots, which overlap with regions enriched for DSBs. In recent years, it has been demonstrated that conversions and mutations resulting from the repair of DSBs lead to a rapid sequence evolution at recombination hotspots eroding target sites for DSBs. We still do not fully understand the effect of this erosion in the recombination activity, but evidence has shown that the binding of trans-acting factors like PRDM9 is affected. PRDM9 is a meiosis-specific, multi-domain protein that recognizes DNA target motifs by its zinc finger domain and directs DSBs to these target sites. Here we discuss the changes in affinity of PRDM9 to eroded recognition sequences, and explain how these changes in affinity of PRDM9 can affect recombination, leading sometimes to sterility in the context of hybrid crosses. We also present experimental data showing that DNA methylation reduces PRDM9 binding in vitro. Finally, we discuss PRDM9-independent hotspots, posing the question how these hotspots evolve and change with sequence erosion. This article is part of the themed issue ‘Evolutionary causes and consequences of recombination rate variation in sexual organisms’.
Collapse
Affiliation(s)
- Irene Tiemann-Boege
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| | - Theresa Schwarz
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| | - Yasmin Striedner
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| | - Angelika Heissl
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| |
Collapse
|
7
|
Pouyet F, Mouchiroud D, Duret L, Sémon M. Recombination, meiotic expression and human codon usage. eLife 2017; 6:27344. [PMID: 28826480 PMCID: PMC5576983 DOI: 10.7554/elife.27344] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 08/14/2017] [Indexed: 12/17/2022] Open
Abstract
Synonymous codon usage (SCU) varies widely among human genes. In particular, genes involved in different functional categories display a distinct codon usage, which was interpreted as evidence that SCU is adaptively constrained to optimize translation efficiency in distinct cellular states. We demonstrate here that SCU is not driven by constraints on tRNA abundance, but by large-scale variation in GC-content, caused by meiotic recombination, via the non-adaptive process of GC-biased gene conversion (gBGC). Expression in meiotic cells is associated with a strong decrease in recombination within genes. Differences in SCU among functional categories reflect differences in levels of meiotic transcription, which is linked to variation in recombination and therefore in gBGC. Overall, the gBGC model explains 70% of the variance in SCU among genes. We argue that the strong heterogeneity of SCU induced by gBGC in mammalian genomes precludes any optimization of the tRNA pool to the demand in codon usage.
Collapse
Affiliation(s)
- Fanny Pouyet
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Claude Bernard, Villeurbanne, France
| | - Dominique Mouchiroud
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Claude Bernard, Villeurbanne, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Claude Bernard, Villeurbanne, France
| | - Marie Sémon
- Laboratory of Biology and Modelling of the Cell, UnivLyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratoire de Biologie et Modélisation de la Cellule, Lyon, France
| |
Collapse
|
8
|
Kenigsberg E, Yehuda Y, Marjavaara L, Keszthelyi A, Chabes A, Tanay A, Simon I. The mutation spectrum in genomic late replication domains shapes mammalian GC content. Nucleic Acids Res 2016; 44:4222-32. [PMID: 27085808 PMCID: PMC4872117 DOI: 10.1093/nar/gkw268] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/10/2016] [Accepted: 03/30/2016] [Indexed: 11/14/2022] Open
Abstract
Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations.
Collapse
Affiliation(s)
- Ephraim Kenigsberg
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Yishai Yehuda
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Lisette Marjavaara
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrea Keszthelyi
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrei Chabes
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Amos Tanay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Itamar Simon
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
9
|
Hillmer M, Wagner D, Summerer A, Daiber M, Mautner VF, Messiaen L, Cooper DN, Kehrer-Sawatzki H. Fine mapping of meiotic NAHR-associated crossovers causing large NF1 deletions. Hum Mol Genet 2015; 25:484-96. [PMID: 26614388 DOI: 10.1093/hmg/ddv487] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 11/19/2015] [Indexed: 02/06/2023] Open
Abstract
Large deletions encompassing the NF1 gene and its flanking regions belong to the group of genomic disorders caused by copy number changes that are mediated by the local genomic architecture. Although nonallelic homologous recombination (NAHR) is known to be a major mutational mechanism underlying such genomic copy number changes, the sequence determinants of NAHR location and frequency are still poorly understood since few high-resolution mapping studies of NAHR hotspots have been performed to date. Here, we have characterized two NAHR hotspots, PRS1 and PRS2, separated by 20 kb and located within the low-copy repeats NF1-REPa and NF1-REPc, which flank the human NF1 gene region. High-resolution mapping of the crossover sites identified in 78 type 1 NF1 deletions mediated by NAHR indicated that PRS2 is a much stronger NAHR hotspot than PRS1 since 80% of these deletions exhibited crossovers within PRS2, whereas 20% had crossovers within PRS1. The identification of the most common strand exchange regions of these 78 deletions served to demarcate the cores of the PRS1 and PRS2 hotspots encompassing 1026 and 1976 bp, respectively. Several sequence features were identified that may influence hotspot intensity and direct the positional preference of NAHR to the hotspot cores. These features include regions of perfect sequence identity encompassing 700 bp at the hotspot core, the presence of PRDM9 binding sites perfectly matching the consensus motif for the most common PRDM9 variant, specific pre-existing patterns of histone modification and open chromatin conformations that are likely to facilitate PRDM9 binding.
Collapse
Affiliation(s)
- Morten Hillmer
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | - David Wagner
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | - Anna Summerer
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | - Michaela Daiber
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | - Victor-Felix Mautner
- Department of Neurology, University Hospital Hamburg Eppendorf, 20246 Hamburg, Germany
| | - Ludwine Messiaen
- Medical Genomics Laboratory, Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35242, USA and
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | | |
Collapse
|
10
|
Bolívar P, Mugal CF, Nater A, Ellegren H. Recombination Rate Variation Modulates Gene Sequence Evolution Mainly via GC-Biased Gene Conversion, Not Hill-Robertson Interference, in an Avian System. Mol Biol Evol 2015; 33:216-27. [PMID: 26446902 PMCID: PMC4693978 DOI: 10.1093/molbev/msv214] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The ratio of nonsynonymous to synonymous substitution rates (ω) is often used to measure the strength of natural selection. However, ω may be influenced by linkage among different targets of selection, that is, Hill–Robertson interference (HRI), which reduces the efficacy of selection. Recombination modulates the extent of HRI but may also affect ω by means of GC-biased gene conversion (gBGC), a process leading to a preferential fixation of G:C (“strong,” S) over A:T (“weak,” W) alleles. As HRI and gBGC can have opposing effects on ω, it is essential to understand their relative impact to make proper inferences of ω. We used a model that separately estimated S-to-S, S-to-W, W-to-S, and W-to-W substitution rates in 8,423 avian genes in the Ficedula flycatcher lineage. We found that the W-to-S substitution rate was positively, and the S-to-W rate negatively, correlated with recombination rate, in accordance with gBGC but not predicted by HRI. The W-to-S rate further showed the strongest impact on both dN and dS. However, since the effects were stronger at 4-fold than at 0-fold degenerated sites, likely because the GC content of these sites is farther away from its equilibrium, ω slightly decreases with increasing recombination rate, which could falsely be interpreted as a consequence of HRI. We corroborated this hypothesis analytically and demonstrate that under particular conditions, ω can decrease with increasing recombination rate. Analyses of the site-frequency spectrum showed that W-to-S mutations were skewed toward high, and S-to-W mutations toward low, frequencies, consistent with a prevalent gBGC-driven fixation bias.
Collapse
Affiliation(s)
- Paulina Bolívar
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Carina F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Alexander Nater
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
11
|
Weber CC, Nabholz B, Romiguier J, Ellegren H. Kr/Kc but not dN/dS correlates positively with body mass in birds, raising implications for inferring lineage-specific selection. Genome Biol 2015; 15:542. [PMID: 25607475 PMCID: PMC4264323 DOI: 10.1186/s13059-014-0542-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 11/13/2014] [Indexed: 02/02/2023] Open
Abstract
Background The ratio of the rates of non-synonymous and synonymous substitution (dN/dS) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, dN/dS should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As Ne is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and dN/dS is consistently observed is conflicting. Results Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to dN/dS, the ratio of radical to conservative amino acid substitutions (Kr/Kc) correlates positively with body mass. Conclusions Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of dN/dS and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0542-8) contains supplementary material, which is available to authorized users.
Collapse
|
12
|
Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet 2015; 11:e1004941. [PMID: 25659072 PMCID: PMC4450053 DOI: 10.1371/journal.pgen.1004941] [Citation(s) in RCA: 135] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 12/08/2014] [Indexed: 11/29/2022] Open
Abstract
The characterization of functional elements in genomes relies on the identification of the footprints of natural selection. In this quest, taking into account neutral evolutionary processes such as mutation and genetic drift is crucial because these forces can generate patterns that may obscure or mimic signatures of selection. In mammals, and probably in many eukaryotes, another such confounding factor called GC-Biased Gene Conversion (gBGC) has been documented. This mechanism generates patterns identical to what is expected under selection for higher GC-content, specifically in highly recombining genomic regions. Recent results have suggested that a mysterious selective force favouring higher GC-content exists in Bacteria but the possibility that it could be gBGC has been excluded. Here, we show that gBGC is probably at work in most if not all bacterial species. First we find a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades. Second, we show that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons. A comparison with data from human populations shows that the intensity of gBGC in Bacteria is comparable to what has been reported in mammals. We propose that gBGC is not restricted to sexual Eukaryotes but also widespread among Bacteria and could therefore be an ancestral feature of cellular organisms. We argue that if gBGC occurs in bacteria, it can account for previously unexplained observations, such as the apparent non-equilibrium of base substitution patterns and the heterogeneity of gene composition within bacterial genomes. Because gBGC produces patterns similar to positive selection, it is essential to take this process into account when studying the evolutionary forces at work in bacterial genomes. Classical population genetics models indicate that the efficiency of selection, and hence adaptation, depends on a number of non-selective factors, such as the size of a population or the intensity of recombination. In the last 10 years, evidence has accumulated that another mechanism called GC-Biased Gene Conversion (gBGC) can interfere with selection and even mimic its effects. This phenomenon, which arises from a particularity of the recombination machinery, was first thought to be restricted to sexual eukaryotic organisms. Here, we show that this mechanism probably exists in Bacteria and has a strong impact on their genome evolution. This discovery not only explains many previously unconnected features of bacterial genome evolution, but also highlights the importance of non-adaptive evolutionary processes in Bacteria.
Collapse
Affiliation(s)
- Florent Lassalle
- Université de Lyon, Lyon, France
- Université Lyon 1, Villeurbanne, France
- CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
- CNRS, UMR 5557, Ecologie Microbienne, Villeurbanne, France
- INRA, USC 1364, Ecologie Microbienne, Villeurbanne, France
- Ecole Normale Supérieure de Lyon, Lyon, France
| | - Séverine Périan
- Université de Lyon, Lyon, France
- Université Lyon 1, Villeurbanne, France
- CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| | - Thomas Bataillon
- Aarhus University, Bioinformatics Research Center, Århus Denmark1 Université de Lyon, Lyon, France
| | - Xavier Nesme
- Université de Lyon, Lyon, France
- Université Lyon 1, Villeurbanne, France
- CNRS, UMR 5557, Ecologie Microbienne, Villeurbanne, France
- INRA, USC 1364, Ecologie Microbienne, Villeurbanne, France
| | - Laurent Duret
- Université de Lyon, Lyon, France
- Université Lyon 1, Villeurbanne, France
- CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| | - Vincent Daubin
- Université de Lyon, Lyon, France
- Université Lyon 1, Villeurbanne, France
- CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
- * E-mail:
| |
Collapse
|
13
|
Crossovers are associated with mutation and biased gene conversion at recombination hotspots. Proc Natl Acad Sci U S A 2015; 112:2109-14. [PMID: 25646453 DOI: 10.1073/pnas.1416622112] [Citation(s) in RCA: 149] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Meiosis is a potentially important source of germline mutations, as sites of meiotic recombination experience recurrent double-strand breaks (DSBs). However, evidence for a local mutagenic effect of recombination from population sequence data has been equivocal, likely because mutation is only one of several forces shaping sequence variation. By sequencing large numbers of single crossover molecules obtained from human sperm for two recombination hotspots, we find direct evidence that recombination is mutagenic: Crossovers carry more de novo mutations than nonrecombinant DNA molecules analyzed for the same donors and hotspots. The observed mutations were primarily CG to TA transitions, with a higher frequency of transitions at CpG than non-CpGs sites. This enrichment of mutations at CpG sites at hotspots could predominate in methylated regions involving frequent single-stranded DNA processing as part of DSB repair. In addition, our data set provides evidence that GC alleles are preferentially transmitted during crossing over, opposing mutation, and shows that GC-biased gene conversion (gBGC) predominates over mutation in the sequence evolution of hotspots. These findings are consistent with the idea that gBGC could be an adaptation to counteract the mutational load of recombination.
Collapse
|
14
|
Berglund J, Quilez J, Arndt PF, Webster MT. Germline methylation patterns determine the distribution of recombination events in the dog genome. Genome Biol Evol 2014; 7:522-30. [PMID: 25527838 PMCID: PMC4350167 DOI: 10.1093/gbe/evu282] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The positive-regulatory domain containing nine gene, PRDM9, which strongly associates with the location of recombination events in several vertebrates, is inferred to be inactive in the dog genome. Here, we address several questions regarding the control of recombination and its influence on genome evolution in dogs. First, we address whether the association between CpG islands (CGIs) and recombination hotspots is generated by lack of methylation, GC-biased gene conversion (gBGC), or both. Using a genome-wide dog single nucleotide polymorphism data set and comparisons of the dog genome with related species, we show that recombination-associated CGIs have low CpG mutation rates, and that CpG mutation rate is negatively correlated with recombination rate genome wide, indicating that nonmethylation attracts the recombination machinery. We next use a neighbor-dependent model of nucleotide substitution to disentangle the effects of CpG mutability and gBGC and analyze the effects that loss of PRDM9 has on these rates. We infer that methylation patterns have been stable during canid genome evolution, but that dog CGIs have experienced a drastic increase in substitution rate due to gBGC, consistent with increased levels of recombination in these regions. We also show that gBGC is likely to have generated many new CGIs in the dog genome, but these mostly occur away from genes, whereas the number of CGIs in gene promoter regions has not increased greatly in recent evolutionary history. Recombination has a major impact on the distribution of CGIs that are detected in the dog genome due to the interaction between methylation and gBGC. The results indicate that germline methylation patterns are the main determinant of recombination rates in the absence of PRDM9.
Collapse
Affiliation(s)
- Jonas Berglund
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| | - Javier Quilez
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Matthew T Webster
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| |
Collapse
|
15
|
Robinson MC, Stone EA, Singh ND. Population genomic analysis reveals no evidence for GC-biased gene conversion in Drosophila melanogaster. Mol Biol Evol 2013; 31:425-33. [PMID: 24214536 DOI: 10.1093/molbev/mst220] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene conversion is the nonreciprocal exchange of genetic material between homologous chromosomes. Multiple lines of evidence from a variety of taxa strongly suggest that gene conversion events are biased toward GC-bearing alleles. However, in Drosophila, the data have largely been indirect and unclear, with some studies supporting the predictions of a GC-biased gene conversion model and other data showing contradictory findings. Here, we test whether gene conversion events are GC-biased in Drosophila melanogaster using whole-genome polymorphism and divergence data. Our results provide no support for GC-biased gene conversion and thus suggest that this process is unlikely to significantly contribute to patterns of polymorphism and divergence in this system.
Collapse
Affiliation(s)
- Matthew C Robinson
- Department of Biological Sciences, Program in Genetics, North Carolina State University
| | | | | |
Collapse
|
16
|
Munch K, Mailund T, Dutheil JY, Schierup MH. A fine-scale recombination map of the human-chimpanzee ancestor reveals faster change in humans than in chimpanzees and a strong impact of GC-biased gene conversion. Genome Res 2013; 24:467-74. [PMID: 24190946 PMCID: PMC3941111 DOI: 10.1101/gr.158469.113] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recombination is a major determinant of adaptive and nonadaptive evolution. Understanding how the recombination landscape has evolved in humans is thus key to the interpretation of human genomic evolution. Comparison of fine-scale recombination maps of human and chimpanzee has revealed large changes at fine genomic scales and conservation over large scales. Here we demonstrate how a fine-scale recombination map can be derived for the ancestor of human and chimpanzee, allowing us to study the changes that have occurred in human and chimpanzee since these species diverged. The map is produced from more than one million accurately determined recombination events. We find that this new recombination map is intermediate to the maps of human and chimpanzee but that the recombination landscape has evolved more rapidly in the human lineage than in the chimpanzee lineage. We use the map to show that recombination rate, through the effect of GC-biased gene conversion, is an even stronger determinant of base composition evolution than previously reported.
Collapse
Affiliation(s)
- Kasper Munch
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark
| | | | | | | |
Collapse
|
17
|
Abstract
Crossovers play mechanical roles in meiotic chromosome segregation, generate genetic diversity by producing new allelic combinations, and facilitate evolution by decoupling linked alleles. In almost every species studied to date, crossover distributions are dramatically nonuniform, differing among sexes and across genomes, with spatial variation in crossover rates on scales from whole chromosomes to subkilobase hotspots. To understand the regulatory forces dictating these heterogeneous distributions a crucial first step is the fine-scale characterization of crossover distributions. Here we define the wild-type distribution of crossovers along a region of the C. elegans chromosome II at unprecedented resolution, using recombinant chromosomes of 243 hermaphrodites and 226 males. We find that well-characterized large-scale domains, with little fine-scale rate heterogeneity, dominate this region's crossover landscape. Using the Gini coefficient as a summary statistic, we find that this region of the C. elegans genome has the least heterogeneous fine-scale crossover distribution yet observed among model organisms, and we show by simulation that the data are incompatible with a mammalian-type hotspot-rich landscape. The large-scale structural domains-the low-recombination center and the high-recombination arm-have a discrete boundary that we localize to a small region. This boundary coincides with the arm-center boundary defined both by nuclear-envelope attachment of DNA in somatic cells and GC content, consistent with proposals that these features of chromosome organization may be mechanical causes and evolutionary consequences of crossover recombination.
Collapse
|
18
|
Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes. PLoS Genet 2013; 9:e1003684. [PMID: 23966869 PMCID: PMC3744432 DOI: 10.1371/journal.pgen.1003684] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 06/14/2013] [Indexed: 01/03/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. Interpreting patterns of DNA sequence variation in the genomes of closely related species is critically important for understanding the causes and functional effects of nucleotide substitutions. Classical models describe patterns of substitution in terms of the fundamental forces of mutation, recombination, neutral drift, and natural selection. However, an entirely separate force, called GC-biased gene conversion (gBGC), also appears to have an important influence on substitution patterns in many species. gBGC is a recombination-associated evolutionary process that favors the fixation of strong (G/C) over weak (A/T) alleles. In mammals, gBGC is thought to promote variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations. However, its genome-wide influence remains poorly understood, in part because, it is difficult to incorporate gBGC into statistical models of evolution. In this paper, we describe a new evolutionary model that jointly describes the effects of selection and gBGC and apply it to the human and chimpanzee genomes. Our genome-wide predictions of gBGC tracts indicate that gBGC has been an important force in recent human evolution. Our publicly available computer program, called phastBias, and our genome-wide predictions will enable other researchers to consider gBGC in their analyses.
Collapse
Affiliation(s)
- John A. Capra
- Gladstone Institutes, University of California, San Francisco, California, United States of America
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Dennis Kostka
- Department of Developmental Biology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Katherine S. Pollard
- Gladstone Institutes, University of California, San Francisco, California, United States of America
- Institute for Human Genetics and Division of Biostatistics, University of California, San Francisco, California, United States of America
- * E-mail: (KSP); (AS)
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (KSP); (AS)
| |
Collapse
|
19
|
Xu K, Wang J, Elango N, Yi SV. The evolution of lineage-specific clusters of single nucleotide substitutions in the human genome. Mol Phylogenet Evol 2013; 69:276-85. [PMID: 23770436 DOI: 10.1016/j.ympev.2013.06.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2013] [Revised: 05/17/2013] [Accepted: 06/04/2013] [Indexed: 11/25/2022]
Abstract
Genomic regions harboring large numbers of human-specific single nucleotide substitutions are of significant interest since they are potential genomic foci underlying the evolution of human-specific traits as well as human adaptive evolution. Previous studies aimed to identify such regions either used pre-defined genomic locations such as coding sequences and conserved genomic elements or employed sliding window methods. Such approaches may miss clusters of substitutions occurring in regions other than those pre-defined locations, or not be able to distinguish human-specific clusters of substitutions from regions of generally high substitution rates. Here, we conduct a 'maximal segment' analysis to scan the whole human genome to identify clusters of human-specific substitutions that occurred since the divergence of the human and the chimpanzee genomes. This method can identify species-specific clusters of substitutions while not relying on pre-defined regions. We thus identify thousands of clusters of human-specific single nucleotide substitutions. The evolution of such clusters is driven by a combination of several different evolutionary processes including increased regional mutation rate, recombination-associated processes, and positive selection. These newly identified regions of human-specific substitution clusters include large numbers of previously identified human accelerated regions, and exhibit significant enrichments of genes involved in several developmental processes. Our study provides a useful tool to study the evolution of the human genome.
Collapse
Affiliation(s)
- Ke Xu
- School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
20
|
Mugal CF, Arndt PF, Ellegren H. Twisted signatures of GC-biased gene conversion embedded in an evolutionary stable karyotype. Mol Biol Evol 2013; 30:1700-12. [PMID: 23564940 PMCID: PMC3684855 DOI: 10.1093/molbev/mst067] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The genomes of many vertebrates show a characteristic heterogeneous distribution of GC content, the so-called GC isochore structure. The origin of isochores has been explained via the mechanism of GC-biased gene conversion (gBGC). However, although the isochore structure is declining in many mammalian genomes, the heterogeneity in GC content is being reinforced in the avian genome. Despite this discrepancy, which remains unexplained, examinations of individual substitution frequencies in mammals and birds are both consistent with the gBGC model of isochore evolution. On the other hand, a negative correlation between substitution and recombination rate found in the chicken genome is inconsistent with the gBGC model. It should therefore be important to consider along with gBGC other consequences of recombination on the origin and fate of mutations, as well as to account for relationships between recombination rate and other genomic features. We therefore developed an analytical model to describe the substitution patterns found in the chicken genome, and further investigated the relationships between substitution patterns and several genomic features in a rigorous statistical framework. Our analysis indicates that GC content itself, either directly or indirectly via interrelations to other genomic features, has an impact on the substitution pattern. Further, we suggest that this phenomenon is particularly visible in avian genomes due to their unusually low rate of chromosomal evolution. Because of this, interrelations between GC content and other genomic features are being reinforced, and are as such more pronounced in avian genomes as compared with other vertebrate genomes with a less stable karyotype.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | | | | |
Collapse
|
21
|
Lesecque Y, Mouchiroud D, Duret L. GC-biased gene conversion in yeast is specifically associated with crossovers: molecular mechanisms and evolutionary significance. Mol Biol Evol 2013; 30:1409-19. [PMID: 23505044 PMCID: PMC3649680 DOI: 10.1093/molbev/mst056] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a process associated with recombination that favors the transmission of GC alleles over AT alleles during meiosis. gBGC plays a major role in genome evolution in many eukaryotes. However, the molecular mechanisms of gBGC are still unknown. Different steps of the recombination process could potentially cause gBGC: the formation of double-strand breaks (DSBs), the invasion of the homologous or sister chromatid, and the repair of mismatches in heteroduplexes. To investigate these models, we analyzed a genome-wide data set of crossovers (COs) and noncrossovers (NCOs) in Saccharomyces cerevisiae. We demonstrate that the overtransmission of GC alleles is specific to COs and that it occurs among conversion tracts in which all alleles are converted from the same donor haplotype. Thus, gBGC results from a process that leads to long-patch repair. We show that gBGC is associated with longer tracts and that it is driven by the nature (GC or AT) of the alleles located at the extremities of the tract. These observations invalidate the hypotheses that gBGC is due to the base excision repair machinery or to a bias in DSB formation and suggest that in S. cerevisiae, gBGC is caused by the mismatch repair (MMR) system. We propose that the presence of nicks on both DNA strands during CO resolution could be the cause of the bias in MMR activity. Our observations are consistent with the hypothesis that gBGC is a nonadaptive consequence of a selective pressure to limit the mutation rate in mitotic cells.
Collapse
Affiliation(s)
- Yann Lesecque
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université de Lyon, Université Lyon 1, Villeurbanne, France
| | | | | |
Collapse
|
22
|
Günther T, Lampei C, Schmid KJ. Mutational bias and gene conversion affect the intraspecific nitrogen stoichiometry of the Arabidopsis thaliana transcriptome. Mol Biol Evol 2012; 30:561-8. [PMID: 23115321 DOI: 10.1093/molbev/mss249] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The transcriptome and proteome of Arabidopsis thaliana are reduced in nitrogen content when compared with other taxa, which may result from ecological nitrogen limitation. We hypothesized that if the A. thaliana transcriptome is selected for a low nitrogen content, nitrogen-reducing derived alleles of single nucleotide polymorphisms (SNPs) should segregate at higher frequencies than nitrogen-increasing alleles. This pattern should be stronger in populations with a larger effective population size (N(e)) if natural selection is more efficient in large than in small populations. We analyzed variation in the nitrogen content in the transcriptome of 80 natural accessions of A. thaliana. In contrast to our expectations, derived alleles increase the nitrogen content in all accessions, and there is a positive correlation between nitrogen difference and derived allele frequency, which is strongest with nonsynonymous SNPs (nsSNPs). Also, there is a positive correlation between nitrogen difference and N(e) that was mainly caused by nsSNPs. These observations led us to reject the hypothesis that the transcriptome of A. thaliana is currently under selection to reduce nitrogen content. Instead, we show that a change in nitrogen content is a side effect of interacting evolutionary factors that influence base composition and include mutational bias, purifying selection of functionally deleterious alleles, and GC-biased gene conversion. We provide strong evidence that GC-biased gene conversion may play an important role for base composition in the highly selfing plant A. thaliana.
Collapse
Affiliation(s)
- Torsten Günther
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
| | | | | |
Collapse
|
23
|
Lartillot N. Phylogenetic patterns of GC-biased gene conversion in placental mammals and the evolutionary dynamics of recombination landscapes. Mol Biol Evol 2012; 30:489-502. [PMID: 23079417 DOI: 10.1093/molbev/mss239] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a major evolutionary force shaping genomic nucleotide landscapes, distorting the estimation of the strength of selection, and having potentially deleterious effects on genome-wide fitness. Yet, a global quantitative picture, at large evolutionary scale, of the relative strength of gBGC compared with selection and random drift is still lacking. Furthermore, owing to its dependence on the local recombination rate, gBGC results in modulations of the substitution patterns along genomes and across time which, if correctly interpreted, may yield quantitative insights into the long-term evolutionary dynamics of recombination landscapes. Deriving a model of the substitution process at putatively neutral nucleotide positions from population-genetics arguments, and accounting for among-lineage and among-gene effects, we propose a reconstruction of the variation in gBGC intensity at the scale of placental mammals, and of its scaling with body-size and karyotypic traits. Our results are compatible with a simple population genetics model relating gBGC to effective population size and recombination rate. In addition, among-gene variation and phylogenetic patterns of exon-specific levels of gBGC reveal the presence of rugged recombination landscapes, and suggest that short-lived recombination hot-spots are a general feature of placentals. Across placental mammals, variation in gBGC strength spans two orders of magnitude, at its lowest in apes, strongest in lagomorphs, microbats or tenrecs, and near or above the nearly neutral threshold in most other lineages. Combined with among-gene variation, such high levels of biased gene conversion are likely to significantly impact midly selected positions, and to represent a substantial mutation load. Altogether, our analysis suggests a more important role of gBGC in placental genome evolution, compared with what could have been anticipated from studies conducted in anthropoid primates.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
24
|
Lartillot N. Interaction between selection and biased gene conversion in mammalian protein-coding sequence evolution revealed by a phylogenetic covariance analysis. Mol Biol Evol 2012; 30:356-68. [PMID: 23024185 DOI: 10.1093/molbev/mss231] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
According to the nearly-neutral model, variation in long-term effective population size among species should result in correlated variation in the ratio of nonsynonymous over synonymous substitution rates (dN/dS). Previous empirical investigations in mammals have been consistent with this prediction, suggesting an important role for nearly-neutral effects on protein-coding sequence evolution. GC-biased gene conversion (gBGC), on the other hand, is increasingly recognized as a major evolutionary force shaping genome nucleotide composition. When sufficiently strong compared with random drift, gBGC may significantly interfere with a nearly-neutral regime and impact dN/dS in a complex manner. Here, we investigate the phylogenetic correlations between dN/dS, the equilibrium GC composition (GC*), and several life-history and karyotypic traits in placental mammals. We show that the equilibrium GC composition decreases with body mass and increases with the number of chromosomes, suggesting a modulation of the strength of biased gene conversion due to changes in effective population size and genome-wide recombination rate. The variation in dN/dS is complex and only partially fits the prediction of the nearly-neutral theory. However, specifically restricting estimation of the dN/dS ratio on GC-conservative transversions, which are immune from gBGC, results in correlations that are more compatible with a nearly-neutral interpretation. Our investigation indicates the presence of complex interactions between selection and biased gene conversion and suggests that further mechanistic development is warranted, to tease out mutation, selection, drift, and conversion.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
25
|
Pessia E, Popa A, Mousset S, Rezvoy C, Duret L, Marais GAB. Evidence for widespread GC-biased gene conversion in eukaryotes. Genome Biol Evol 2012; 4:675-82. [PMID: 22628461 PMCID: PMC5635611 DOI: 10.1093/gbe/evs052] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a process that tends to increase the GC content of recombining DNA over evolutionary time and is thought to explain the evolution of GC content in mammals and yeasts. Evidence for gBGC outside these two groups is growing but is still limited. Here, we analyzed 36 completely sequenced genomes representing four of the five major groups in eukaryotes (Unikonts, Excavates, Chromalveolates and Plantae). gBGC was investigated by directly comparing GC content and recombination rates in species where recombination data are available, that is, half of them. To study all species of our dataset, we used chromosome size as a proxy for recombination rate and compared it with GC content. Among the 17 species showing a significant relationship between GC content and chromosome size, 15 are consistent with the predictions of the gBGC model. Importantly, the species showing a pattern consistent with gBGC are found in all the four major groups of eukaryotes studied, which suggests that gBGC may be widespread in eukaryotes.
Collapse
Affiliation(s)
- Eugénie Pessia
- Université Lyon 1, Centre National de la Recherche Scientifique, UMR5558, Laboratoire de Biométrie et Biologie évolutive, Villeurbanne, Cedex, France
| | | | | | | | | | | |
Collapse
|
26
|
Voelker RB, Erkelenz S, Reynoso V, Schaal H, Berglund JA. Frequent gain and loss of intronic splicing regulatory elements during the evolution of vertebrates. Genome Biol Evol 2012; 4:659-74. [PMID: 22619362 PMCID: PMC3606033 DOI: 10.1093/gbe/evs051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Splicing regulatory elements (SREs) are sequences bound by proteins that influence splicing of nearby splice sites. Constitutively spliced introns have evolved to utilize many different splicing factors. The evolutionary processes that influenced which splicing factors are used for splicing of individual introns are generally unclear. We demonstrate that in the lineage that gave rise to mammals, many introns lost U-rich sequences and gained G-rich sequences, both of which resemble known SREs. The apparent conversion of U-rich to G-rich SREs suggests that the associated splicing factors are functionally equivalent. In support of this we demonstrated that U-rich and G-rich SREs are both capable of promoting splicing of an SRE-dependent splicing reporter. Furthermore, we demonstrate, using the heterologous MS2 tethering system (bacterial MS2 coat fusion-protein and its RNA stem-loop binding site), that both the U-rich SRE-binding protein (TIA1) and the G-rich SRE-binding protein (HNRNPF) can promote splicing of the same intron. We also observed that gain of G-rich SREs is significantly associated with G/C-rich genomic isochores, suggesting that gain or loss of SREs was driven by the same processes that ultimately resulted in the formation of mammalian genomic isochores. We propose the following model for the gain and loss of mammalian SREs. Ancestral U-rich SREs located in genomic regions that were experiencing high rates of A/T to G/C conversion would have suffered frequent deleterious mutations. However, this same process resulted in increased formation of functionally equivalent G-rich SREs, and acquisition of new G-rich SREs decreased purifying selection on the U-rich SREs, which were then free to decay.
Collapse
Affiliation(s)
- Rodger B Voelker
- Institute of Molecular Biology, Department of Chemistry, University of Oregon, OR, USA
| | | | | | | | | |
Collapse
|
27
|
Webster MT, Hurst LD. Direct and indirect consequences of meiotic recombination: implications for genome evolution. Trends Genet 2011; 28:101-9. [PMID: 22154475 DOI: 10.1016/j.tig.2011.11.002] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 11/08/2011] [Accepted: 11/09/2011] [Indexed: 12/23/2022]
Abstract
There is considerable variation within eukaryotic genomes in the local rate of crossing over. Why is this and what effect does it have on genome evolution? On the genome scale, it is known that by shuffling alleles, recombination increases the efficacy of selection. By contrast, the extent to which differences in the recombination rate modulate the efficacy of selection between genomic regions is unclear. Recombination also has direct consequences on the origin and fate of mutations: biased gene conversion and other forms of meiotic drive promote the fixation of mutations in a similar way to selection, and recombination itself may be mutagenic. Consideration of both the direct and indirect effects of recombination is necessary to understand why its rate is so variable and for correct interpretation of patterns of genome evolution.
Collapse
Affiliation(s)
- Matthew T Webster
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | | |
Collapse
|
28
|
Kostka D, Hubisz MJ, Siepel A, Pollard KS. The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome. Mol Biol Evol 2011; 29:1047-57. [PMID: 22075116 PMCID: PMC3278478 DOI: 10.1093/molbev/msr279] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.
Collapse
Affiliation(s)
- Dennis Kostka
- Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | | | | | | |
Collapse
|
29
|
Katzman S, Capra JA, Haussler D, Pollard KS. Ongoing GC-biased evolution is widespread in the human genome and enriched near recombination hot spots. Genome Biol Evol 2011; 3:614-26. [PMID: 21697099 PMCID: PMC3157837 DOI: 10.1093/gbe/evr058] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Fast evolving regions of many metazoan genomes show a bias toward substitutions that change weak (A,T) into strong (G,C) base pairs. Single-nucleotide polymorphisms (SNPs) do not share this pattern, suggesting that it results from biased fixation rather than biased mutation. Supporting this hypothesis, analyses of polymorphism in specific regions of the human genome have identified a positive correlation between weak to strong (W→S) SNPs and derived allele frequency (DAF), suggesting that SNPs become increasingly GC biased over time, especially in regions of high recombination. Using polymorphism data generated by the 1000 Genomes Project from 179 individuals from 4 human populations, we evaluated the extent and distribution of ongoing GC-biased evolution in the human genome. We quantified GC fixation bias by comparing the DAFs of W→S mutations and S→W mutations using a Mann-Whitney U test. Genome-wide, W→S SNPs have significantly higher DAFs than S→W SNPs. This pattern is widespread across the human genome but varies in magnitude along the chromosomes. We found extreme GC-biased evolution in neighborhoods of recombination hot spots, a significant correlation between GC bias and recombination rate, and an inverse correlation between GC bias and chromosome arm length. These findings demonstrate the presence of ongoing fixation bias favoring G and C alleles throughout the human genome and suggest that the bias is caused by a recombination-associated process, such as GC-biased gene conversion.
Collapse
Affiliation(s)
- Sol Katzman
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, USA
| | | | | | | |
Collapse
|