1
|
Poszewiecka B, Gogolewski K, Karolak JA, Stankiewicz P, Gambin A. PhaseDancer: a novel targeted assembler of segmental duplications unravels the complexity of the human chromosome 2 fusion going from 48 to 46 chromosomes in hominin evolution. Genome Biol 2023; 24:205. [PMID: 37697406 PMCID: PMC10496407 DOI: 10.1186/s13059-023-03022-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 07/25/2023] [Indexed: 09/13/2023] Open
Abstract
Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined evolutionary scenarios. PhaseDancer enables extension of the incomplete complex SD-rich subtelomeric regions of Great Ape chromosomes orthologous to the human chromosome 2 (HSA2) fusion site, informing a model of HSA2 formation and unravelling the evolution of human and Great Ape genomes.
Collapse
Affiliation(s)
- Barbara Poszewiecka
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Krzysztof Gogolewski
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Justyna A. Karolak
- Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, 77030 Houston, TX USA
- Chair and Department of Genetics and Pharmaceutical Microbiology, Poznan University of Medical Sciences, 60-806 Poznan, Poland
| | - Paweł Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, 77030 Houston, TX USA
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
2
|
Hu W, Hao Z, Du P, Di Vincenzo F, Manzi G, Cui J, Fu YX, Pan YH, Li H. Genomic inference of a severe human bottleneck during the Early to Middle Pleistocene transition. Science 2023; 381:979-984. [PMID: 37651513 DOI: 10.1126/science.abq7487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/11/2023] [Indexed: 09/02/2023]
Abstract
Population size history is essential for studying human evolution. However, ancient population size history during the Pleistocene is notoriously difficult to unravel. In this study, we developed a fast infinitesimal time coalescent process (FitCoal) to circumvent this difficulty and calculated the composite likelihood for present-day human genomic sequences of 3154 individuals. Results showed that human ancestors went through a severe population bottleneck with about 1280 breeding individuals between around 930,000 and 813,000 years ago. The bottleneck lasted for about 117,000 years and brought human ancestors close to extinction. This bottleneck is congruent with a substantial chronological gap in the available African and Eurasian fossil record. Our results provide new insights into our ancestry and suggest a coincident speciation event.
Collapse
Affiliation(s)
- Wangjie Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
| | - Ziqian Hao
- College of Artificial Intelligence and Big Data for Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Pengyuan Du
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- College of Artificial Intelligence and Big Data for Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | | | - Giorgio Manzi
- Department of Environmental Biology, Sapienza University of Rome, Rome, Italy
| | - Jialong Cui
- Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
| | - Yun-Xin Fu
- Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
- Key Laboratory for Conservation and Utilization of Bioresources, Yunnan University, Kunming, China
| | - Yi-Hsuan Pan
- Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
| | - Haipeng Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
3
|
Poszewiecka B, Gogolewski K, Stankiewicz P, Gambin A. Revised time estimation of the ancestral human chromosome 2 fusion. BMC Genomics 2022; 23:616. [PMID: 36008753 PMCID: PMC9413910 DOI: 10.1186/s12864-022-08828-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 11/24/2022] Open
Abstract
Background The reduction of the chromosome number from 48 in the Great Apes to 46 in modern humans is thought to result from the end-to-end fusion of two ancestral non-human primate chromosomes forming the human chromosome 2 (HSA2). Genomic signatures of this event are the presence of inverted telomeric repeats at the HSA2 fusion site and a block of degenerate satellite sequences that mark the remnants of the ancestral centromere. It has been estimated that this fusion arose up to 4.5 million years ago (Mya). Results We have developed an enhanced algorithm for the detection and efficient counting of the locally over-represented weak-to-strong (AT to GC) substitutions. By analyzing the enrichment of these substitutions around the fusion site of HSA2 we estimated its formation time at 0.9 Mya with a 95% confidence interval of 0.4-1.5 Mya. Additionally, based on the statistics derived from our algorithm, we have reconstructed the evolutionary distances among the Great Apes (Hominoidea). Conclusions Our results shed light on the HSA2 fusion formation and provide a novel computational alternative for the estimation of the speciation chronology.
Collapse
Affiliation(s)
| | | | - Paweł Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, US
| | - Anna Gambin
- Institute of Informatics, Warsaw University, Warsaw, Poland
| |
Collapse
|
4
|
Ho AT, Hurst LD. Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection. PLoS Biol 2022; 20:e3001588. [PMID: 35550630 PMCID: PMC9129041 DOI: 10.1371/journal.pbio.3001588] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 05/24/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open
Abstract
The assumption that conservation of sequence implies the action of purifying selection is central to diverse methodologies to infer functional importance. GC-biased gene conversion (gBGC), a meiotic mismatch repair bias strongly favouring GC over AT, can in principle mimic the action of selection, this being thought to be especially important in mammals. As mutation is GC→AT biased, to demonstrate that gBGC does indeed cause false signals requires evidence that an AT-rich residue is selectively optimal compared to its more GC-rich allele, while showing also that the GC-rich alternative is conserved. We propose that mammalian stop codon evolution provides a robust test case. Although in most taxa TAA is the optimal stop codon, TGA is both abundant and conserved in mammalian genomes. We show that this mammalian exceptionalism is well explained by gBGC mimicking purifying selection and that TAA is the selectively optimal codon. Supportive of gBGC, we observe (i) TGA usage trends are consistent at the focal stop codon and elsewhere (in UTR sequences); (ii) that higher TGA usage and higher TAA→TGA substitution rates are predicted by a high recombination rate; and (iii) across species the difference in TAA <-> TGA substitution rates between GC-rich and GC-poor genes is largest in genomes that possess higher between-gene GC variation. TAA optimality is supported both by enrichment in highly expressed genes and trends associated with effective population size. High TGA usage and high TAA→TGA rates in mammals are thus consistent with gBGC’s predicted ability to “drive” deleterious mutations and supports the hypothesis that sequence conservation need not be indicative of purifying selection. A general trend for GC-rich trinucleotides to reside at frequencies far above their mutational equilibrium in high recombining domains supports the generality of these results.
Collapse
Affiliation(s)
- Alexander Thomas Ho
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- * E-mail:
| | | |
Collapse
|
5
|
Huttener R, Thorrez L, Veld TI, Granvik M, Van Lommel L, Waelkens E, Derua R, Lemaire K, Goyvaerts L, De Coster S, Buyse J, Schuit F. Sequencing refractory regions in bird genomes are hotspots for accelerated protein evolution. BMC Ecol Evol 2021; 21:176. [PMID: 34537008 PMCID: PMC8449477 DOI: 10.1186/s12862-021-01905-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/31/2021] [Indexed: 11/29/2022] Open
Abstract
Background Approximately 1000 protein encoding genes common for vertebrates are still unannotated in avian genomes. Are these genes evolutionary lost or are they not yet found for technical reasons? Using genome landscapes as a tool to visualize large-scale regional effects of genome evolution, we reexamined this question. Results On basis of gene annotation in non-avian vertebrate genomes, we established a list of 15,135 common vertebrate genes. Of these, 1026 were not found in any of eight examined bird genomes. Visualizing regional genome effects by our sliding window approach showed that the majority of these "missing" genes can be clustered to 14 regions of the human reference genome. In these clusters, an additional 1517 genes (often gene fragments) were underrepresented in bird genomes. The clusters of “missing” genes coincided with regions of very high GC content, particularly in avian genomes, making them “hidden” because of incomplete sequencing. Moreover, proteins encoded by genes in these sequencing refractory regions showed signs of accelerated protein evolution. As a proof of principle for this idea we experimentally characterized the mRNA and protein products of four "hidden" bird genes that are crucial for energy homeostasis in skeletal muscle: ALDOA, ENO3, PYGM and SLC2A4. Conclusions A least part of the “missing” genes in bird genomes can be attributed to an artifact caused by the difficulty to sequence regions with extreme GC% (“hidden” genes). Biologically, these “hidden” genes are of interest as they encode proteins that evolve more rapidly than the genome wide average. Finally we show that four of these “hidden” genes encode key proteins for energy metabolism in flight muscle. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01905-7.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium.,Tissue Engineering Laboratory, Department of Development and Regeneration, KU Leuven Campus Kulak, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - L Van Lommel
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - E Waelkens
- Laboratory of Protein Phosphorylation and Proteomics, KU Leuven, Leuven, Belgium
| | - R Derua
- Laboratory of Protein Phosphorylation and Proteomics, KU Leuven, Leuven, Belgium
| | - K Lemaire
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - L Goyvaerts
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - S De Coster
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium
| | - J Buyse
- Laboratory of Livestock Physiology, Department of Biosystems, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, bus 901, 3000, Leuven, Belgium.
| |
Collapse
|
6
|
Huttener R, Thorrez L, Veld TI, Potter B, Baele G, Granvik M, Van Lommel L, Schuit F. Regional effect on the molecular clock rate of protein evolution in Eutherian and Metatherian genomes. BMC Ecol Evol 2021; 21:153. [PMID: 34348656 PMCID: PMC8336415 DOI: 10.1186/s12862-021-01882-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 07/22/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Different types of proteins diverge at vastly different rates. Moreover, the same type of protein has been observed to evolve with different rates in different phylogenetic lineages. In the present study we measured the rates of protein evolution in Eutheria (placental mammals) and Metatheria (marsupials) on a genome-wide basis and we propose that the gene position in the genome landscape has an important influence on the rate of protein divergence. RESULTS We analyzed a protein-encoding gene set (n = 15,727) common to 16 mammals (12 Eutheria and 4 Metatheria). Using sliding windows that averaged regional effects of protein divergence we constructed landscapes in which strong and lineage-specific regional effects were seen on the molecular clock rate of protein divergence. Within each lineage, the relatively high rates were preferentially found in subtelomeric chromosomal regions. Such regions were observed to contain important and well-studied loci for fetal growth, uterine function and the generation of diversity in the adaptive repertoire of immunoglobulins. CONCLUSIONS A genome landscape approach visualizes lineage-specific regional differences between Eutherian and Metatherian rates of protein evolution. This phenomenon of chromosomal position is a new element that explains at least part of the lineage-specific effects and differences between proteins on the molecular clock rates.
Collapse
Affiliation(s)
- Raf Huttener
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Lieven Thorrez
- Tissue Engineering Laboratory, Department of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - Thomas In't Veld
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Barney Potter
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Mikaela Granvik
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Leentje Van Lommel
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Frans Schuit
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium.
| |
Collapse
|
7
|
Charlesworth D, Zhang Y, Bergero R, Graham C, Gardner J, Yong L. Using GC Content to Compare Recombination Patterns on the Sex Chromosomes and Autosomes of the Guppy, Poecilia reticulata, and Its Close Outgroup Species. Mol Biol Evol 2021; 37:3550-3562. [PMID: 32697821 DOI: 10.1093/molbev/msaa187] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Genetic and physical mapping of the guppy (Poecilia reticulata) have shown that recombination patterns differ greatly between males and females. Crossover events occur evenly across the chromosomes in females, but in male meiosis they are restricted to the tip furthest from the centromere of each chromosome, creating very high recombination rates per megabase, as in pseudoautosomal regions of mammalian sex chromosomes. We used GC content to indirectly infer recombination patterns on guppy chromosomes, based on evidence that recombination is associated with GC-biased gene conversion, so that genome regions with high recombination rates should be detectable by high GC content. We used intron sequences and third positions of codons to make comparisons between sequences that are matched, as far as possible, and are all probably under weak selection. Almost all guppy chromosomes, including the sex chromosome (LG12), have very high GC values near their assembly ends, suggesting high recombination rates due to strong crossover localization in male meiosis. Our test does not suggest that the guppy XY pair has stronger crossover localization than the autosomes, or than the homologous chromosome in the close relative, the platyfish (Xiphophorus maculatus). We therefore conclude that the guppy XY pair has not recently undergone an evolutionary change to a different recombination pattern, or reduced its crossover rate, but that the guppy evolved Y-linkage due to acquiring a male-determining factor that also conferred the male crossover pattern. We also identify the centromere ends of guppy chromosomes, which were not determined in the genome assembly.
Collapse
Affiliation(s)
- Deborah Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Yexin Zhang
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Roberta Bergero
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Chay Graham
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Jim Gardner
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Lengxob Yong
- Centre for Ecology and Conservation, University of Exeter, Falmouth, Cornwall, United Kingdom
| |
Collapse
|
8
|
Abstract
Recombination increases the local GC-content in genomic regions through GC-biased gene conversion (gBGC). The recent discovery of a large genomic region with extreme GC-content in the fat sand rat Psammomys obesus provides a model to study the effects of gBGC on chromosome evolution. Here, we compare the GC-content and GC-to-AT substitution patterns across protein-coding genes of four gerbil species and two murine rodents (mouse and rat). We find that the known high-GC region is present in all the gerbils, and is characterized by high substitution rates for all mutational categories (AT-to-GC, GC-to-AT, and GC-conservative) both at synonymous and nonsynonymous sites. A higher AT-to-GC than GC-to-AT rate is consistent with the high GC-content. Additionally, we find more than 300 genes outside the known region with outlying values of AT-to-GC synonymous substitution rates in gerbils. Of these, over 30% are organized into at least 17 large clusters observable at the megabase-scale. The unusual GC-skewed substitution pattern suggests the evolution of genomic regions with very high recombination rates in the gerbil lineage, which can lead to a runaway increase in GC-content. Our results imply that rapid evolution of GC-content is possible in mammals, with gerbil species providing a powerful model to study the mechanisms of gBGC.
Collapse
Affiliation(s)
- Rodrigo Pracana
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | | - John F Mulley
- School of Natural Sciences, Bangor University, Bangor, Gwynedd, United Kingdom
| | | |
Collapse
|
9
|
Rahnama M, Novikova O, Starnes JH, Zhang S, Chen L, Farman ML. Transposon-mediated telomere destabilization: a driver of genome evolution in the blast fungus. Nucleic Acids Res 2020; 48:7197-7217. [PMID: 32558886 PMCID: PMC7367193 DOI: 10.1093/nar/gkaa287] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 04/03/2020] [Accepted: 04/14/2020] [Indexed: 01/01/2023] Open
Abstract
The fungus Magnaporthe oryzae causes devastating diseases of crops, including rice and wheat, and in various grasses. Strains from ryegrasses have highly unstable chromosome ends that undergo frequent rearrangements, and this has been associated with the presence of retrotransposons (Magnaporthe oryzae Telomeric Retrotransposons-MoTeRs) inserted in the telomeres. The objective of the present study was to determine the mechanisms by which MoTeRs promote telomere instability. Targeted cloning, mapping, and sequencing of parental and novel telomeric restriction fragments (TRFs), along with MinION sequencing of genomic DNA allowed us to document the precise molecular alterations underlying 109 newly-formed TRFs. These included truncations of subterminal rDNA sequences; acquisition of MoTeR insertions by 'plain' telomeres; insertion of the MAGGY retrotransposons into MoTeR arrays; MoTeR-independent expansion and contraction of subtelomeric tandem repeats; and a variety of rearrangements initiated through breaks in interstitial telomere tracts that are generated during MoTeR integration. Overall, we estimate that alterations occurred in approximately sixty percent of chromosomes (one in three telomeres) analyzed. Most importantly, we describe an entirely new mechanism by which transposons can promote genomic alterations at exceptionally high frequencies, and in a manner that can promote genome evolution while minimizing collateral damage to overall chromosome architecture and function.
Collapse
Affiliation(s)
- Mostafa Rahnama
- Department of Plant Pathology, University of Kentucky, 1405 Veteran's Dr., Lexington, KY 40546, USA
| | - Olga Novikova
- Department of Plant Pathology, University of Kentucky, 1405 Veteran's Dr., Lexington, KY 40546, USA
| | - John H Starnes
- Department of Plant Pathology, University of Kentucky, 1405 Veteran's Dr., Lexington, KY 40546, USA
| | - Shouan Zhang
- Department of Plant Pathology, University of Kentucky, 1405 Veteran's Dr., Lexington, KY 40546, USA
| | - Li Chen
- Department of Plant Pathology, University of Kentucky, 1405 Veteran's Dr., Lexington, KY 40546, USA
| | - Mark L Farman
- Department of Plant Pathology, University of Kentucky, 1405 Veteran's Dr., Lexington, KY 40546, USA
| |
Collapse
|
10
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
11
|
Rousselle M, Laverré A, Figuet E, Nabholz B, Galtier N. Influence of Recombination and GC-biased Gene Conversion on the Adaptive and Nonadaptive Substitution Rate in Mammals versus Birds. Mol Biol Evol 2019; 36:458-471. [PMID: 30590692 PMCID: PMC6389324 DOI: 10.1093/molbev/msy243] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Recombination is expected to affect functional sequence evolution in several ways. On the one hand, recombination is thought to improve the efficiency of multilocus selection by dissipating linkage disequilibrium. On the other hand, natural selection can be counteracted by recombination-associated transmission distorters such as GC-biased gene conversion (gBGC), which tends to promote G and C alleles irrespective of their fitness effect in high-recombining regions. It has been suggested that gBGC might impact coding sequence evolution in vertebrates, and particularly the ratio of nonsynonymous to synonymous substitution rates (dN/dS). However, distinctive gBGC patterns have been reported in mammals and birds, maybe reflecting the documented contrasts in evolutionary dynamics of recombination rate between these two taxa. Here, we explore how recombination and gBGC affect coding sequence evolution in mammals and birds by analyzing proteome-wide data in six species of Galloanserae (fowls) and six species of catarrhine primates. We estimated the dN/dS ratio and rates of adaptive and nonadaptive evolution in bins of genes of increasing recombination rate, separately analyzing AT → GC, GC → AT, and G ↔ C/A ↔ T mutations. We show that in both taxa, recombination and gBGC entail a decrease in dN/dS. Our analysis indicates that recombination enhances the efficiency of purifying selection by lowering Hill-Robertson effects, whereas gBGC leads to an overestimation of the adaptive rate of AT → GC mutations. Finally, we report a mutagenic effect of recombination, which is independent of gBGC.
Collapse
Affiliation(s)
| | - Alexandre Laverré
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Emeric Figuet
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Benoit Nabholz
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Nicolas Galtier
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| |
Collapse
|
12
|
Gossmann TI, Bockwoldt M, Diringer L, Schwarz F, Schumann VF. Evidence for Strong Fixation Bias at 4-fold Degenerate Sites Across Genes in the Great Tit Genome. Front Ecol Evol 2018. [DOI: 10.3389/fevo.2018.00203] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
13
|
Kalesinskas L, Cudone E, Fofanov Y, Putonti C. S-plot2: Rapid Visual and Statistical Analysis of Genomic Sequences. Evol Bioinform Online 2018; 14:1176934318797354. [PMID: 30245567 PMCID: PMC6144591 DOI: 10.1177/1176934318797354] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Accepted: 08/08/2018] [Indexed: 12/12/2022] Open
Abstract
With the daily release of data from whole genome sequencing projects, tools to facilitate comparative studies are hard-pressed to keep pace. Graphical software solutions can readily recognize synteny by measuring similarities between sequences. Nevertheless, regions of dissimilarity can prove to be equally informative; these regions may harbor genes acquired via lateral gene transfer (LGT), signify gene loss or gain, or include coding regions under strong selection. Previously, we developed the software S-plot. This tool employed an alignment-free approach for comparing bacterial genomes and generated a heatmap representing the genomes’ similarities and dissimilarities in nucleotide usage. In prior studies, this tool proved valuable in identifying genome rearrangements as well as exogenous sequences acquired via LGT in several bacterial species. Herein, we present the next generation of this tool, S-plot2. Similar to its predecessor, S-plot2 creates an interactive, 2-dimensional heatmap capturing the similarities and dissimilarities in nucleotide usage between genomic sequences (partial or complete). This new version, however, includes additional metrics for analysis, new reporting options, and integrated BLAST query functionality for the user to interrogate regions of interest. Furthermore, S-plot2 can evaluate larger sequences, including whole eukaryotic chromosomes. To illustrate some of the applications of the tool, 2 case studies are presented. The first examines strain-specific variation across the Pseudomonas aeruginosa genome and strain-specific LGT events. In the second case study, corresponding human, chimpanzee, and rhesus macaque autosomes were studied and lineage specific contributions to divergence were estimated. S-plot2 provides a means to both visually and quantitatively compare nucleotide sequences, from microbial genomes to eukaryotic chromosomes. The case studies presented illustrate just 2 potential applications of the tool, highlighting its capability to identify and investigate the variation in molecular divergence rates across sequences. S-plot2 is freely available through https://bitbucket.org/lkalesinskas/splot and is supported on the Linux and MS Windows operating systems.
Collapse
Affiliation(s)
- Laurynas Kalesinskas
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.,Department of Biology, Loyola University Chicago, Chicago, IL, USA
| | - Evan Cudone
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.,Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, USA
| | - Yuriy Fofanov
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch at Galveston, Galveston, TX, USA
| | - Catherine Putonti
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.,Department of Biology, Loyola University Chicago, Chicago, IL, USA.,Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
14
|
Inferring the Probability of the Derived vs. the Ancestral Allelic State at a Polymorphic Site. Genetics 2018; 209:897-906. [PMID: 29769282 PMCID: PMC6028244 DOI: 10.1534/genetics.118.301120] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 05/14/2018] [Indexed: 12/03/2022] Open
Abstract
It is known that the allele ancestral to the variation at a polymorphic site cannot be assigned with certainty, and that the most frequently used method to assign the ancestral state—maximum parsimony—is prone to misinference. Estimates of counts of sites that have a certain number of copies of the derived allele in a sample (the unfolded site frequency spectrum, uSFS) made by parsimony are therefore also biased. We previously developed a maximum likelihood method to estimate the uSFS for a focal species using information from two outgroups while assuming simple models of nucleotide substitution. Here, we extend this approach to allow multiple outgroups (implemented for three outgroups), potentially any phylogenetic tree topology, and more complex models of nucleotide substitution. We find, however, that two outgroups and the Kimura two-parameter model are adequate for uSFS inference in most cases. We show that using parsimony to infer the ancestral state at a specific site seriously breaks down in two situations. The first is where the outgroups provide no information about the ancestral state of variation in the focal species. In this case, nucleotide variation will be underestimated if such sites are excluded. The second is where the minor allele in the focal species agrees with the allelic state of the outgroups. In this situation, parsimony tends to overestimate the probability of the major allele being derived, because it fails to account for the fact that sites with a high frequency of the derived allele tend to be rare. We present a method that corrects this deficiency and is capable of providing nearly unbiased estimates of ancestral state probabilities on a site-by-site basis and the uSFS.
Collapse
|
15
|
Lartillot N, Phillips MJ, Ronquist F. A mixed relaxed clock model. Philos Trans R Soc Lond B Biol Sci 2017; 371:rstb.2015.0132. [PMID: 27325829 PMCID: PMC4920333 DOI: 10.1098/rstb.2015.0132] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/29/2016] [Indexed: 12/13/2022] Open
Abstract
Over recent years, several alternative relaxed clock models have been proposed in the context of Bayesian dating. These models fall in two distinct categories: uncorrelated and autocorrelated across branches. The choice between these two classes of relaxed clocks is still an open question. More fundamentally, the true process of rate variation may have both long-term trends and short-term fluctuations, suggesting that more sophisticated clock models unfolding over multiple time scales should ultimately be developed. Here, a mixed relaxed clock model is introduced, which can be mechanistically interpreted as a rate variation process undergoing short-term fluctuations on the top of Brownian long-term trends. Statistically, this mixed clock represents an alternative solution to the problem of choosing between autocorrelated and uncorrelated relaxed clocks, by proposing instead to combine their respective merits. Fitting this model on a dataset of 105 placental mammals, using both node-dating and tip-dating approaches, suggests that the two pure clocks, Brownian and white noise, are rejected in favour of a mixed model with approximately equal contributions for its uncorrelated and autocorrelated components. The tip-dating analysis is particularly sensitive to the choice of the relaxed clock model. In this context, the classical pure Brownian relaxed clock appears to be overly rigid, leading to biases in divergence time estimation. By contrast, the use of a mixed clock leads to more recent and more reasonable estimates for the crown ages of placental orders and superorders. Altogether, the mixed clock introduced here represents a first step towards empirically more adequate models of the patterns of rate variation across phylogenetic trees.This article is part of the themed issue 'Dating species divergences using rocks and clocks'.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France
| | - Matthew J Phillips
- School of Earth, Environmental and Biological Sciences, Queensland University of Technology, Brisbane, Australia
| | - Fredrik Ronquist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, 104 05 Stockholm, Sweden
| |
Collapse
|
16
|
Stankiewicz P. One pedigree we all may have come from - did Adam and Eve have the chromosome 2 fusion? Mol Cytogenet 2016; 9:72. [PMID: 27708712 PMCID: PMC5037601 DOI: 10.1186/s13039-016-0283-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 09/16/2016] [Indexed: 11/18/2022] Open
Abstract
Background In contrast to Great Apes, who have 48 chromosomes, modern humans and likely Neandertals and Denisovans have and had, respectively, 46 chromosomes. The reduction in chromosome number was caused by the head-to-head fusion of two ancestral chromosomes to form human chromosome 2 (HSA2) and may have contributed to the reproductive barrier with Great Apes. Results Next generation sequencing and molecular clock analyses estimated that this fusion arose prior to our last common ancestor with Neandertal and Denisovan hominins ~ 0.74 - 4.5 million years ago. Hypotheses I propose that, unlike recurrent Robertsonian translocations in humans, the HSA2 fusion was a single nonrecurrent event that spread through a small polygamous clan population bottleneck. Its heterozygous to homozygous conversion, fixation, and accumulation in the succeeding populations was likely facilitated by an evolutionary advantage through the genomic loss rather than deregulation of expression of the gene(s) flanking the HSA2 fusion site at 2q13. Conclusions The origin of HSA2 might have been a critical evolutionary event influencing higher cognitive functions in various early subspecies of hominins. Next generation sequencing of Homo heidelbergensis and Homo erectus genomes and complete reconstruction of DNA sequence of the orthologous subtelomeric chromosomes in Great Apes should enable more precise timing of HSA2 formation and better understanding of its evolutionary consequences.
Collapse
Affiliation(s)
- Paweł Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Rm ABBR-R809, Houston, TX 77030 USA
| |
Collapse
|
17
|
Abstract
It has been long understood that mutation distribution is not completely random across genomic space and in time. Indeed, recent surprising discoveries identified multiple simultaneous mutations occurring in tiny regions within chromosomes while the rest of the genome remains relatively mutation-free. Mechanistic elucidation of these phenomena, called mutation showers, mutation clusters, or kataegis, in parallel with findings of abundant clustered mutagenesis in cancer genomes, is ongoing. So far, the combination of factors most important for clustered mutagenesis is the induction of DNA lesions within unusually long and persistent single-strand DNA intermediates. In addition to being a fascinating phenomenon, clustered mutagenesis also became an indispensable tool for identifying a previously unrecognized major source of mutation in cancer, APOBEC cytidine deaminases. Future research on clustered mutagenesis may shed light onto important mechanistic details of genome maintenance, with potentially profound implications for human health.
Collapse
Affiliation(s)
- Kin Chan
- Mechanisms of Genome Dynamics Group, National Institute of Environmental Health Sciences, Department of Health and Human Services, National Institutes of Health, Durham, North Carolina 27709; ,
| | - Dmitry A Gordenin
- Mechanisms of Genome Dynamics Group, National Institute of Environmental Health Sciences, Department of Health and Human Services, National Institutes of Health, Durham, North Carolina 27709; ,
| |
Collapse
|
18
|
Chromosome-Specific Centromere Sequences Provide an Estimate of the Ancestral Chromosome 2 Fusion Event in Hominin Genomes. J Hered 2016; 108:45-52. [DOI: 10.1093/jhered/esw039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 06/20/2016] [Indexed: 12/14/2022] Open
|
19
|
Kenigsberg E, Yehuda Y, Marjavaara L, Keszthelyi A, Chabes A, Tanay A, Simon I. The mutation spectrum in genomic late replication domains shapes mammalian GC content. Nucleic Acids Res 2016; 44:4222-32. [PMID: 27085808 PMCID: PMC4872117 DOI: 10.1093/nar/gkw268] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/10/2016] [Accepted: 03/30/2016] [Indexed: 11/14/2022] Open
Abstract
Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations.
Collapse
Affiliation(s)
- Ephraim Kenigsberg
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Yishai Yehuda
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Lisette Marjavaara
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrea Keszthelyi
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrei Chabes
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Amos Tanay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Itamar Simon
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
20
|
Mugal CF, Weber CC, Ellegren H. GC-biased gene conversion links the recombination landscape and demography to genomic base composition. Bioessays 2015; 37:1317-26. [DOI: 10.1002/bies.201500058] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Carina F. Mugal
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Claudia C. Weber
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Department of Biology; Center for Computational Genetics and Genomics; Temple University; Philadelphia PA USA
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
21
|
Knief U, Schielzeth H, Ellegren H, Kempenaers B, Forstmeier W. A prezygotic transmission distorter acting equally in female and male zebra finchesTaeniopygia guttata. Mol Ecol 2015; 24:3846-59. [DOI: 10.1111/mec.13281] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 06/13/2015] [Accepted: 06/17/2015] [Indexed: 12/25/2022]
Affiliation(s)
- Ulrich Knief
- Department of Behavioural Ecology and Evolutionary Genetics; Max Planck Institute for Ornithology; Eberhard-Gwinner-Str. 82319 Seewiesen Germany
| | - Holger Schielzeth
- Department of Evolutionary Biology; Bielefeld University; Morgenbreede 45 33615 Bielefeld Germany
| | - Hans Ellegren
- Department of Evolutionary Biology; Uppsala University; Norbyvägen 18D 752 36 Uppsala Sweden
| | - Bart Kempenaers
- Department of Behavioural Ecology and Evolutionary Genetics; Max Planck Institute for Ornithology; Eberhard-Gwinner-Str. 82319 Seewiesen Germany
| | - Wolfgang Forstmeier
- Department of Behavioural Ecology and Evolutionary Genetics; Max Planck Institute for Ornithology; Eberhard-Gwinner-Str. 82319 Seewiesen Germany
| |
Collapse
|
22
|
Glémin S, Arndt PF, Messer PW, Petrov D, Galtier N, Duret L. Quantification of GC-biased gene conversion in the human genome. Genome Res 2015; 25:1215-28. [PMID: 25995268 PMCID: PMC4510005 DOI: 10.1101/gr.185488.114] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 05/18/2015] [Indexed: 11/25/2022]
Abstract
Much evidence indicates that GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, a detailed quantification of the process is still lacking. The strength of gBGC can be measured from the analysis of derived allele frequency spectra (DAF), but this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors and by spatial heterogeneity in gBGC strength. We propose a new general method to quantify gBGC from DAF spectra, incorporating polarization errors, taking spatial heterogeneity into account, and jointly estimating mutation bias. Applying it to human polymorphism data from the 1000 Genomes Project, we show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. Genome-wide, the intensity of gBGC is in the nearly neutral area. However, given that recombination occurs primarily within recombination hotspots, 1%–2% of the human genome is subject to strong gBGC. On average, gBGC is stronger in African than in non-African populations, reflecting differences in effective population sizes. However, due to more heterogeneous recombination landscapes, the fraction of the genome affected by strong gBGC is larger in non-African than in African populations. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that, in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.
Collapse
Affiliation(s)
- Sylvain Glémin
- Institut des Sciences de l'Evolution (ISEM - UMR 5554 Université de Montpellier-CNRS-IRD-EPHE), 34095 Montpellier, France; Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Philipp W Messer
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Dmitri Petrov
- Department of Biology, Stanford University, Stanford, California 94305-5020, USA
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution (ISEM - UMR 5554 Université de Montpellier-CNRS-IRD-EPHE), 34095 Montpellier, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Lyon 1, 69622 Villeurbanne, France
| |
Collapse
|
23
|
Real-Time Evolution of a Subtelomeric Gene Family in Candida albicans. Genetics 2015; 200:907-19. [PMID: 25956943 PMCID: PMC4512551 DOI: 10.1534/genetics.115.177451] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2015] [Accepted: 05/05/2015] [Indexed: 01/02/2023] Open
Abstract
Subtelomeric regions of the genome are notable for high rates of sequence evolution and rapid gene turnover. Evidence of subtelomeric evolution has relied heavily on comparisons of historical evolutionary patterns to infer trends and frequencies of these events. Here, we describe evolution of the subtelomeric TLO gene family in Candida albicans during laboratory passaging for over 4000 generations. C. albicans is a commensal and opportunistic pathogen of humans and the TLO gene family encodes a subunit of the Mediator complex that regulates transcription and affects a range of virulence factors. We identified 16 distinct subtelomeric recombination events that altered the TLO repertoire. Ectopic recombination between subtelomeres on different chromosome ends occurred approximately once per 5000 generations and was often followed by loss of heterozygosity, resulting in the complete loss of one TLO gene sequence with expansion of another. In one case, recombination within TLO genes produced a novel TLO gene sequence. TLO copy number changes were biased, with some TLOs preferentially being copied to novel chromosome arms and other TLO genes being frequently lost. The majority of these nonreciprocal recombination events occurred either within the 3′ end of the TLO coding sequence or within a conserved 50-bp sequence element centromere-proximal to TLO coding sequence. Thus, subtelomeric recombination is a rapid mechanism of generating genotypic diversity through alterations in the number and sequence of related gene family members.
Collapse
|
24
|
Clément Y, Fustier MA, Nabholz B, Glémin S. The bimodal distribution of genic GC content is ancestral to monocot species. Genome Biol Evol 2014; 7:336-48. [PMID: 25527839 PMCID: PMC4316631 DOI: 10.1093/gbe/evu278] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots.
Collapse
Affiliation(s)
- Yves Clément
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, Montpellier, France Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | | | - Benoit Nabholz
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | - Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| |
Collapse
|
25
|
Berglund J, Quilez J, Arndt PF, Webster MT. Germline methylation patterns determine the distribution of recombination events in the dog genome. Genome Biol Evol 2014; 7:522-30. [PMID: 25527838 PMCID: PMC4350167 DOI: 10.1093/gbe/evu282] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The positive-regulatory domain containing nine gene, PRDM9, which strongly associates with the location of recombination events in several vertebrates, is inferred to be inactive in the dog genome. Here, we address several questions regarding the control of recombination and its influence on genome evolution in dogs. First, we address whether the association between CpG islands (CGIs) and recombination hotspots is generated by lack of methylation, GC-biased gene conversion (gBGC), or both. Using a genome-wide dog single nucleotide polymorphism data set and comparisons of the dog genome with related species, we show that recombination-associated CGIs have low CpG mutation rates, and that CpG mutation rate is negatively correlated with recombination rate genome wide, indicating that nonmethylation attracts the recombination machinery. We next use a neighbor-dependent model of nucleotide substitution to disentangle the effects of CpG mutability and gBGC and analyze the effects that loss of PRDM9 has on these rates. We infer that methylation patterns have been stable during canid genome evolution, but that dog CGIs have experienced a drastic increase in substitution rate due to gBGC, consistent with increased levels of recombination in these regions. We also show that gBGC is likely to have generated many new CGIs in the dog genome, but these mostly occur away from genes, whereas the number of CGIs in gene promoter regions has not increased greatly in recent evolutionary history. Recombination has a major impact on the distribution of CGIs that are detected in the dog genome due to the interaction between methylation and gBGC. The results indicate that germline methylation patterns are the main determinant of recombination rates in the absence of PRDM9.
Collapse
Affiliation(s)
- Jonas Berglund
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| | - Javier Quilez
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Matthew T Webster
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| |
Collapse
|
26
|
Figuet E, Ballenghien M, Romiguier J, Galtier N. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates. Genome Biol Evol 2014; 7:240-50. [PMID: 25527834 PMCID: PMC4316630 DOI: 10.1093/gbe/evu277] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.
Collapse
Affiliation(s)
- Emeric Figuet
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France
| | - Marion Ballenghien
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France
| | - Jonathan Romiguier
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France Department of Ecology and Evolution, Biophore, University of Lausanne, Switzerland
| | - Nicolas Galtier
- CNRS, Université Montpellier 2, UMR 5554, Institut des Sciences de l'Evolution de Montpellier, France
| |
Collapse
|
27
|
Scala G, Affinito O, Miele G, Monticelli A, Cocozza S. Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites. PLoS One 2014; 9:e114432. [PMID: 25474578 PMCID: PMC4256220 DOI: 10.1371/journal.pone.0114432] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 11/09/2014] [Indexed: 11/19/2022] Open
Abstract
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.
Collapse
Affiliation(s)
- Giovanni Scala
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Dipartimento di Fisica, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy
- * E-mail:
| | - Ornella Affinito
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Istituto di Endocrinologia ed Oncologia Sperimentale (IEOS), CNR, Naples, Italy
| | - Gennaro Miele
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Dipartimento di Fisica, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy
| | - Antonella Monticelli
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Istituto di Endocrinologia ed Oncologia Sperimentale (IEOS), CNR, Naples, Italy
| | - Sergio Cocozza
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli “Federico II”, Naples, Italy
- Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Università degli Studi di Napoli “Federico II”, Naples, Italy
| |
Collapse
|
28
|
Bell CG, Wilson GA, Beck S. Human-specific CpG 'beacons' identify human-specific prefrontal cortex H3K4me3 chromatin peaks. Epigenomics 2014; 6:21-31. [PMID: 24579944 DOI: 10.2217/epi.13.74] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Targeted recruitment of chromatin-modifying enzymes to clusters of CpG dinucleotides contributes toward the formation of accessible chromatin. By interprimate comparison we previously identified the set of nonpolymorphic human-specific CpGs (CpG 'beacons') and revealed that these loci were enriched for human disease traits. Due to their human-specific CpG density change, extreme CpG 'beacon' clusters (≥20 CpG beacons/kb) were predicted to identify permissive chromatin peaks within the human genome. AIM We set out to explore these sequence-defined regions for evidence of an active chromatin signature. RESULTS Using available comparative primate epigenomic data from neurons of the prefrontal cortex, we show that these CpG 'beacon' clusters are indeed enriched for being human-specific H3K4me3 peaks (χ(2): p < 2.2 × 10(-16)) and thus predictive of permissive chromatin states. These sequence regions had a higher predictive value than previous selective analyses. We also show that both human-specific H3K4me3 and CpG 'beacon' clusters are increased within current and ancestral telomeric regions, supporting an association with recombination, which is higher towards the distal ends of chromosomes. CONCLUSION Therefore, CpG-focused comparative sequence analysis can precisely pinpoint chromatin structures that contribute to the human-specific phenotype and further supports an integrated approach in genomic and epigenomic studies.
Collapse
Affiliation(s)
- Christopher G Bell
- Medical Genomics, UCL Cancer Institute, University College London, London, UK
| | | | | |
Collapse
|
29
|
Lachance J, Tishkoff SA. Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. Am J Hum Genet 2014; 95:408-20. [PMID: 25279983 PMCID: PMC4185123 DOI: 10.1016/j.ajhg.2014.09.008] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 08/21/2014] [Accepted: 09/10/2014] [Indexed: 10/25/2022] Open
Abstract
Gene conversion results in the nonreciprocal transfer of genetic information between two recombining sequences, and there is evidence that this process is biased toward G and C alleles. However, the strength of GC-biased gene conversion (gBGC) in human populations and its effects on hereditary disease have yet to be assessed on a genomic scale. Using high-coverage whole-genome sequences of African hunter-gatherers, agricultural populations, and primate outgroups, we quantified the effects of GC-biased gene conversion on population genomic data sets. We find that genetic distances (FST and population branch statistics) are modified by gBGC. In addition, the site frequency spectrum is left-shifted when ancestral alleles are favored by gBGC and right-shifted when derived alleles are favored by gBGC. Allele frequency shifts due to gBGC mimic the effects of natural selection. As expected, these effects are strongest in high-recombination regions of the human genome. By comparing the relative rates of fixation of unbiased and biased sites, the strength of gene conversion was estimated to be on the order of Nb ≈ 0.05 to 0.09. We also find that derived alleles favored by gBGC are much more likely to be homozygous than derived alleles at unbiased SNPs (+42.2% to 62.8%). This results in a curse of the converted, whereby gBGC causes substantial increases in hereditary disease risks. Taken together, our findings reveal that GC-biased gene conversion has important population genetic and public health implications.
Collapse
MESH Headings
- Bias
- Evolution, Molecular
- Gene Conversion
- Gene Frequency
- Genes, Recessive/genetics
- Genetic Diseases, Inborn/genetics
- Genetics, Population
- Genome, Human/genetics
- Humans
- Models, Genetic
- Models, Theoretical
- Polymorphism, Single Nucleotide/genetics
- Recombination, Genetic
- Selection, Genetic/genetics
Collapse
Affiliation(s)
- Joseph Lachance
- Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Sarah A Tishkoff
- Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
30
|
Abstract
The great ape families are the species most closely related to our own, comprising chimpanzees, bonobos, gorillas, and orangutans. They live exclusively in tropical rainforests in Central Africa and the islands of Southeast Asia. Due to their close evolutionary relationship with humans, great apes share many cognitive, physiological, and morphological similarities with humans. The members of the great ape family make obvious models to facilitate the further understanding about humans' biology and history. This review will discuss how the recent addition of genome-wide data from great apes has furthered humans' understanding of these species and humanity, especially in the realm of evolutionary genetics.
Collapse
|
31
|
Weber CC, Boussau B, Romiguier J, Jarvis ED, Ellegren H. Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biol 2014; 15:549. [PMID: 25496599 PMCID: PMC4290106 DOI: 10.1186/s13059-014-0549-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 11/19/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
Collapse
Affiliation(s)
- Claudia C Weber
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| | - Bastien Boussau
- />Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558 Villeurbanne, France
| | | | - Erich D Jarvis
- />Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC USA
| | - Hans Ellegren
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| |
Collapse
|
32
|
Robinson MC, Stone EA, Singh ND. Population genomic analysis reveals no evidence for GC-biased gene conversion in Drosophila melanogaster. Mol Biol Evol 2013; 31:425-33. [PMID: 24214536 DOI: 10.1093/molbev/mst220] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene conversion is the nonreciprocal exchange of genetic material between homologous chromosomes. Multiple lines of evidence from a variety of taxa strongly suggest that gene conversion events are biased toward GC-bearing alleles. However, in Drosophila, the data have largely been indirect and unclear, with some studies supporting the predictions of a GC-biased gene conversion model and other data showing contradictory findings. Here, we test whether gene conversion events are GC-biased in Drosophila melanogaster using whole-genome polymorphism and divergence data. Our results provide no support for GC-biased gene conversion and thus suggest that this process is unlikely to significantly contribute to patterns of polymorphism and divergence in this system.
Collapse
Affiliation(s)
- Matthew C Robinson
- Department of Biological Sciences, Program in Genetics, North Carolina State University
| | | | | |
Collapse
|
33
|
Munch K, Mailund T, Dutheil JY, Schierup MH. A fine-scale recombination map of the human-chimpanzee ancestor reveals faster change in humans than in chimpanzees and a strong impact of GC-biased gene conversion. Genome Res 2013; 24:467-74. [PMID: 24190946 PMCID: PMC3941111 DOI: 10.1101/gr.158469.113] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recombination is a major determinant of adaptive and nonadaptive evolution. Understanding how the recombination landscape has evolved in humans is thus key to the interpretation of human genomic evolution. Comparison of fine-scale recombination maps of human and chimpanzee has revealed large changes at fine genomic scales and conservation over large scales. Here we demonstrate how a fine-scale recombination map can be derived for the ancestor of human and chimpanzee, allowing us to study the changes that have occurred in human and chimpanzee since these species diverged. The map is produced from more than one million accurately determined recombination events. We find that this new recombination map is intermediate to the maps of human and chimpanzee but that the recombination landscape has evolved more rapidly in the human lineage than in the chimpanzee lineage. We use the map to show that recombination rate, through the effect of GC-biased gene conversion, is an even stronger determinant of base composition evolution than previously reported.
Collapse
Affiliation(s)
- Kasper Munch
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark
| | | | | | | |
Collapse
|
34
|
Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes. PLoS Genet 2013; 9:e1003684. [PMID: 23966869 PMCID: PMC3744432 DOI: 10.1371/journal.pgen.1003684] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 06/14/2013] [Indexed: 01/03/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. Interpreting patterns of DNA sequence variation in the genomes of closely related species is critically important for understanding the causes and functional effects of nucleotide substitutions. Classical models describe patterns of substitution in terms of the fundamental forces of mutation, recombination, neutral drift, and natural selection. However, an entirely separate force, called GC-biased gene conversion (gBGC), also appears to have an important influence on substitution patterns in many species. gBGC is a recombination-associated evolutionary process that favors the fixation of strong (G/C) over weak (A/T) alleles. In mammals, gBGC is thought to promote variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations. However, its genome-wide influence remains poorly understood, in part because, it is difficult to incorporate gBGC into statistical models of evolution. In this paper, we describe a new evolutionary model that jointly describes the effects of selection and gBGC and apply it to the human and chimpanzee genomes. Our genome-wide predictions of gBGC tracts indicate that gBGC has been an important force in recent human evolution. Our publicly available computer program, called phastBias, and our genome-wide predictions will enable other researchers to consider gBGC in their analyses.
Collapse
Affiliation(s)
- John A. Capra
- Gladstone Institutes, University of California, San Francisco, California, United States of America
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Dennis Kostka
- Department of Developmental Biology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Katherine S. Pollard
- Gladstone Institutes, University of California, San Francisco, California, United States of America
- Institute for Human Genetics and Division of Biostatistics, University of California, San Francisco, California, United States of America
- * E-mail: (KSP); (AS)
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (KSP); (AS)
| |
Collapse
|
35
|
Smith JD, McManus KF, Fraser HB. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol Biol Evol 2013; 30:2509-18. [PMID: 23904330 DOI: 10.1093/molbev/mst134] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Measuring natural selection on genomic elements involved in the cis-regulation of gene expression--such as transcriptional enhancers and promoters--is critical for understanding the evolution of genomes, yet it remains a major challenge. Many studies have attempted to detect positive or negative selection in these noncoding elements by searching for those with the fastest or slowest rates of evolution, but this can be problematic. Here, we introduce a new approach to this issue, and demonstrate its utility on three mammalian transcriptional enhancers. Using results from saturation mutagenesis studies of these enhancers, we classified all possible point mutations as upregulating, downregulating, or silent, and determined which of these mutations have occurred on each branch of a phylogeny. Applying a framework analogous to Ka/Ks in protein-coding genes, we measured the strength of selection on upregulating and downregulating mutations, in specific branches as well as entire phylogenies. We discovered distinct modes of selection acting on different enhancers: although all three have experienced negative selection against downregulating mutations, the selection pressures on upregulating mutations vary. In one case, we detected positive selection for upregulation, whereas the other two had no detectable selection on upregulating mutations. Our methodology is applicable to the growing number of saturation mutagenesis data sets, and provides a detailed picture of the mode and strength of natural selection acting on cis-regulatory elements.
Collapse
|
36
|
Xu K, Wang J, Elango N, Yi SV. The evolution of lineage-specific clusters of single nucleotide substitutions in the human genome. Mol Phylogenet Evol 2013; 69:276-85. [PMID: 23770436 DOI: 10.1016/j.ympev.2013.06.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2013] [Revised: 05/17/2013] [Accepted: 06/04/2013] [Indexed: 11/25/2022]
Abstract
Genomic regions harboring large numbers of human-specific single nucleotide substitutions are of significant interest since they are potential genomic foci underlying the evolution of human-specific traits as well as human adaptive evolution. Previous studies aimed to identify such regions either used pre-defined genomic locations such as coding sequences and conserved genomic elements or employed sliding window methods. Such approaches may miss clusters of substitutions occurring in regions other than those pre-defined locations, or not be able to distinguish human-specific clusters of substitutions from regions of generally high substitution rates. Here, we conduct a 'maximal segment' analysis to scan the whole human genome to identify clusters of human-specific substitutions that occurred since the divergence of the human and the chimpanzee genomes. This method can identify species-specific clusters of substitutions while not relying on pre-defined regions. We thus identify thousands of clusters of human-specific single nucleotide substitutions. The evolution of such clusters is driven by a combination of several different evolutionary processes including increased regional mutation rate, recombination-associated processes, and positive selection. These newly identified regions of human-specific substitution clusters include large numbers of previously identified human accelerated regions, and exhibit significant enrichments of genes involved in several developmental processes. Our study provides a useful tool to study the evolution of the human genome.
Collapse
Affiliation(s)
- Ke Xu
- School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
37
|
Leushkin EV, Bazykin GA. Short indels are subject to insertion-biased gene conversion. Evolution 2013; 67:2604-13. [PMID: 24033170 DOI: 10.1111/evo.12129] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 04/05/2013] [Indexed: 11/29/2022]
Abstract
Recombination between homologous loci is accompanied by formation of heteroduplexes. Repairing mismatches in heteroduplexes often leads to single nucleotide substitutions in a process known as gene conversion. Gene conversion was shown to be GC-biased in different organisms; that is, a W(A or T)→S(G or C) substitution is more likely in this process than a S→W substitution. Here, we show that the insertion/deletion ratio for short noncoding indels that reach fixation between species is positively correlated with the recombination rate in Drosophila melanogaster, Homo sapiens, and Saccharomyces cerevisiae. This correlation is both due to an increase of the fixation rate of insertions and decrease of the fixation rate of deletions in regions of high recombination. Whole-genome data on indel polymorphism and divergence in D. melanogaster rule out mutation biases and selection as the cause of this trend, pointing to insertion-biased gene conversion as the most likely explanation. The bias toward insertions is the strongest for single-nucleotide indels, and decreases with indel length. In regions of high recombination rate this bias leads to an up to ∼5-fold excess of fixed short insertions over deletions, and substantially affects the evolution of DNA segments.
Collapse
Affiliation(s)
- Evgeny V Leushkin
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow, 119992, Russia; Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Bolshoi Karetny pereulok, 19, Moscow, 127994, Russia.
| | | |
Collapse
|
38
|
Cagliani R, Guerini FR, Rubio-Acero R, Baglio F, Forni D, Agliardi C, Griffanti L, Fumagalli M, Pozzoli U, Riva S, Calabrese E, Sikora M, Casals F, Comi GP, Bresolin N, Cáceres M, Clerici M, Sironi M. Long-standing balancing selection in the THBS4 gene: influence on sex-specific brain expression and gray matter volumes in Alzheimer disease. Hum Mutat 2013; 34:743-53. [PMID: 23420636 DOI: 10.1002/humu.22301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2012] [Accepted: 02/01/2013] [Indexed: 01/08/2023]
Abstract
The THBS4 gene encodes a glycoprotein involved in inflammatory responses and synaptogenesis. THBS4 is expressed at higher levels in the brain of humans compared with nonhuman primates, and the protein accumulates in β-amyloid plaques. We analyzed THBS4 genetic variability in humans and show that two haplotypes (hap1 and hap2) are maintained by balancing selection and modulate THBS4 expression in lymphocytes. Indeed, the balancing selection region covers a predicted transcriptional enhancer. In humans, but not in macaques and chimpanzees, THBS4 brain expression increases with age, and variants in the balancing selection region interact with sex in influencing THBS4 expression (pinteraction = 0.038), with hap1 homozygous females showing lowest expression. In Alzheimer disease (AD) patients, significant interactions between sex and THBS4 genotype were detected for peripheral gray matter (pinteraction = 0.014) and total gray matter (pinteraction = 0.012) volumes. Similarly to the gene expression results, the interaction is mainly mediated by hap1 homozygous AD females, who show reduced volumes. Thus, the balancing selection target in THBS4 is likely represented by one or more variants that regulate tissue-specific and sex-specific gene expression. The selection signature associated with THBS4 might not be related to AD pathogenesis, but rather to inflammatory responses.
Collapse
|
39
|
Lartillot N. Phylogenetic patterns of GC-biased gene conversion in placental mammals and the evolutionary dynamics of recombination landscapes. Mol Biol Evol 2012; 30:489-502. [PMID: 23079417 DOI: 10.1093/molbev/mss239] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a major evolutionary force shaping genomic nucleotide landscapes, distorting the estimation of the strength of selection, and having potentially deleterious effects on genome-wide fitness. Yet, a global quantitative picture, at large evolutionary scale, of the relative strength of gBGC compared with selection and random drift is still lacking. Furthermore, owing to its dependence on the local recombination rate, gBGC results in modulations of the substitution patterns along genomes and across time which, if correctly interpreted, may yield quantitative insights into the long-term evolutionary dynamics of recombination landscapes. Deriving a model of the substitution process at putatively neutral nucleotide positions from population-genetics arguments, and accounting for among-lineage and among-gene effects, we propose a reconstruction of the variation in gBGC intensity at the scale of placental mammals, and of its scaling with body-size and karyotypic traits. Our results are compatible with a simple population genetics model relating gBGC to effective population size and recombination rate. In addition, among-gene variation and phylogenetic patterns of exon-specific levels of gBGC reveal the presence of rugged recombination landscapes, and suggest that short-lived recombination hot-spots are a general feature of placentals. Across placental mammals, variation in gBGC strength spans two orders of magnitude, at its lowest in apes, strongest in lagomorphs, microbats or tenrecs, and near or above the nearly neutral threshold in most other lineages. Combined with among-gene variation, such high levels of biased gene conversion are likely to significantly impact midly selected positions, and to represent a substantial mutation load. Altogether, our analysis suggests a more important role of gBGC in placental genome evolution, compared with what could have been anticipated from studies conducted in anthropoid primates.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
40
|
Lartillot N. Interaction between selection and biased gene conversion in mammalian protein-coding sequence evolution revealed by a phylogenetic covariance analysis. Mol Biol Evol 2012; 30:356-68. [PMID: 23024185 DOI: 10.1093/molbev/mss231] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
According to the nearly-neutral model, variation in long-term effective population size among species should result in correlated variation in the ratio of nonsynonymous over synonymous substitution rates (dN/dS). Previous empirical investigations in mammals have been consistent with this prediction, suggesting an important role for nearly-neutral effects on protein-coding sequence evolution. GC-biased gene conversion (gBGC), on the other hand, is increasingly recognized as a major evolutionary force shaping genome nucleotide composition. When sufficiently strong compared with random drift, gBGC may significantly interfere with a nearly-neutral regime and impact dN/dS in a complex manner. Here, we investigate the phylogenetic correlations between dN/dS, the equilibrium GC composition (GC*), and several life-history and karyotypic traits in placental mammals. We show that the equilibrium GC composition decreases with body mass and increases with the number of chromosomes, suggesting a modulation of the strength of biased gene conversion due to changes in effective population size and genome-wide recombination rate. The variation in dN/dS is complex and only partially fits the prediction of the nearly-neutral theory. However, specifically restricting estimation of the dN/dS ratio on GC-conservative transversions, which are immune from gBGC, results in correlations that are more compatible with a nearly-neutral interpretation. Our investigation indicates the presence of complex interactions between selection and biased gene conversion and suggests that further mechanistic development is warranted, to tease out mutation, selection, drift, and conversion.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
41
|
Bell CG, Wilson GA, Butcher LM, Roos C, Walter L, Beck S. Human-specific CpG "beacons" identify loci associated with human-specific traits and disease. Epigenetics 2012; 7:1188-99. [PMID: 22968434 PMCID: PMC3469460 DOI: 10.4161/epi.22127] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Regulatory change has long been hypothesized to drive the delineation of the human phenotype from other closely related primates. Here we provide evidence that CpG dinucleotides play a special role in this process. CpGs enable epigenome variability via DNA methylation, and this epigenetic mark functions as a regulatory mechanism. Therefore, species-specific CpGs may influence species-specific regulation. We report non-polymorphic species-specific CpG dinucleotides (termed “CpG beacons”) as a distinct genomic feature associated with CpG island (CGI) evolution, human traits and disease. Using an inter-primate comparison, we identified 21 extreme CpG beacon clusters (≥ 20/kb peaks, empirical p < 1.0 × 10−3) in humans, which include associations with four monogenic developmental and neurological disease related genes (Benjamini-Hochberg corrected p = 6.03 × 10−3). We also demonstrate that beacon-mediated CpG density gain in CGIs correlates with reduced methylation in these species in orthologous CGIs over time, via human, chimpanzee and macaque MeDIP-seq. Therefore mapping into both the genomic and epigenomic space the identified CpG beacon clusters define points of intersection where a substantial two-way interaction between genetic sequence and epigenetic state has occurred. Taken together, our data support a model for CpG beacons to contribute to CGI evolution from genesis to tissue-specific to constitutively active CGIs.
Collapse
Affiliation(s)
- Christopher G Bell
- Medical Genomics, UCL Cancer Institute, University College London, London, UK.
| | | | | | | | | | | |
Collapse
|
42
|
The three clades of the telomere-associated TLO gene family of Candida albicans have different splicing, localization, and expression features. EUKARYOTIC CELL 2012; 11:1268-75. [PMID: 22923044 DOI: 10.1128/ec.00230-12] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Candida albicans grows within a wide range of host niches, and this adaptability enhances its success as a commensal and as a pathogen. The telomere-associated TLO gene family underwent a recent expansion from one or two copies in other CUG clade members to 14 expressed copies in C. albicans. This correlates with increased virulence and clinical prevalence relative to those of other Candida clade species. The 14 expressed TLO gene family members have a conserved Med2 domain at the N terminus, suggesting a role in general transcription. The C-terminal half is more divergent, distinguishing three clades: clade α and clade β have no introns and encode proteins that localize primarily to the nucleus; clade γ sometimes undergoes splicing, and the gene products localize within the mitochondria as well as the nuclei. Additionally, TLOα genes are generally expressed at much higher levels than are TLOγ genes. We propose that expansion of the TLO gene family and the predicted role of Tlo proteins in transcription regulation provide C. albicans with the ability to adapt rapidly to the broad range of different environmental niches within the human host.
Collapse
|
43
|
Voelker RB, Erkelenz S, Reynoso V, Schaal H, Berglund JA. Frequent gain and loss of intronic splicing regulatory elements during the evolution of vertebrates. Genome Biol Evol 2012; 4:659-74. [PMID: 22619362 PMCID: PMC3606033 DOI: 10.1093/gbe/evs051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Splicing regulatory elements (SREs) are sequences bound by proteins that influence splicing of nearby splice sites. Constitutively spliced introns have evolved to utilize many different splicing factors. The evolutionary processes that influenced which splicing factors are used for splicing of individual introns are generally unclear. We demonstrate that in the lineage that gave rise to mammals, many introns lost U-rich sequences and gained G-rich sequences, both of which resemble known SREs. The apparent conversion of U-rich to G-rich SREs suggests that the associated splicing factors are functionally equivalent. In support of this we demonstrated that U-rich and G-rich SREs are both capable of promoting splicing of an SRE-dependent splicing reporter. Furthermore, we demonstrate, using the heterologous MS2 tethering system (bacterial MS2 coat fusion-protein and its RNA stem-loop binding site), that both the U-rich SRE-binding protein (TIA1) and the G-rich SRE-binding protein (HNRNPF) can promote splicing of the same intron. We also observed that gain of G-rich SREs is significantly associated with G/C-rich genomic isochores, suggesting that gain or loss of SREs was driven by the same processes that ultimately resulted in the formation of mammalian genomic isochores. We propose the following model for the gain and loss of mammalian SREs. Ancestral U-rich SREs located in genomic regions that were experiencing high rates of A/T to G/C conversion would have suffered frequent deleterious mutations. However, this same process resulted in increased formation of functionally equivalent G-rich SREs, and acquisition of new G-rich SREs decreased purifying selection on the U-rich SREs, which were then free to decay.
Collapse
Affiliation(s)
- Rodger B Voelker
- Institute of Molecular Biology, Department of Chemistry, University of Oregon, OR, USA
| | | | | | | | | |
Collapse
|
44
|
Takahashi M, Saitou N. Identification and characterization of lineage-specific highly conserved noncoding sequences in Mammalian genomes. Genome Biol Evol 2012; 4:641-57. [PMID: 22505575 PMCID: PMC3381673 DOI: 10.1093/gbe/evs035] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2012] [Indexed: 01/12/2023] Open
Abstract
Vertebrate genome comparisons revealed that there are highly conserved noncoding sequences (HCNSs) among a wide range of species and many of which contain regulatory elements. However, recently emerged sequences conserved in specific lineages have not been well studied. Toward this end, we identified 8,198 primate and 21,128 specific HCNSs as representative ones among mammals from human-marmoset and mouse-rat comparisons, respectively. Derived allele frequency analysis of primate-specific HCNSs showed that these HCNSs were under purifying selection, indicating that they may harbor important functions. We selected the top 1,000 largest HCNSs and compared the lineage-specific HCNS-flanking genes (LHF genes) with ultraconserved element (UCE)-flanking genes. Interestingly, the majority of LHF genes were different from UCE-flanking genes. This lineage-specific set of LHF genes was more enriched in protein-binding function. Conversely, the number of LHF genes that were also shared by UCEs was small but significantly larger than random expectation, and many of these genes were involved in anatomical development as transcriptional regulators, suggesting that certain groups of genes preferentially recruit new HCNSs in addition to old HCNSs that are conserved among vertebrates. This group of LHF genes might be involved in the various levels of lineage-specific evolution among vertebrates, mammals, primates, and rodents. If so, the emergence of HCNSs in and around these two groups of LHF genes developed lineage-specific characteristics. Our results provide new insight into lineage-specific evolution through interactions between HCNSs and their LHF genes.
Collapse
Affiliation(s)
- Mahoko Takahashi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies, Japan
- Division of Population Genetics, National Institute of Genetics, Japan
- Present address: Department of Genetics, Stanford University
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies, Japan
- Division of Population Genetics, National Institute of Genetics, Japan
| |
Collapse
|
45
|
Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, Humburg P, Iqbal Z, Lunter G, Maller J, Hernandez RD, Melton C, Venkat A, Nobrega MA, Bontrop R, Myers S, Donnelly P, Przeworski M, McVean G. A fine-scale chimpanzee genetic map from population sequencing. Science 2012; 336:193-8. [PMID: 22422862 PMCID: PMC3532813 DOI: 10.1126/science.1216872] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To study the evolution of recombination rates in apes, we developed methodology to construct a fine-scale genetic map from high-throughput sequence data from 10 Western chimpanzees, Pan troglodytes verus. Compared to the human genetic map, broad-scale recombination rates tend to be conserved, but with exceptions, particularly in regions of chromosomal rearrangements and around the site of ancestral fusion in human chromosome 2. At fine scales, chimpanzee recombination is dominated by hotspots, which show no overlap with those of humans even though rates are similarly elevated around CpG islands and decreased within genes. The hotspot-specifying protein PRDM9 shows extensive variation among Western chimpanzees, and there is little evidence that any sequence motifs are enriched in hotspots. The contrasting locations of hotspots provide a natural experiment, which demonstrates the impact of recombination on base composition.
Collapse
Affiliation(s)
- Adam Auton
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Genetics, Albert Einstein College of Medicine, New York, New York, USA
| | - Adi Fledel-Alon
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Susanne Pfeifer
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Oliver Venn
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Laure Ségurel
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
| | - Teresa Street
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Ellen M. Leffler
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Rory Bowden
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
- Oxford Biomedical Research Centre, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 9DU, UK
| | - Ivy Aneas
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - John Broxholme
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Peter Humburg
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Gerton Lunter
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Julian Maller
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Ryan D. Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143-0912, USA
| | - Cord Melton
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Aarti Venkat
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
| | - Marcelo A. Nobrega
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Ronald Bontrop
- Department of Comparative Genetics and Refinement, Biomedical Primate Research Center, Lange Kleiweg 139 2288 GJ, Rijswijk, Netherlands
| | - Simon Myers
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Peter Donnelly
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| | - Molly Przeworski
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, University of Oxford, Oxford, OX1 3TG, UK
| |
Collapse
|
46
|
Popa A, Samollow P, Gautier C, Mouchiroud D. The sex-specific impact of meiotic recombination on nucleotide composition. Genome Biol Evol 2012; 4:412-22. [PMID: 22417915 PMCID: PMC3318449 DOI: 10.1093/gbe/evs023] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Meiotic recombination is an important evolutionary force shaping the nucleotide landscape of genomes. For most vertebrates, the frequency of recombination varies slightly or considerably between the sexes (heterochiasmy). In humans, male, rather than female, recombination rate has been found to be more highly correlated with the guanine and cytosine (GC) content across the genome. In the present study, we review the results in human and extend the examination of the evolutionary impact of heterochiasmy beyond primates to include four additional eutherian mammals (mouse, dog, pig, and sheep), a metatherian mammal (opossum), and a bird (chicken). Specifically, we compared sex-specific recombination rates (RRs) with nucleotide substitution patterns evaluated in transposable elements. Our results, based on a comparative approach, reveal a great diversity in the relationship between heterochiasmy and nucleotide composition. We find that the stronger male impact on this relationship is a conserved feature of human, mouse, dog, and sheep. In contrast, variation in genomic GC content in pig and opossum is more strongly correlated with female, rather than male, RR. Moreover, we show that the sex-differential impact of recombination is mainly driven by the chromosomal localization of recombination events. Independent of sex, the higher the RR in a genomic region and the longer this recombination activity is conserved in time, the stronger the bias in nucleotide substitution pattern, through such mechanisms as biased gene conversion. Over time, this bias will increase the local GC content of the region.
Collapse
|
47
|
Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K. Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res 2012; 22:51-63. [PMID: 22006216 PMCID: PMC3246206 DOI: 10.1101/gr.124123.111] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2011] [Accepted: 10/05/2011] [Indexed: 11/25/2022]
Abstract
Analysis of diverse eukaryotes has revealed that recombination events cluster in discrete genomic locations known as hotspots. In humans, a zinc-finger protein, PRDM9, is believed to initiate recombination in >40% of hotspots by binding to a specific DNA sequence motif. However, the PRDM9 coding sequence is disrupted in the dog genome assembly, raising questions regarding the nature and control of recombination in dogs. By analyzing the sequences of PRDM9 orthologs in a number of dog breeds and several carnivores, we show here that this gene was inactivated early in canid evolution. We next use patterns of linkage disequilibrium using more than 170,000 SNP markers typed in almost 500 dogs to estimate the recombination rates in the dog genome using a coalescent-based approach. Broad-scale recombination rates show good correspondence with an existing linkage-based map. Significant variation in recombination rate is observed on the fine scale, and we are able to detect over 4000 recombination hotspots with high confidence. In contrast to human hotspots, 40% of canine hotspots are characterized by a distinct peak in GC content. A comparative genomic analysis indicates that these peaks are present also as weaker peaks in the panda, suggesting that the hotspots have been continually reinforced by accelerated and strongly GC biased nucleotide substitutions, consistent with the long-term action of biased gene conversion on the dog lineage. These results are consistent with the loss of PRDM9 in canids, resulting in a greater evolutionary stability of recombination hotspots. The genetic determinants of recombination hotspots in the dog genome may thus reflect a fundamental process of relevance to diverse animal species.
Collapse
Affiliation(s)
- Erik Axelsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75237 Uppsala, Sweden
| | - Matthew T. Webster
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75237 Uppsala, Sweden
| | - Abhirami Ratnakumar
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75237 Uppsala, Sweden
| | - Chris P. Ponting
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75237 Uppsala, Sweden
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
48
|
Webster MT, Hurst LD. Direct and indirect consequences of meiotic recombination: implications for genome evolution. Trends Genet 2011; 28:101-9. [PMID: 22154475 DOI: 10.1016/j.tig.2011.11.002] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 11/08/2011] [Accepted: 11/09/2011] [Indexed: 12/23/2022]
Abstract
There is considerable variation within eukaryotic genomes in the local rate of crossing over. Why is this and what effect does it have on genome evolution? On the genome scale, it is known that by shuffling alleles, recombination increases the efficacy of selection. By contrast, the extent to which differences in the recombination rate modulate the efficacy of selection between genomic regions is unclear. Recombination also has direct consequences on the origin and fate of mutations: biased gene conversion and other forms of meiotic drive promote the fixation of mutations in a similar way to selection, and recombination itself may be mutagenic. Consideration of both the direct and indirect effects of recombination is necessary to understand why its rate is so variable and for correct interpretation of patterns of genome evolution.
Collapse
Affiliation(s)
- Matthew T Webster
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | | |
Collapse
|
49
|
Kostka D, Hubisz MJ, Siepel A, Pollard KS. The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome. Mol Biol Evol 2011; 29:1047-57. [PMID: 22075116 PMCID: PMC3278478 DOI: 10.1093/molbev/msr279] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.
Collapse
Affiliation(s)
- Dennis Kostka
- Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | | | | | | |
Collapse
|
50
|
Late replicating domains are highly recombining in females but have low male recombination rates: implications for isochore evolution. PLoS One 2011; 6:e24480. [PMID: 21949720 PMCID: PMC3176772 DOI: 10.1371/journal.pone.0024480] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 08/11/2011] [Indexed: 01/01/2023] Open
Abstract
In mammals sequences that are either late replicating or highly recombining have high rates of evolution at putatively neutral sites. As early replicating domains and highly recombining domains both tend to be GC rich we a priori expect these two variables to covary. If so, the relative contribution of either of these variables to the local neutral substitution rate might have been wrongly estimated owing to covariance with the other. Against our expectations, we find that sex-averaged recombination rates show little or no correlation with replication timing, suggesting that they are independent determinants of substitution rates. However, this result masks significant sex-specific complexity: late replicating domains tend to have high recombination rates in females but low recombination rates in males. That these trends are antagonistic explains why sex-averaged recombination is not correlated with replication timing. This unexpected result has several important implications. First, although both male and female recombination rates covary significantly with intronic substitution rates, the magnitude of this correlation is moderately underestimated for male recombination and slightly overestimated for female recombination, owing to covariance with replicating timing. Second, the result could explain why male recombination is strongly correlated with GC content but female recombination is not. If to explain the correlation between GC content and replication timing we suppose that late replication forces reduced GC content, then GC promotion by biased gene conversion during female recombination is partly countered by the antagonistic effect of later replicating sequence tending increase AT content. Indeed, the strength of the correlation between female recombination rate and local GC content is more than doubled by control for replication timing. Our results underpin the need to consider sex-specific recombination rates and potential covariates in analysis of GC content and rates of evolution.
Collapse
|