1
|
Forni D, Pozzoli U, Mozzi A, Cagliani R, Sironi M. Depletion of CpG dinucleotides in bacterial genomes may represent an adaptation to high temperatures. NAR Genom Bioinform 2024; 6:lqae088. [PMID: 39071851 PMCID: PMC11282364 DOI: 10.1093/nargab/lqae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/17/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024] Open
Abstract
Dinucleotide biases have been widely investigated in the genomes of eukaryotes and viruses, but not in bacteria. We assembled a dataset of bacterial genomes (>15 000), which are representative of the genetic diversity in the kingdom Eubacteria, and we analyzed dinucleotide biases in relation to different traits. We found that TpA dinucleotides are the most depleted and that CpG dinucleotides show the widest dispersion. The abundances of both dinucleotides vary with genomic G + C content and show a very strong phylogenetic signal. After accounting for G + C content and phylogenetic inertia, we analyzed different bacterial lifestyle traits. We found that temperature preferences associate with the abundance of CpG dinucleotides, with thermophiles/hyperthemophiles being particularly depleted. Conversely, the TpA dinucleotide displays a bias that only depends on genomic G + C composition. Using predictions of intrinsic cyclizability we also show that CpG depletion may associate with higher DNA bendability in both thermophiles/hyperthermophiles and mesophiles, and that the former are predicted to have significantly more flexible genomes than the latter. We suggest that higher bendability is advantageous at high temperatures because it facilitates DNA positive supercoiling and that, through modulation of DNA mechanical properties, local or global CpG depletion controls genome organization, most likely not only in bacteria.
Collapse
Affiliation(s)
- Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| |
Collapse
|
2
|
Johnston SE. Understanding the Genetic Basis of Variation in Meiotic Recombination: Past, Present, and Future. Mol Biol Evol 2024; 41:msae112. [PMID: 38959451 PMCID: PMC11221659 DOI: 10.1093/molbev/msae112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/03/2024] [Accepted: 06/05/2024] [Indexed: 07/05/2024] Open
Abstract
Meiotic recombination is a fundamental feature of sexually reproducing species. It is often required for proper chromosome segregation and plays important role in adaptation and the maintenance of genetic diversity. The molecular mechanisms of recombination are remarkably conserved across eukaryotes, yet meiotic genes and proteins show substantial variation in their sequence and function, even between closely related species. Furthermore, the rate and distribution of recombination shows a huge diversity within and between chromosomes, individuals, sexes, populations, and species. This variation has implications for many molecular and evolutionary processes, yet how and why this diversity has evolved is not well understood. A key step in understanding trait evolution is to determine its genetic basis-that is, the number, effect sizes, and distribution of loci underpinning variation. In this perspective, I discuss past and current knowledge on the genetic basis of variation in recombination rate and distribution, explore its evolutionary implications, and present open questions for future research.
Collapse
Affiliation(s)
- Susan E Johnston
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
3
|
Grant AR, Johnson KP, Stanley EL, Baldwin-Brown J, Kolenčík S, Allen JM. Rapid Targeted Assembly of the Proteome Reveals Evolutionary Variation of GC Content in Avian Lice. Bioinform Biol Insights 2024; 18:11779322241257991. [PMID: 38860163 PMCID: PMC11163934 DOI: 10.1177/11779322241257991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 05/02/2024] [Indexed: 06/12/2024] Open
Abstract
Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.
Collapse
Affiliation(s)
- Avery R Grant
- Department of Biology, University of Nevada, Reno, Reno, NV, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Edward L Stanley
- Department of Natural History, Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | | | - Stanislav Kolenčík
- Faculty of Mathematics, Natural Sciences, and Information Technologies, University of Primorska, Koper, Slovenia
| | - Julie M Allen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
4
|
Joseph J, Prentout D, Laverré A, Tricou T, Duret L. High prevalence of PRDM9-independent recombination hotspots in placental mammals. Proc Natl Acad Sci U S A 2024; 121:e2401973121. [PMID: 38809707 PMCID: PMC11161765 DOI: 10.1073/pnas.2401973121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/26/2024] [Indexed: 05/31/2024] Open
Abstract
In many mammals, recombination events are concentrated in hotspots directed by a sequence-specific DNA-binding protein named PRDM9. Intriguingly, PRDM9 has been lost several times in vertebrates, and notably among mammals, it has been pseudogenized in the ancestor of canids. In the absence of PRDM9, recombination hotspots tend to occur in promoter-like features such as CpG islands. It has thus been proposed that one role of PRDM9 could be to direct recombination away from PRDM9-independent hotspots. However, the ability of PRDM9 to direct recombination hotspots has been assessed in only a handful of species, and a clear picture of how much recombination occurs outside of PRDM9-directed hotspots in mammals is still lacking. In this study, we derived an estimator of past recombination activity based on signatures of GC-biased gene conversion in substitution patterns. We quantified recombination activity in PRDM9-independent hotspots in 52 species of boreoeutherian mammals. We observe a wide range of recombination rates at these loci: several species (such as mice, humans, some felids, or cetaceans) show a deficit of recombination, while a majority of mammals display a clear peak of recombination. Our results demonstrate that PRDM9-directed and PRDM9-independent hotspots can coexist in mammals and that their coexistence appears to be the rule rather than the exception. Additionally, we show that the location of PRDM9-independent hotspots is relatively more stable than that of PRDM9-directed hotspots, but that PRDM9-independent hotspots nevertheless evolve slowly in concert with DNA hypomethylation.
Collapse
Affiliation(s)
- Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne69100, France
| | - Djivan Prentout
- Department of Biological Sciences, Columbia University, New York, NY10027
| | - Alexandre Laverré
- Department of Ecology and Evolution, University of Lausanne, LausanneCH-1015, Switzerland
- Swiss Institute of Bioinformatics, LausanneCH-1015, Switzerland
| | - Théo Tricou
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne69100, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne69100, France
| |
Collapse
|
5
|
Parée T, Noble L, Ferreira Gonçalves J, Teotónio H. rec-1 loss of function increases recombination in the central gene clusters at the expense of autosomal pairing centers. Genetics 2024; 226:iyad205. [PMID: 38001364 DOI: 10.1093/genetics/iyad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 10/03/2023] [Accepted: 11/08/2023] [Indexed: 11/26/2023] Open
Abstract
Meiotic control of crossover (CO) number and position is critical for homologous chromosome segregation and organismal fertility, recombination of parental genotypes, and the generation of novel genetic combinations. We here characterize the recombination rate landscape of a rec-1 loss of function modifier of CO position in Caenorhabditis elegans, one of the first ever modifiers discovered. By averaging CO position across hermaphrodite and male meioses and by genotyping 203 single-nucleotide variants covering about 95% of the genome, we find that the characteristic chromosomal arm-center recombination rate domain structure is lost in the loss of function rec-1 mutant. The rec-1 loss of function mutant smooths the recombination rate landscape but is insufficient to eliminate the nonuniform position of CO. Lower recombination rates in the rec-1 mutant are particularly found in the autosomal arm domains containing the pairing centers. We further find that the rec-1 mutant is of little consequence for organismal fertility and egg viability and thus for rates of autosomal nondisjunction. It nonetheless increases X chromosome nondisjunction rates and thus male appearance. Our findings question the maintenance of recombination rate heritability and genetic diversity among C. elegans natural populations, and they further suggest that manipulating genetic modifiers of CO position will help find quantitative trait loci located in low-recombining genomic regions normally refractory to discovery.
Collapse
Affiliation(s)
- Tom Parée
- Institut de Biologie de l'École Normale Supérieure, CNRS UMR, 8197, Inserm U1024, PSL Research University, Paris F-75005, France
| | - Luke Noble
- Institut de Biologie de l'École Normale Supérieure, CNRS UMR, 8197, Inserm U1024, PSL Research University, Paris F-75005, France
- EnviroDNA, 95 Albert St., Brunswick, Victoria 3065, Australia
| | - João Ferreira Gonçalves
- Institut de Biologie de l'École Normale Supérieure, CNRS UMR, 8197, Inserm U1024, PSL Research University, Paris F-75005, France
| | - Henrique Teotónio
- Institut de Biologie de l'École Normale Supérieure, CNRS UMR, 8197, Inserm U1024, PSL Research University, Paris F-75005, France
| |
Collapse
|
6
|
Forni D, Pozzoli U, Cagliani R, Sironi M. Dinucleotide biases in the genomes of prokaryotic and eukaryotic dsDNA viruses and their hosts. Mol Ecol 2024; 33:e17287. [PMID: 38263702 DOI: 10.1111/mec.17287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 12/21/2023] [Accepted: 01/15/2024] [Indexed: 01/25/2024]
Abstract
The genomes of cellular organisms display CpG and TpA dinucleotide composition biases. Such biases have been poorly investigated in dsDNA viruses. Here, we show that in dsDNA virus, bacterial, and eukaryotic genomes, the representation of TpA and CpG dinucleotides is strongly dependent on genomic G + C content. Thus, the classical observed/expected ratios do not fully capture dinucleotide biases across genomes. Because a larger portion of the variance in TpA frequency was explained by G + C content, we explored which additional factors drive the distribution of CpG dinucleotides. Using the residuals of the linear regressions as a measure of dinucleotide abundance and ancestral state reconstruction across eukaryotic and prokaryotic virus trees, we identified an important role for phylogeny in driving CpG representation. Nonetheless, phylogenetic ANOVA analyses showed that few host associations also account for significant variations. Among eukaryotic viruses, most significant differences were observed between arthropod-infecting viruses and viruses that infect vertebrates or unicellular organisms. However, an effect of viral DNA methylation status (either driven by the host or by viral-encoded methyltransferases) is also likely. Among prokaryotic viruses, cyanobacteria-infecting phages resulted to be significantly CpG-depleted, whereas phages that infect bacteria in the genera Burkolderia and Staphylococcus were CpG-rich. Comparison with bacterial genomes indicated that this effect is largely driven by the general tendency for phages to resemble the host's genomic CpG content. Notably, such tendency is stronger for temperate than for lytic phages. Our data shed light into the processes that shape virus genome composition and inform manipulation strategies for biotechnological applications.
Collapse
Affiliation(s)
- Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|
7
|
Liu Y, Liang N, Xian Q, Zhang W. GC heterogeneity reveals sequence-structures evolution of angiosperm ITS2. BMC PLANT BIOLOGY 2023; 23:608. [PMID: 38036992 PMCID: PMC10691020 DOI: 10.1186/s12870-023-04634-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 11/26/2023] [Indexed: 12/02/2023]
Abstract
BACKGROUND Despite GC variation constitutes a fundamental element of genome and species diversity, the precise mechanisms driving it remain unclear. The abundant sequence data available for the ITS2, a commonly employed phylogenetic marker in plants, offers an exceptional resource for exploring the GC variation across angiosperms. RESULTS A comprehensive selection of 8666 species, comprising 165 genera, 63 families, and 30 orders were used for the analyses. The alignment of ITS2 sequence-structures and partitioning of secondary structures into paired and unpaired regions were performed using 4SALE. Substitution rates and frequencies among GC base-pairs in the paired regions of ITS2 were calculated using RNA-specific models in the PHASE package. The results showed that the distribution of ITS2 GC contents on the angiosperm phylogeny was heterogeneous, but their increase was generally associated with ITS2 sequence homogenization, thereby supporting the occurrence of GC-biased gene conversion (gBGC) during the concerted evolution of ITS2. Additionally, the GC content in the paired regions of the ITS2 secondary structure was significantly higher than that of the unpaired regions, indicating the selection of GC for thermodynamic stability. Furthermore, the RNA substitution models demonstrated that base-pair transformations favored both the elevation and fixation of GC in the paired regions, providing further support for gBGC. CONCLUSIONS Our findings highlight the significance of secondary structure in GC investigation, which demonstrate that both gBGC and structure-based selection are influential factors driving angiosperm ITS2 GC content.
Collapse
Affiliation(s)
- Yubo Liu
- Marine College, Shandong University, Weihai, 264209, China
- Division of Physical Biology, CAS Key Laboratory of Interfacial Physics and Technology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 201800, China
| | - Nan Liang
- Marine College, Shandong University, Weihai, 264209, China
- Allergy Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Qing Xian
- Marine College, Shandong University, Weihai, 264209, China
| | - Wei Zhang
- Marine College, Shandong University, Weihai, 264209, China.
| |
Collapse
|
8
|
Salazar-Tortosa DF, Huang YF, Enard D. Assessing the Presence of Recent Adaptation in the Human Genome With Mixture Density Regression. Genome Biol Evol 2023; 15:evad170. [PMID: 37713622 PMCID: PMC10563788 DOI: 10.1093/gbe/evad170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 08/30/2023] [Accepted: 09/04/2023] [Indexed: 09/17/2023] Open
Abstract
How much genome differences between species reflect neutral or adaptive evolution is a central question in evolutionary genomics. In humans and other mammals, the presence of adaptive versus neutral genomic evolution has proven particularly difficult to quantify. The difficulty notably stems from the highly heterogeneous organization of mammalian genomes at multiple levels (functional sequence density, recombination, etc.) which complicates the interpretation and distinction of adaptive versus neutral evolution signals. In this study, we introduce mixture density regressions (MDRs) for the study of the determinants of recent adaptation in the human genome. MDRs provide a flexible regression model based on multiple Gaussian distributions. We use MDRs to model the association between recent selection signals and multiple genomic factors likely to affect the occurrence/detection of positive selection, if the latter was present in the first place to generate these associations. We find that an MDR model with two Gaussian distributions provides an excellent fit to the genome-wide distribution of a common sweep summary statistic (integrated haplotype score), with one of the two distributions likely enriched in positive selection. We further find several factors associated with signals of recent adaptation, including the recombination rate, the density of regulatory elements in immune cells, GC content, gene expression in immune cells, the density of mammal-wide conserved elements, and the distance to the nearest virus-interacting gene. These results support the presence of strong positive selection in recent human evolution and highlight MDRs as a powerful tool to make sense of signals of recent genomic adaptation.
Collapse
Affiliation(s)
- Diego F Salazar-Tortosa
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, USA
- Department of Ecology, University of Granada, Granada, Spain
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, State College, Pennsylvania, PA 16801, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, State College, Pennsylvania, PA 16801, USA
| | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, USA
| |
Collapse
|
9
|
Brovkina MV, Chapman MA, Holding ML, Clowney EJ. Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays. BMC Biol 2023; 21:179. [PMID: 37612705 PMCID: PMC10463633 DOI: 10.1186/s12915-023-01673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND The radiation of mammals at the extinction of the dinosaurs produced a plethora of new forms-as diverse as bats, dolphins, and elephants-in only 10-20 million years. Behind the scenes, adaptation to new niches is accompanied by extensive innovation in large families of genes that allow animals to contact the environment, including chemosensors, xenobiotic enzymes, and immune and barrier proteins. Genes in these "outward-looking" families are allelically diverse among humans and exhibit tissue-specific and sometimes stochastic expression. RESULTS Here, we show that these tandem arrays of outward-looking genes occupy AT-biased isochores and comprise the "tissue-specific" gene class that lack CpG islands in their promoters. Models of mammalian genome evolution have not incorporated the sharply different functions and transcriptional patterns of genes in AT- versus GC-biased regions. To examine the relationship between gene family expansion, sequence content, and allelic diversity, we use population genetic data and comparative analysis. First, we find that AT bias can emerge during evolutionary expansion of gene families in cis. Second, human genes in AT-biased isochores or with GC-poor promoters experience relatively low rates of de novo point mutation today but are enriched for non-synonymous variants. Finally, we find that isochores containing gene clusters exhibit low rates of recombination. CONCLUSIONS Our analyses suggest that tolerance of non-synonymous variation and low recombination are two forces that have produced the depletion of GC bases in outward-facing gene arrays. In turn, high AT content exerts a profound effect on their chromatin organization and transcriptional regulation.
Collapse
Affiliation(s)
- Margarita V Brovkina
- Graduate Program in Cellular and Molecular Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Margaret A Chapman
- Neurosciences Graduate Program, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - E Josephine Clowney
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA.
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
10
|
Wu C, Paradis NJ, Lakernick PM, Hryb M. L-shaped distribution of the relative substitution rate (c/μ) observed for SARS-COV-2's genome, inconsistent with the selectionist theory, the neutral theory and the nearly neutral theory but a near-neutral balanced selection theory: Implication on "neutralist-selectionist" debate. Comput Biol Med 2023; 153:106522. [PMID: 36638615 PMCID: PMC9814386 DOI: 10.1016/j.compbiomed.2022.106522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 12/17/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]
Abstract
The genomic substitution rate (GSR) of SARS-CoV-2 exhibits a molecular clock feature and does not change under fluctuating environmental factors such as the infected human population (10°-107), vaccination etc. The molecular clock feature is believed to be inconsistent with the selectionist theory (ST). The GSR shows lack of dependence on the effective population size, suggesting Ohta's nearly neutral theory (ONNT) is not applicable to this virus. Big variation of the substitution rate within its genome is also inconsistent with Kimura's neutral theory (KNT). Thus, all three existing evolution theories fail to explain the evolutionary nature of this virus. In this paper, we proposed a Segment Substitution Rate Model (SSRM) under non-neutral selections and pointed out that a balanced mechanism between negative and positive selection of some segments that could also lead to the molecular clock feature. We named this hybrid mechanism as near-neutral balanced selection theory (NNBST) and examined if it was followed by SARS-CoV-2 using the three independent sets of SARS-CoV-2 genomes selected by the Nextstrain team. Intriguingly, the relative substitution rate of this virus exhibited an L-shaped probability distribution consisting with NNBST rather than Poisson distribution predicted by KNT or an asymmetric distribution predicted by ONNT in which nearly neutral sites are believed to be slightly deleterious only, or the distribution that is lack of nearly neutral sites predicted by ST. The time-dependence of the substitution rates for some segments and their correlation with the vaccination were observed, supporting NNBST. Our relative substitution rate method provides a tool to resolve the long standing "neutralist-selectionist" controversy. Implications of NNBST in resolving Lewontin's Paradox is also discussed.
Collapse
Affiliation(s)
- Chun Wu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, NJ, 08028, USA; Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ, 08028, USA.
| | - Nicholas J Paradis
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, NJ, 08028, USA
| | - Phillip M Lakernick
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, NJ, 08028, USA
| | - Mariya Hryb
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, NJ, 08028, USA
| |
Collapse
|
11
|
Bergman J, Schierup MH. Evolutionary dynamics of pseudoautosomal region 1 in humans and great apes. Genome Biol 2022; 23:215. [PMID: 36253794 PMCID: PMC9575207 DOI: 10.1186/s13059-022-02784-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 09/30/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The pseudoautosomal region 1 (PAR1) is a 2.7 Mb telomeric region of human sex chromosomes. PAR1 has a crucial role in ensuring proper segregation of sex chromosomes during male meiosis, exposing it to extreme recombination and mutation processes. We investigate PAR1 evolution using population genomic datasets of extant humans, eight populations of great apes, and two archaic human genome sequences. RESULTS We find that PAR1 is fast evolving and closer to evolutionary nucleotide equilibrium than autosomal telomeres. We detect a difference between substitution patterns and extant diversity in PAR1, mainly driven by the conflict between strong mutation and recombination-associated fixation bias at CpG sites. We detect excess C-to-G mutations in PAR1 of all great apes, specific to the mutagenic effect of male recombination. Despite recent evidence for Y chromosome introgression from humans into Neanderthals, we find that the Neanderthal PAR1 retained similarity to the Denisovan sequence. We find differences between substitution spectra of these archaics suggesting rapid evolution of PAR1 in recent hominin history. Frequency analysis of alleles segregating in females and males provided no evidence for recent sexual antagonism in this region. We study repeat content and double-strand break hotspot regions in PAR1 and find that they may play roles in ensuring the obligate X-Y recombination event during male meiosis. CONCLUSIONS Our study provides an unprecedented quantification of population genetic forces governing PAR1 biology across extant and extinct hominids. PAR1 evolutionary dynamics are predominantly governed by recombination processes with a strong impact on mutation patterns across all species.
Collapse
Affiliation(s)
- Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | | |
Collapse
|
12
|
Revealing the Complete Chloroplast Genome of an Andean Horticultural Crop, Sweet Cucumber (Solanum muricatum), and Its Comparison with Other Solanaceae Species. DATA 2022. [DOI: 10.3390/data7090123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Sweet cucumber (Solanum muricatum) sect. Basarthrum is a neglected horticultural crop native to the Andean region. It is naturally distributed very close to other two Solanum crops of high importance, potatoes, and tomatoes. To date, molecular tools for this crop remain undetermined. In this study, the complete sweet cucumber chloroplast (cp) genome was obtained and compared with seven Solanaceae species. The cp genome of S. muricatum was 155,681 bp in length and included a large single copy (LSC) region of 86,182 bp and a small single-copy (SSC) region of 18,360 bp, separated by a pair of inverted repeats (IR) regions of 25,568 bp. The cp genome possessed 87 protein-coding genes (CDS), 37 transfer RNA (tRNA) genes, eight ribosomal RNA (rRNA) genes, and one pseudogene. Furthermore, 48 perfect microsatellites were identified. These repeats were mainly located in the noncoding regions. Whole cp genome comparative analysis revealed that the SSC and LSC regions showed more divergence than IR regions. Similar to previous studies, our phylogenetic analysis showed that S. muricatum is a sister species to members of sections Petota + Lycopersicum + Etuberosum. We expect that this first sweet cucumber chloroplast genome will provide potential molecular markers and genomic resources to shed light on the genetic diversity and population studies of S. muricatum, which will allow us to identify varieties and ecotypes. Finally, the features and the structural differentiation will provide us with information about the genes of interest, generating tools for the most precise selection of the best individuals of sweet cucumber, in less time and with fewer resources.
Collapse
|
13
|
Lian Q, Solier V, Walkemeier B, Durand S, Huettel B, Schneeberger K, Mercier R. The megabase-scale crossover landscape is largely independent of sequence divergence. Nat Commun 2022; 13:3828. [PMID: 35780220 PMCID: PMC9250513 DOI: 10.1038/s41467-022-31509-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/20/2022] [Indexed: 02/01/2023] Open
Abstract
Meiotic recombination frequency varies along chromosomes and strongly correlates with sequence divergence. However, the causal relationship between recombination landscapes and polymorphisms is unclear. Here, we characterize the genome-wide recombination landscape in the quasi-absence of polymorphisms, using Arabidopsis thaliana homozygous inbred lines in which a few hundred genetic markers were introduced through mutagenesis. We find that megabase-scale recombination landscapes in inbred lines are strikingly similar to the recombination landscapes in hybrids, with the notable exception of heterozygous large rearrangements where recombination is prevented locally. In addition, the megabase-scale recombination landscape can be largely explained by chromatin features. Our results show that polymorphisms are not a major determinant of the shape of the megabase-scale recombination landscape but rather favour alternative models in which recombination and chromatin shape sequence divergence across the genome. The frequency of recombination varies along chromosomes and highly correlates with sequence divergence. Here, the authors show that polymorphisms are not a major determinant of the megabase-scale recombination landscape in Arabidopsis, which is rather determined by chromatin accessibility and DNA methylation.
Collapse
Affiliation(s)
- Qichao Lian
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Victor Solier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Birgit Walkemeier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Stéphanie Durand
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Bruno Huettel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Korbinian Schneeberger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany. .,Faculty of Biology, LMU Munich, 82152, Planegg-Martinsried, Germany.
| | - Raphael Mercier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany.
| |
Collapse
|
14
|
Ho AT, Hurst LD. Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection. PLoS Biol 2022; 20:e3001588. [PMID: 35550630 PMCID: PMC9129041 DOI: 10.1371/journal.pbio.3001588] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 05/24/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open
Abstract
The assumption that conservation of sequence implies the action of purifying selection is central to diverse methodologies to infer functional importance. GC-biased gene conversion (gBGC), a meiotic mismatch repair bias strongly favouring GC over AT, can in principle mimic the action of selection, this being thought to be especially important in mammals. As mutation is GC→AT biased, to demonstrate that gBGC does indeed cause false signals requires evidence that an AT-rich residue is selectively optimal compared to its more GC-rich allele, while showing also that the GC-rich alternative is conserved. We propose that mammalian stop codon evolution provides a robust test case. Although in most taxa TAA is the optimal stop codon, TGA is both abundant and conserved in mammalian genomes. We show that this mammalian exceptionalism is well explained by gBGC mimicking purifying selection and that TAA is the selectively optimal codon. Supportive of gBGC, we observe (i) TGA usage trends are consistent at the focal stop codon and elsewhere (in UTR sequences); (ii) that higher TGA usage and higher TAA→TGA substitution rates are predicted by a high recombination rate; and (iii) across species the difference in TAA <-> TGA substitution rates between GC-rich and GC-poor genes is largest in genomes that possess higher between-gene GC variation. TAA optimality is supported both by enrichment in highly expressed genes and trends associated with effective population size. High TGA usage and high TAA→TGA rates in mammals are thus consistent with gBGC’s predicted ability to “drive” deleterious mutations and supports the hypothesis that sequence conservation need not be indicative of purifying selection. A general trend for GC-rich trinucleotides to reside at frequencies far above their mutational equilibrium in high recombining domains supports the generality of these results.
Collapse
Affiliation(s)
- Alexander Thomas Ho
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- * E-mail:
| | | |
Collapse
|
15
|
Laverre A, Tannier E, Necsulea A. Long-range promoter-enhancer contacts are conserved during evolution and contribute to gene expression robustness. Genome Res 2021; 32:280-296. [PMID: 34930799 PMCID: PMC8805723 DOI: 10.1101/gr.275901.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 12/16/2021] [Indexed: 11/25/2022]
Abstract
Gene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements are rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve not only regulatory element sequences but also their chromatin contacts with target genes. The extent of evolutionary conservation is remarkable for long-range promoter–enhancer contacts, illustrating how the structure of regulatory landscapes constrains large-scale genome evolution. We show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny, or contacts with target genes, is significantly associated with gene expression evolution.
Collapse
Affiliation(s)
- Alexandre Laverre
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive
| | - Eric Tannier
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, Centre de recherche Inria de Lyon
| | | |
Collapse
|
16
|
Lucena-Perez M, Kleinman-Ruiz D, Marmesat E, Saveljev AP, Schmidt K, Godoy JA. Bottleneck-associated changes in the genomic landscape of genetic diversity in wild lynx populations. Evol Appl 2021; 14:2664-2679. [PMID: 34815746 PMCID: PMC8591332 DOI: 10.1111/eva.13302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 08/17/2021] [Accepted: 09/08/2021] [Indexed: 01/06/2023] Open
Abstract
Demographic bottlenecks generally reduce genetic diversity through more intense genetic drift, but their net effect may vary along the genome due to the random nature of genetic drift and to local effects of recombination, mutation, and selection. Here, we analyzed the changes in genetic diversity following a bottleneck by comparing whole-genome diversity patterns in populations with and without severe recent documented declines of Iberian (Lynx pardinus, n = 31) and Eurasian lynx (Lynx lynx, n = 29). As expected, overall genomic diversity correlated negatively with bottleneck intensity and/or duration. Correlations of genetic diversity with divergence, chromosome size, gene or functional site content, GC content, or recombination were observed in nonbottlenecked populations, but were weaker in bottlenecked populations. Also, functional features under intense purifying selection and the X chromosome showed an increase in the observed density of variants, even resulting in higher θ W diversity than in nonbottlenecked populations. Increased diversity seems to be related to both a higher mutational input in those regions creating a large collection of low-frequency variants, a few of which increase in frequency during the bottleneck to the point they become detectable with our limited sample, and the reduced efficacy of purifying selection, which affects not only protein structure and function but also the regulation of gene expression. The results of this study alert to the possible reduction of fitness and adaptive potential associated with the genomic erosion in regulatory elements. Further, the detection of a gain of diversity in ultra-conserved elements can be used as a sensitive and easy-to-apply signature of genetic erosion in wild populations.
Collapse
Affiliation(s)
- Maria Lucena-Perez
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
| | - Daniel Kleinman-Ruiz
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
- Departamento de Genética Facultad de Biología Universidad Complutense Madrid Spain
| | - Elena Marmesat
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
| | - Alexander P Saveljev
- Department of Animal Ecology Russian Research Institute of Game Management and Fur Farming Kirov Russia
| | - Krzysztof Schmidt
- Mammal Research Institute Polish Academy of Sciences Białowieża Poland
| | - José A Godoy
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
| |
Collapse
|
17
|
Jackson EK, Bellott DW, Skaletsky H, Page DC. GC-biased gene conversion in X-chromosome palindromes conserved in human, chimpanzee, and rhesus macaque. G3 GENES|GENOMES|GENETICS 2021; 11:6317831. [PMID: 34849781 PMCID: PMC8981503 DOI: 10.1093/g3journal/jkab224] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/28/2021] [Indexed: 12/03/2022]
Abstract
Gene conversion is GC-biased across a wide range of taxa. Large palindromes on mammalian
sex chromosomes undergo frequent gene conversion that maintains arm-to-arm sequence
identity greater than 99%, which may increase their susceptibility to the effects of
GC-biased gene conversion. Here, we demonstrate a striking history of GC-biased gene
conversion in 12 palindromes conserved on the X chromosomes of human, chimpanzee, and
rhesus macaque. Primate X-chromosome palindrome arms have significantly higher GC content
than flanking single-copy sequences. Nucleotide replacements that occurred in human and
chimpanzee palindrome arms over the past 7 million years are one-and-a-half times as
GC-rich as the ancestral bases they replaced. Using simulations, we show that our observed
pattern of nucleotide replacements is consistent with GC-biased gene conversion with a
magnitude of 70%, similar to previously reported values based on analyses of human
meioses. However, GC-biased gene conversion since the divergence of human and rhesus
macaque explains only a fraction of the observed difference in GC content between
palindrome arms and flanking sequence, suggesting that palindromes are older than 29
million years and/or had elevated GC content at the time of their formation. This work
supports a greater than 2:1 preference for GC bases over AT bases during gene conversion
and demonstrates that the evolution and composition of mammalian sex chromosome
palindromes is strongly influenced by GC-biased gene conversion.
Collapse
Affiliation(s)
- Emily K Jackson
- Whitehead Institute, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | - Helen Skaletsky
- Whitehead Institute, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, MA 02142, USA
| | - David C Page
- Whitehead Institute, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
18
|
Seplyarskiy VB, Sunyaev S. The origin of human mutation in light of genomic data. Nat Rev Genet 2021; 22:672-686. [PMID: 34163020 DOI: 10.1038/s41576-021-00376-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line. With the advent of high-throughput sequencing, genome-scale data have revealed statistical properties of mutagenesis in humans. These properties include variation of the mutation rate and spectrum along the genome at different scales in relation to epigenomic features and dependency on parental age. Moreover, mutations originated in mothers are less frequent than mutations originated in fathers and have a distinct genomic distribution. Statistical analyses that interpret these patterns in the context of known biochemistry can provide mechanistic models of mutagenesis in humans.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
19
|
Fu Y, Mahmoud M, Muraliraman VV, Sedlazeck FJ, Treangen TJ. Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience 2021; 10:6375129. [PMID: 34561697 PMCID: PMC8463296 DOI: 10.1093/gigascience/giab063] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/22/2021] [Accepted: 08/29/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection. FINDINGS We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone. CONCLUSIONS Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX 77251-1892, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX 77251-1892, USA
| |
Collapse
|
20
|
Neupane S, Xu S. Adaptive Divergence of Meiotic Recombination Rate in Ecological Speciation. Genome Biol Evol 2021; 12:1869-1881. [PMID: 32857858 PMCID: PMC7594247 DOI: 10.1093/gbe/evaa182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2020] [Indexed: 02/06/2023] Open
Abstract
Theories predict that directional selection during adaptation to a novel habitat results in elevated meiotic recombination rate. Yet the lack of population-level recombination rate data leaves this hypothesis untested in natural populations. Here, we examine the population-level recombination rate variation in two incipient ecological species, the microcrustacean Daphnia pulex (an ephemeral-pond species) and Daphnia pulicaria (a permanent-lake species). The divergence of D. pulicaria from D. pulex involved habitat shifts from pond to lake habitats as well as strong local adaptation due to directional selection. Using a novel single-sperm genotyping approach, we estimated the male-specific recombination rate of two linkage groups in multiple populations of each species in common garden experiments and identified a significantly elevated recombination rate in D. pulicaria. Most importantly, population genetic analyses show that the divergence in recombination rate between these two species is most likely due to divergent selection in distinct ecological habitats rather than neutral evolution.
Collapse
Affiliation(s)
| | - Sen Xu
- Department of Biology, University of Texas at Arlington
| |
Collapse
|
21
|
Bergman J, Schierup MH. Population dynamics of GC-changing mutations in humans and great apes. Genetics 2021; 218:6291657. [PMID: 34081117 DOI: 10.1093/genetics/iyab083] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 05/27/2021] [Indexed: 11/14/2022] Open
Abstract
The nucleotide composition of the genome is a balance between origin and fixation rates of different mutations. For example, it is well-known that transitions occur more frequently than transversions, particularly at CpG sites. Differences in fixation rates of mutation types are less explored. Specifically, recombination-associated GC-biased gene conversion (gBGC) may differentially impact GC-changing mutations, due to differences in their genomic distributions and efficiency of mismatch repair mechanisms. Given that recombination evolves rapidly across species, we explore gBGC of different mutation types across human populations and great ape species. We report a stronger correlation between segregating GC frequency and recombination for transitions than for transversions. Notably, CpG transitions are most strongly affected by gBGC in humans and chimpanzees. We show that the overall strength of gBGC is generally correlated with effective population sizes in humans, with some notable exceptions, such as a stronger effect of gBGC on non-CpG transitions in populations of European descent. Furthermore, species of the Gorilla and Pongo genus have a greatly reduced gBGC effect on CpG sites. We also study the dependence of gBGC dynamics on flanking nucleotides and show that some mutation types evolve in opposition to the gBGC expectation, likely due to hypermutability of specific nucleotide contexts. Our results highlight the importance of different gBGC dynamics experienced by GC-changing mutations and their impact on nucleotide composition evolution.
Collapse
Affiliation(s)
- Juraj Bergman
- Bioinformatics Research Institute, Aarhus University, DK-8000 Aarhus C, Denmark
| | | |
Collapse
|
22
|
Chen D, Cremona MA, Qi Z, Mitra RD, Chiaromonte F, Makova KD. Human L1 Transposition Dynamics Unraveled with Functional Data Analysis. Mol Biol Evol 2021; 37:3576-3600. [PMID: 32722770 DOI: 10.1093/molbev/msaa194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies.
Collapse
Affiliation(s)
- Di Chen
- Intercollege Graduate Degree Program in Genetics, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA.,Department of Operations and Decision Systems, Université Laval, Québec, Canada
| | - Zongtai Qi
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Robi D Mitra
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA.,EMbeDS, Sant'Anna School of Advanced Studies, Pisa, Italy.,The Huck Institutes of the Life Sciences, Center for Medical Genomics, The Pennsylvania State University, University Park, PA
| | - Kateryna D Makova
- The Huck Institutes of the Life Sciences, Center for Medical Genomics, The Pennsylvania State University, University Park, PA.,Department of Biology, The Pennsylvania State University, University Park, PA
| |
Collapse
|
23
|
Yi SV, Goodisman MAD. The impact of epigenetic information on genome evolution. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200114. [PMID: 33866804 DOI: 10.1098/rstb.2020.0114] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Epigenetic information affects gene function by interacting with chromatin, while not changing the DNA sequence itself. However, it has become apparent that the interactions between epigenetic information and chromatin can, in fact, indirectly lead to DNA mutations and ultimately influence genome evolution. This review evaluates the ways in which epigenetic information affects genome sequence and evolution. We discuss how DNA methylation has strong and pervasive effects on DNA sequence evolution in eukaryotic organisms. We also review how the physical interactions arising from the connections between histone proteins and DNA affect DNA mutation and repair. We then discuss how a variety of epigenetic mechanisms exert substantial effects on genome evolution by suppressing the movement of transposable elements. Finally, we examine how genome expansion through gene duplication is also partially controlled by epigenetic information. Overall, we conclude that epigenetic information has widespread indirect effects on DNA sequences in eukaryotes and represents a potent cause and constraint of genome evolution. This article is part of the theme issue 'How does epigenetics influence the course of evolution?'
Collapse
Affiliation(s)
- Soojin V Yi
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Michael A D Goodisman
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
24
|
Abstract
Recombination increases the local GC-content in genomic regions through GC-biased gene conversion (gBGC). The recent discovery of a large genomic region with extreme GC-content in the fat sand rat Psammomys obesus provides a model to study the effects of gBGC on chromosome evolution. Here, we compare the GC-content and GC-to-AT substitution patterns across protein-coding genes of four gerbil species and two murine rodents (mouse and rat). We find that the known high-GC region is present in all the gerbils, and is characterized by high substitution rates for all mutational categories (AT-to-GC, GC-to-AT, and GC-conservative) both at synonymous and nonsynonymous sites. A higher AT-to-GC than GC-to-AT rate is consistent with the high GC-content. Additionally, we find more than 300 genes outside the known region with outlying values of AT-to-GC synonymous substitution rates in gerbils. Of these, over 30% are organized into at least 17 large clusters observable at the megabase-scale. The unusual GC-skewed substitution pattern suggests the evolution of genomic regions with very high recombination rates in the gerbil lineage, which can lead to a runaway increase in GC-content. Our results imply that rapid evolution of GC-content is possible in mammals, with gerbil species providing a powerful model to study the mechanisms of gBGC.
Collapse
Affiliation(s)
- Rodrigo Pracana
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | | - John F Mulley
- School of Natural Sciences, Bangor University, Bangor, Gwynedd, United Kingdom
| | | |
Collapse
|
25
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
26
|
Yu Y, Li HT, Wu YH, Li DZ. Correlation Analysis Reveals an Important Role of GC Content in Accumulation of Deletion Mutations in the Coding Region of Angiosperm Plastomes. J Mol Evol 2021; 89:73-80. [PMID: 33433638 DOI: 10.1007/s00239-020-09987-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 12/21/2020] [Indexed: 10/22/2022]
Abstract
Variation in GC content is assumed to correlate with various processes, including mutation biases, recombination, and environmental parameters. To date, most genomic studies exploring the evolution of GC content have focused on nuclear genomes, but relatively few have concentrated on organelle genomes. We explored the mechanisms maintaining the GC content in angiosperm plastomes, with a particular focus on the hypothesis of phylogenetic dependence and the correlation with deletion mutations. We measured three genetic traits, namely, GC content, A/T tracts, and G/C tracts, in the coding region of plastid genomes for 1382 angiosperm species representing 350 families and 64 orders, and tested the phylogenetic signal. Then, we performed correlation analyses and revealed the variation in evolutionary rate of selected traits using RRphylo. The plastid GC content in the coding region varied from 28.10% to 43.20% across angiosperms, with a few non-photosynthetic species showing highly reduced values, highlighting the significance of functional constraints. We found strong phylogenetic signal in A/T tracts, but weak ones in GC content and G/C tracts, indicating adaptive potential. GC content was positively and negatively correlated with G/C and A/T tracts, respectively, suggesting a trade-off between these two deletion events. GC content evolved at various rates across the phylogeny, with significant increases in monocots and Lamiids, and a decrease in Fabids, implying the effects of some other factors. We hypothesize that variation in plastid GC content might be a mixed strategy of species to optimize fitness in fluctuating climates, partly through influencing the trade-off between AT → GC and GC → AT mutations.
Collapse
Affiliation(s)
- Ying Yu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Tao Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Yu-Huan Wu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 311121, China.
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
| |
Collapse
|
27
|
Abstract
Today massive amounts of sequenced metagenomic and metatranscriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding of biological systems analysed from different perspectives is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data are of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data have been successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects. This article is part of the theme issue ‘Integrative research perspectives on marine conservation’.
Collapse
Affiliation(s)
- Ovidiu Popa
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| | - Ellen Oldenburg
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| |
Collapse
|
28
|
Sun J, Zhang Y, Wang M, Guan Q, Yang X, Ou JX, Yan M, Wang C, Zhang Y, Li ZH, Lan C, Mao C, Zhou HW, Hao B, Zhang Z. The Biological Significance of Multi-copy Regions and Their Impact on Variant Discovery. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:516-524. [PMID: 32827758 PMCID: PMC8377240 DOI: 10.1016/j.gpb.2019.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 05/07/2019] [Accepted: 06/06/2019] [Indexed: 11/23/2022]
Abstract
Identification of genetic variants via high-throughput sequencing (HTS) technologies has been essential for both fundamental and clinical studies. However, to what extent the genome sequence composition affects variant calling remains unclear. In this study, we identified 63,897 multi-copy sequences (MCSs) with a minimum length of 300 bp, each of which occurs at least twice in the human genome. The 151,749 genomic loci (multi-copy regions, or MCRs) harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes. MCRs containing the same MCS tend to be located on the same chromosome. Gene Ontology (GO) analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgi-related cellular component terms and various enzymatic activities in the GO biological function category. MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks. Moreover, genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs. Using simulated HTS datasets, we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions. These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.
Collapse
Affiliation(s)
- Jing Sun
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China; Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China
| | - Yanfang Zhang
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China
| | - Minhui Wang
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Qian Guan
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Xiujia Yang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China
| | - Jin Xia Ou
- Microbiome Medicine Center, Division of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, China
| | - Mingchen Yan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Chengrui Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yan Zhang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zhi-Hao Li
- Division of Epidemiology, School of Public Health, Southern Medical University, Guangzhou 510515, China
| | - Chunhong Lan
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China; Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China
| | - Chen Mao
- Division of Epidemiology, School of Public Health, Southern Medical University, Guangzhou 510515, China
| | - Hong-Wei Zhou
- Microbiome Medicine Center, Division of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, China
| | - Bingtao Hao
- Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China.
| | - Zhenhai Zhang
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China; Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China.
| |
Collapse
|
29
|
Bruijnesteijn J, de Groot NG, Bontrop RE. The Genetic Mechanisms Driving Diversification of the KIR Gene Cluster in Primates. Front Immunol 2020; 11:582804. [PMID: 33013938 PMCID: PMC7516082 DOI: 10.3389/fimmu.2020.582804] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 08/18/2020] [Indexed: 12/26/2022] Open
Abstract
The activity and function of natural killer (NK) cells are modulated through the interactions of multiple receptor families, of which some recognize MHC class I molecules. The high level of MHC class I polymorphism requires their ligands either to interact with conserved epitopes, as is utilized by the NKG2A receptor family, or to co-evolve with the MHC class I allelic variation, which task is taken up by the killer cell immunoglobulin-like receptor (KIR) family. Multiple molecular mechanisms are responsible for the diversification of the KIR gene system, and include abundant chromosomal recombination, high mutation rates, alternative splicing, and variegated expression. The combination of these genetic mechanisms generates a compound array of diversity as is reflected by the contraction and expansion of KIR haplotypes, frequent birth of fusion genes, allelic polymorphism, structurally distinct isoforms, and variegated expression, which is in contrast to the mainly allelic nature of MHC class I polymorphism in humans. A comparison of the thoroughly studied human and macaque KIR gene repertoires demonstrates a similar evolutionarily conserved toolbox, through which selective forces drove and maintained the diversified nature of the KIR gene cluster. This hypothesis is further supported by the comparative genetics of KIR haplotypes and genes in other primate species. The complex nature of the KIR gene system has an impact upon the education, activity, and function of NK cells in coherence with an individual’s MHC class I repertoire and pathogenic encounters. Although selection operates on an individual, the continuous diversification of the KIR gene system in primates might protect populations against evolving pathogens.
Collapse
Affiliation(s)
- Jesse Bruijnesteijn
- Comparative Genetics and Refinement, Biomedical Primate Research Centre, Rijswijk, Netherlands
| | - Natasja G de Groot
- Comparative Genetics and Refinement, Biomedical Primate Research Centre, Rijswijk, Netherlands
| | - Ronald E Bontrop
- Comparative Genetics and Refinement, Biomedical Primate Research Centre, Rijswijk, Netherlands.,Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
30
|
Simon H, Huttley G. Quantifying Influences on Intragenomic Mutation Rate. G3 (BETHESDA, MD.) 2020; 10:2641-2652. [PMID: 32527747 PMCID: PMC7407452 DOI: 10.1534/g3.120.401335] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 05/28/2020] [Indexed: 12/14/2022]
Abstract
We report work to quantify the impact on the probability of human genome polymorphism both of recombination and of sequence context at different scales. We use population-based analyses of data on human genetic variants obtained from the public Ensembl database. For recombination, we calculate the variance due to recombination and the probability that a recombination event causes a mutation. We employ novel statistical procedures to take account of the spatial auto-correlation of recombination and mutation rates along the genome. Our results support the view that genomic diversity in recombination hotspots arises largely from a direct effect of recombination on mutation rather than predominantly from the effect of selective sweeps. We also use the statistic of variance due to context to compare the effect on the probability of polymorphism of contexts of various sizes. We find that when the 12 point mutations are considered separately, variance due to context increases significantly as we move from 3-mer to 5-mer and from 5-mer to 7-mer contexts. However, when all mutations are considered in aggregate, these differences are outweighed by the effect of interaction between the central base and its immediate neighbors. This interaction is itself dominated by the transition mutations, including, but not limited to, the CpG effect. We also demonstrate strand-asymmetry of contextual influence in intronic regions, which is hypothesized to be a result of transcription coupled DNA repair. We consider the extent to which the measures we have used can be used to meaningfully compare the relative magnitudes of the impact of recombination and context on mutation.
Collapse
Affiliation(s)
- Helmut Simon
- Research School of Biology, the Australian National University
| | - Gavin Huttley
- Research School of Biology, the Australian National University
| |
Collapse
|
31
|
Pedrola-Monfort J, Lázaro-Gimeno D, Boluda CG, Pedrola L, Garmendia A, Soler C, Soriano JM. Evolutionary Trends in the Mitochondrial Genome of Archaeplastida: How Does the GC Bias Affect the Transition from Water to Land? PLANTS 2020; 9:plants9030358. [PMID: 32178249 PMCID: PMC7154891 DOI: 10.3390/plants9030358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 03/09/2020] [Accepted: 03/11/2020] [Indexed: 12/22/2022]
Abstract
Among the most intriguing mysteries in the evolutionary biology of photosynthetic organisms are the genesis and consequences of the dramatic increase in the mitochondrial and nuclear genome sizes, together with the concomitant evolution of the three genetic compartments, particularly during the transition from water to land. To clarify the evolutionary trends in the mitochondrial genome of Archaeplastida, we analyzed the sequences from 37 complete genomes. Therefore, we utilized mitochondrial, plastidial and nuclear ribosomal DNA molecular markers on 100 species of Streptophyta for each subunit. Hierarchical models of sequence evolution were fitted to test the heterogeneity in the base composition. The best resulting phylogenies were used for reconstructing the ancestral Guanine-Cytosine (GC) content and equilibrium GC frequency (GC*) using non-homogeneous and non-stationary models fitted with a maximum likelihood approach. The mitochondrial genome length was strongly related to repetitive sequences across Archaeplastida evolution; however, the length seemed not to be linked to the other studied variables, as different lineages showed diverse evolutionary patterns. In contrast, Streptophyta exhibited a powerful positive relationship between the GC content, non-coding DNA, and repetitive sequences, while the evolution of Chlorophyta reflected a strong positive linear relationship between the genome length and the number of genes.
Collapse
Affiliation(s)
- Joan Pedrola-Monfort
- Cavanilles Institute of Biodiversity and Evolutionary Biology, University of Valencia, 46980 Paterna, Spain; (J.P.-M.); (D.L.-G.); (C.G.B.); (L.P.)
| | - David Lázaro-Gimeno
- Cavanilles Institute of Biodiversity and Evolutionary Biology, University of Valencia, 46980 Paterna, Spain; (J.P.-M.); (D.L.-G.); (C.G.B.); (L.P.)
| | - Carlos G. Boluda
- Cavanilles Institute of Biodiversity and Evolutionary Biology, University of Valencia, 46980 Paterna, Spain; (J.P.-M.); (D.L.-G.); (C.G.B.); (L.P.)
- Unité de Phylogénie et Génetique Moléculaires, Conservatoire et Jardin Botaniques, Chambésy, 1292 Geneva, Switzerland
| | - Laia Pedrola
- Cavanilles Institute of Biodiversity and Evolutionary Biology, University of Valencia, 46980 Paterna, Spain; (J.P.-M.); (D.L.-G.); (C.G.B.); (L.P.)
| | - Alfonso Garmendia
- Mediterranean Agroforestry Institute, Department of Agroforest Ecosystems, Polytechnic University of Valencia, 46022 Valencia, Spain;
| | - Carla Soler
- Biomaterials, Institute of Materials Science, University of Valencia, 46980 Paterna, Spain;
| | - Jose M. Soriano
- Biomaterials, Institute of Materials Science, University of Valencia, 46980 Paterna, Spain;
- Correspondence: ; Tel.: +34-963-543-056
| |
Collapse
|
32
|
Abstract
Sex differences in overall recombination rates are well known, but little theoretical or empirical attention has been given to how and why sexes differ in their recombination landscapes: the patterns of recombination along chromosomes. In the first scientific review of this phenomenon, we find that recombination is biased toward telomeres in males and more uniformly distributed in females in most vertebrates and many other eukaryotes. Notable exceptions to this pattern exist, however. Fine-scale recombination patterns also frequently differ between males and females. The molecular mechanisms responsible for sex differences remain unclear, but chromatin landscapes play a role. Why these sex differences evolve also is unclear. Hypotheses suggest that they may result from sexually antagonistic selection acting on coding genes and their regulatory elements, meiotic drive in females, selection during the haploid phase of the life cycle, selection against aneuploidy, or mechanistic constraints. No single hypothesis, however, can adequately explain the evolution of sex differences in all cases. Sex-specific recombination landscapes have important consequences for population differentiation and sex chromosome evolution.
Collapse
Affiliation(s)
- Jason M. Sardell
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712
| | - Mark Kirkpatrick
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712
| |
Collapse
|
33
|
Castellano D, Eyre-Walker A, Munch K. Impact of Mutation Rate and Selection at Linked Sites on DNA Variation across the Genomes of Humans and Other Homininae. Genome Biol Evol 2020; 12:3550-3561. [PMID: 31596481 PMCID: PMC6944223 DOI: 10.1093/gbe/evz215] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2019] [Indexed: 12/23/2022] Open
Abstract
DNA diversity varies across the genome of many species. Variation in diversity across a genome might arise from regional variation in the mutation rate, variation in the intensity and mode of natural selection, and regional variation in the recombination rate. We show that both noncoding and nonsynonymous diversity are positively correlated to a measure of the mutation rate and the recombination rate and negatively correlated to the density of conserved sequences in 50 kb windows across the genomes of humans and nonhuman homininae. Interestingly, we find that although noncoding diversity is equally affected by these three genomic variables, nonsynonymous diversity is mostly dominated by the density of conserved sequences. The positive correlation between diversity and our measure of the mutation rate seems to be largely a direct consequence of regions with higher mutation rates having more diversity. However, the positive correlation with recombination rate and the negative correlation with the density of conserved sequences suggest that selection at linked sites also affect levels of diversity. This is supported by the observation that the ratio of the number of nonsynonymous to noncoding polymorphisms is negatively correlated to a measure of the effective population size across the genome. We show these patterns persist even when we restrict our analysis to GC-conservative mutations, demonstrating that the patterns are not driven by GC biased gene conversion. In conclusion, our comparative analyses describe how recombination rate, gene density, and mutation rate interact to produce the patterns of DNA diversity that we observe along the hominine genomes.
Collapse
Affiliation(s)
- David Castellano
- Bioinformatics Research Centre, Aarhus University, Denmark
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona, Spain
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, Denmark
| |
Collapse
|
34
|
Kader F, Ghai M, Olaniran AO. Characterization of DNA methylation-based markers for human body fluid identification in forensics: a critical review. Int J Legal Med 2019; 134:1-20. [PMID: 31713682 DOI: 10.1007/s00414-019-02181-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 10/15/2019] [Indexed: 02/07/2023]
Abstract
Body fluid identification in crime scene investigations aids in reconstruction of crime scenes. Several studies have identified and reported differentially methylated sites (DMSs) and regions (DMRs) which differ between forensically relevant tissues (tDMRs) and body fluids. Diverse factors affect methylation patterns such as the environment, diets, lifestyle, disease, ethnicity, genetic variation, amongst others. Thus, it is important to analyse the stability of markers employed for forensic identification. Furthermore, even though epigenetic modifications are described as stable and heritable, epigenetic inheritance of potential markers for body fluid identification needs to be assessed in the long term. Here, we discuss the current status of reported DNA methylation-based markers and their verification studies. Such thorough investigation is crucial to develop a stable panel of DNA methylation-based markers for accurate body fluid identification.
Collapse
Affiliation(s)
- Farzeen Kader
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Private Bag X54001, Durban, Republic of South Africa
| | - Meenu Ghai
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Private Bag X54001, Durban, Republic of South Africa.
| | - Ademola O Olaniran
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Private Bag X54001, Durban, Republic of South Africa
| |
Collapse
|
35
|
Dapper AL, Payseur BA. Molecular evolution of the meiotic recombination pathway in mammals. Evolution 2019; 73:2368-2389. [PMID: 31579931 DOI: 10.1111/evo.13850] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/07/2019] [Indexed: 02/06/2023]
Abstract
Meiotic recombination shapes evolution and helps to ensure proper chromosome segregation in most species that reproduce sexually. Recombination itself evolves, with species showing considerable divergence in the rate of crossing-over. However, the genetic basis of this divergence is poorly understood. Recombination events are produced via a complicated, but increasingly well-described, cellular pathway. We apply a phylogenetic comparative approach to a carefully selected panel of genes involved in the processes leading to crossovers-spanning double-strand break formation, strand invasion, the crossover/non-crossover decision, and resolution-to reconstruct the evolution of the recombination pathway in eutherian mammals and identify components of the pathway likely to contribute to divergence between species. Eleven recombination genes, predominantly involved in the stabilization of homologous pairing and the crossover/non-crossover decision, show evidence of rapid evolution and positive selection across mammals. We highlight TEX11 and associated genes involved in the synaptonemal complex and the early stages of the crossover/non-crossover decision as candidates for the evolution of recombination rate. Evolutionary comparisons to MLH1 count, a surrogate for the number of crossovers, reveal a positive correlation between genome-wide recombination rate and the rate of evolution at TEX11 across the mammalian phylogeny. Our results illustrate the power of viewing the evolution of recombination from a pathway perspective.
Collapse
Affiliation(s)
- Amy L Dapper
- Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin, 53706.,Department of Biological Sciences, Mississippi State University, Mississippi, 39762
| | - Bret A Payseur
- Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin, 53706
| |
Collapse
|
36
|
Lim MCW, Witt CC, Graham CH, Dávalos LM. Parallel Molecular Evolution in Pathways, Genes, and Sites in High-Elevation Hummingbirds Revealed by Comparative Transcriptomics. Genome Biol Evol 2019; 11:1552-1572. [PMID: 31028697 PMCID: PMC6553502 DOI: 10.1093/gbe/evz101] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2019] [Indexed: 12/13/2022] Open
Abstract
High-elevation organisms experience shared environmental challenges that include low oxygen availability, cold temperatures, and intense ultraviolet radiation. Consequently, repeated evolution of the same genetic mechanisms may occur across high-elevation taxa. To test this prediction, we investigated the extent to which the same biochemical pathways, genes, or sites were subject to parallel molecular evolution for 12 Andean hummingbird species (family: Trochilidae) representing several independent transitions to high elevation across the phylogeny. Across high-elevation species, we discovered parallel evolution for several pathways and genes with evidence of positive selection. In particular, positively selected genes were frequently part of cellular respiration, metabolism, or cell death pathways. To further examine the role of elevation in our analyses, we compared results for low- and high-elevation species and tested different thresholds for defining elevation categories. In analyses with different elevation thresholds, positively selected genes reflected similar functions and pathways, even though there were almost no specific genes in common. For example, EPAS1 (HIF2α), which has been implicated in high-elevation adaptation in other vertebrates, shows a signature of positive selection when high-elevation is defined broadly (>1,500 m), but not when defined narrowly (>2,500 m). Although a few biochemical pathways and genes change predictably as part of hummingbird adaptation to high-elevation conditions, independent lineages have rarely adapted via the same substitutions.
Collapse
Affiliation(s)
- Marisa C W Lim
- Department of Ecology and Evolution, Stony Brook University
| | - Christopher C Witt
- Museum of Southwestern Biology and Department of Biology, University of New Mexico
| | - Catherine H Graham
- Department of Ecology and Evolution, Stony Brook University.,Swiss Federal Research Institute (WSL), Birmensdorf, Switzerland
| | - Liliana M Dávalos
- Department of Ecology and Evolution, Stony Brook University.,Consortium for Inter-Disciplinary Environmental Research, Stony Brook University
| |
Collapse
|
37
|
Amos W. Flanking heterozygosity influences the relative probability of different base substitutions in humans. ROYAL SOCIETY OPEN SCIENCE 2019; 6:191018. [PMID: 31598319 PMCID: PMC6774961 DOI: 10.1098/rsos.191018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 08/30/2019] [Indexed: 06/10/2023]
Abstract
Understanding when, where and which mutations are mostly likely to occur impacts many areas of evolutionary biology, from genetic diseases to phylogenetic reconstruction. Africans and non-African humans differ in the mutability of different triplet base combinations. Africans and non-Africans also differ in mutation rate, possibly because heterozygosity is mutagenic, such that diversity lost when humans expanded out of Africa also lowered the mutation rate. I show that these phenomena are linked: as flanking heterozygosity increases, some triplets become progressively more mutable while others become less so. Africans and non-African show near-identical patterns of dependence on heterozygosity. Thus, the striking differences in triplet mutation frequency between Africans and non-Africans, at least in part, seem to be an emergent property, driven by the way changes in heterozygosity 'out of Africa' have differentially impacted the mutability of different triplets. As heterozygosity decreased, the mutation spectrum outside Africa became enriched for triplet mutations that are favoured by low heterozygosity while those favoured by high heterozygosity became relatively rarer.
Collapse
|
38
|
Li R, Bitoun E, Altemose N, Davies RW, Davies B, Myers SR. A high-resolution map of non-crossover events reveals impacts of genetic diversity on mammalian meiotic recombination. Nat Commun 2019; 10:3900. [PMID: 31467277 PMCID: PMC6715734 DOI: 10.1038/s41467-019-11675-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Accepted: 07/17/2019] [Indexed: 12/21/2022] Open
Abstract
During meiotic recombination, homologue-templated repair of programmed DNA double-strand breaks (DSBs) produces relatively few crossovers and many difficult-to-detect non-crossovers. By intercrossing two diverged mouse subspecies over five generations and deep-sequencing 119 offspring, we detect thousands of crossover and non-crossover events genome-wide with unprecedented power and spatial resolution. We find that both crossovers and non-crossovers are strongly depleted at DSB hotspots where the DSB-positioning protein PRDM9 fails to bind to the unbroken homologous chromosome, revealing that PRDM9 also functions to promote homologue-templated repair. Our results show that complex non-crossovers are much rarer in mice than humans, consistent with complex events arising from accumulated non-programmed DNA damage. Unexpectedly, we also find that GC-biased gene conversion is restricted to non-crossover tracts containing only one mismatch. These results demonstrate that local genetic diversity profoundly alters meiotic repair pathway decisions via at least two distinct mechanisms, impacting genome evolution and Prdm9-related hybrid infertility. During meiotic recombination, genetic information is transferred or exchanged between parental chromosome copies. Using a large hybrid mouse pedigree, the authors generated high-resolution maps of these transfer/exchange events and discovered new properties governing their processing and resolution.
Collapse
Affiliation(s)
- Ran Li
- The Wellcome Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, OX3 7BN, UK.,Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.,Target Discovery Institute, NDM Research Building, University of Oxford, Old Road Campus, Headington, Oxford, OX3 7FZ, UK
| | - Emmanuelle Bitoun
- The Wellcome Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, OX3 7BN, UK.,Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK
| | - Nicolas Altemose
- The Wellcome Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, OX3 7BN, UK.,Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.,Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA, 94720, USA
| | - Robert W Davies
- The Wellcome Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, OX3 7BN, UK.,Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK
| | - Benjamin Davies
- The Wellcome Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, OX3 7BN, UK
| | - Simon R Myers
- The Wellcome Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, OX3 7BN, UK. .,Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.
| |
Collapse
|
39
|
A first genetic portrait of synaptonemal complex variation. PLoS Genet 2019; 15:e1008337. [PMID: 31449519 PMCID: PMC6730954 DOI: 10.1371/journal.pgen.1008337] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 09/06/2019] [Accepted: 07/31/2019] [Indexed: 12/30/2022] Open
Abstract
The synaptonemal complex (SC) is a proteinaceous scaffold required for synapsis and recombination between homologous chromosomes during meiosis. Although the SC has been linked to differences in genome-wide crossover rates, the genetic basis of standing variation in SC structure remains unknown. To investigate the possibility that recombination evolves through changes to the SC, we characterized the genetic architecture of SC divergence on two evolutionary timescales. Applying a novel digital image analysis technique to spermatocyte spreads, we measured total SC length in 9,532 spermatocytes from recombinant offspring of wild-derived mouse strains with differences in this fundamental meiotic trait. Using this large dataset, we identified the first known genomic regions involved in the evolution of SC length. Distinct loci affect total SC length divergence between and within subspecies, with the X chromosome contributing to both. Joint genetic analysis of MLH1 foci—immunofluorescent markers of crossovers—from the same spermatocytes revealed that two of the identified loci also confer differences in the genome-wide recombination rate. Causal mediation analysis suggested that one pleiotropic locus acts early in meiosis to designate crossovers prior to SC assembly, whereas a second locus primarily shapes crossover number through its effect on SC length. One genomic interval shapes the relationship between SC length and recombination rate, likely modulating the strength of crossover interference. Our findings pinpoint SC formation as a key step in the evolution of recombination and demonstrate the power of genetic mapping on standing variation in the context of the recombination pathway. During the first stages of meiosis, the chromosome axes are organized along a protein scaffold in preparation for recombination and their subsequent segregation. This scaffold, known as the synaptonemal complex (SC), is critical for the regular progression of recombination. A complex relationship exists between the organization of the SC, the frequency of recombination, and the likelihood of improper chromosome segregation. In this study, we investigate the genetics of synaptonemal complex variation in the house mouse and connect it with variation in the rate of recombination. We found five loci and several compelling candidate genes responsible for the evolution of synaptonemal complex length within and between mouse subspecies. Several of these loci also affect recombination rate, and our joint analyses of the phenotypes suggest an order by which their effects manifest within the recombination pathway. Our results show that evolution of SC length is crucial to recombination rate divergence. Our work here also demonstrates that genetic analysis of additional meiotic phenotypes can help explain the evolution of recombination, a fundamental evolutionary force.
Collapse
|
40
|
Laurin-Lemay S, Rodrigue N, Lartillot N, Philippe H. Conditional Approximate Bayesian Computation: A New Approach for Across-Site Dependency in High-Dimensional Mutation-Selection Models. Mol Biol Evol 2019; 35:2819-2834. [PMID: 30203003 DOI: 10.1093/molbev/msy173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation-selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
Collapse
Affiliation(s)
- Simon Laurin-Lemay
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS 5558, Université Lyon 1, Lyon, France
| | - Hervé Philippe
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.,Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, France
| |
Collapse
|
41
|
Enard D, Petrov DA. Evidence that RNA Viruses Drove Adaptive Introgression between Neanderthals and Modern Humans. Cell 2019; 175:360-371.e13. [PMID: 30290142 DOI: 10.1016/j.cell.2018.08.034] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 07/04/2018] [Accepted: 08/16/2018] [Indexed: 01/01/2023]
Abstract
Neanderthals and modern humans interbred at least twice in the past 100,000 years. While there is evidence that most introgressed DNA segments from Neanderthals to modern humans were removed by purifying selection, less is known about the adaptive nature of introgressed sequences that were retained. We hypothesized that interbreeding between Neanderthals and modern humans led to (1) the exposure of each species to novel viruses and (2) the exchange of adaptive alleles that provided resistance against these viruses. Here, we find that long, frequent-and more likely adaptive-segments of Neanderthal ancestry in modern humans are enriched for proteins that interact with viruses (VIPs). We found that VIPs that interact specifically with RNA viruses were more likely to belong to introgressed segments in modern Europeans. Our results show that retained segments of Neanderthal ancestry can be used to detect ancient epidemics.
Collapse
Affiliation(s)
- David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.
| | - Dmitri A Petrov
- Department of Biology, Stanford University, Stanford, CA, USA
| |
Collapse
|
42
|
A century of bias in genetics and evolution. Heredity (Edinb) 2019; 123:33-43. [PMID: 31189901 DOI: 10.1038/s41437-019-0194-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 01/29/2019] [Accepted: 01/29/2019] [Indexed: 02/08/2023] Open
Abstract
Mendel proposed that the heritable material is particulate and that transmission of alleles is unbiased. An assumption of unbiased transmission was necessary to show how variation can be preserved in the absence of selection, so overturning an early objection to Darwinism. In the second half of the twentieth century, it was widely recognised that even strongly deleterious alleles can invade if they have strongly biased transmission (i.e. strong segregation distortion). The spread of alleles with distorted segregation can explain many curiosities. More recently, the selectionist-neutralist duopoly was broken by the realisation that biased gene conversion can explain phenomena such as mammalian isochore structures. An initial focus on unbiased transmission in 1919, has thus given way to an interest in biased transmission in 2019. A focus on very weak bias is now possible owing to technological advances, although technical biases may put a limit on resolving power. To understand the relevance of weak bias we could profit from having the concept of the effectively Mendelian allele, a companion to the effectively neutral allele. Understanding the implications of unbiased and biased transmission may, I suggest, be a good way to teach evolution so as to avoid psychological biases.
Collapse
|
43
|
Lim MCW, Witt CC, Graham CH, Dávalos LM. Divergent Fine-Scale Recombination Landscapes between a Freshwater and Marine Population of Threespine Stickleback Fish. Genome Biol Evol 2019; 11:1573-1585. [PMID: 31028697 PMCID: PMC6553502 DOI: 10.1093/gbe/evz090] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/17/2019] [Indexed: 12/27/2022] Open
Abstract
Meiotic recombination is a highly conserved process that has profound effects on genome evolution. At a fine-scale, recombination rates can vary drastically across genomes, often localized into small recombination "hotspots" with highly elevated rates, surrounded by regions with little recombination. In most species studied, the location of hotspots within genomes is highly conserved across broad evolutionary timescales. The main exception to this pattern is in mammals, where hotspot location can evolve rapidly among closely related species and even among populations within a species. Hotspot position in mammals is controlled by the gene, Prdm9, whereas in species with conserved hotspots, a functional Prdm9 is typically absent. Due to a limited number of species where recombination rates have been estimated at a fine-scale, it remains unclear whether hotspot conservation is always associated with the absence of a functional Prdm9. Threespine stickleback fish (Gasterosteus aculeatus) are an excellent model to examine the evolution of recombination over short evolutionary timescales. Using a linkage disequilibrium-based approach, we found recombination rates indeed varied at a fine-scale across the genome, with many regions organized into narrow hotspots. Hotspots had highly divergent landscapes between stickleback populations, where only ∼15% of these hotspots were shared. Our results indicate that fine-scale recombination rates may be diverging between closely related populations of threespine stickleback fish. Interestingly, we found only a weak association of a PRDM9 binding motif within hotspots, which suggests that threespine stickleback fish may possess a novel mechanism for targeting recombination hotspots at a fine-scale.
Collapse
Affiliation(s)
- Marisa C W Lim
- Department of Ecology and Evolution, Stony Brook University
| | - Christopher C Witt
- Museum of Southwestern Biology and Department of Biology, University of New Mexico
| | - Catherine H Graham
- Department of Ecology and Evolution, Stony Brook University
- Swiss Federal Research Institute (WSL), Birmensdorf, Switzerland
| | - Liliana M Dávalos
- Department of Ecology and Evolution, Stony Brook University
- Consortium for Inter-Disciplinary Environmental Research, Stony Brook University
| |
Collapse
|
44
|
Mothay D, Ramesh KV. Evolutionary history and genetic diversity study of heat-shock protein 60 of Rhizophagus irregularis. J Genet 2019; 98:48. [PMID: 31204704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Despite the ubiquitous occurrence of heat-shock protein 60 (Hsp60) and their role in maintenance of cell activity and integrity, this protein remains poorly characterized in many of the symbiotic soil mycorrhizal fungi such as Rhizophagus irregularis. Thus, in the current study, an attempt has been made to elucidate the evolutionary history, time of divergence followed by estimation of population genetic parameters of hsp60 using R. irregularis as a model organism. Sequence alignment reported here identified several close homologues for hsp60 (gene) and Hsp60 (protein) from diverse taxa, while the output from protein-based phylogenetic tree indicates that mitochondrial Hsp60 of R. irregularis shares close evolutionary relationship with classical α-proteobacteria. This is perhaps the first line of evidence elucidating the likelihood of hsp60 from fungal taxa sharing a close evolutionary relationship with classical α-proteobacteria as a common ancestor. Comprehensive analysis of mitochondrial hsp60 from selected fungal taxa from the evolutionary point of view explains the possibility of gene duplication and or horizontal gene transfer of this gene across various fungal species. Synteny relationships and population genetics credibly explain high genetic variability associated with fungal hsp60 presumably brought by random genetic recombination events. The results presented here also confirm a high level of genetic differentiation of hsp60 among all the three fungal populations analysed. In this context, the outcome of the current study, basedon computational approach, stands as a testimony for explaining the possibility of increased genetic differentiation experienced by hsp60 of R. irregularis.
Collapse
Affiliation(s)
- Dipti Mothay
- Department of Biotechnology, Jain University, School of Sciences, Bengaluru, India.
| | | |
Collapse
|
45
|
Evolutionary history and genetic diversity study of heat-shock protein 60 of Rhizophagus irregularis. J Genet 2019. [DOI: 10.1007/s12041-019-1096-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
46
|
NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans. Genome Biol 2019; 20:32. [PMID: 30744685 PMCID: PMC6371618 DOI: 10.1186/s13059-019-1634-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 01/17/2019] [Indexed: 02/07/2023] Open
Abstract
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.
Collapse
|
47
|
Gossmann TI, Bockwoldt M, Diringer L, Schwarz F, Schumann VF. Evidence for Strong Fixation Bias at 4-fold Degenerate Sites Across Genes in the Great Tit Genome. Front Ecol Evol 2018. [DOI: 10.3389/fevo.2018.00203] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
48
|
Li W, Thanos D, Provata A. Quantifying local randomness in human DNA and RNA sequences using Erdös motifs. J Theor Biol 2018; 461:41-50. [PMID: 30336158 DOI: 10.1016/j.jtbi.2018.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/14/2018] [Accepted: 09/25/2018] [Indexed: 10/28/2022]
Abstract
In 1932, Paul Erdös asked whether a random walk constructed from a binary sequence can achieve the lowest possible deviation (lowest discrepancy), for the sequence itself and for all its subsequences formed by homogeneous arithmetic progressions. Although avoiding low discrepancy is impossible for infinite sequences, as recently proven by Terence Tao, attempts were made to construct such sequences with finite lengths. We recognize that such constructed sequences (we call these "Erdös sequences") exhibit certain hallmarks of randomness at the local level: they show roughly equal frequencies of short subsequences, and at the same time exclude trivial periodic patterns. For the human DNA we examine the frequency of a set of Erdös motifs of length-10 using three nucleotides-to-binary mappings. The particular length-10 Erdös sequence is derived from the length-11 Mathias sequence and is identical with the first 10 digits of the Thue-Morse sequence, underscoring the fact that both are deficient in periodicities. Our calculations indicate that: (1) the purine(A and G)/pyridimine(C and T) based Erdös motifs are greatly underrepresented in the human genome, (2) the strong(G and C)/weak(A and T) based Erdös motifs are slightly overrepresented, (3) the densities of the two are negatively correlated, (4) the Erdös motifs based on all three mappings being combined are slightly underrepresented, and (5) the strong/weak based Erdös motifs are greatly overrepresented in the human messenger RNA sequences.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA.
| | - Dimitrios Thanos
- Department of Mathematics, National and Kapodistrian University of Athens, Athens GR-15784, Greece; Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", Athens GR-15341, Greece
| | - Astero Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", Athens GR-15341, Greece
| |
Collapse
|
49
|
Kalesinskas L, Cudone E, Fofanov Y, Putonti C. S-plot2: Rapid Visual and Statistical Analysis of Genomic Sequences. Evol Bioinform Online 2018; 14:1176934318797354. [PMID: 30245567 PMCID: PMC6144591 DOI: 10.1177/1176934318797354] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Accepted: 08/08/2018] [Indexed: 12/12/2022] Open
Abstract
With the daily release of data from whole genome sequencing projects, tools to facilitate comparative studies are hard-pressed to keep pace. Graphical software solutions can readily recognize synteny by measuring similarities between sequences. Nevertheless, regions of dissimilarity can prove to be equally informative; these regions may harbor genes acquired via lateral gene transfer (LGT), signify gene loss or gain, or include coding regions under strong selection. Previously, we developed the software S-plot. This tool employed an alignment-free approach for comparing bacterial genomes and generated a heatmap representing the genomes’ similarities and dissimilarities in nucleotide usage. In prior studies, this tool proved valuable in identifying genome rearrangements as well as exogenous sequences acquired via LGT in several bacterial species. Herein, we present the next generation of this tool, S-plot2. Similar to its predecessor, S-plot2 creates an interactive, 2-dimensional heatmap capturing the similarities and dissimilarities in nucleotide usage between genomic sequences (partial or complete). This new version, however, includes additional metrics for analysis, new reporting options, and integrated BLAST query functionality for the user to interrogate regions of interest. Furthermore, S-plot2 can evaluate larger sequences, including whole eukaryotic chromosomes. To illustrate some of the applications of the tool, 2 case studies are presented. The first examines strain-specific variation across the Pseudomonas aeruginosa genome and strain-specific LGT events. In the second case study, corresponding human, chimpanzee, and rhesus macaque autosomes were studied and lineage specific contributions to divergence were estimated. S-plot2 provides a means to both visually and quantitatively compare nucleotide sequences, from microbial genomes to eukaryotic chromosomes. The case studies presented illustrate just 2 potential applications of the tool, highlighting its capability to identify and investigate the variation in molecular divergence rates across sequences. S-plot2 is freely available through https://bitbucket.org/lkalesinskas/splot and is supported on the Linux and MS Windows operating systems.
Collapse
Affiliation(s)
- Laurynas Kalesinskas
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.,Department of Biology, Loyola University Chicago, Chicago, IL, USA
| | - Evan Cudone
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.,Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, USA
| | - Yuriy Fofanov
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch at Galveston, Galveston, TX, USA
| | - Catherine Putonti
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.,Department of Biology, Loyola University Chicago, Chicago, IL, USA.,Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
50
|
Tiemann-Boege I, Schwarz T, Striedner Y, Heissl A. The consequences of sequence erosion in the evolution of recombination hotspots. Philos Trans R Soc Lond B Biol Sci 2018; 372:rstb.2016.0462. [PMID: 29109225 PMCID: PMC5698624 DOI: 10.1098/rstb.2016.0462] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2017] [Indexed: 12/18/2022] Open
Abstract
Meiosis is initiated by a double-strand break (DSB) introduced in the DNA by a highly controlled process that is repaired by recombination. In many organisms, recombination occurs at specific and narrow regions of the genome, known as recombination hotspots, which overlap with regions enriched for DSBs. In recent years, it has been demonstrated that conversions and mutations resulting from the repair of DSBs lead to a rapid sequence evolution at recombination hotspots eroding target sites for DSBs. We still do not fully understand the effect of this erosion in the recombination activity, but evidence has shown that the binding of trans-acting factors like PRDM9 is affected. PRDM9 is a meiosis-specific, multi-domain protein that recognizes DNA target motifs by its zinc finger domain and directs DSBs to these target sites. Here we discuss the changes in affinity of PRDM9 to eroded recognition sequences, and explain how these changes in affinity of PRDM9 can affect recombination, leading sometimes to sterility in the context of hybrid crosses. We also present experimental data showing that DNA methylation reduces PRDM9 binding in vitro. Finally, we discuss PRDM9-independent hotspots, posing the question how these hotspots evolve and change with sequence erosion. This article is part of the themed issue ‘Evolutionary causes and consequences of recombination rate variation in sexual organisms’.
Collapse
Affiliation(s)
- Irene Tiemann-Boege
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| | - Theresa Schwarz
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| | - Yasmin Striedner
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| | - Angelika Heissl
- Institute of Biophysics, Johannes Kepler University, Linz, Gruberstraße 40, 4020 Linz, Austria
| |
Collapse
|