1
|
Mühlhausen S, Hurst LD. Transgene-design: a web application for the design of mammalian transgenes. Bioinformatics 2022; 38:2626-2627. [PMID: 35244144 PMCID: PMC9048660 DOI: 10.1093/bioinformatics/btac139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/15/2022] [Accepted: 03/02/2022] [Indexed: 11/19/2022] Open
Abstract
Summary Transgene-design is a web application to help design transgenes for use in mammalian studies. It is predicated on the recent discovery that human intronless transgenes and native retrogenes can be expressed very effectively if the GC content at exonic synonymous sites is high. In addition, as exonic splice enhancers resident in intron containing genes may have different utility in intronless genes, these can be reduced or increased in density. Input can be a native gene or a commercially ‘optimised’ gene. The option to leave in the first intron and to protect or avoid other motifs is also permitted. Availability and implementation Transgene-design is based on a ruby for rails platform. The application is available at https://transgene-design.bath.ac.uk. The code is available under GNU General Public License from GitHub (https://github.com/smuehlh/transgenes). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefanie Mühlhausen
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
2
|
Ho AT, Hurst LD. Effective Population Size Predicts Local Rates but Not Local Mitigation of Read-through Errors. Mol Biol Evol 2021; 38:244-262. [PMID: 32797190 PMCID: PMC7783166 DOI: 10.1093/molbev/msaa210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
In correctly predicting that selection efficiency is positively correlated with the effective population size (Ne), the nearly neutral theory provides a coherent understanding of between-species variation in numerous genomic parameters, including heritable error (germline mutation) rates. Does the same theory also explain variation in phenotypic error rates and in abundance of error mitigation mechanisms? Translational read-through provides a model to investigate both issues as it is common, mostly nonadaptive, and has good proxy for rate (TAA being the least leaky stop codon) and potential error mitigation via "fail-safe" 3' additional stop codons (ASCs). Prior theory of translational read-through has suggested that when population sizes are high, weak selection for local mitigation can be effective thus predicting a positive correlation between ASC enrichment and Ne. Contra to prediction, we find that ASC enrichment is not correlated with Ne. ASC enrichment, although highly phylogenetically patchy, is, however, more common both in unicellular species and in genes expressed in unicellular modes in multicellular species. By contrast, Ne does positively correlate with TAA enrichment. These results imply that local phenotypic error rates, not local mitigation rates, are consistent with a drift barrier/nearly neutral model.
Collapse
Affiliation(s)
- Alexander T Ho
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- Corresponding author: E-mail:
| | - Laurence D Hurst
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
3
|
Abrahams L, Hurst LD. A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences. Mol Biol Evol 2020; 37:1148-1164. [PMID: 31841162 PMCID: PMC7086181 DOI: 10.1093/molbev/msz299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.
Collapse
Affiliation(s)
- Liam Abrahams
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
4
|
Fontrodona N, Aubé F, Claude JB, Polvèche H, Lemaire S, Tranchevent LC, Modolo L, Mortreux F, Bourgeois CF, Auboeuf D. Interplay between coding and exonic splicing regulatory sequences. Genome Res 2019; 29:711-722. [PMID: 30962178 PMCID: PMC6499313 DOI: 10.1101/gr.241315.118] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 03/28/2019] [Indexed: 01/24/2023]
Abstract
The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.
Collapse
Affiliation(s)
- Nicolas Fontrodona
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Jean-Baptiste Claude
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Hélène Polvèche
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Sébastien Lemaire
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Léon-Charles Tranchevent
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Laurent Modolo
- LBMC Biocomputing Center, CNRS UMR 5239, INSERM U1210, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
5
|
Guillén Y, Casillas S, Ruiz A. Genome-Wide Patterns of Sequence Divergence of Protein-Coding Genes Between Drosophila buzzatii and D. mojavensis. J Hered 2019; 110:92-101. [PMID: 30124907 DOI: 10.1093/jhered/esy041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 08/14/2018] [Indexed: 12/15/2022] Open
Abstract
Evolutionary rates for protein-coding genes are determined not only by natural selection but also by multiple genomic factors including mutation rates, recombination, gene expression levels, and chromosomal location. To investigate the joint effects of different genomic determinants on protein evolution, we compared the coding sequences of 9017 single-copy orthologs between 2 cactophilic species from the Drosophila subgenus, Drosophila mojavensis and D. buzzatii, whose genomes have been previously sequenced. We assessed the impact of 7 genomic determinants, that is, chromosome type, recombination, chromosomal inversions, expression breadth, expression level, gene length, and the number of exons, on divergence rates of protein-coding genes to understand patterns of evolutionary variation. Integrative analysis of these factors revealed that 1) X-linked and autosomal genes evolve at significantly different rates in agreement with the faster-X hypothesis, 2) genes located on the dot chromosome and pericentromeric regions have higher divergence rates, 3) genes located at chromosomes with more fixed inversions have higher pairwise divergence than those located at nearly collinear chromosomes, and 4) gene expression patterns can be considered the strongest determinant of protein evolution. In addition, the number of exons and protein length had a significant effect on pairwise divergence at synonymous sites. All in all, our results show the relative importance of each genomic factor on the rates of protein evolution and functional constraint in these 2 cactophilic Drosophila species.
Collapse
Affiliation(s)
- Yolanda Guillén
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Sònia Casillas
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,The Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Alfredo Ruiz
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| |
Collapse
|
6
|
Savisaar R, Hurst LD. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 2018; 28:1442-1454. [PMID: 30143596 PMCID: PMC6169883 DOI: 10.1101/gr.233999.117] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 07/31/2018] [Indexed: 01/17/2023]
Abstract
What proportion of coding sequence nucleotides have roles in splicing, and how strong is the selection that maintains them? Despite a large body of research into exonic splice regulatory signals, these questions have not been answered. This is because, to our knowledge, previous investigations have not explicitly disentangled the frequency of splice regulatory elements from the strength of the evolutionary constraint under which they evolve. Current data are consistent both with a scenario of weak and diffuse constraint, enveloping large swaths of sequence, as well as with well-defined pockets of strong purifying selection. In the former case, natural selection on exonic splice enhancers (ESEs) might primarily act as a slight modifier of codon usage bias. In the latter, mutations that disrupt ESEs are likely to have large fitness and, potentially, clinical effects. To distinguish between these scenarios, we used several different methods to determine the distribution of selection coefficients for new mutations within ESEs. The analyses converged to suggest that ∼15%-20% of fourfold degenerate sites are part of functional ESEs. Most of these sites are under strong evolutionary constraint. Therefore, exonic splice regulation does not simply impose a weak bias that gently nudges coding sequence evolution in a particular direction. Rather, the selection to preserve these motifs is a strong force that severely constrains the evolution of a substantial proportion of coding nucleotides. Thus synonymous mutations that disrupt ESEs should be considered as a potentially common cause of single-locus genetic disorders.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| |
Collapse
|
7
|
Abstract
In Cryptococcus neoformans, nearly all genes are interrupted by small introns. In recent years, genome annotation and genetic analysis have illuminated the major roles these introns play in the biology of this pathogenic yeast. Introns are necessary for gene expression and alternative splicing can regulate gene expression in response to environmental cues. In addition, recent studies have revealed that C. neoformans introns help to prevent transposon dissemination and protect genome integrity. These characteristics of cryptococcal introns are probably not unique to Cryptococcus, and this yeast likely can be considered as a model for intron-related studies in fungi.
Collapse
Affiliation(s)
- Guilhem Janbon
- Unité Biologie des ARN des Pathogènes Fongiques, Département de Mycologie, Institut Pasteur, Paris, France
| |
Collapse
|
8
|
Hurst LD, Batada NN. Depletion of somatic mutations in splicing-associated sequences in cancer genomes. Genome Biol 2017; 18:213. [PMID: 29115978 PMCID: PMC5678748 DOI: 10.1186/s13059-017-1337-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 10/12/2017] [Indexed: 01/01/2023] Open
Abstract
Background An important goal of cancer genomics is to identify systematically cancer-causing mutations. A common approach is to identify sites with high ratios of non-synonymous to synonymous mutations; however, if synonymous mutations are under purifying selection, this methodology leads to identification of false-positive mutations. Here, using synonymous somatic mutations (SSMs) identified in over 4000 tumours across 15 different cancer types, we sought to test this assumption by focusing on coding regions required for splicing. Results Exon flanks, which are enriched for sequences required for splicing fidelity, have ~ 17% lower SSM density compared to exonic cores, even after excluding canonical splice sites. While it is impossible to eliminate a mutation bias of unknown cause, multiple lines of evidence support a purifying selection model above a mutational bias explanation. The flank/core difference is not explained by skewed nucleotide content, replication timing, nucleosome occupancy or deficiency in mismatch repair. The depletion is not seen in tumour suppressors, consistent with their role in positive tumour selection, but is otherwise observed in cancer-associated and non-cancer genes, both essential and non-essential. Consistent with a role in splicing modulation, exonic splice enhancers have a lower SSM density before and after controlling for nucleotide composition; moreover, flanks at the 5’ end of the exons have significantly lower SSM density than at the 3’ end. Conclusions These results suggest that the observable mutational spectrum of cancer genomes is not simply a product of various mutational processes and positive selection, but might also be shaped by negative selection. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1337-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Nizar N Batada
- Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
9
|
Savisaar R, Hurst LD. Both Maintenance and Avoidance of RNA-Binding Protein Interactions Constrain Coding Sequence Evolution. Mol Biol Evol 2017; 34:1110-1126. [PMID: 28138077 PMCID: PMC5400389 DOI: 10.1093/molbev/msx061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
While the principal force directing coding sequence (CDS) evolution is selection on protein function, to ensure correct gene expression CDSs must also maintain interactions with RNA-binding proteins (RBPs). Understanding how our genes are shaped by these RNA-level pressures is necessary for diagnostics and for improving transgenes. However, the evolutionary impact of the need to maintain RBP interactions remains unresolved. Are coding sequences constrained by the need to specify RBP binding motifs? If so, what proportion of mutations are affected? Might sequence evolution also be constrained by the need not to specify motifs that might attract unwanted binding, for instance because it would interfere with exon definition? Here, we have scanned human CDSs for motifs that have been experimentally determined to be recognized by RBPs. We observe two sets of motifs-those that are enriched over nucleotide-controlled null and those that are depleted. Importantly, the depleted set is enriched for motifs recognized by non-CDS binding RBPs. Supporting the functional relevance of our observations, we find that motifs that are more enriched are also slower-evolving. The net effect of this selection to preserve is a reduction in the over-all rate of synonymous evolution of 2-3% in both primates and rodents. Stronger motif depletion, on the other hand, is associated with stronger selection against motif gain in evolution. The challenge faced by our CDSs is therefore not only one of attracting the right RBPs but also of avoiding the wrong ones, all while also evolving under selection pressures related to protein structure.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
10
|
Nikolaou C. Invisible cities: segregated domains in the yeast genome with distinct structural and functional attributes. Curr Genet 2017; 64:247-258. [PMID: 28780612 DOI: 10.1007/s00294-017-0731-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 07/31/2017] [Accepted: 08/02/2017] [Indexed: 02/07/2023]
Abstract
Recent advances in our understanding of the three-dimensional organization of the eukaryotic nucleus have rendered the spatial distribution of genes increasingly relevant. In a recent work (Tsochatzidou et al., Nucleic Acids Res 45:5818-5828, 2017), we proposed the existence of a functional compartmentalization of the yeast genome according to which, genes occupying the chromosomal regions at the nuclear periphery have distinct structural, functional and evolutionary characteristics compared to their centromeric-proximal counterparts. Around the same time, it was also shown that the genome of Saccharomyces cerevisiae is organized in topologically associated domains (TADs), which are largely associated with the replication timing. In this work, we proceed to investigate whether such units of three-dimensional genomic organization can be linked to transcriptional activity as a driving force for the shaping of genomic architecture. Through the application of a simple boundary-calling criterion in genome-wide 3C data, we define ~100 TAD-like domains which can be clustered in six different classes with radically different nucleosomal organizations, significant variations in transcription factor binding and uneven chromosomal distribution. Approximately ~20% of the genome is found to be confined in regions with "closed" chromatin structure around gene promoters. Most interestingly, we find both "open" and "closed" regions to be segregated, in the sense that they tend to avoid inter-chromosomal interactions. Our data further enforce the notion of a marked compartmentalization of the yeast genome in isolated territories, with implications in its function and evolution.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 70013, Herakleion, Greece.
| |
Collapse
|
11
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
12
|
Wu X, Hurst LD. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs. Mol Biol Evol 2016; 33:518-29. [PMID: 26545919 PMCID: PMC4866546 DOI: 10.1093/molbev/msv251] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Revised: 10/21/2015] [Accepted: 10/25/2015] [Indexed: 12/11/2022] Open
Abstract
Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3-69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20-45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3' non-"AGgt" splice site. We suggest the concept of the "fragile" exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density.
Collapse
Affiliation(s)
- XianMing Wu
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| |
Collapse
|
13
|
Abstract
Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites.
Collapse
Affiliation(s)
- Rosina Savisaar
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
14
|
Zhou K, Salamov A, Kuo A, Aerts AL, Kong X, Grigoriev IV. Alternative splicing acting as a bridge in evolution. Stem Cell Investig 2015; 2:19. [PMID: 27358887 DOI: 10.3978/j.issn.2306-9759.2015.10.01] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 10/15/2015] [Indexed: 12/15/2022]
Abstract
BACKGROUND Alternative splicing (AS) regulates diverse cellular and developmental functions through alternative protein structures of different isoforms. Alternative exons dominate AS in vertebrates; however, very little is known about the extent and function of AS in lower eukaryotes. To understand the role of introns in gene evolution, we examined AS from a green algal and five fungal genomes using a novel EST-based gene-modeling algorithm (COMBEST). METHODS AS from each genome was classified with COMBEST that maps EST sequences to genomes to build gene models. Various aspects of AS were analyzed through statistical methods. The interplay of intron 3n length, phase, coding property, and intron retention (RI) were examined with Chi-square testing. RESULTS With 3 to 834 times EST coverage, we identified up to 73% of AS in intron-containing genes and found preponderance of RI among 11 types of AS. The number of exons, expression level, and maximum intron length correlated with number of AS per gene (NAG), and intron-rich genes suppressed AS. Genes with AS were more ancient, and AS was conserved among fungal genomes. Among stopless introns, non-retained introns (NRI) avoided, but major RI preferred 3n length. In contrast, stop-containing introns showed uniform distribution among 3n, 3n+1, and 3n+2 lengths. We found a clue to the intron phase enigma: it was the coding function of introns involved in AS that dictates the intron phase bias. CONCLUSIONS Majority of AS is non-functional, and the extent of AS is suppressed for intron-rich genes. RI through 3n length, stop codon, and phase bias bridges the transition from functionless to functional alternative isoforms.
Collapse
Affiliation(s)
- Kemin Zhou
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Asaf Salamov
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Alan Kuo
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Andrea L Aerts
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Xiangyang Kong
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Igor V Grigoriev
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| |
Collapse
|
15
|
Smithers B, Oates ME, Gough J. Splice junctions are constrained by protein disorder. Nucleic Acids Res 2015; 43:4814-22. [PMID: 25934802 PMCID: PMC4446445 DOI: 10.1093/nar/gkv407] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 04/15/2015] [Indexed: 01/23/2023] Open
Abstract
We have discovered that positions of splice junctions in genes are constrained by the tolerance for disorder-promoting amino acids in the translated protein region. It is known that efficient splicing requires nucleotide bias at the splice junction; the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splicing enhancers found near the splice junction in the gene, reveals that these (short DNA motifs) are more prevalent in exons that encode disordered protein regions than exons encoding structured regions. Thus we also conclude that local protein features constrain efficient splicing more in structure than in disorder.
Collapse
Affiliation(s)
- Ben Smithers
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Matt E Oates
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Julian Gough
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| |
Collapse
|
16
|
Wu X, Hurst LD. Why Selection Might Be Stronger When Populations Are Small: Intron Size and Density Predict within and between-Species Usage of Exonic Splice Associated cis-Motifs. Mol Biol Evol 2015; 32:1847-61. [PMID: 25771198 PMCID: PMC4476162 DOI: 10.1093/molbev/msv069] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The nearly neutral theory predicts that small effective population size provides the conditions for weakened selection. This is postulated to explain why our genome is more “bloated” than that of, for example, yeast, ours having large introns and large intergene spacer. If a bloated genome is also an error prone genome might it, however, be the case that selection for error-mitigating properties is stronger in our genome? We examine this notion using splicing as an exemplar, not least because large introns can predispose to noisy splicing. We thus ask whether, owing to genomic decay, selection for splice error-control mechanisms is stronger, not weaker, in species with large introns and small populations. In humans much information defining splice sites is in cis-exonic motifs, most notably exonic splice enhancers (ESEs). These act as splice-error control elements. Here then we ask whether within and between-species intron size is a predictor of the commonality of exonic cis-splicing motifs. We show that, as predicted, the proportion of synonymous sites that are ESE-associated and under selection in humans is weakly positively correlated with the size of the flanking intron. In a phylogenetically controlled framework, we observe, also as expected, that mean intron size is both predicted by Ne.μ and is a good predictor of cis-motif usage across species, this usage coevolving with splice site definition. Unexpectedly, however, across taxa intron density is a better predictor of cis-motif usage than intron size. We propose that selection for splice-related motifs is driven by a need to avoid decoy splice sites that will be more common in genes with many and large introns. That intron number and density predict ESE usage within human genes is consistent with this, as is the finding of intragenic heterogeneity in ESE density. As intronic content and splice site usage across species is also well predicted by Ne.μ, the result also suggests an unusual circumstance in which selection (for cis-modifiers of splicing) might be stronger when population sizes are smaller, as here splicing is noisier, resulting in a greater need to control error-prone splicing.
Collapse
Affiliation(s)
- XianMing Wu
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| |
Collapse
|
17
|
Schüler A, Ghanbarian AT, Hurst LD. Purifying selection on splice-related motifs, not expression level nor RNA folding, explains nearly all constraint on human lincRNAs. Mol Biol Evol 2014; 31:3164-83. [PMID: 25158797 PMCID: PMC4245815 DOI: 10.1093/molbev/msu249] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
There are two strong and equally important predictors of rates of human protein evolution: The amount the gene is expressed and the proportion of exonic sequence devoted to control splicing, mediated largely by selection on exonic splice enhancer (ESE) motifs. Is the same true for noncoding RNAs, known to be under very weak purifying selection? Prior evidence suggests that selection at splice sites in long intergenic noncoding RNAs (lincRNAs) is important. We now report multiple lines of evidence indicating that the great majority of purifying selection operating on lincRNAs in humans is splice related. Splice-related parameters explain much of the between-gene variation in evolutionary rate in humans. Expression rate is not a relevant predictor, although expression breadth is weakly so. In contrast to protein-coding RNAs, we observe no relationship between evolutionary rate and lincRNA stability. As in protein-coding genes, ESEs are especially abundant near splice junctions and evolve slower than non-ESE sequence equidistant from boundaries. Nearly all constraint in lincRNAs is at exon ends (N.B. the same is not witnessed in Drosophila). Although we cannot definitely answer the question as to why splice-related selection is so important, we find no evidence that splicing might enable the nonsense-mediated decay pathway to capture transcripts incorrectly processed by ribosomes. We find evidence consistent with the notion that splicing modifies the underlying chromatin through recruitment of splice-coupled chromatin modifiers, such as CHD1, which in turn might modulate neighbor gene activity. We conclude that most selection on human lincRNAs is splice mediated and suggest that the possibility of splice-chromatin coupling is worthy of further scrutiny.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Avazeh T Ghanbarian
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
18
|
Wollschlaeger C, Trevijano-Contador N, Wang X, Legrand M, Zaragoza O, Heitman J, Janbon G. Distinct and redundant roles of exonucleases in Cryptococcus neoformans: implications for virulence and mating. Fungal Genet Biol 2014; 73:20-8. [PMID: 25267175 DOI: 10.1016/j.fgb.2014.09.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 09/17/2014] [Accepted: 09/22/2014] [Indexed: 01/26/2023]
Abstract
Opportunistic pathogens like Cryptococcus neoformans are constantly exposed to changing environments, in their natural habitat as well as when encountering a human host. This requires a coordinated program to regulate gene expression that can act at the levels of mRNA synthesis and also mRNA degradation. Here, we find that deletion of the gene encoding the major cytoplasmic 5'→3' exonuclease Xrn1p in C. neoformans has important consequences for virulence associated phenotypes such as growth at 37 °C, capsule and melanin. In an invertebrate model of cryptococcosis the alteration of these virulence properties corresponds to avirulence of the xrn1Δ mutant strains. Additionally, deletion of XRN1 impairs uni- and bisexual mating. On a molecular level, the absence of XRN1 is associated with the upregulation of other major exonuclease encoding genes (i.e. XRN2 and RRP44). Using inducible alleles of RRP44 and XRN2, we show that artificial overexpression of these genes alters LAC1 gene expression and mating. Our data thus suggest the existence of a complex interdependent regulation of exonuclease encoding genes that impact upon virulence and mating in C. neoformans.
Collapse
Affiliation(s)
- Carolin Wollschlaeger
- Institut Pasteur, Unité Biologie et Pathogénicité Fongiques - INRA USC2019, 75015 Paris, France
| | - Nuria Trevijano-Contador
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Xuying Wang
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Mélanie Legrand
- Institut Pasteur, Unité Biologie et Pathogénicité Fongiques - INRA USC2019, 75015 Paris, France
| | - Oscar Zaragoza
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Joseph Heitman
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Guilhem Janbon
- Institut Pasteur, Unité Biologie et Pathogénicité Fongiques - INRA USC2019, 75015 Paris, France.
| |
Collapse
|
19
|
Abstract
The ability to distinguish self from non-self nucleic acids enables eukaryotes to suppress mobile elements and maintain genome integrity. In organisms from protist to human, this function is performed by RNA silencing pathways. There have been major advances in our understanding of the RNA silencing machinery, but the mechanisms by which these pathways distinguish self from non-self remain unclear. Recent studies in the yeast C. neoformans indicate that transposon-derived transcripts encode suboptimal introns and tend to stall in spliceosomes, which promotes the biogenesis of siRNA that targets these transcripts. These findings identify gene expression signal strength as a metric by which a foreign element can be distinguished from a host gene, and reveal a new function for introns and the spliceosome in genome defense. Anticipating that these principles may apply to RNA silencing in other systems, we discuss strong hints in the literature suggesting that the spliceosome may guide small RNA biogenesis in the siRNA and piRNA pathways of plants and animals.
Collapse
Affiliation(s)
- Phillip A Dumesic
- Department of Biochemistry and Biophysics; University of California; San Francisco, CA USA
| | - Hiten D Madhani
- Department of Biochemistry and Biophysics; University of California; San Francisco, CA USA
| |
Collapse
|
20
|
Abstract
In this work we review the current knowledge on the prehistory, origins, and evolution of spliceosomal introns. First, we briefly outline the major features of the different types of introns, with particular emphasis on the nonspliceosomal self-splicing group II introns, which are widely thought to be the ancestors of spliceosomal introns. Next, we discuss the main scenarios proposed for the origin and proliferation of spliceosomal introns, an event intimately linked to eukaryogenesis. We then summarize the evidence that suggests that the last eukaryotic common ancestor (LECA) had remarkably high intron densities and many associated characteristics resembling modern intron-rich genomes. From this intron-rich LECA, the different eukaryotic lineages have taken very distinct evolutionary paths leading to profoundly diverged modern genome structures. Finally, we discuss the origins of alternative splicing and the qualitative differences in alternative splicing forms and functions across lineages.
Collapse
Affiliation(s)
- Manuel Irimia
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S3E1, Canada
| | - Scott William Roy
- Department of Biology, San Francisco State University, San Francisco, California 94132
| |
Collapse
|
21
|
Abstract
Evolutionary conservation has been an accurate predictor of functional elements across the first decade of metazoan genomics. More recently, there has been a move to define functional elements instead from biochemical annotations. Evolutionary methods are, however, more comprehensive than biochemical approaches can be and can assess quantitatively, especially for subtle effects, how biologically important--how injurious after mutation--different types of elements are. Evolutionary methods are thus critical for understanding the large fraction (up to 10%) of the human genome that does not encode proteins and yet might convey function. These methods can also capture the ephemeral nature of much noncoding functional sequence, with large numbers of functional elements having been gained and lost rapidly along each mammalian lineage. Here, we review how different strengths of purifying selection have impacted on protein-coding and non-protein-coding loci and on transcription factor binding sites in mammalian and fruit fly genomes.
Collapse
Affiliation(s)
- Wilfried Haerty
- MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom; ,
| | | |
Collapse
|
22
|
Abstract
The intron-exon structures of eukaryotic nuclear genomes exhibit tremendous diversity across different species. The availability of many genomes from diverse eukaryotic species now allows for the reconstruction of the evolutionary history of this diversity. Consideration of spliceosomal systems in comparative context reveals a surprising and very complex portrait: in contrast to many expectations, gene structures in early eukaryotic ancestors were highly complex and "animal or plant-like" in many of their spliceosomal structures has occurred; pronounced simplification of gene structures, splicing signals, and spliceosomal machinery occurring independently in many lineages. In addition, next-generation sequencing of transcripts has revealed that alternative splicing is more common across eukaryotes than previously thought. However, much alternative splicing in diverse eukaryotes appears to play a regulatory role: alternative splicing fulfilling the most famous role for alternative splicing-production of multiple different proteins from a single gene-appears to be much more common in animal species than in nearly any other lineage.
Collapse
Affiliation(s)
- Scott William Roy
- Department of Biology, San Francisco State University, San Francisco, CA, USA
| | | |
Collapse
|
23
|
Yang YF, Zhu T, Niu DK. Association of intron loss with high mutation rate in Arabidopsis: implications for genome size evolution. Genome Biol Evol 2013; 5:723-33. [PMID: 23516254 PMCID: PMC4104619 DOI: 10.1093/gbe/evt043] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Despite the prevalence of intron losses during eukaryotic evolution, the selective forces acting on them have not been extensively explored. Arabidopsis thaliana lost half of its genome and experienced an elevated rate of intron loss after diverging from A. lyrata. The selective force for genome reduction was suggested to have driven the intron loss. However, the evolutionary mechanism of genome reduction is still a matter of debate. In this study, we found that intron-lost genes have high synonymous substitution rates. Assuming that differences in mutability among different introns are conserved among closely related species, we used the nucleotide substitution rate between orthologous introns in other species as the proxy of the mutation rate of Arabidopsis introns, either lost or extant. The lost introns were found to have higher mutation rates than extant introns. At the genome-wide level, A. thaliana has a higher mutation rate than A. lyrata, which correlates with the higher rate of intron loss and rapid genome reduction of A. thaliana. Our results indicate that selection to minimize mutational hazards might be the selective force for intron loss, and possibly also for genome reduction, in the evolution of A. thaliana. Small genome size and lower genome-wide intron density were widely reported to be correlated with phenotypic features, such as high metabolic rates and rapid growth. We argue that the mutational-hazard hypothesis is compatible with these correlations, by suggesting that selection for rapid growth might indirectly increase mutational hazards.
Collapse
Affiliation(s)
- Yu-Fei Yang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, China
| | | | | |
Collapse
|
24
|
Goebels C, Thonn A, Gonzalez-Hilarion S, Rolland O, Moyrand F, Beilharz TH, Janbon G. Introns regulate gene expression in Cryptococcus neoformans in a Pab2p dependent pathway. PLoS Genet 2013; 9:e1003686. [PMID: 23966870 PMCID: PMC3744415 DOI: 10.1371/journal.pgen.1003686] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 06/17/2013] [Indexed: 11/18/2022] Open
Abstract
Most Cryptococccus neoformans genes are interrupted by introns, and alternative splicing occurs very often. In this study, we examined the influence of introns on C. neoformans gene expression. For most tested genes, elimination of introns greatly reduces mRNA accumulation. Strikingly, the number and the position of introns modulate the gene expression level in a cumulative manner. A screen for mutant strains able to express functionally an intronless allele revealed that the nuclear poly(A) binding protein Pab2 modulates intron-dependent regulation of gene expression in C. neoformans. PAB2 deletion partially restored accumulation of intronless mRNA. In addition, our results demonstrated that the essential nucleases Rrp44p and Xrn2p are implicated in the degradation of mRNA transcribed from an intronless allele in C. neoformans. Double mutant constructions and over-expression experiments suggested that Pab2p and Xrn2p could act in the same pathway whereas Rrp44p appears to act independently. Finally, deletion of the RRP6 or the CID14 gene, encoding the nuclear exosome nuclease and the TRAMP complex associated poly(A) polymerase, respectively, has no effect on intronless allele expression. Cryptococcus neoformans is a major human pathogen responsible for deadly infection in immunocompromised patients. The analysis of its genome previously revealed that most of its genes are interrupted by introns. Here, we demonstrate that introns modulate gene expression in a cumulative manner. We also demonstrate that introns can play a positive or a negative role in this process. We identify a nuclear poly(A) binding protein (Pab2p) as implicated in the intron-dependent control of gene expression in C. neoformans. We also demonstrate that the essential nucleases Rrp44p and Xrn2p are implicated in two independent pathways controlling the intron-dependent regulation of gene expression in C. neoformans. Xrn2p regulation seems to depend on Pab2p whereas Rrp44p acts independently. In contrast, the other exosome nuclease Rrp6p and the TRAMP associated poly(A) polymerase Cid14p do not appear to be implicated in this regulation. Our results provide new insights into the regulation of gene expression in eukaryotes and more specifically into the biology and virulence of C. neoformans.
Collapse
Affiliation(s)
- Carolin Goebels
- Institut Pasteur, Unité des Aspergillus, Département Parasitologie et Mycologie, Paris, France
| | - Aline Thonn
- Institut Pasteur, Unité des Aspergillus, Département Parasitologie et Mycologie, Paris, France
| | - Sara Gonzalez-Hilarion
- Institut Pasteur, Unité des Aspergillus, Département Parasitologie et Mycologie, Paris, France
| | - Olga Rolland
- Institut Pasteur, Unité des Aspergillus, Département Parasitologie et Mycologie, Paris, France
| | - Frederique Moyrand
- Institut Pasteur, Unité des Aspergillus, Département Parasitologie et Mycologie, Paris, France
| | - Traude H. Beilharz
- Monash University, Department of Biochemistry and Molecular Biology, Clayton, Australia
| | - Guilhem Janbon
- Institut Pasteur, Unité des Aspergillus, Département Parasitologie et Mycologie, Paris, France
- * E-mail:
| |
Collapse
|
25
|
Dumesic PA, Natarajan P, Chen C, Drinnenberg IA, Schiller BJ, Thompson J, Moresco JJ, Yates JR, Bartel DP, Madhani HD. Stalled spliceosomes are a signal for RNAi-mediated genome defense. Cell 2013; 152:957-68. [PMID: 23415457 DOI: 10.1016/j.cell.2013.01.046] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Revised: 11/13/2012] [Accepted: 01/17/2013] [Indexed: 11/29/2022]
Abstract
Using the yeast Cryptococcus neoformans, we describe a mechanism by which transposons are initially targeted for RNAi-mediated genome defense. We show that intron-containing mRNA precursors template siRNA synthesis. We identify a Spliceosome-Coupled And Nuclear RNAi (SCANR) complex required for siRNA synthesis and demonstrate that it physically associates with the spliceosome. We find that RNAi target transcripts are distinguished by suboptimal introns and abnormally high occupancy on spliceosomes. Functional investigations demonstrate that the stalling of mRNA precursors on spliceosomes is required for siRNA accumulation. Lariat debranching enzyme is also necessary for siRNA production, suggesting a requirement for processing of stalled splicing intermediates. We propose that recognition of mRNA precursors by the SCANR complex is in kinetic competition with splicing, thereby promoting siRNA production from transposon transcripts stalled on spliceosomes. Disparity in the strength of expression signals encoded by transposons versus host genes offers an avenue for the evolution of genome defense.
Collapse
Affiliation(s)
- Phillip A Dumesic
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Wu X, Tronholm A, Cáceres EF, Tovar-Corona JM, Chen L, Urrutia AO, Hurst LD. Evidence for deep phylogenetic conservation of exonic splice-related constraints: splice-related skews at exonic ends in the brown alga Ectocarpus are common and resemble those seen in humans. Genome Biol Evol 2013; 5:1731-45. [PMID: 23902749 PMCID: PMC3787667 DOI: 10.1093/gbe/evt115] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2013] [Indexed: 12/22/2022] Open
Abstract
The control of RNA splicing is often modulated by exonic motifs near splice sites. Chief among these are exonic splice enhancers (ESEs). Well-described ESEs in mammals are purine rich and cause predictable skews in codon and amino acid usage toward exonic ends. Looking across species, those with relatively abundant intronic sequence are those with the more profound end of exon skews, indicative of exonization of splice site recognition. To date, the only intron-rich species that have been analyzed are mammals, precluding any conclusions about the likely ancestral condition. Here, we examine the patterns of codon and amino acid usage in the vicinity of exon-intron junctions in the brown alga Ectocarpus siliculosus, a species with abundant large introns, known SR proteins, and classical splice sites. We find that amino acids and codons preferred/avoided at both 3' and 5' ends in Ectocarpus, of which there are many, tend, on average, to also be preferred/avoided at the same exon ends in humans. Moreover, the preferences observed at the 5' ends of exons are largely the same as those at the 3' ends, a symmetry trend only previously observed in animals. We predict putative hexameric ESEs in Ectocarpus and show that these are purine rich and that there are many more of these identified as functional ESEs in humans than expected by chance. These results are consistent with deep phylogenetic conservation of SR protein binding motifs. Assuming codons preferred near boundaries are "splice optimal" codons, in Ectocarpus, unlike Drosophila, splice optimal and translationally optimal codons are not mutually exclusive. The exclusivity of translationally optimal and splice optimal codon sets is thus not universal.
Collapse
Affiliation(s)
- XianMing Wu
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | - Ana Tronholm
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
- Present address: Department of Biological Sciences, University of Alabama, Mary Harmon Bryant Hall, Tuscaloosa, AL
| | - Eva Fernández Cáceres
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | - Jaime M. Tovar-Corona
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | - Lu Chen
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, United Kingdom
| | - Araxi O. Urrutia
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | - Laurence D. Hurst
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| |
Collapse
|
27
|
Williams C, Hoppe HJ, Rezgui D, Strickland M, Forbes BE, Grutzner F, Frago S, Ellis RZ, Wattana-Amorn P, Prince SN, Zaccheo OJ, Nolan CM, Mungall AJ, Jones EY, Crump MP, Hassan AB. An exon splice enhancer primes IGF2:IGF2R binding site structure and function evolution. Science 2012; 338:1209-13. [PMID: 23197533 PMCID: PMC4658703 DOI: 10.1126/science.1228633] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Placental development and genomic imprinting coevolved with parental conflict over resource distribution to mammalian offspring. The imprinted genes IGF2 and IGF2R code for the growth promoter insulin-like growth factor 2 (IGF2) and its inhibitor, mannose 6-phosphate (M6P)/IGF2 receptor (IGF2R), respectively. M6P/IGF2R of birds and fish do not recognize IGF2. In monotremes, which lack imprinting, IGF2 specifically bound M6P/IGF2R via a hydrophobic CD loop. We show that the DNA coding the CD loop in monotremes functions as an exon splice enhancer (ESE) and that structural evolution of binding site loops (AB, HI, FG) improved therian IGF2 affinity. We propose that ESE evolution led to the fortuitous acquisition of IGF2 binding by M6P/IGF2R that drew IGF2R into parental conflict; subsequent imprinting may then have accelerated affinity maturation.
Collapse
Affiliation(s)
- Christopher Williams
- Department of Organic and Biological Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Weber CC, Pink CJ, Hurst LD. Late-replicating domains have higher divergence and diversity in Drosophila melanogaster. Mol Biol Evol 2011; 29:873-82. [PMID: 22046001 DOI: 10.1093/molbev/msr265] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Several reports from mammals indicate that an increase in the mutation rate in late-replicating regions may, in part, be responsible for the observed genomic heterogeneity in neutral substitution rates and levels of diversity, although the mechanisms for this remain poorly understood. Recent evidence also suggests that late replication is associated with high mutability in yeast. This then raises the question as to whether a similar effect is operating across all eukaryotes. Limited evidence from one chromosome arm in Drosophila melanogaster suggests the opposite pattern, with regions overlapping early-firing origins showing increased levels of diversity and divergence. Given the availability of genome-wide replication timing profiles for D. melanogaster, we now return to this issue. Consistent with what is seen in other taxa, we find that divergence at synonymous sites in exon cores, as well as divergence at putatively unconstrained intronic sites, is elevated in late-replicating regions. Analysis of genes with low codon usage bias suggests a ∼30% difference in mutation rate between the earliest and the latest replicating sequence. Intronic sequence suggests a more modest difference. We additionally show that an increase in diversity in late-replicating sequences is not owing to replication timing covarying with the local recombination rate. If anything, the effects of recombination mask the impact of replication timing. We conclude that, contrary to prior reports and consistent with what is seen in mammals and yeast, there is indeed a relationship between rates of nucleotide divergence and diversity and replication timing that is consistent with an increase in the mutation rate during late S-phase in D. melanogaster. It is therefore plausible that such an effect might be common among eukaryotes. The result may have implications for the inference of positive selection.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | | | | |
Collapse
|
29
|
Weber CC, Hurst LD. Intronic AT skew is a defendable proxy for germline transcription but does not predict crossing-over or protein evolution rates in Drosophila melanogaster. J Mol Evol 2010; 71:415-26. [PMID: 20938653 DOI: 10.1007/s00239-010-9395-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 09/17/2010] [Indexed: 01/28/2023]
Abstract
Recent evidence suggests that germline transcription may affect both protein evolutionary rates, possibly mediated by repair processes, and recombination rates, possibly mediated by chromatin and epigenetic modification. Here, we test these propositions in Drosophila melanogaster. The challenge for such analyses is to provide defendable measures of germline gene expression. Intronic AT skew is a good candidate measure as it is thought to be a consequence, at least in part, of transcription-coupled repair. Prior evidence suggests that intronic AT skew in D. melanogaster is not affected by proximity to intron extremities and differs between transcribed DNA and flanking sequence. We now also establish that intronic AT skew is a defendable proxy for germline expression as (a) it is more similar than expected by chance between introns of the same gene (which is not accounted for by physical proximity), (b) is correlated with male germline expression, and (c) is more pronounced in broadly expressed genes. Furthermore, (d) a trend for intronic skew to differ between 3' and 5' ends of genes is particular to broadly expressed genes. Finally, (e) controlling for physical distance, introns of proximate genes are most different in skew if they have different tissue specificity. We find that intronic AT skew, employed as a proxy for germline transcription, correlates neither with recombination rates nor with the rate of protein evolution. We conclude that there is no prima facie evidence that germline expression modulates recombination rates or monotonically affects protein evolution rates in D. melanogaster.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
30
|
Salato VK, Rediske NW, Zhang C, Hastings ML, Munroe SH. An exonic splicing enhancer within a bidirectional coding sequence regulates alternative splicing of an antisense mRNA. RNA Biol 2010; 7:179-90. [PMID: 20200494 DOI: 10.4161/rna.7.2.11182] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The discovery of increasing numbers of genes with overlapping sequences highlights the problem of expression in the context of constraining regulatory elements from more than one gene. This study identifies regulatory sequences encompassed within two genes that overlap in an antisense orientation at their 3' ends. The genes encode the alpha-thyroid hormone receptor gene (TRalpha or NR1A1) and Rev-erbalpha (NR1D1). In mammals TRalpha pre-mRNAs are alternatively spliced to yield mRNAs encoding functionally antagonistic proteins: TRalpha1, an authentic thyroid hormone receptor; and TRalpha2, a non-hormone-binding variant that acts as a repressor. TRalpha2-specific splicing requires two regulatory elements that overlap with Rev-erbalpha sequences. Functional mapping of these elements reveals minimal splicing enhancer elements that have evolved within the constraints of the overlapping Rev-erbalpha sequence. These results provide insight into the evolution of regulatory elements within the context of bidirectional coding sequences. They also demonstrate the ability of the genetic code to accommodate multiple layers of information within a given sequence, an important property of the code recently suggested on theoretical grounds.
Collapse
Affiliation(s)
- Valerie K Salato
- Department of Biological Sciences, Marquette University, Milwaukee, WI, USA
| | | | | | | | | |
Collapse
|
31
|
Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence. Biochem Soc Trans 2009; 37:756-61. [PMID: 19614589 DOI: 10.1042/bst0370756] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
There is considerable variation in the rate at which different proteins evolve. Why is this? Classically, it has been considered that the density of functionally important sites must predict rates of protein evolution. Likewise, amino acid choice is usually assumed to reflect optimal protein function. In the present article, we briefly review evidence suggesting that this protein function-centred view is too simplistic. In particular, we concentrate on how selection acting during the protein's production history can also affect protein evolutionary rates and amino acid choice. Exploring the role of selection at the DNA and RNA level, we specifically address how the need (i) to specify exonic splice enhancer motifs in pre-mRNA, and (ii) to ensure nucleosome positioning on DNA have an impact on amino acid choice and rates of evolution. For both, we review evidence that sequence affected by more than one coding demand is particularly constrained. Strikingly, in mammals, splicing-related constraints are quantitatively as important as expression parameters in predicting rates of protein evolution. These results indicate that there is substantially more to protein evolution than protein functional constraints.
Collapse
|
32
|
Irimia M, Roy SW, Neafsey DE, Abril JF, Garcia-Fernandez J, Koonin EV. Complex selection on 5' splice sites in intron-rich organisms. Genome Res 2009; 19:2021-7. [PMID: 19745111 DOI: 10.1101/gr.089276.108] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
In contrast to the typically streamlined genomes of prokaryotes, many eukaryotic genomes are riddled with long intergenic regions, spliceosomal introns, and repetitive elements. What explains the persistence of these and other seemingly suboptimal structures? There are three general hypotheses: (1) the structures in question are not actually suboptimal but optimal, being favored by selection, for unknown reasons; (2) the structures are not suboptimal, but of (essentially) equal fitness to "optimal" ones; or (3) the structures are truly suboptimal, but selection is too weak to systematically eliminate them. The 5' splice sites of introns offer a rare opportunity to directly test these hypotheses. Intron-poor species show a clear consensus splice site; most introns begin with the same six nucleotide sequence (typically GTAAGT or GTATGT), indicating efficient selection for this consensus sequence. In contrast, intron-rich species have much less pronounced boundary consensus sequences, and only small minorities of introns in intron-rich species share the same boundary sequence. We studied rates of evolutionary change of 5' splice sites in three groups of closely related intron-rich species--three primates, five Drosophila species, and four Cryptococcus fungi. Surprisingly, the results indicate that changes from consensus-to-variant nucleotides are generally disfavored by selection, but that changes from variant to consensus are neither favored nor disfavored. This evolutionary pattern is consistent with selective differences across introns, for instance, due to compensatory changes at other sites within the gene, which compensate for the otherwise suboptimal consensus-to-variant changes in splice boundaries.
Collapse
Affiliation(s)
- Manuel Irimia
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, 08028 Barcelona, Spain
| | | | | | | | | | | |
Collapse
|
33
|
Splicing in the eukaryotic ancestor: form, function and dysfunction. Trends Ecol Evol 2009; 24:447-55. [DOI: 10.1016/j.tree.2009.04.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Revised: 03/30/2009] [Accepted: 04/01/2009] [Indexed: 12/11/2022]
|
34
|
Evolution of alternative splicing regulation: changes in predicted exonic splicing regulators are not associated with changes in alternative splicing levels in primates. PLoS One 2009; 4:e5800. [PMID: 19495418 PMCID: PMC2686173 DOI: 10.1371/journal.pone.0005800] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2009] [Accepted: 05/12/2009] [Indexed: 12/12/2022] Open
Abstract
Alternative splicing is tightly regulated in a spatio-temporal and quantitative manner. This regulation is achieved by a complex interplay between spliceosomal (trans) factors that bind to different sequence (cis) elements. cis-elements reside in both introns and exons and may either enhance or silence splicing. Differential combinations of cis-elements allows for a huge diversity of overall splicing signals, together comprising a complex ‘splicing code’. Many cis-elements have been identified, and their effects on exon inclusion levels demonstrated in reporter systems. However, the impact of interspecific differences in these elements on the evolution of alternative splicing levels has not yet been investigated at genomic level. Here we study the effect of interspecific differences in predicted exonic splicing regulators (ESRs) on exon inclusion levels in human and chimpanzee. For this purpose, we compiled and studied comprehensive datasets of predicted ESRs, identified by several computational and experimental approaches, as well as microarray data for changes in alternative splicing levels between human and chimpanzee. Surprisingly, we found no association between changes in predicted ESRs and changes in alternative splicing levels. This observation holds across different ESR exon positions, exon lengths, and 5′ splice site strengths. We suggest that this lack of association is mainly due to the great importance of context for ESR functionality: many ESR-like motifs in primates may have little or no effect on splicing, and thus interspecific changes at short-time scales may primarily occur in these effectively neutral ESRs. These results underscore the difficulties of using current computational ESR prediction algorithms to identify truly functionally important motifs, and provide a cautionary tale for studies of the effect of SNPs on splicing in human disease.
Collapse
|