1
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
2
|
Suo C, Polanski K, Dann E, Lindeboom RGH, Vilarrasa-Blasi R, Vento-Tormo R, Haniffa M, Meyer KB, Dratva LM, Tuong ZK, Clatworthy MR, Teichmann SA. Dandelion uses the single-cell adaptive immune receptor repertoire to explore lymphocyte developmental origins. Nat Biotechnol 2024; 42:40-51. [PMID: 37055623 PMCID: PMC10791579 DOI: 10.1038/s41587-023-01734-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 03/07/2023] [Indexed: 04/15/2023]
Abstract
Assessment of single-cell gene expression (single-cell RNA sequencing) and adaptive immune receptor (AIR) sequencing (scVDJ-seq) has been invaluable in studying lymphocyte biology. Here we introduce Dandelion, a computational pipeline for scVDJ-seq analysis. It enables the application of standard V(D)J analysis workflows to single-cell datasets, delivering improved V(D)J contig annotation and the identification of nonproductive and partially spliced contigs. We devised a strategy to create an AIR feature space that can be used for both differential V(D)J usage analysis and pseudotime trajectory inference. The application of Dandelion improved the alignment of human thymic development trajectories of double-positive T cells to mature single-positive CD4/CD8 T cells, generating predictions of factors regulating lineage commitment. Dandelion analysis of other cell compartments provided insights into the origins of human B1 cells and ILC/NK cell development, illustrating the power of our approach. Dandelion is available at https://www.github.com/zktuong/dandelion .
Collapse
Affiliation(s)
- Chenqu Suo
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Department of Paediatrics, Cambridge University Hospitals, Cambridge, UK
| | | | - Emma Dann
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | | | | | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Department of Dermatology and NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Lisa M Dratva
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Zewen Kelvin Tuong
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK.
- Frazer Institute, Faculty of Medicine, The University of Queensland, Brisbane, Queensland, Australia.
- Ian Frazer Centre for Children's Immunotherapy Research, Child Health Research Centre, Faculty of Medicine, The University of Queensland, Brisbane, Queensland, Australia.
| | - Menna R Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK.
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- Theory of Condensed Matter, Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
3
|
Wong DK, Grisdale CJ, Slat VA, Rader SD, Fast NM. The evolution of pre-mRNA splicing and its machinery revealed by reduced extremophilic red algae. J Eukaryot Microbiol 2023; 70:e12927. [PMID: 35662328 DOI: 10.1111/jeu.12927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The Cyanidiales are a group of mostly thermophilic and acidophilic red algae that thrive near volcanic vents. Despite their phylogenetic relationship, the reduced genomes of Cyanidioschyzon merolae and Galdieria sulphuraria are strikingly different with respect to pre-mRNA splicing, a ubiquitous eukaryotic feature. Introns are rare and spliceosomal machinery is extremely reduced in C. merolae, in contrast to G. sulphuraria. Previous studies also revealed divergent spliceosomes in the mesophilic red alga Porphyridium purpureum and the red algal derived plastid of Guillardia theta (Cryptophyta), along with unusually high levels of unspliced transcripts. To further examine the evolution of splicing in red algae, we compared C. merolae and G. sulphuraria, investigating splicing levels, intron position, intron sequence features, and the composition of the spliceosome. In addition to identifying 11 additional introns in C. merolae, our transcriptomic analysis also revealed typical eukaryotic splicing in G. sulphuraria, whereas most transcripts in C. merolae remain unspliced. The distribution of intron positions within their host genes was examined to provide insight into patterns of intron loss in red algae. We observed increasing variability of 5' splice sites and branch donor regions with increasing intron richness. We also found these relationships to be connected to reductions in and losses of corresponding parts of the spliceosome. Our findings highlight patterns of intron and spliceosome evolution in related red algae under the pressures of genome reduction.
Collapse
Affiliation(s)
- Donald K Wong
- Biodiversity Research Centre and Department of Botany, University of British Columbia, Vancouver, BC, Canada
| | - Cameron J Grisdale
- Biodiversity Research Centre and Department of Botany, University of British Columbia, Vancouver, BC, Canada.,Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - Viktor A Slat
- Department of Chemistry, University of Northern British Columbia, Prince George, BC, Canada
| | - Stephen D Rader
- Department of Chemistry, University of Northern British Columbia, Prince George, BC, Canada
| | - Naomi M Fast
- Biodiversity Research Centre and Department of Botany, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
4
|
An extended catalogue of tandem alternative splice sites in human tissue transcriptomes. PLoS Comput Biol 2021; 17:e1008329. [PMID: 33826604 PMCID: PMC8055015 DOI: 10.1371/journal.pcbi.1008329] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 04/19/2021] [Accepted: 03/22/2021] [Indexed: 12/18/2022] Open
Abstract
Tandem alternative splice sites (TASS) is a special class of alternative splicing events that are characterized by a close tandem arrangement of splice sites. Most TASS lack functional characterization and are believed to arise from splicing noise. Based on the RNA-seq data from the Genotype Tissue Expression project, we present an extended catalogue of TASS in healthy human tissues and analyze their tissue-specific expression. The expression of TASS is usually dominated by one major splice site (maSS), while the expression of minor splice sites (miSS) is at least an order of magnitude lower. Among 46k miSS with sufficient read support, 9k (20%) are significantly expressed above the expected noise level, and among them 2.5k are expressed tissue-specifically. We found significant correlations between tissue-specific expression of RNA-binding proteins (RBP), tissue-specific expression of miSS, and miSS response to RBP inactivation by shRNA. In combination with RBP profiling by eCLIP, this allowed prediction of novel cases of tissue-specific splicing regulation including a miSS in QKI mRNA that is likely regulated by PTBP1. The analysis of human primary cell transcriptomes suggested that both tissue-specific and cell-type-specific factors contribute to the regulation of miSS expression. More than 20% of tissue-specific miSS affect structured protein regions and may adjust protein-protein interactions or modify the stability of the protein core. The significantly expressed miSS evolve under the same selection pressure as maSS, while other miSS lack signatures of evolutionary selection and conservation. Using mixture models, we estimated that not more than 15% of maSS and not more than 54% of tissue-specific miSS are noisy, while the proportion of noisy splice sites among non-significantly expressed miSS is above 63%. Pre-mRNA splicing is an important step in the processing of the genomic information during gene expression. During splicing, introns are excised from a gene transcript, and the remaining exons are ligated. Our work concerns one its particular subtype, which involves the so-called tandem alternative splice sites, a group of closely located exon borders that are used alternatively. We analyzed RNA-seq measurements of gene expression provided by the Genotype-Tissue Expression (GTEx) project, the largest to-date collection of such measurements in healthy human tissues, and constructed a detailed catalogue of tandem alternative splice sites. Within this catalogue, we characterized patterns of tissue-specific expression, regulation, impact on protein structure, and evolutionary selection acting on tandem alternative splice sites. In a number of genes, we predicted regulatory mechanisms that could be responsible for choosing one of many tandem alternative splice sites. The results of this study provide an invaluable resource for molecular biologists studying alternative splicing.
Collapse
|
5
|
Abstract
BACKGROUND Eukaryotic protein-coding genes consist of exons and introns. Exon-intron borders are conserved between species and thus their changes might be observed only on quite long evolutionary distances. One of the rarest types of change, in which intron relocates over a short distance, is called "intron sliding", but the reality of this event has been debated for a long time. The main idea of a search for intron sliding is to use the most accurate genome annotation and genome sequence, as well as high-quality transcriptome data. We applied them in a search for sliding introns in mammals in order to widen knowledge about the presence or absence of such phenomena in this group. RESULTS We didn't find any significant evidence of intron sliding in the primate group (human, chimpanzee, rhesus macaque, crab-eating macaque, green monkey, marmoset). Only one possible intron sliding event supported by a set of high quality transcriptomes was observed between EIF1AX human and sheep gene orthologs. Also, we checked a list of previously observed intron sliding events in mammals and showed that most likely they are artifacts of genome annotations and are not shown in subsequent annotation versions as well as are not supported by transcriptomic data. CONCLUSIONS We assume that intron sliding is indeed a very rare evolutionary event if it exists at all. Every case of intron sliding needs a lot of supportive data for detection and confirmation.
Collapse
|
6
|
Palazzo AF, Koonin EV. Functional Long Non-coding RNAs Evolve from Junk Transcripts. Cell 2020; 183:1151-1161. [PMID: 33068526 DOI: 10.1016/j.cell.2020.09.047] [Citation(s) in RCA: 129] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/20/2020] [Accepted: 09/17/2020] [Indexed: 12/30/2022]
Abstract
Transcriptome studies reveal pervasive transcription of complex genomes, such as those of mammals. Despite popular arguments for functionality of most, if not all, of these transcripts, genome-wide analysis of selective constraints indicates that most of the produced RNA are junk. However, junk is not garbage. On the contrary, junk transcripts provide the raw material for the evolution of diverse long non-coding (lnc) RNAs by non-adaptive mechanisms, such as constructive neutral evolution. The generation of many novel functional entities, such as lncRNAs, that fuels organismal complexity does not seem to be driven by strong positive selection. Rather, the weak selection regime that dominates the evolution of most multicellular eukaryotes provides ample material for functional innovation with relatively little adaptation involved.
Collapse
Affiliation(s)
- Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, ON M5G 1M1, Canada.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
7
|
Wong DK, Grisdale CJ, Fast NM. Evolution and Diversity of Pre-mRNA Splicing in Highly Reduced Nucleomorph Genomes. Genome Biol Evol 2018; 10:1573-1583. [PMID: 29860351 PMCID: PMC6009652 DOI: 10.1093/gbe/evy111] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/30/2018] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genes are interrupted by introns that are removed in a conserved process known as pre-mRNA splicing. Though well-studied in select model organisms, we are only beginning to understand the variation and diversity of this process across the tree of eukaryotes. We explored pre-mRNA splicing and other features of transcription in nucleomorphs, the highly reduced remnant nuclei of secondary endosymbionts. Strand-specific transcriptomes were sequenced from the cryptophyte Guillardia theta and the chlorarachniophyte Bigelowiella natans, whose plastids are derived from red and green algae, respectively. Both organisms exhibited elevated nucleomorph antisense transcription and gene expression relative to their respective nuclei, suggesting unique properties of gene regulation and transcriptional control in nucleomorphs. Marked differences in splicing were observed between the two nucleomorphs: the few introns of the G. theta nucleomorph were largely retained in mature transcripts, whereas the many short introns of the B. natans nucleomorph are spliced at typical eukaryotic levels (>90%). These differences in splicing levels could be reflecting the ancestries of the respective plastids, the different intron densities due to independent genome reduction events, or a combination of both. In addition to extending our understanding of the diversity of pre-mRNA splicing across eukaryotes, our study also indicates potential links between splicing, antisense transcription, and gene regulation in reduced genomes.
Collapse
Affiliation(s)
- Donald K Wong
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cameron J Grisdale
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
| | - Naomi M Fast
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
8
|
Abstract
In Cryptococcus neoformans, nearly all genes are interrupted by small introns. In recent years, genome annotation and genetic analysis have illuminated the major roles these introns play in the biology of this pathogenic yeast. Introns are necessary for gene expression and alternative splicing can regulate gene expression in response to environmental cues. In addition, recent studies have revealed that C. neoformans introns help to prevent transposon dissemination and protect genome integrity. These characteristics of cryptococcal introns are probably not unique to Cryptococcus, and this yeast likely can be considered as a model for intron-related studies in fungi.
Collapse
Affiliation(s)
- Guilhem Janbon
- Unité Biologie des ARN des Pathogènes Fongiques, Département de Mycologie, Institut Pasteur, Paris, France
| |
Collapse
|
9
|
Zhou K, Salamov A, Kuo A, Aerts AL, Kong X, Grigoriev IV. Alternative splicing acting as a bridge in evolution. Stem Cell Investig 2015; 2:19. [PMID: 27358887 DOI: 10.3978/j.issn.2306-9759.2015.10.01] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 10/15/2015] [Indexed: 12/15/2022]
Abstract
BACKGROUND Alternative splicing (AS) regulates diverse cellular and developmental functions through alternative protein structures of different isoforms. Alternative exons dominate AS in vertebrates; however, very little is known about the extent and function of AS in lower eukaryotes. To understand the role of introns in gene evolution, we examined AS from a green algal and five fungal genomes using a novel EST-based gene-modeling algorithm (COMBEST). METHODS AS from each genome was classified with COMBEST that maps EST sequences to genomes to build gene models. Various aspects of AS were analyzed through statistical methods. The interplay of intron 3n length, phase, coding property, and intron retention (RI) were examined with Chi-square testing. RESULTS With 3 to 834 times EST coverage, we identified up to 73% of AS in intron-containing genes and found preponderance of RI among 11 types of AS. The number of exons, expression level, and maximum intron length correlated with number of AS per gene (NAG), and intron-rich genes suppressed AS. Genes with AS were more ancient, and AS was conserved among fungal genomes. Among stopless introns, non-retained introns (NRI) avoided, but major RI preferred 3n length. In contrast, stop-containing introns showed uniform distribution among 3n, 3n+1, and 3n+2 lengths. We found a clue to the intron phase enigma: it was the coding function of introns involved in AS that dictates the intron phase bias. CONCLUSIONS Majority of AS is non-functional, and the extent of AS is suppressed for intron-rich genes. RI through 3n length, stop codon, and phase bias bridges the transition from functionless to functional alternative isoforms.
Collapse
Affiliation(s)
- Kemin Zhou
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Asaf Salamov
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Alan Kuo
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Andrea L Aerts
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Xiangyang Kong
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Igor V Grigoriev
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| |
Collapse
|
10
|
Abstract
In this work we review the current knowledge on the prehistory, origins, and evolution of spliceosomal introns. First, we briefly outline the major features of the different types of introns, with particular emphasis on the nonspliceosomal self-splicing group II introns, which are widely thought to be the ancestors of spliceosomal introns. Next, we discuss the main scenarios proposed for the origin and proliferation of spliceosomal introns, an event intimately linked to eukaryogenesis. We then summarize the evidence that suggests that the last eukaryotic common ancestor (LECA) had remarkably high intron densities and many associated characteristics resembling modern intron-rich genomes. From this intron-rich LECA, the different eukaryotic lineages have taken very distinct evolutionary paths leading to profoundly diverged modern genome structures. Finally, we discuss the origins of alternative splicing and the qualitative differences in alternative splicing forms and functions across lineages.
Collapse
Affiliation(s)
- Manuel Irimia
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S3E1, Canada
| | - Scott William Roy
- Department of Biology, San Francisco State University, San Francisco, California 94132
| |
Collapse
|
11
|
Denisov SV, Bazykin GA, Sutormin R, Favorov AV, Mironov AA, Gelfand MS, Kondrashov AS. Weak negative and positive selection and the drift load at splice sites. Genome Biol Evol 2014; 6:1437-47. [PMID: 24966225 PMCID: PMC4079205 DOI: 10.1093/gbe/evu100] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/05/2014] [Indexed: 11/30/2022] Open
Abstract
Splice sites (SSs) are short sequences that are crucial for proper mRNA splicing in eukaryotic cells, and therefore can be expected to be shaped by strong selection. Nevertheless, in mammals and in other intron-rich organisms, many of the SSs often involve nonconsensus (Nc), rather than consensus (Cn), nucleotides, and beyond the two critical nucleotides, the SSs are not perfectly conserved between species. Here, we compare the SS sequences between primates, and between Drosophila fruit flies, to reveal the pattern of selection acting at SSs. Cn-to-Nc substitutions are less frequent, and Nc-to-Cn substitutions are more frequent, than neutrally expected, indicating, respectively, negative and positive selection. This selection is relatively weak (1 < |4Nes| < 4), and has a similar efficiency in primates and in Drosophila. Within some nucleotide positions, the positive selection in favor of Nc-to-Cn substitutions is weaker than the negative selection maintaining already established Cn nucleotides; this difference is due to site-specific negative selection favoring current Nc nucleotides. In general, however, the strength of negative selection protecting the Cn alleles is similar in magnitude to the strength of positive selection favoring replacement of Nc alleles, as expected under the simple nearly neutral turnover. In summary, although a fraction of the Nc nucleotides within SSs is maintained by selection, the abundance of deleterious nucleotides in this class suggests a substantial genome-wide drift load.
Collapse
Affiliation(s)
- Stepan V Denisov
- A.A. Kharkevich Insitute for Information Transmission Problems RAS, Moscow, Russia
| | - Georgii A Bazykin
- A.A. Kharkevich Insitute for Information Transmission Problems RAS, Moscow, RussiaFaculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Roman Sutormin
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Alexander V Favorov
- Division of Oncology Biostatistics, The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MDLaboratory of System Biology and Computational Genetics, Department of Computational System Biology, N.I. Vavilov Institute of General Genetics, Moscow, RussiaLaboratory of Bioinformatics, State Research Institute of Genetics and Selection of Industrial Microorganism (GosNIIGenetika), Moscow, Russia
| | - Andrey A Mironov
- A.A. Kharkevich Insitute for Information Transmission Problems RAS, Moscow, RussiaFaculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Mikhail S Gelfand
- A.A. Kharkevich Insitute for Information Transmission Problems RAS, Moscow, RussiaFaculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Alexey S Kondrashov
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, RussiaLife Sciences Institute and Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
12
|
Nasiri J, Naghavi M, Rad SN, Yolmeh T, Shirazi M, Naderi R, Nasiri M, Ahmadi S. Gene identification programs in bread wheat: a comparison study. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2014; 32:529-54. [PMID: 24124688 DOI: 10.1080/15257770.2013.832773] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Seven ab initio web-based gene prediction programs (i.e., AUGUSTUS, BGF, Fgenesh, Fgenesh+, GeneID, Genemark.hmm, and HMMgene) were assessed to compare their prediction accuracy using protein-coding sequences of bread wheat. At both nucleotide and exon levels, Fgenesh+ was deduced as the superior program and BGF followed by Fgenesh were resided in the next positions, respectively. Conversely, at gene level, Fgenesh with the value of predicting more than 75% of all the genes precisely, concluded as the best ones. It was also found out that programs such as Fgenesh+, BGF, and Fgenesh, because of harboring the highest percentage of correct predictive exons appear to be much more applicable in achieving more trustworthy results, while using both GeneID and HMMgene the percentage of false negatives would be expected to enhance. Regarding initial exon, overall, the frequency of accurate recognition of 3' boundary was significantly higher than that of 5' and the reverse was true if terminal exon is taken into account. Lastly, HMMgene and Genemark.hmm, overall, presented independent tendency against GC content, while the others appear to be slightly more sensitive if GC-poor sequences are employed. Our results, overall, exhibited that to make adequate opportunity in acquiring remarkable results, gene finders still need additional improvements.
Collapse
Affiliation(s)
- Jaber Nasiri
- a Department of Agronomy and Plant Breeding, Division of Molecular Plant Genetics, College of Agricultural & Natural Resources , University of Tehran , Karaj , Tehran , Iran
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Koonin EV, Csuros M, Rogozin IB. Whence genes in pieces: reconstruction of the exon-intron gene structures of the last eukaryotic common ancestor and other ancestral eukaryotes. WILEY INTERDISCIPLINARY REVIEWS-RNA 2012; 4:93-105. [PMID: 23139082 DOI: 10.1002/wrna.1143] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In eukaryotes, protein-coding sequences are interrupted by non-coding sequences known as introns. During mRNA maturation, introns are excised by the spliceosome and the coding regions, exons, are spliced to form the mature coding region. The intron densities widely differ between eukaryotic lineages, from 6 to 7 introns per kb of coding sequence in vertebrates, some invertebrates and green plants, to only a few introns across the entire genome in many unicellular eukaryotes. Evolutionary reconstructions using maximum likelihood methods suggest intron-rich ancestors for each major group of eukaryotes. For the last common ancestor of animals, the highest intron density of all extant and extinct eukaryotes was inferred, at 120-130% of the human intron density. Furthermore, an intron density within 53-74% of the human values was inferred for the last eukaryotic common ancestor. Accordingly, evolution of eukaryotic genes in all lines of descent involved primarily intron loss, with substantial gain only at the bases of several branches including plants and animals. These conclusions have substantial biological implications indicating that the common ancestor of all modern eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. Alternative splicing most likely initially appeared as an inevitable result of splicing errors and only later was employed to generate structural and functional diversification of proteins.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information NLM/NIH, Bethesda, MD, USA.
| | | | | |
Collapse
|
14
|
Rogozin IB, Carmel L, Csuros M, Koonin EV. Origin and evolution of spliceosomal introns. Biol Direct 2012; 7:11. [PMID: 22507701 PMCID: PMC3488318 DOI: 10.1186/1745-6150-7-11] [Citation(s) in RCA: 224] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/15/2012] [Indexed: 12/31/2022] Open
Abstract
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information NLM/NIH, 8600 Rockville Pike, Bldg, 38A, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
15
|
Abstract
The recent explosion of genome sequences from all major phylogenetic groups has unveiled an unexpected wealth of cases of recurrent evolution of strikingly similar genomic features in different lineages. Here, we review the diverse known types of recurrent evolution in eukaryotic genomes, with a special focus on metazoans, ranging from reductive genome evolution to origins of splice-leader trans-splicing, from tandem exon duplications to gene family expansions. We first propose a general classification scheme for evolutionary recurrence at the genomic level, based on the type of driving force-mutation or selection-and the environmental and genomic circumstances underlying these forces. We then discuss various cases of recurrent genomic evolution under this scheme. Finally, we provide a broader context for repeated genomic evolution, including the unique relationship of genomic recurrence with the genotype-phenotype map, and the ways in which the study of recurrent genomic evolution can be used to understand fundamental evolutionary processes.
Collapse
Affiliation(s)
- Ignacio Maeso
- Department of Zoology, University of Oxford, United Kingdom
| | - Scott William Roy
- Department of Biology, Stanford University
- Department of Biology, San Francisco State University
| | - Manuel Irimia
- Department of Biology, Stanford University
- Banting and Best Department of Medical Research, Donnelly Centre, University of Toronto, Canada
| |
Collapse
|
16
|
A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes. PLoS Comput Biol 2011; 7:e1002150. [PMID: 21935348 PMCID: PMC3174169 DOI: 10.1371/journal.pcbi.1002150] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Accepted: 06/21/2011] [Indexed: 11/19/2022] Open
Abstract
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. In eukaryotes, protein-coding genes are interrupted by non-coding introns. The intron densities widely differ, from 6–7 introns per kilobase of coding sequence in vertebrates, some invertebrates and plants, to only a few introns across the entire genome in many unicellular forms. We applied a robust statistical methodology, Markov Chain Monte Carlo, to reconstruct the history of intron gain and loss throughout the evolution of eukaryotes using a set of 245 homologous genes from 99 genomes that represent the diversity of eukaryotes. Intron-rich ancestors were confidently inferred for each major eukaryotic group including 53% to 74% of the human intron density for the last eukaryotic common ancestor, and 120% to 130% of the human value for the last common ancestor of animals. Evolution of eukaryotic genes involved primarily intron loss, with substantial gain only at the bases of several major branches including plants and animals. Thus, the common ancestor of all extant eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. The line of descent from the last common ancestor to mammals was an uninterrupted intron-rich state that, given the error-prone splicing in intron-rich organisms, was conducive to the elaboration of functional alternative splicing.
Collapse
|
17
|
Farlow A, Dolezal M, Hua L, Schlötterer C. The genomic signature of splicing-coupled selection differs between long and short introns. Mol Biol Evol 2011; 29:21-4. [PMID: 21878685 PMCID: PMC3245539 DOI: 10.1093/molbev/msr201] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Understanding the function of noncoding regions in the genome, such as introns, is of central importance to evolutionary biology. One approach is to assay for the targets of natural selection. On one hand, the sequence of introns, especially short introns, appears to evolve in an almost neutral manner. Whereas on the other hand, a large proportion of intronic sequence is under selective constraint. This discrepancy is largely dependent on intron length and differences in the methods used to infer selection. We have used a method based on DNA strand asymmetery that does not require comparison with any putatively neutrally evolving sequence, nor sequence conservation between species, to detect selection within introns. The strongest signal we identify is associated with short introns. This signal comes from a family of motifs that could act as cryptic 5′ splice sites during mRNA processing, suggesting a mechanistic justification underlying this signal of selection. Together with an analysis of intron length and splice site strength, we observe that the genomic signature of splicing-coupled selection differs between long and short introns.
Collapse
Affiliation(s)
- Ashley Farlow
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Present address: Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Marlies Dolezal
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - Liushuai Hua
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Present address: College of Animal Science and Technology, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Northwest A&F University, Yangling, Shaanxi, China
| | - Christian Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Corresponding author: E-mail:
| |
Collapse
|
18
|
Koonin EV, Wolf YI. Constraints and plasticity in genome and molecular-phenome evolution. Nat Rev Genet 2011; 11:487-98. [PMID: 20548290 DOI: 10.1038/nrg2810] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Multiple constraints variously affect different parts of the genomes of diverse life forms. The selective pressures that shape the evolution of viral, archaeal, bacterial and eukaryotic genomes differ markedly, even among relatively closely related animal and bacterial lineages; by contrast, constraints affecting protein evolution seem to be more universal. The constraints that shape the evolution of genomes and phenomes are complemented by the plasticity and robustness of genome architecture, expression and regulation. Taken together, these findings are starting to reveal complex networks of evolutionary processes that must be integrated to attain a new synthesis of evolutionary biology.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
| | | |
Collapse
|
19
|
DNA double-strand break repair and the evolution of intron density. Trends Genet 2010; 27:1-6. [PMID: 21106271 PMCID: PMC3020277 DOI: 10.1016/j.tig.2010.10.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Revised: 10/18/2010] [Accepted: 10/18/2010] [Indexed: 01/23/2023]
Abstract
The density of introns is both an important feature of genome architecture and a highly variable trait across eukaryotes. This heterogeneity has posed an evolutionary puzzle for the last 30 years. Recent evidence is consistent with novel introns being the outcome of the error-prone repair of DNA double-stranded breaks (DSBs) via non-homologous end joining (NHEJ). Here we suggest that deletion of pre-existing introns could occur via the same pathway. We propose a novel framework in which species-specific differences in the activity of NHEJ and homologous recombination (HR) during the repair of DSBs underlie changes in intron density.
Collapse
|
20
|
Abstract
The 2.9-Mbp genome of the microsporidian Encephalitozoon cuniculi is severely reduced and compacted, possessing only 16 known tiny spliceosomal introns. Based on motif and expression data, intron profiles were constructed to screen the genome. Twenty additional introns were predicted and verified, doubling the previous estimate. We further predict that accurate 3' splice site (3'SS) selection is accomplished via a scanning mechanism with specificity achieved by maintaining a constrained variable length between the branch point motif and 3'SS. Only introns in ribosomal protein genes exhibit positional bias, and we hypothesize that splicing could be regulating expression of these genes. The large set of new introns in non-ribosomal protein genes suggests that current models of intron loss are unlikely sufficient to explain the distribution of introns. Together, these results extend our understanding of the role of intron loss in genome evolution and contribute to a novel model for splice site selection.
Collapse
|