1
|
Yu R, Xue H, Lin W, Collins F, Mount S, Cao K. Progerin mRNA expression in non-HGPS patients is correlated with widespread shifts in transcript isoforms. NAR Genom Bioinform 2024; 6:lqae115. [PMID: 39211333 PMCID: PMC11358823 DOI: 10.1093/nargab/lqae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 08/06/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
Hutchinson-Gilford Progeria Syndrome (HGPS) is a premature aging disease caused primarily by a C1824T mutation in LMNA. This mutation activates a cryptic splice donor site, producing a lamin variant called progerin. Interestingly, progerin has also been detected in cells and tissues of non-HGPS patients. Here, we investigated progerin expression using publicly available RNA-seq data from non-HGPS patients in the GTEx project. We found that progerin expression is present across all tissue types in non-HGPS patients and correlated with telomere shortening in the skin. Transcriptome-wide correlation analyses suggest that the level of progerin expression is correlated with switches in gene isoform expression patterns. Differential expression analyses show that progerin expression is correlated with significant changes in genes involved in splicing regulation and mitochondrial function. Interestingly, 5' splice sites whose use is correlated with progerin expression have significantly altered frequencies of consensus trinucleotides within the core 5' splice site. Furthermore, introns whose alternative splicing correlates with progerin have reduced GC content. Our study suggests that progerin expression in non-HGPS patients is part of a global shift in splicing patterns.
Collapse
Affiliation(s)
- Reynold Yu
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, MD, USA
| | - Huijing Xue
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, MD, USA
| | - Wanru Lin
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, MD, USA
| | - Francis S Collins
- Molecular Genetics Section, Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, MD, USA
| | - Kan Cao
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, MD, USA
| |
Collapse
|
2
|
Song Y, Zhang C, Omenn GS, O’Meara MJ, Welch JD. Predicting the Structural Impact of Human Alternative Splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572928. [PMID: 38187531 PMCID: PMC10769328 DOI: 10.1101/2023.12.21.572928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Protein structure prediction with neural networks is a powerful new method for linking protein sequence, structure, and function, but structures have generally been predicted for only a single isoform of each gene, neglecting splice variants. To investigate the structural implications of alternative splicing, we used AlphaFold2 to predict the structures of more than 11,000 human isoforms. We employed multiple metrics to identify splicing-induced structural alterations, including template matching score, secondary structure composition, surface charge distribution, radius of gyration, accessibility of post-translational modification sites, and structure-based function prediction. We identified examples of how alternative splicing induced clear changes in each of these properties. Structural similarity between isoforms largely correlated with degree of sequence identity, but we identified a subset of isoforms with low structural similarity despite high sequence similarity. Exon skipping and alternative last exons tended to increase the surface charge and radius of gyration. Splicing also buried or exposed numerous post-translational modification sites, most notably among the isoforms of BAX. Functional prediction nominated numerous functional differences among isoforms of the same gene, with loss of function compared to the reference predominating. Finally, we used single-cell RNA-seq data from the Tabula Sapiens to determine the cell types in which each structure is expressed. Our work represents an important resource for studying the structure and function of splice isoforms across the cell types of the human body.
Collapse
Affiliation(s)
- Yuxuan Song
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Matthew J. O’Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
3
|
Zhang J, Xu C. Gene product diversity: adaptive or not? Trends Genet 2022; 38:1112-1122. [PMID: 35641344 PMCID: PMC9560964 DOI: 10.1016/j.tig.2022.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 04/30/2022] [Accepted: 05/03/2022] [Indexed: 01/24/2023]
Abstract
One gene does not equal one RNA or protein. The genomic revolution has revealed numerous different RNA and protein molecules that can be produced from one gene, such as circular RNAs generated by back-splicing, proteins with residues mismatching the genomic encoding because of RNA editing, and proteins extended in the C terminus via stop codon readthrough in translation. Are these diverse products results of exquisite gene regulations or imprecise biological processes? While there are cases where the gene product diversity appears beneficial, genome-scale patterns suggest that much of this diversity arises from nonadaptive, molecular errors. This finding has important implications for studying the functions of diverse gene products and for understanding the fundamental properties and evolution of cellular life.
Collapse
Affiliation(s)
- Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Chuan Xu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
4
|
Osmanli Z, Falgarone T, Samadova T, Aldrian G, Leclercq J, Shahmuradov I, Kajava AV. The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis. Biomolecules 2022; 12:1610. [PMID: 36358962 PMCID: PMC9687161 DOI: 10.3390/biom12111610] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/14/2022] [Accepted: 10/27/2022] [Indexed: 09/02/2023] Open
Abstract
Alternative splicing is an important means of generating the protein diversity necessary for cellular functions. Hence, there is a growing interest in assessing the structural and functional impact of alternative protein isoforms. Typically, experimental studies are used to determine the structures of the canonical proteins ignoring the other isoforms. Therefore, there is still a large gap between abundant sequence information and meager structural data on these isoforms. During the last decade, significant progress has been achieved in the development of bioinformatics tools for structural and functional annotations of proteins. Moreover, the appearance of the AlphaFold program opened up the possibility to model a large number of high-confidence structures of the isoforms. In this study, using state-of-the-art tools, we performed in silico analysis of 58 eukaryotic proteomes. The evaluated structural states included structured domains, intrinsically disordered regions, aggregation-prone regions, and tandem repeats. Among other things, we found that the isoforms have fewer signal peptides, transmembrane regions, or tandem repeat regions in comparison with their canonical counterparts. This could change protein function and/or cellular localization. The AlphaFold modeling demonstrated that frequently isoforms, having differences with the canonical sequences, still can fold in similar structures though with significant structural rearrangements which can lead to changes of their functions. Based on the modeling, we suggested classification of the structural differences between canonical proteins and isoforms. Altogether, we can conclude that a majority of isoforms, similarly to the canonical proteins are under selective pressure for the functional roles.
Collapse
Affiliation(s)
- Zarifa Osmanli
- CRBM, Université de Montpellier, CNRS, 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institute of Biophysics, ANAS, Baku AZ1141, Azerbaijan
| | - Theo Falgarone
- CRBM, Université de Montpellier, CNRS, 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
| | | | - Gudrun Aldrian
- CRBM, Université de Montpellier, CNRS, 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
| | - Jeremy Leclercq
- CRBM, Université de Montpellier, CNRS, 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
| | | | - Andrey V. Kajava
- CRBM, Université de Montpellier, CNRS, 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
| |
Collapse
|
5
|
Wright CJ, Smith CWJ, Jiggins CD. Alternative splicing as a source of phenotypic diversity. Nat Rev Genet 2022; 23:697-710. [PMID: 35821097 DOI: 10.1038/s41576-022-00514-4] [Citation(s) in RCA: 120] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/13/2022] [Indexed: 12/27/2022]
Abstract
A major goal of evolutionary genetics is to understand the genetic processes that give rise to phenotypic diversity in multicellular organisms. Alternative splicing generates multiple transcripts from a single gene, enriching the diversity of proteins and phenotypic traits. It is well established that alternative splicing contributes to key innovations over long evolutionary timescales, such as brain development in bilaterians. However, recent developments in long-read sequencing and the generation of high-quality genome assemblies for diverse organisms has facilitated comparisons of splicing profiles between closely related species, providing insights into how alternative splicing evolves over shorter timescales. Although most splicing variants are probably non-functional, alternative splicing is nonetheless emerging as a dynamic, evolutionarily labile process that can facilitate adaptation and contribute to species divergence.
Collapse
Affiliation(s)
- Charlotte J Wright
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK. .,Department of Zoology, University of Cambridge, Cambridge, UK.
| | | | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
6
|
Reixachs‐Solé M, Eyras E. Uncovering the impacts of alternative splicing on the proteome with current omics techniques. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1707. [PMID: 34979593 PMCID: PMC9542554 DOI: 10.1002/wrna.1707] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022]
Abstract
The high-throughput sequencing of cellular RNAs has underscored a broad effect of isoform diversification through alternative splicing on the transcriptome. Moreover, the differential production of transcript isoforms from gene loci has been recognized as a critical mechanism in cell differentiation, organismal development, and disease. Yet, the extent of the impact of alternative splicing on protein production and cellular function remains a matter of debate. Multiple experimental and computational approaches have been developed in recent years to address this question. These studies have unveiled how molecular changes at different steps in the RNA processing pathway can lead to differences in protein production and have functional effects. New and emerging experimental technologies open exciting new opportunities to develop new methods to fully establish the connection between messenger RNA expression and protein production and to further investigate how RNA variation impacts the proteome and cell function. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing Translation > Regulation RNA Evolution and Genomics > Computational Analyses of RNA.
Collapse
Affiliation(s)
- Marina Reixachs‐Solé
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
| | - Eduardo Eyras
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
- Catalan Institution for Research and Advanced StudiesBarcelonaSpain
- Hospital del Mar Medical Research Institute (IMIM)BarcelonaSpain
| |
Collapse
|
7
|
Kaisers W, Schwender H, Schaal H. Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data. Int J Mol Sci 2017; 18:ijms18091900. [PMID: 28872584 PMCID: PMC5618549 DOI: 10.3390/ijms18091900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 08/28/2017] [Accepted: 08/29/2017] [Indexed: 01/13/2023] Open
Abstract
Merging data from multiple samples is required to detect low expressed transcripts or splicing events that might be present only in a subset of samples. However, the exact number of required replicates enabling the detection of such rare events often remains a mystery but can be approached through probability theory. Here, we describe a probabilistic model, relating the number of observed events in a batch of samples with observation probabilities. Therein, samples appear as a heterogeneous collection of events, which are observed with some probability. The model is evaluated in a batch of 54 transcriptomes of human dermal fibroblast samples. The majority of putative splice-sites (alignment gap-sites) are detected in (almost) all samples or only sporadically, resulting in an U-shaped pattern for observation probabilities. The probabilistic model systematically underestimates event numbers due to a bias resulting from finite sampling. However, using an additional assumption, the probabilistic model can predict observed event numbers within a <10% deviation from the median. Single samples contain a considerable amount of uniquely observed putative splicing events (mean 7122 in alignments from TopHat alignments and 86,215 in alignments from STAR). We conclude that the probabilistic model provides an adequate description for observation of gap-sites in transcriptome data. Thus, the calculation of required sample sizes can be done by application of a simple binomial model to sporadically observed random events. Due to the large number of uniquely observed putative splice-sites and the known stochastic noise in the splicing machinery, it appears advisable to include observation of rare splicing events into analysis objectives. Therefore, it is beneficial to take scores for the validation of gap-sites into account.
Collapse
Affiliation(s)
- Wolfgang Kaisers
- Department for Anaesthesiology, Heinrich Heine University, 40225 Düsseldorf, Germany.
- BMFZ, Heinrich Heine University, 40225 Düsseldorf, Germany.
| | - Holger Schwender
- BMFZ, Heinrich Heine University, 40225 Düsseldorf, Germany.
- Mathematical Institute, Heinrich Heine University, 40225 Düsseldorf, Germany.
| | - Heiner Schaal
- BMFZ, Heinrich Heine University, 40225 Düsseldorf, Germany.
- Institute for Virology, Heinrich Heine University, 40225 Düsseldorf, Germany.
| |
Collapse
|
8
|
Ramanouskaya TV, Grinev VV. The determinants of alternative RNA splicing in human cells. Mol Genet Genomics 2017; 292:1175-1195. [PMID: 28707092 DOI: 10.1007/s00438-017-1350-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Accepted: 07/06/2017] [Indexed: 12/29/2022]
Abstract
Alternative splicing represents an important level of the regulation of gene function in eukaryotic organisms. It plays a critical role in virtually every biological process within an organism, including regulation of cell division and cell death, differentiation of tissues in the embryo and the adult organism, as well as in cellular response to diverse environmental factors. In turn, studies of the last decade have shown that alternative splicing itself is controlled by different mechanisms. Unfortunately, there is no clear understanding of how these diverse mechanisms, or determinants, regulate and constrain the set of alternative RNA species produced from any particular gene in every cell of the human body. Here, we provide a consolidated overview of alternative splicing determinants including RNA-protein interactions, epigenetic regulation via chromatin remodeling, coupling of transcription-to-alternative splicing, effect of secondary structures in pre-RNA, and function of the RNA quality control systems. We also extensively and critically discuss some mechanistic insights on coordinated inclusion/exclusion of exons during the formation of mature RNA molecules. We conclude that the final structure of RNA is pre-determined by a complex interplay between cis- and trans-acting factors. Altogether, currently available empirical data significantly expand our understanding of the functioning of the alternative splicing machinery of cells in normal and pathological conditions. On the other hand, there are still many blind spots that require further deep investigations.
Collapse
|
9
|
Satyawan D, Kim MY, Lee S. Stochastic alternative splicing is prevalent in mungbean (Vigna radiata). PLANT BIOTECHNOLOGY JOURNAL 2017; 15:174-182. [PMID: 27400146 PMCID: PMC5258860 DOI: 10.1111/pbi.12600] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 06/10/2016] [Accepted: 07/05/2016] [Indexed: 05/20/2023]
Abstract
Alternative splicing (AS) can produce multiple mature mRNAs from the same primary transcript, thereby generating diverse proteins and phenotypes from the same gene. To assess the prevalence of AS in mungbean (Vigna radiata), we analysed whole-genome RNA sequencing data from root, leaf, flower and pod tissues and found that at least 37.9% of mungbean genes are subjected to AS. The number of AS transcripts exhibited a strong correlation with exon number and thus resembled a uniform probabilistic event rather than a specific regulatory function. The proportion of frameshift splicing was close to the expected frequency of random splicing. However, alternative donor and acceptor AS events tended to occur at multiples of three nucleotides (i.e. the codon length) from the main splice site. Genes with high exon number and expression level, which should have the most AS if splicing is purely stochastic, exhibited less AS, implying the existence of negative selection against excessive random AS. Functional AS is probably rare: a large proportion of AS isoforms exist at very low copy per cell on average or are expressed at much lower levels than default transcripts. Conserved AS was only detected in 629 genes (2.8% of all genes in the genome) when compared to Vigna angularis, and in 16 genes in more distant species like soya bean. These observations highlight the challenges of finding and cataloguing candidates for experimentally proven AS isoforms in a crop genome.
Collapse
Affiliation(s)
- Dani Satyawan
- Department of Plant Science and Research Institute of Agriculture and Life SciencesSeoul National UniversitySeoulKorea
- Indonesian Center for Agricultural Biotechnology and Genetic Resources Research and DevelopmentBogorIndonesia
| | - Moon Young Kim
- Department of Plant Science and Research Institute of Agriculture and Life SciencesSeoul National UniversitySeoulKorea
- Plant Genomics and Breeding InstituteSeoul National UniversitySeoulKorea
| | - Suk‐Ha Lee
- Department of Plant Science and Research Institute of Agriculture and Life SciencesSeoul National UniversitySeoulKorea
- Plant Genomics and Breeding InstituteSeoul National UniversitySeoulKorea
| |
Collapse
|
10
|
Hao Y, Colak R, Teyra J, Corbi-Verge C, Ignatchenko A, Hahne H, Wilhelm M, Kuster B, Braun P, Kaida D, Kislinger T, Kim PM. Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins. Cell Rep 2015; 12:183-9. [PMID: 26146086 DOI: 10.1016/j.celrep.2015.06.031] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Revised: 02/18/2015] [Accepted: 06/09/2015] [Indexed: 12/30/2022] Open
Abstract
Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse), which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS) spectra for an overall AU-ROC of 0.85. We predict that around 32% of "exon skipping" alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing.
Collapse
Affiliation(s)
- Yanqi Hao
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Recep Colak
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Joan Teyra
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada
| | - Carles Corbi-Verge
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada
| | - Alexander Ignatchenko
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada
| | - Hannes Hahne
- Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany
| | - Mathias Wilhelm
- Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany
| | - Bernhard Kuster
- Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany; German Cancer Consortium (DKTK), Munich, Germany; German Cancer Research Center (DKFZ), Heidelberg, Germany; Center for Integrated Protein Science Munich, Munich, Germany; Bavarian Biomolecular Mass Spectrometry Center, Technische Universität München, Freising, Germany
| | - Pascal Braun
- Lehrstuhl fuer Systembiologie der Pflanzen, TU Muenchen, Munich, Germany
| | - Daisuke Kaida
- Frontier Research Core for Life Sciences, University of Toyama, Toyama 930-8555, Japan
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Princess Margaret Cancer Center, University Health Network, Toronto, ON M5T 2M9, Canada
| | - Philip M Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1AS, Canada.
| |
Collapse
|
11
|
Abascal F, Ezkurdia I, Rodriguez-Rivas J, Rodriguez JM, del Pozo A, Vázquez J, Valencia A, Tress ML. Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level. PLoS Comput Biol 2015; 11:e1004325. [PMID: 26061177 PMCID: PMC4465641 DOI: 10.1371/journal.pcbi.1004325] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/08/2015] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles. Alternative splicing is thought to be one means for generating the protein diversity necessary for the whole range of cellular functions. While the presence of alternatively spliced transcripts in the cell has been amply demonstrated, the same cannot be said for alternatively spliced proteins. The quest for alternative protein isoforms has focused primarily on the analysis of peptides from large-scale mass spectroscopy experiments, but evidence for alternative isoforms has been patchy and contradictory. A careful analysis of the peptide evidence is needed to fully understand the scale of alternative splicing detectable at the protein level. Here we analysed peptides from eight large-scale data sets, identifying just 282 splice events among 12,716 genes. This suggests that most genes have a single dominant isoform. Many of the alternative isoforms that we identified were only subtly different from the main splice variant, and one in five was generated by substitution of homologous exons by swapping one related exon for another. Remarkably, the alternative isoforms generated from homologous exons were highly conserved, first appearing 460 million years ago, and several appear to have tissue-specific roles in the brain and heart. Our results suggest that these particular isoforms are likely to have important cellular roles.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Juan Rodriguez-Rivas
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Angela del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares (CNIC) Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| |
Collapse
|
12
|
Chorev DS, Ben-Nissan G, Sharon M. Exposing the subunit diversity and modularity of protein complexes by structural mass spectrometry approaches. Proteomics 2015; 15:2777-91. [PMID: 25727951 DOI: 10.1002/pmic.201400517] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 01/08/2015] [Accepted: 02/24/2015] [Indexed: 12/11/2022]
Abstract
Although the number of protein-encoding genes in the human genome is only about 20 000 not far from the amount found in the nematode worm genome, the number of proteins that are translated from these sequences is larger by several orders of magnitude. A number of mechanisms have evolved to enable this diversity. For example, genes can be alternatively spliced to create multiple transcripts; they may also be translated from different alternative initiation sites. After translation, hundreds of chemical modifications can be introduced in proteins, altering their chemical properties, folding, stability, and activity. The complexity is then further enhanced by the various combinations that are generated from the assembly of different subunit variants into protein complexes. This, in turn, confers structural and functional flexibility, and endows the cell with the ability to adapt to various environmental conditions. Therefore, exposing the variability of protein complexes is an important step toward understanding their biological functions. Revealing this enormous diversity, however, is not a simple task. In this review, we will focus on the array of MS-based strategies that are capable of performing this mission. We will also discuss the challenges that lie ahead, and the future directions toward which the field might be heading.
Collapse
Affiliation(s)
- Dror S Chorev
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Gili Ben-Nissan
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Michal Sharon
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
13
|
Li YI, Sanchez-Pulido L, Haerty W, Ponting CP. RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts. Genome Res 2015; 25:1-13. [PMID: 25524026 PMCID: PMC4317164 DOI: 10.1101/gr.181990.114] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 10/27/2014] [Indexed: 11/24/2022]
Abstract
Ninety-four percent of mammalian protein-coding exons exceed 51 nucleotides (nt) in length. The paucity of micro-exons (≤ 51 nt) suggests that their recognition and correct processing by the splicing machinery present greater challenges than for longer exons. Yet, because thousands of human genes harbor processed micro-exons, specialized mechanisms may be in place to promote their splicing. Here, we survey deep genomic data sets to define 13,085 micro-exons and to study their splicing mechanisms and molecular functions. More than 60% of annotated human micro-exons exhibit a high level of sequence conservation, an indicator of functionality. While most human micro-exons require splicing-enhancing genomic features to be processed, the splicing of hundreds of micro-exons is enhanced by the adjacent binding of splice factors in the introns of pre-messenger RNAs. Notably, splicing of a significant number of micro-exons was found to be facilitated by the binding of RBFOX proteins, which promote their inclusion in the brain, muscle, and heart. Our analyses suggest that accurate regulation of micro-exon inclusion by RBFOX proteins and PTBP1 plays an important role in the maintenance of tissue-specific protein-protein interactions.
Collapse
Affiliation(s)
- Yang I Li
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Luis Sanchez-Pulido
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom
| | - Wilfried Haerty
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom
| | - Chris P Ponting
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom;
| |
Collapse
|
14
|
Morata J, Béjar S, Talavera D, Riera C, Lois S, de Xaxars GM, de la Cruz X. The relationship between gene isoform multiplicity, number of exons and protein divergence. PLoS One 2013; 8:e72742. [PMID: 24023641 PMCID: PMC3758341 DOI: 10.1371/journal.pone.0072742] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 07/14/2013] [Indexed: 11/18/2022] Open
Abstract
At present we know that phenotypic differences between organisms arise from a variety of sources, like protein sequence divergence, regulatory sequence divergence, alternative splicing, etc. However, we do not have yet a complete view of how these sources are related. Here we address this problem, studying the relationship between protein divergence and the ability of genes to express multiple isoforms. We used three genome-wide datasets of human-mouse orthologs to study the relationship between isoform multiplicity co-occurrence between orthologs (the fact that two orthologs have more than one isoform) and protein divergence. In all cases our results showed that there was a monotonic dependence between these two properties. We could explain this relationship in terms of a more fundamental one, between exon number of the largest isoform and protein divergence. We found that this last relationship was present, although with variations, in other species (chimpanzee, cow, rat, chicken, zebrafish and fruit fly). In summary, we have identified a relationship between protein divergence and isoform multiplicity co-occurrence and explained its origin in terms of a simple gene-level property. Finally, we discuss the biological implications of these findings for our understanding of inter-species phenotypic differences.
Collapse
Affiliation(s)
- Jordi Morata
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - Santi Béjar
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - David Talavera
- Faculty of Life Sciences, Manchester University, Manchester, United Kingdom
| | - Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
| | - Gemma Mas de Xaxars
- Laboratori de Botànica, Facultat de Farmàcia, Universitat de Barcelona, Barcelona, Spain
| | - Xavier de la Cruz
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail:
| |
Collapse
|
15
|
Bianchi V, Colantoni A, Calderone A, Ausiello G, Ferrè F, Helmer-Citterich M. DBATE: database of alternative transcripts expression. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat050. [PMID: 23842462 PMCID: PMC5654372 DOI: 10.1093/database/bat050] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. Database URL:http://bioinformatica.uniroma2.it/DBATE/.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica s.n.c., Rome 00133, Italy
| | | | | | | | | | | |
Collapse
|
16
|
Spinelli R, Pirola A, Redaelli S, Sharma N, Raman H, Valletta S, Magistroni V, Piazza R, Gambacorti-Passerini C. Identification of novel point mutations in splicing sites integrating whole-exome and RNA-seq data in myeloproliferative diseases. Mol Genet Genomic Med 2013; 1:246-59. [PMID: 24498620 PMCID: PMC3865592 DOI: 10.1002/mgg3.23] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/22/2013] [Accepted: 05/24/2013] [Indexed: 12/13/2022] Open
Abstract
Point mutations in intronic regions near mRNA splice junctions can affect the splicing process. To identify novel splicing variants from exome sequencing data, we developed a bioinformatics splice-site prediction procedure to analyze next-generation sequencing (NGS) data (SpliceFinder). SpliceFinder integrates two functional annotation tools for NGS, ANNOVAR and MutationTaster and two canonical splice site prediction programs for single mutation analysis, SSPNN and NetGene2. By SpliceFinder, we identified somatic mutations affecting RNA splicing in a colon cancer sample, in eight atypical chronic myeloid leukemia (aCML), and eight CML patients. A novel homozygous splicing mutation was found in APC (NM_000038.4:c.1312+5G>A) and six heterozygous in GNAQ (NM_002072.2:c.735+1C>T), ABCC3 (NM_003786.3:c.1783-1G>A), KLHDC1 (NM_172193.1:c.568-2A>G), HOOK1 (NM_015888.4:c.1662-1G>A), SMAD9 (NM_001127217.2:c.1004-1C>T), and DNAH9 (NM_001372.3:c.10242+5G>A). Integrating whole-exome and RNA sequencing in aCML and CML, we assessed the phenotypic effect of mutations on mRNA splicing for GNAQ, ABCC3, HOOK1. In ABCC3 and HOOK1, RNA-Seq showed the presence of aberrant transcripts with activation of a cryptic splice site or intron retention, validated by the reverse transcription-polymerase chain reaction (RT-PCR) in the case of HOOK1. In GNAQ, RNA-Seq showed 22% of wild-type transcript and 78% of mRNA skipping exon 5, resulting in a 4–6 frameshift fusion confirmed by RT-PCR. The pipeline can be useful to identify intronic variants affecting RNA sequence by complementing conventional exome analysis.
Collapse
Affiliation(s)
- Roberta Spinelli
- Department of Health Sciences, University of Milano-Bicocca, Monza, Italy
| | - Alessandra Pirola
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Sara Redaelli
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Nitesh Sharma
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Hima Raman
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Simona Valletta
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Vera Magistroni
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Rocco Piazza
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy
| | - Carlo Gambacorti-Passerini
- Department of Health Sciences, University of Milano-Bicocca Monza, Italy ; Hematology and Clinical Research Unit, San Gerardo Hospital Monza, Italy
| |
Collapse
|
17
|
Riera M, Burguera D, Garcia-Fernàndez J, Gonzàlez-Duarte R. CERKL knockdown causes retinal degeneration in zebrafish. PLoS One 2013; 8:e64048. [PMID: 23671706 PMCID: PMC3650063 DOI: 10.1371/journal.pone.0064048] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2012] [Accepted: 04/08/2013] [Indexed: 12/21/2022] Open
Abstract
The human CERKL gene is responsible for common and severe forms of retinal dystrophies. Despite intense in vitro studies at the molecular and cellular level and in vivo analyses of the retina of murine knockout models, CERKL function remains unknown. In this study, we aimed to approach the developmental and functional features of cerkl in Danio rerio within an Evo-Devo framework. We show that gene expression increases from early developmental stages until the formation of the retina in the optic cup. Unlike the high mRNA-CERKL isoform multiplicity shown in mammals, the moderate transcriptional complexity in fish facilitates phenotypic studies derived from gene silencing. Moreover, of relevance to pathogenicity, teleost CERKL shares the two main human protein isoforms. Morpholino injection has been used to generate a cerkl knockdown zebrafish model. The morphant phenotype results in abnormal eye development with lamination defects, failure to develop photoreceptor outer segments, increased apoptosis of retinal cells and small eyes. Our data support that zebrafish Cerkl does not interfere with proliferation and neural differentiation during early developmental stages but is relevant for survival and protection of the retinal tissue. Overall, we propose that this zebrafish model is a powerful tool to unveil CERKL contribution to human retinal degeneration.
Collapse
Affiliation(s)
- Marina Riera
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Barcelona, Spain
- CIBERER, Instituto de Salud Carlos III, Barcelona, Spain
| | - Demian Burguera
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Barcelona, Spain
| | - Jordi Garcia-Fernàndez
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Barcelona, Spain
| | - Roser Gonzàlez-Duarte
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Barcelona, Spain
- CIBERER, Instituto de Salud Carlos III, Barcelona, Spain
- * E-mail:
| |
Collapse
|
18
|
Colak R, Kim T, Michaut M, Sun M, Irimia M, Bellay J, Myers CL, Blencowe BJ, Kim PM. Distinct types of disorder in the human proteome: functional implications for alternative splicing. PLoS Comput Biol 2013; 9:e1003030. [PMID: 23633940 PMCID: PMC3635989 DOI: 10.1371/journal.pcbi.1003030] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 02/26/2013] [Indexed: 01/07/2023] Open
Abstract
Intrinsically disordered regions have been associated with various cellular processes and are implicated in several human diseases, but their exact roles remain unclear. We previously defined two classes of conserved disordered regions in budding yeast, referred to as "flexible" and "constrained" conserved disorder. In flexible disorder, the property of disorder has been positionally conserved during evolution, whereas in constrained disorder, both the amino acid sequence and the property of disorder have been conserved. Here, we show that flexible and constrained disorder are widespread in the human proteome, and are particularly common in proteins with regulatory functions. Both classes of disordered sequences are highly enriched in regions of proteins that undergo tissue-specific (TS) alternative splicing (AS), but not in regions of proteins that undergo general (i.e., not tissue-regulated) AS. Flexible disorder is more highly enriched in TS alternative exons, whereas constrained disorder is more highly enriched in exons that flank TS alternative exons. These latter regions are also significantly more enriched in potential phosphosites and other short linear motifs associated with cell signaling. We further show that cancer driver mutations are significantly enriched in regions of proteins associated with TS and general AS. Collectively, our results point to distinct roles for TS alternative exons and flanking exons in the dynamic regulation of protein interaction networks in response to signaling activity, and they further suggest that alternatively spliced regions of proteins are often functionally altered by mutations responsible for cancer.
Collapse
Affiliation(s)
- Recep Colak
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - TaeHyung Kim
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Magali Michaut
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
| | - Mark Sun
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Manuel Irimia
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
| | - Jeremy Bellay
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Chad L. Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Benjamin J. Blencowe
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
- * E-mail: (BJB); (PMK)
| | - Philip M. Kim
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- * E-mail: (BJB); (PMK)
| |
Collapse
|
19
|
Jacobs E, Mills JD, Janitz M. The role of RNA structure in posttranscriptional regulation of gene expression. J Genet Genomics 2012; 39:535-43. [PMID: 23089363 DOI: 10.1016/j.jgg.2012.08.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 08/16/2012] [Accepted: 08/17/2012] [Indexed: 01/18/2023]
Abstract
As more information is gathered on the mechanisms of transcription and translation, it is becoming apparent that these processes are highly regulated. The formation of mRNA secondary and tertiary structures is one such regulatory process that until recently it has not been analysed in depth. Formation of these mRNA structures has the potential to enhance and inhibit alternative splicing of transcripts, and regulate rates and amount of translation. As this regulatory mechanism potentially impacts at both the transcriptional and translational level, while also potentially utilising the vast array of non-coding RNAs, it warrants further investigation. Currently, a variety of high-throughput sequencing techniques including parallel analysis of RNA structure (PARS), fragmentation sequencing (FragSeq) and selective 2-hydroxyl acylation analysed by primer extension (SHAPE) lead the way in the genome-wide identification and analysis of mRNA structure formation. These new sequencing techniques highlight the diversity and complexity of the transcriptome, and demonstrate another regulatory mechanism that could become a target for new therapeutic approaches.
Collapse
Affiliation(s)
- Elina Jacobs
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney NSW 2052, Australia
| | | | | |
Collapse
|
20
|
Frankish A, Mudge JM, Thomas M, Harrow J. The importance of identifying alternative splicing in vertebrate genome annotation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas014. [PMID: 22434846 PMCID: PMC3308168 DOI: 10.1093/database/bas014] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. Database URL:http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html
Collapse
Affiliation(s)
- Adam Frankish
- Human and Vertebrate Analysis and Annotation Team, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | |
Collapse
|
21
|
Severing EI, van Dijk ADJ, Morabito G, Busscher-Lange J, Immink RGH, van Ham RCHJ. Predicting the impact of alternative splicing on plant MADS domain protein function. PLoS One 2012; 7:e30524. [PMID: 22295091 PMCID: PMC3266260 DOI: 10.1371/journal.pone.0030524] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Accepted: 12/18/2011] [Indexed: 11/18/2022] Open
Abstract
Several genome-wide studies demonstrated that alternative splicing (AS) significantly increases the transcriptome complexity in plants. However, the impact of AS on the functional diversity of proteins is difficult to assess using genome-wide approaches. The availability of detailed sequence annotations for specific genes and gene families allows for a more detailed assessment of the potential effect of AS on their function. One example is the plant MADS-domain transcription factor family, members of which interact to form protein complexes that function in transcription regulation. Here, we perform an in silico analysis of the potential impact of AS on the protein-protein interaction capabilities of MIKC-type MADS-domain proteins. We first confirmed the expression of transcript isoforms resulting from predicted AS events. Expressed transcript isoforms were considered functional if they were likely to be translated and if their corresponding AS events either had an effect on predicted dimerisation motifs or occurred in regions known to be involved in multimeric complex formation, or otherwise, if their effect was conserved in different species. Nine out of twelve MIKC MADS-box genes predicted to produce multiple protein isoforms harbored putative functional AS events according to those criteria. AS events with conserved effects were only found at the borders of or within the K-box domain. We illustrate how AS can contribute to the evolution of interaction networks through an example of selective inclusion of a recently evolved interaction motif in the MADS AFFECTING FLOWERING1-3 (MAF1-3) subclade. Furthermore, we demonstrate the potential effect of an AS event in SHORT VEGETATIVE PHASE (SVP), resulting in the deletion of a short sequence stretch including a predicted interaction motif, by overexpression of the fully spliced and the alternatively spliced SVP transcripts. For most of the AS events we were able to formulate hypotheses about the potential impact on the interaction capabilities of the encoded MIKC proteins.
Collapse
Affiliation(s)
- Edouard I. Severing
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands
| | - Giuseppa Morabito
- Plant Developmental Systems, Plant Research International, Wageningen, The Netherlands
| | | | - Richard G. H. Immink
- Centre for BioSystems Genomics, Wageningen, The Netherlands
- Plant Developmental Systems, Plant Research International, Wageningen, The Netherlands
| | - Roeland C. H. J. van Ham
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
22
|
Fukuchi S, Hosoda K, Homma K, Gojobori T, Nishikawa K. Binary classification of protein molecules into intrinsically disordered and ordered segments. BMC STRUCTURAL BIOLOGY 2011; 11:29. [PMID: 21693062 PMCID: PMC3199747 DOI: 10.1186/1472-6807-11-29] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Accepted: 06/22/2011] [Indexed: 11/17/2022]
Abstract
Background Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome. Results In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing. Conclusions We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT.
Collapse
Affiliation(s)
- Satoshi Fukuchi
- Center for Information Biology & DNA Data Bank of Japan, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan.
| | | | | | | | | |
Collapse
|
23
|
Floris M, Raimondo D, Leoni G, Orsini M, Marcatili P, Tramontano A. MAISTAS: a tool for automatic structural evaluation of alternative splicing products. Bioinformatics 2011; 27:1625-9. [PMID: 21498402 PMCID: PMC3106191 DOI: 10.1093/bioinformatics/btr198] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Motivation: Analysis of the human genome revealed that the amount of transcribed sequence is an order of magnitude greater than the number of predicted and well-characterized genes. A sizeable fraction of these transcripts is related to alternatively spliced forms of known protein coding genes. Inspection of the alternatively spliced transcripts identified in the pilot phase of the ENCODE project has clearly shown that often their structure might substantially differ from that of other isoforms of the same gene, and therefore that they might perform unrelated functions, or that they might even not correspond to a functional protein. Identifying these cases is obviously relevant for the functional assignment of gene products and for the interpretation of the effect of variations in the corresponding proteins. Results: Here we describe a publicly available tool that, given a gene or a protein, retrieves and analyses all its annotated isoforms, provides users with three-dimensional models of the isoform(s) of his/her interest whenever possible and automatically assesses whether homology derived structural models correspond to plausible structures. This information is clearly relevant. When the homology model of some isoforms of a gene does not seem structurally plausible, the implications are that either they assume a structure unrelated to that of the other isoforms of the same gene with presumably significant functional differences, or do not correspond to functional products. We provide indications that the second hypothesis is likely to be true for a substantial fraction of the cases. Availability:http://maistas.bioinformatica.crs4.it/. Contact:anna.tramontano@uniromal.it
Collapse
Affiliation(s)
- Matteo Floris
- CRS4-Bioinformatics Laboratory, c/o Sardegna Ricerche Scientific Park, Pula, 09010 Cagliari, Italy
| | | | | | | | | | | |
Collapse
|
24
|
Characterization of an alternative splicing by a NAGNAG splice acceptor site in the porcine KIT gene. Genes Genomics 2011. [DOI: 10.1007/s13258-010-0156-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
25
|
Leoni G, Le Pera L, Ferrè F, Raimondo D, Tramontano A. Coding potential of the products of alternative splicing in human. Genome Biol 2011; 12:R9. [PMID: 21251333 PMCID: PMC3091307 DOI: 10.1186/gb-2011-12-1-r9] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2010] [Revised: 12/17/2010] [Accepted: 01/20/2011] [Indexed: 12/22/2022] Open
Abstract
Background Analysis of the human genome has revealed that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein. Results In this study we analyze alternative splicing isoforms of human gene products that are unambiguously identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report how well each of these strategies and their combination can correctly identify translated isoforms and derive a lower limit for their specificity, that is, their ability to correctly identify non-translated products. Conclusions The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity.
Collapse
Affiliation(s)
- Guido Leoni
- Dipartimento di Scienze Biochimiche, Sapienza Università di Roma, P.le A. Moro, 5 - 00185 Rome, Italy
| | | | | | | | | |
Collapse
|
26
|
Hegyi H, Kalmar L, Horvath T, Tompa P. Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder. Nucleic Acids Res 2010; 39:1208-19. [PMID: 20972208 PMCID: PMC3045584 DOI: 10.1093/nar/gkq843] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.
Collapse
Affiliation(s)
- Hedi Hegyi
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, PO Box 7, 1518 Budapest, Hungary.
| | | | | | | |
Collapse
|
27
|
Zambelli F, Pavesi G, Gissi C, Horner DS, Pesole G. Assessment of orthologous splicing isoforms in human and mouse orthologous genes. BMC Genomics 2010; 11:534. [PMID: 20920313 PMCID: PMC3091683 DOI: 10.1186/1471-2164-11-534] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Accepted: 10/01/2010] [Indexed: 11/22/2022] Open
Abstract
Background Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level. Results As a starting step in this direction, in this work we performed a large scale human- mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns. Conclusions We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts) we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species-specific, suggests that, still maintaining the conventional definition of gene orthology, a new concept of "splicing orthology" can be defined at transcript level.
Collapse
Affiliation(s)
- Federico Zambelli
- Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, Milano, Italia
| | | | | | | | | |
Collapse
|
28
|
Barbazuk WB. A conserved alternative splicing event in plants reveals an ancient exonization of 5S rRNA that regulates TFIIIA. RNA Biol 2010; 7:397-402. [PMID: 20699638 DOI: 10.4161/rna.7.4.12684] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Uncovering conserved alternative splicing (AS) events can identify AS events that perform important functions. This is especially useful for identifying premature stop codon containing (PTC) AS isoforms that may regulate protein expression by being targets for nonsense mediated decay. This report discusses the identification of a PTC containing splice isoform of the TFIIIA gene that is highly conserved in land plants. TFIIIA is essential for RNA Polymerase III-based transcription of 5S rRNA in eukaryotes. Two independent groups have determined that the PTC containing alternative exon is ultraconserved and is coupled with nonsense-mediated mRNA decay. The alternative exon appears to have been derived by the exonization of 5S ribosomal RNA (5S rRNA) within the gene of its own transcription regulator, TFIIIA. This provides the first evidence of ancient exaptation of 5S rRNA in plants, suggesting a novel gene regulation model mediated by the AS of an anciently exonized non-coding element.
Collapse
Affiliation(s)
- W Brad Barbazuk
- Department of Biology and the Florida Genetics Institute, University of Florida, Gainesville, FL USA.
| |
Collapse
|